[05:17:31] (03CR) 10Kevin Bazira: [C:03+1] "thank you for fixing this: https://phabricator.wikimedia.org/P74906" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [05:43:19] (03CR) 10Kevin Bazira: "I've tested this locally: https://phabricator.wikimedia.org/P74907" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) (owner: 10Ilias Sarantopoulos) [06:45:45] good morning folks! [06:53:39] Good morning [08:00:41] (03PS5) 10Ilias Sarantopoulos: articlequality: add async requests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) [08:01:11] (03CR) 10Ilias Sarantopoulos: articlequality: add async requests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) (owner: 10Ilias Sarantopoulos) [08:03:38] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10737332 (10OKarakaya-WMF) I've tried catboost with non-normalized features. Pearson (linear) correlation: 0.838 Best score from ridge regression was: 0.8... [08:09:59] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10737353 (10isarantopoulos) I just added you to the project and to the [[ https://gitlab.wikimedia.org/groups/repos/machine-learning/-/group_members | machin... [08:43:40] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10737495 (10OKarakaya-WMF) Thank you @isarantopoulos ! I keep my notebooks here: https://gitlab.wikimedia.org/repos/machine-learning/exploratory-notebook/-/... [09:22:45] (03CR) 10Kevin Bazira: [C:03+1] articlequality: add async requests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) (owner: 10Ilias Sarantopoulos) [09:34:56] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10737743 (10kevinbazira) Following the `aya-expanse-32b` inference speeds in T385173#10729616, we wanted to understand more about how this isvc would perform with different input and output token... [09:36:07] o/ here are the inference benchmarking results of `aya-expanse-32b` hosted in the ROCm vLLM image using ml-lab GPUs: [09:36:07] https://phabricator.wikimedia.org/T385173#10737743 [09:36:07] these were generated using AMD's benchmarking tools from: https://github.com/ROCm/MAD [10:06:54] this looks great Kevin! I'll take a look a bit later and add comments (if any!) [10:15:31] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10737911 (10OKarakaya-WMF) I've trained two regression models (ridge, catboost) and calculated evaluation metrics: Ridge regression: Pearson (linear) correl... [10:57:57] isaranto: o/ is it ok if I merge the shap value's patch? https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1133426 [11:11:52] (03CR) 10Ilias Sarantopoulos: "Thank you for the work and the changes and sorry for delaying you." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [11:12:38] aiko: sorry for delaying you on this. could you add a unit test for the postprocess function? then we can merge it [11:13:55] we can also do it in parallel: merge and run load tests and in parallel create 1-2 unit tests . I leave it up to you to decide! [11:26:49] np! [12:05:40] (03CR) 10Ilias Sarantopoulos: [C:03+2] articlequality: add async requests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) (owner: 10Ilias Sarantopoulos) [12:06:27] (03Merged) 10jenkins-bot: articlequality: add async requests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1135721 (https://phabricator.wikimedia.org/T388805) (owner: 10Ilias Sarantopoulos) [12:21:48] 10Lift-Wing, 06Machine-Learning-Team, 10Wikimedia Enterprise - Content Integrity, 13Patch-For-Review: Load test the language agnostic article-quality model - https://phabricator.wikimedia.org/T388805#10738369 (10Isaac) Glad to see the latency dropping! One thought: I suspect if we further instrumented the... [13:31:10] 10Lift-Wing, 06Machine-Learning-Team, 10Wikimedia Enterprise - Content Integrity: Load test the language agnostic article-quality model - https://phabricator.wikimedia.org/T388805#10738880 (10isarantopoulos) After deploying to ml-staging I reran the previous tests (the same test on ml-staging as the first 2... [13:31:37] --^ got some great results load testing articlequality in ml-staging [13:40:43] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10738922 (10OKarakaya-WMF) I see catboost is also better as a classification task on the same test set. I need to check some steps and then I'll share my tho... [14:05:14] 10Lift-Wing, 06Machine-Learning-Team, 10Wikimedia Enterprise - Content Integrity: Load test the language agnostic article-quality model - https://phabricator.wikimedia.org/T388805#10739033 (10isarantopoulos) @Isaac you're totally right on this. Although adding async requests is an improvement the reported te... [15:30:37] 06Machine-Learning-Team, 06Data-Engineering, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10739547 (10Ottomata) Nice!!! https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#EventSt... [15:32:45] * isaranto afk [17:08:46] FIRING: LiftWingServiceErrorRate: ... [17:08:52] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [17:48:44] RESOLVED: LiftWingServiceErrorRate: ... [17:48:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate