[04:57:57] <wikibugs>	 (03CR) 10Santhosh: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508) (owner: 10Santhosh)
[05:07:35] <wikibugs>	 (03PS2) 10Santhosh: Consider special language codes while checking for article existence [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508)
[05:08:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Consider special language codes while checking for article existence [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508) (owner: 10Santhosh)
[05:19:28] <wikibugs>	 (03PS3) 10Santhosh: Consider special language codes while checking for article existence [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508)
[05:21:48] <wikibugs>	 (03CR) 10Santhosh: Consider special language codes while checking for article existence (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508) (owner: 10Santhosh)
[07:25:51] <wikibugs>	 (03PS1) 10Kevin Bazira: RRLA: send prediction results to output event stream [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1129755 (https://phabricator.wikimedia.org/T326179)
[07:35:27] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C:03+2] build: Update MediaWiki requirement to 1.44 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1129487 (owner: 10Jforrester)
[07:46:03] <elukey>	 hello folks!
[07:46:14] <elukey>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1129762 to move ml-serve2002 to containerd
[08:04:35] <isaranto>	 howdy!
[08:13:54] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "Thanks for updating this Aiko. LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113755 (owner: 10AikoChou)
[08:32:48] <wikibugs>	 (03Merged) 10jenkins-bot: build: Update MediaWiki requirement to 1.44 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1129487 (owner: 10Jforrester)
[08:52:40] <elukey>	 reimaging 2002 now
[08:56:51] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Thanks for the work on this. I'd like to suggest the following:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1129755 (https://phabricator.wikimedia.org/T326179) (owner: 10Kevin Bazira)
[08:57:54] <isaranto>	 kevinbazira: o/ I added a comment about publishing events. lemme know what you think, happy to chat about it more!
[09:04:16] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: locust: change time between requests for edit-check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1129773 (https://phabricator.wikimedia.org/T388817)
[09:05:52] <isaranto>	 georgekyz: o/  Using the patch above, could you run an additional load test for edit-check for {100, 150, 200 } users for 5 minutes each?
[09:06:27] <isaranto>	 I'd like to see what happens when we go close to 100rps
[09:07:05] <isaranto>	 actually also include 50 users because of the difference between requests. so we have {50, 100, 150, 200 } users for 5 minutes each (300s)
[09:07:30] <isaranto>	 lemme know if you want any help 
[09:21:00] <georgekyz>	 I'm on it
[09:25:04] <isaranto>	 Bedankt!
[09:53:32] <georgekyz>	 Graag gedaan
[10:03:36] <isaranto>	 that is next level Dutch for me :P
[10:06:05] <georgekyz>	 hahaha for me as well! it is a very formal version of "my pleasure" 🤣
[10:09:23] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck, 13Patch-For-Review: Load test the peacock edit check service - https://phabricator.wikimedia.org/T388817#10656297 (10gkyziridis) **Multiple Locust tests eidt-check on GPU**  Test specifications:  ` wait_time = between(0.0, 0.1)         # random number betw...
[10:09:35] <georgekyz>	 locust tests ~~~^^^
[10:17:50] <isaranto>	 thanks! so that puts some stress on the service. I'm wondering what is the cutoff point in rps/users above which latency starts to increase
[10:20:23] <isaranto>	 we should look into setting up a load test that runs for different users so that we can run it in one go (instead of running it for 50 users collect results, then run for 100 etc)
[10:20:37] <isaranto>	 this could be done using a LoadTestShape https://docs.locust.io/en/stable/custom-load-shape.html
[10:21:43] <isaranto>	 I'm not sure if the output is broken down by default or it gives the aggregate stats though
[10:22:31] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10656332 (10elukey) Got this for ml-serve2002:  ` UEFI0339: The Dual Inline Memory Module (DIMM) in the memory slot B2 is disabled because of initialization erro...
[10:22:46] <wikibugs>	 (03CR) 10Kevin Bazira: "Thanks for the suggestion, Ilias." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1129755 (https://phabricator.wikimedia.org/T326179) (owner: 10Kevin Bazira)
[10:23:51] <georgekyz>	 isaranto: I will have a look on that one 
[10:23:51] <isaranto>	 georgekyz: I see 3 replicas for edit-check in experimental. The service scaled horizontally due to the increased traffic. I think we should set minreplicas to 1 for now to see what 1 pod can do.
[10:25:12] <isaranto>	 I'm changing it now on the fly in experimental ns, we should also add the change in deployment-charts
[10:25:33] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw: DIMM B1 issues for ml-serve2002 - https://phabricator.wikimedia.org/T389472 (10elukey) 03NEW
[10:25:58] <isaranto>	 I meant maxreplicas = 1
[10:26:03] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw: DIMM B1 issues for ml-serve2002 - https://phabricator.wikimedia.org/T389472#10656369 (10elukey) The host is completely depooled, please take any action that you need to do :)
[10:26:35] <kevinbazira>	 isaranto: o/ I responded to your comment: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1129755/comments/c0290bd0_9849a062
[10:28:02] <elukey>	 ml-serve2002 needs to stay down due to T389472 sigh
[10:28:03] <wikibugs>	 06Machine-Learning-Team, 13Patch-For-Review: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10656373 (10elukey)
[10:29:29] <georgekyz>	 isaranto: Did you change it already in isvc ?
[10:30:10] <isaranto>	 doing it now
[10:32:30] <isaranto>	 georgekyz: done
[10:32:46] <isaranto>	 I see 1 pod now
[10:36:13] <georgekyz>	 are you running the locust tests now ?
[10:36:28] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Got it! So only the changeprop requests have some additional latency. Let's keep in mind the background tasks for other cases then!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1129755 (https://phabricator.wikimedia.org/T326179) (owner: 10Kevin Bazira)
[10:36:54] <isaranto>	 no go ahead, sorry for interfering!
[10:38:05] <isaranto>	 I was just curious about  pod resources utilization during the tests so was checking grafana and saw 3 pods
[10:58:29] <georgekyz>	 Do we have any examples using the LoadTestShape in our locust tests ? just for a reference 
[10:59:23] <isaranto>	 no we havent used it yet
[12:01:43] * isaranto afk lunch!
[12:27:44] <jinxer-wm>	 FIRING: LiftWingServiceErrorRate: ...
[12:27:44] <jinxer-wm>	 LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revision-models&var-backend=reference-need-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate
[12:58:38] <wikibugs>	 06Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Tests, 10Testing Support, and 3 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#10656932 (10zeljkofilipin)
[13:19:12] <wikibugs>	 (03CR) 10Sbisson: [C:03+2] Consider special language codes while checking for article existence [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508) (owner: 10Santhosh)
[13:19:56] <wikibugs>	 (03Merged) 10jenkins-bot: Consider special language codes while checking for article existence [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1129205 (https://phabricator.wikimedia.org/T306508) (owner: 10Santhosh)
[14:29:44] <georgekyz>	 Locust tests for edit-check on single pod are crashing the pod
[15:03:48] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: DIMM B1 issues for ml-serve2002 - https://phabricator.wikimedia.org/T389472#10657360 (10Jhancock.wm) okay since this has happened before i pulled DIMM_B1 to see if it would boot without it. Got the same error on DIMM_B2. moved it to DIMM_B1. error move...
[15:04:09] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: DIMM B1 issues for ml-serve2002 - https://phabricator.wikimedia.org/T389472#10657361 (10Jhancock.wm) a:03Jhancock.wm
[17:12:36] * isaranto afk!
[18:07:56] <wikibugs>	 06Machine-Learning-Team, 10ORES, 10MediaWiki-Core-Tests, 10Testing Support, and 2 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#10658603 (10zeljkofilipin)
[18:08:26] <wikibugs>	 (03PS4) 10AikoChou: locust: add util for fetching recent change revisions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113755
[18:10:25] <wikibugs>	 (03CR) 10AikoChou: "Instructions added!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113755 (owner: 10AikoChou)
[18:10:43] <wikibugs>	 (03CR) 10AikoChou: [C:03+2] "Thanks for the review!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113755 (owner: 10AikoChou)