[05:15:01] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Data-Persistence, 10MediaWiki-Recent-changes, and 2 others: Summary table for recent ORES scores - https://phabricator.wikimedia.org/T403003 (10tstarling) 03NEW [06:57:12] 06Machine-Learning-Team, 07Essential-Work: Upgrade revscoring model servers from debian bullseye to bookworm - https://phabricator.wikimedia.org/T400350#11122373 (10kevinbazira) Following the EOL concerns with python3.9 discussed in T400350#11120693, I have worked on the updated revscoring model-server blubber... [06:58:36] good morning! [07:18:21] good morning [07:48:20] kevinbazira: regarding the comments ^^ upgrading to python 3.10 sounds like a good middle ground solution to allow us to move forward with the upgrade! [07:49:48] isaranto: o/ thank you for the feedback. I am going to move forward with the python3.10 approach. [07:52:38] I'm curious what the .sh script has in it, other than that sgtm! [07:54:30] here is the .sh script: https://phabricator.wikimedia.org/P81852#328373 [08:25:35] thanks! [08:33:27] morning! [08:33:42] (03PS1) 10Kevin Bazira: revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) [08:34:54] (03CR) 10CI reject: [V:04-1] revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [08:37:12] (03PS2) 10Kevin Bazira: revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) [08:37:40] (03CR) 10CI reject: [V:04-1] revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [08:50:50] (03PS3) 10Kevin Bazira: revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) [08:53:22] a patch that upgrades the revscoring model-server to bookworm is ready for review: https://gerrit.wikimedia.org/r/1182495 [08:53:22] when anyone has a minute, please have a look. thanks! [09:25:31] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Data-Persistence, 10MediaWiki-Recent-changes, and 2 others: Summary table for recent ORES scores - https://phabricator.wikimedia.org/T403003#11122566 (10tstarling) [09:27:09] (03CR) 10Ozge: [C:03+1] "Looks great! thank you Kevin." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [09:36:40] (03CR) 10Kevin Bazira: [C:03+2] "yep, the goal was to avoid dependency hell. thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [09:37:09] (03Merged) 10jenkins-bot: revscoring: upgrade model-server from bullseye to bookworm [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1182495 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [10:10:07] thanks for the review Özge! [10:10:07] I have pushed a patch to test the new image on revscoring ivscs in staging: https://gerrit.wikimedia.org/r/1182506 [10:15:11] 🙌 [10:55:55] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 10PageTriage: ORES (and by extension PageTriage) cannot use newer LiftWing models - https://phabricator.wikimedia.org/T403029#11123128 (10Samwalton9-WMF) [11:21:46] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 10Editing-team (Tracking): Build model training pipeline for tone check using WMF ML Airflow instance - https://phabricator.wikimedia.org/T396495#11123240 (10gkyziridis) Thank you for building the PVC and copying the files on it in order... [11:33:11] klausman: o/ as FYI I am going to reimage ml-server101[2,3] in a bit with Bookworm, I got the green light from Moritz to use the bookworm-backports kernel [11:33:29] the caveat is that it should be used only for the transition period, but it is good to start experimenting with the new hosts [11:59:00] aye, sounds good [12:20:16] 06Machine-Learning-Team, 07Essential-Work: Enable alerts for outdated admin_ng charts for ml-team - https://phabricator.wikimedia.org/T403047 (10klausman) 03NEW [12:45:54] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11123623 (10OKarakaya-WMF) airflow dag mr for staging release: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1... [12:50:51] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11123691 (10OKarakaya-WMF) staging release airflow dag tested on dev with three wikis and it works well. {F65920987} [12:54:18] hello, I've two MRs for release dags https://gitlab.wikimedia.org/repos/machine-learning/ml-pipelines/-/merge_requests/26 https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1638/diffs can you take a look when you have time? tested on airflow dev with three wikis. I've shared more information on the MRs and the comments https://phabricator.wikimedia.org/T398950#11123623 [12:54:19] https://phabricator.wikimedia.org/T398950#11123691 thank you @kevinbazira [12:56:34] I'll continue with the inference service changes. [13:00:59] ack... taking a look! [13:01:14] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11123737 (10OKarakaya-WMF) [13:04:06] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11123752 (10OKarakaya-WMF) [13:12:38] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11123815 (10OKarakaya-WMF) Hey @brouberol I don't have access to yarn logs. Is it expected? ` ozge@stat1010:~$ yarn logs -appOwner analytics-m... [13:20:44] FIRING: LiftWingServiceErrorRate: ... [13:20:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=hewiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:23:25] looking ---^ [13:24:10] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11123895 (10brouberol) Have you tried running `kerberos-run-command analytics-ml yarn logs -appOwner analytics-ml -applicationId application_17549... [13:25:44] RESOLVED: LiftWingServiceErrorRate: ... [13:25:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=hewiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:27:41] klausman: I was able to make everything work on ml-serve1012, the caveat is that I had to pull firmware-amd-graphics as well from bookworm-backports [13:33:51] @kevinbazira I've not checked if it's related but it reminds me this one: https://phabricator.wikimedia.org/T401109#11068784 [13:34:39] elukey: yeah, that's not super surprising. Since the fw packages don't really affect anything besides kernel/initrd, it should be fine [13:38:52] ozge_: ack. thanks for sharing [13:38:53] logs are showing a couple of errors between 1300 and 1330 hrs GMT : https://logstash.wikimedia.org/goto/b07d169d43c454bc99fcdbf3bf68a01f [13:42:02] the errors are mainly occurring when the isvc is fetching feature values from the MW API [14:02:24] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11124122 (10OKarakaya-WMF) oh thanks :) [14:08:23] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#11124157 (10elukey) Note for the future - we decided to use bookworm for these nodes, forcing the install of a backported kernel to get more... [14:09:34] isaranto, georgekyz o/ lemme me know when you have a moment to discuss https://phabricator.wikimedia.org/T390706 [14:26:51] elukey: Hey, we can discuss it tomorrow if you want [14:30:25] georgekyz: sure! It is fine also to follow up in the task, my understanding is that the SLO target is a bit too tight and we may need to review it, and/or to check why the latency is not as expected.. [14:31:10] elukey: alright lets sync tomorrow then [14:38:57] 06Machine-Learning-Team, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): Create an analytics service user for the ML team - https://phabricator.wikimedia.org/T400902#11124365 (10brouberol) No problem! I also would like to emphasize that the canonical way to get yarn logs when your airflow instance runs in Kub... [15:30:02] ack on ^^ [15:30:35] the issue is with the latency SLo so we'll need to revisit and investigate any issues with it [15:57:06] 06Machine-Learning-Team, 07Essential-Work: Investigate revscoring-editquality-damaging alert triggered by MW API fetch errors - https://phabricator.wikimedia.org/T403088 (10kevinbazira) 03NEW [16:14:03] 06Machine-Learning-Team, 07Essential-Work: Investigate revscoring-editquality-damaging alert triggered by MW API fetch errors - https://phabricator.wikimedia.org/T403088#11125140 (10kevinbazira) One thing that stands out from the [logs](https://phabricator.wikimedia.org/P81904) is that there are multiple concu... [17:48:50] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Planning), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11125635 (10ppelberg) [17:49:51] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Planning), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11125647 (10ppelberg) [17:50:29] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Planning), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11125648 (10ppelberg) [19:01:57] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Planning), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11126098 (10ppelberg) [19:03:47] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Planning), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11126104 (10ppelberg) [19:48:32] (03PS1) 10Sbisson: Introduce 'stub' and 'unknown' difficulty levels [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1182641 [19:53:57] (03PS2) 10Sbisson: Introduce 'stub' and 'unknown' difficulty levels [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1182641 [19:55:16] (03PS3) 10Sbisson: Introduce 'stub' and 'unknown' difficulty levels [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1182641 [20:49:48] (03PS1) 10Sbisson: Support difficulty filtering for collection with source_lang=en [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1182654