[02:06:43] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [04:52:36] 06Machine-Learning-Team, 07Essential-Work: Merge tone-check pipeline DAGs into a single DAG for simplified orchestration - https://phabricator.wikimedia.org/T407212#11275074 (10kevinbazira) To fix the OOM issue reported in T407212#11271755, I increased `container_resources.limits.memory` to 16Gi in the `train_... [06:06:43] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [08:06:56] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Machine Learning host migrations - https://phabricator.wikimedia.org/T405647#11275382 (10klausman) ml-cache1002 can be done anytime, it just needs an Icinga/Prometheus downtime. The two ml-serve machines can be done anytime during CET da... [08:43:31] hello [09:12:31] morning! [09:12:42] I'll be taking care of the admin-ng bits in a mo' [09:13:16] 🙌 [10:01:28] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [10:05:38] 06Machine-Learning-Team, 06Data-Platform-SRE: Investigate Label functionality of AMD GPU device plugin on k8s - https://phabricator.wikimedia.org/T373806#11275873 (10kevinbazira) @elukey, thank you for working on the GPU node labellers. Following our IRC conversation, I tested the node selector functionality i... [10:06:24] elukey: o/ I have tested GPU node labels in an airflow DAG and they work as expected: https://phabricator.wikimedia.org/T373806#11275873 [10:06:28] RESOLVED: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [10:17:52] kevinbazira: \o/ thanks for the test! Super so it works as expected, very nice! [10:18:50] \o/ [10:57:03] 06Machine-Learning-Team, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work, 13Patch-For-Review: Enable Airflow triggerer process for deferrable operators in airflow-ml and airflow-devenv - https://phabricator.wikimedia.org/T406958#11276034 (10brouberol) {F66752118} The triggerer is now runnin... [11:02:52] 06Machine-Learning-Team, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work, 13Patch-For-Review: Enable Airflow triggerer process for deferrable operators in airflow-ml and airflow-devenv - https://phabricator.wikimedia.org/T406958#11276047 (10brouberol) Because devenvs are using the same `val... [11:07:20] 06Machine-Learning-Team, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work, 13Patch-For-Review: Enable Airflow triggerer process for deferrable operators in airflow-ml and airflow-devenv - https://phabricator.wikimedia.org/T406958#11276065 (10brouberol) 05In progress→03Resolved [12:40:16] 06Machine-Learning-Team, 07Essential-Work: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11276460 (10OKarakaya-WMF) 05Open→03In progress a:03OKarakaya-WMF [12:41:15] 06Machine-Learning-Team, 07Essential-Work: Update blubber version in inference services images - https://phabricator.wikimedia.org/T400446#11276465 (10OKarakaya-WMF) [13:42:51] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11276746 (10Ottomata) Thanks @achou I really appreciate this summary! > revise... [13:53:16] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11276791 (10dcausse) >>! In T401021#11274339, @achou wrote: > > Q for Search @d... [14:08:00] 06Machine-Learning-Team, 10Semantic Search: Semantic Search POC - In article QA - https://phabricator.wikimedia.org/T405359#11276953 (10OKarakaya-WMF) I've added questions from two large models into the prototype ui. gpt-oss:120b, aya:35b Overall evaluation is in progress. Unfortunately, we can not run follow... [14:40:47] aiko: I just left an update on the task: https://phabricator.wikimedia.org/T407155#11277135, and I uploaded the samples from training data in drive for pre-evaluation. Lets keep this ticket updated with our findings. [15:03:57] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11277317 (10Ottomata) > vs setting up something similar to what was done for art... [15:08:05] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11277351 (10Ottomata) BTW, if we ever do {T214430} (and we might one day now tha... [17:17:17] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11278239 (10achou) > Q: "feed the paragraphs into tone check model and get sco... [18:17:19] georgekyz: o/ thanks! [18:20:45] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11278612 (10Ottomata) > We want to move to Flink for streaming use cases, but Fl... [19:18:26] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11278787 (10achou) Regarding the timeline, Growth plans to launch A/B test in mi... [19:34:44] FIRING: LiftWingServiceErrorRate: ... [19:34:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=svwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:39:44] RESOLVED: LiftWingServiceErrorRate: ... [19:39:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=svwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [23:10:20] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 06Growth-Team (FY2025-26 Q2 Sprint 1): Movement Communications: Rollout "Add a Link" Structured Task to Wikipedias that are supported by V2 model - https://phabricator.wikimedia.org/T407448 (10KStoller-WMF) 03NEW [23:27:49] (03PS1) 10Tim Starling: Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 [23:39:55] (03CR) 10CI reject: [V:04-1] Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539 (owner: 10Tim Starling) [23:47:55] (03PS2) 10Tim Starling: Fix RecentChanges straight join [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1196539