[05:01:27] 06Machine-Learning-Team, 07Essential-Work: Enable Airflow triggerer process for deferrable operators in airflow-ml and airflow-devenv - https://phabricator.wikimedia.org/T406958 (10kevinbazira) 03NEW [06:48:39] good morning. [07:35:41] o/ mooorning [09:22:46] 06Machine-Learning-Team, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work: Enable Airflow triggerer process for deferrable operators in airflow-ml and airflow-devenv - https://phabricator.wikimedia.org/T406958#11263514 (10BTullis) p:05Triage→03High [11:01:28] FIRING: [2x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [11:06:28] FIRING: [4x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [11:11:39] ^^^ on it [11:12:17] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11264016 (10isarantopoulos) >What we're doing here is actually a really common p... [11:15:02] elukey: applying theamdin_ng changes above includes the new cluster role. I think since it's only an add'l role, it's fine to do on Friday [12:01:18] klausman: o/ yes yes it will not be used anywhere, super safe [12:01:28] FIRING: [4x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [12:06:28] RESOLVED: [4x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [12:18:52] 06Machine-Learning-Team, 05Goal, 13Patch-For-Review: Q1 FY2025-26 Goal: Scaling Add-a-link to more wikis via production (airflow) pipelines - https://phabricator.wikimedia.org/T398950#11264150 (10OKarakaya-WMF) ###__**Reporting (10/10/2025)**__ **Progress update on the hypothesis for the week, including if s... [12:46:27] filed the two changes to implement the amd node labeller with min privileges, it should work (in theory) [12:47:21] I thought it would have been a worse thing to add, it looks "relatively" straightforward [12:51:48] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11264264 (10BWojtowicz-WMF) **Weekly Report** Summary of progress: 1. The article topic model now supports `page_id` parameter. The chan... [12:57:24] 06Machine-Learning-Team, 10Semantic Search: Semantic Search POC - In article QA - https://phabricator.wikimedia.org/T405359#11264268 (10OKarakaya-WMF) - I've checked several benchmarks related to QA generation: MMLU Helm: https://crfm.stanford.edu/helm/mmlu/latest/#/leaderboard livebench: https://livebench.a... [14:28:18] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11264532 (10Eevans) >>! In T401021#11264016, @isarantopoulos wrote: > > [...] >... [15:17:20] 06Machine-Learning-Team, 06LPL Hypothesis, 10Recommendation-API: Collection data unavailable in several rec-api hosts - https://phabricator.wikimedia.org/T406854#11264681 (10SBisson) p:05Unbreak!→03Medium It looks like all rec-api instances have been able to update their cache in the last hour. I'm hes... [15:18:26] 06Machine-Learning-Team, 06LPL Hypothesis, 10Recommendation-API: Collection data unavailable in several rec-api hosts - https://phabricator.wikimedia.org/T406854#11264685 (10SBisson) a:05SBisson→03None Pushing back to incoming since I'm not working on it and I'm not even sure what should be done. [15:32:43] anybody have any clue on the above --^ [15:32:57] I'm talking about the localhost:6500 failures reported in https://phabricator.wikimedia.org/T406854#11264685 [17:01:08] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11265021 (10Michael) > There is however a need for an additional step on the ser... [20:00:44] FIRING: LiftWingServiceErrorRate: ... [20:00:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=svwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [20:05:44] RESOLVED: LiftWingServiceErrorRate: ... [20:05:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=svwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate