[04:29:33] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10FlaggedRevs, 10MediaWiki-Recent-changes, and 3 others: Special:RecentChanges with complex query (rc_namespace = 1 AND ct_tag = 'visualeditor') slow due to using wrong index - https://phabricator.wikimedia.org/T168096#10742083 (10Pppery) [05:44:51] hello! [06:23:42] 06Machine-Learning-Team, 06Data-Engineering, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10742272 (10isarantopoulos) >>! In T326179#10718156, @Ottomata wrote: > Once this stream is deployed and active,... [06:42:58] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10742306 (10isarantopoulos) Nice work Kevin! @kevinbazira @klausman could we run the same benchmark on the MI300X? [06:50:12] Good morning [07:36:11] (03PS12) 10AikoChou: edit-check: add SHAP values [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) [07:49:22] (03CR) 10AikoChou: "I added unit tests for the preprocess and postprocess function, focusing on the format in the final response." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [07:56:47] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10742507 (10OKarakaya-WMF) hey @isarantopoulos , @achou and I have checked feature generation part and I've checked checked further mwparserfromhtml repo.... [08:32:48] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10742624 (10klausman) >>! In T385173#10742305, @isarantopoulos wrote: > Nice work Kevin! > @kevinbazira @klausman could we run the same benchmark on the MI300X? I think so, yes. Will give that a... [08:54:37] (03CR) 10Ilias Sarantopoulos: "LGTM, nice work Aiko!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [08:54:41] (03CR) 10Ilias Sarantopoulos: [C:03+1] edit-check: add SHAP values [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [09:09:49] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Improving language agnostic articlequality model + service - https://phabricator.wikimedia.org/T391679#10742684 (10isarantopoulos) Yes, that would be great to avoid any inconsistency between training/serving! One thing to keep in mind while working on this is... [09:17:08] Morning! [09:23:34] \o [09:32:55] (03CR) 10AikoChou: [C:03+2] "Thanks for the review! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [09:33:49] (03Merged) 10jenkins-bot: edit-check: add SHAP values [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1133426 (https://phabricator.wikimedia.org/T387984) (owner: 10AikoChou) [10:06:48] 06Machine-Learning-Team: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940 (10isarantopoulos) 03NEW [10:08:30] 06Machine-Learning-Team: Q4 24-25 Simple article summaries: set up the software stack for efficiently serving production LLMs - https://phabricator.wikimedia.org/T391941 (10isarantopoulos) 03NEW [10:10:26] 06Machine-Learning-Team: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10742850 (10isarantopoulos) [10:12:34] 06Machine-Learning-Team, 05Goal: Q4 24-25 Simple article summaries: set up the software stack for efficiently serving production LLMs - https://phabricator.wikimedia.org/T391941#10742853 (10isarantopoulos) [10:12:39] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10742854 (10isarantopoulos) [10:32:09] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Operational Excellence - LiftWing Platform Updates & Improvements - https://phabricator.wikimedia.org/T391943 (10isarantopoulos) 03NEW [10:32:30] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Simple article summaries: Set up the software stack for efficiently serving production LLMs - https://phabricator.wikimedia.org/T391941#10742914 (10isarantopoulos) [12:13:40] 06Machine-Learning-Team: Create a new S3 bucket for MinT - https://phabricator.wikimedia.org/T391958 (10elukey) 03NEW [12:20:39] 06Machine-Learning-Team: Move Lift Wing models from Thanos Swift to APUS - https://phabricator.wikimedia.org/T391960 (10elukey) 03NEW [12:45:10] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Operational Excellence - LiftWing Platform Updates & Improvements - https://phabricator.wikimedia.org/T391943#10743428 (10elukey) If possible, I'd swap "Upgrade k8s" with T369493, that also requires to upgrade the whole eqiad cluster to Lift Wing. It is a sizeab... [12:50:04] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964 (10DMburugu) 03NEW [12:50:48] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743494 (10DMburugu) [12:55:36] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743535 (10DMburugu) [12:56:19] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743539 (10DMburugu) [12:56:45] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes ORES Enabled Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743544 (10DMburugu) [13:09:21] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Epic - Recent Changes ORES Enabled Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743587 (10Ladsgroup) I suggest creating a dblist similar to flow for e... [13:09:40] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10743595 (10klausman) The tests are still running, but here's the first few results: {F59177908} It looks like the MI300x is about 3-4 times faster than the MI210 in this test case. [13:14:57] 06Machine-Learning-Team, 06Data-Engineering, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10743606 (10Ottomata) I don't think so! The data in these streams is the same as in `mediawiki.page_change.v1`,... [13:30:58] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Simple article summaries: Set up the software stack for efficiently serving production LLMs - https://phabricator.wikimedia.org/T391941#10743685 (10isarantopoulos) [13:53:27] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, and 2 others: [Epic] Recent Changes ORES Enabled Revert Risk Powered Filters Rollout Plan - https://phabricator.wikimedia.org/T391964#10743823 (10Aklapper) [14:52:26] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10744075 (10SSalgaonkar-WMF) 05Open→03Resolved a:03SSalgaonkar-WMF [14:53:34] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10744089 (10calbon) 05Resolved→03In progress [14:53:44] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10744092 (10isarantopoulos) 05In progress→03Open [14:53:58] 06Machine-Learning-Team, 05Goal: Q4 24-25 Goal: Productionize peacock detection model - https://phabricator.wikimedia.org/T391940#10744094 (10isarantopoulos) 05Open→03In progress [15:04:55] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10744134 (10klausman) The full chart for running the benchmark as describe by kevin above, on the SMC-provided MI300X test machine (using one of the 8 GPUs). {F59220097} [15:06:20] 10Lift-Wing, 06Machine-Learning-Team: Use rocm/vllm image on Lift Wing - https://phabricator.wikimedia.org/T385173#10744141 (10klausman) And the undelrying CSV data for the above graph: {F59220995} [16:35:32] * isaranto afk! [18:15:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:15:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [18:15:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:43:10] 06Machine-Learning-Team, 06Research: AI/ML Model Request: **Add-a-Link for Orphan Articles** - https://phabricator.wikimedia.org/T392012 (10SSalgaonkar-WMF) 03NEW [18:43:56] 06Machine-Learning-Team, 06Research: AI/ML Model Request: Add-a-Link for Orphan Articles - https://phabricator.wikimedia.org/T392012#10745083 (10SSalgaonkar-WMF) [19:11:18] 06Machine-Learning-Team, 06Research: AI/ML Model Request: Copyedit suggestions - https://phabricator.wikimedia.org/T392013 (10SSalgaonkar-WMF) 03NEW [20:16:24] 06Machine-Learning-Team, 06Research: AI/ML Model Request: Copyedit suggestions - https://phabricator.wikimedia.org/T392013#10745536 (10SSalgaonkar-WMF) [22:15:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [22:15:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [22:15:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas