[05:25:20] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Host an OpenVINO model in LiftWing - https://phabricator.wikimedia.org/T395012#10889903 (10santhosh) Here is the screencast of everything working together: https://drive.google.com/file/d/1YDSvTm3ePv585-ittck2tYWH2XZ-AHLx/view (MP4 video, 15mins, 88MB) [06:10:14] Good morning folks! [06:21:54] good morning! :) [07:03:07] morning! [07:21:53] good morning [08:49:08] (03PS1) 10Bartosz Wójtowicz: ci: Add CI pipeline for pre-commit to be ran on entire repository. Add basic .dockerignore to the repo. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1154242 (https://phabricator.wikimedia.org/T393865) [10:13:44] FIRING: LiftWingServiceErrorRate: ... [10:13:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:18:44] RESOLVED: LiftWingServiceErrorRate: ... [10:18:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:32:30] ^ it seems one of the pods for `viwiki-reverted-predictor` deployment was restarted around the time that the error happened, but I cannot find any logs/events telling why the restart happened [10:39:53] what's suspicious is some of the functions take very long time to execute: `INFO:root:Function fetch_features took 77.0619 seconds to execute. INFO:root:Function run_in_process_pool took 491.59 seconds to execute.` [10:57:22] thanks for looking into that bartosz . without having taken a deeper look into this I suspect that it the known issue described in https://phabricator.wikimedia.org/T363336. We have enabled multiprocessing for that model server https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039776 [11:48:37] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 3 others: [batch #2] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395823#10890652 (10gkyziridis) [12:21:44] FIRING: LiftWingServiceErrorRate: ... [12:21:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:30:00] nooo [12:31:44] RESOLVED: LiftWingServiceErrorRate: ... [12:31:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:33:28] (03PS2) 10Bartosz Wójtowicz: ci: Add CI pipeline for pre-commit to be ran on entire repository. Add basic .dockerignore to the repo. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1154242 (https://phabricator.wikimedia.org/T393865) [12:38:44] FIRING: LiftWingServiceErrorRate: ... [12:38:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:41:02] (03CR) 10CI reject: [V:04-1] ci: Add CI pipeline for pre-commit to be ran on entire repository. Add basic .dockerignore to the repo. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1154242 (https://phabricator.wikimedia.org/T393865) (owner: 10Bartosz Wójtowicz) [12:44:24] Grafana reports 452 active backends, what does this means ? [12:53:15] whcih grafana page are you referring to, can you share a link? [12:53:42] usually this means that according to the filters you have set there are X active backends/services detected [12:53:44] RESOLVED: LiftWingServiceErrorRate: ... [12:53:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-reverted&var-backend=viwiki-reverted-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:27:38] isaranto: When you open it at first it reports 4 and if you refresh it it reports 452: https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=D-2kXvZnk&var-namespace=revscoring-editquality-reverted&var-backend=$__all&from=now-3h&to=now&timezone=utc&var-response_code=$__all&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99 [13:27:49] it is codfw [13:31:05] That's a bit of a bug in the dashboard [13:31:26] the "Active backends" panel uses this query: `count(sum by (destination_service_name) (istio_requests_total{destination_service_name=~"${backend:pipe}",}))` [13:31:44] Note how it only filters by the contents of the "backend" dropdown, not by "namespace" [13:32:17] So if you have e.g. rs-eq-reverted selected, but all namespaces, you get, well, everything [13:35:48] There, fixed it. [13:36:37] Now, if you select e.g. just the revision-models namespace, but leave backend at "All", it shows 8 active backends. HTH. [13:57:50] nice, thanks Tobias! [14:02:13] (03PS1) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) [14:05:09] (03CR) 10Gkyziridis: "The previous logic of reporting the statistics at this patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/1151706 was wro" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) (owner: 10Gkyziridis) [14:18:56] (03CR) 10CI reject: [V:04-1] improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) (owner: 10Gkyziridis) [14:29:20] (03PS2) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) [14:34:20] (03PS3) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) [15:56:35] Have a nice weekend all ! [16:06:34] o/ enjoy the weekend :) [17:03:18] Enjoy folks [17:03:22] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.45-notes (1.45.0-wmf.4; 2025-06-03), 13Patch-For-Review: Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10891736 (10gkyziridis) I refactored the backfill script: `./extensions/ORES/maintenance/Populat... [17:05:25] (03PS4) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) [17:05:46] (03Abandoned) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) (owner: 10Gkyziridis) [17:06:01] (03Restored) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) (owner: 10Gkyziridis) [17:07:09] (03PS5) 10Gkyziridis: improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) [17:20:32] (03CR) 10CI reject: [V:04-1] improve logging logic for PopulateDatabase backfill script [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1154299 (https://phabricator.wikimedia.org/T395253) (owner: 10Gkyziridis)