[01:01:44] FIRING: LiftWingServiceErrorRate: ... [01:01:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [01:36:44] RESOLVED: LiftWingServiceErrorRate: ... [01:36:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [02:15:44] FIRING: LiftWingServiceErrorRate: ... [02:15:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [02:17:51] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:25:44] RESOLVED: LiftWingServiceErrorRate: ... [02:25:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [03:11:44] FIRING: LiftWingServiceErrorRate: ... [03:11:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [03:51:44] RESOLVED: LiftWingServiceErrorRate: ... [03:51:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:21:44] FIRING: LiftWingServiceErrorRate: ... [04:21:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:36:44] RESOLVED: LiftWingServiceErrorRate: ... [04:36:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:04:44] FIRING: LiftWingServiceErrorRate: ... [05:04:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:27:30] 10Lift-Wing, 06Machine-Learning-Team, 10Wikidata, 07OKR-Work: Optimize revertrisk-wikidata inference service to achieve ~500ms latency target - https://phabricator.wikimedia.org/T414060#11649436 (10kevinbazira) For future reference, below is a consolidated report of the optimizations we implemented and the... [05:54:44] RESOLVED: LiftWingServiceErrorRate: ... [05:54:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:17:51] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [06:31:44] FIRING: LiftWingServiceErrorRate: ... [06:31:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:21:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:11:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:13:07] (03CR) 10Kevin Bazira: "thank you for updating the isvc to use a base image that has packages (like `torch==2.5.1+rocm6.1`) which support GPU usage." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [09:21:43] 06Machine-Learning-Team: Explore gpt-oss-safeguard-20b - https://phabricator.wikimedia.org/T417860#11649814 (10BWojtowicz-WMF) [09:21:44] 06Machine-Learning-Team, 05Goal: Q2 FY2025-26 Goal: Host safeguard model on LiftWing - https://phabricator.wikimedia.org/T418267#11649815 (10BWojtowicz-WMF) [09:31:24] (03CR) 10Kevin Bazira: Revertrisk-multilingual: Add predictions to events stream. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [09:31:25] 06Machine-Learning-Team: Deploy gpt-oss-safeguard-20b on LiftWing - https://phabricator.wikimedia.org/T418350 (10BWojtowicz-WMF) 03NEW [09:35:46] 06Machine-Learning-Team: Optimize gpt-oss-safeguard-20b LiftWing deployment - https://phabricator.wikimedia.org/T418351 (10BWojtowicz-WMF) 03NEW [10:17:52] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:07:41] (03CR) 10Gkyziridis: "We can create a fork of `knowledge_integrity` repo without the torch import dependency and point to the forked version from our inference-" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [11:46:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:59:57] ^ looking [12:41:04] (03PS1) 10Kevin Bazira: policy-violation: integrate prototype into model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1243810 (https://phabricator.wikimedia.org/T418350) [12:51:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:16:54] (03CR) 10Kevin Bazira: "removing torch completely from the `knowledge_integrity` repo would introduce a breaking change for other users/tools that rely on this pa" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [13:18:00] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 2 others: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11650585 (10Samwalton9-WMF) 05Open→03Resolved a:03DPogo... [13:19:59] (03CR) 10Gkyziridis: "THNX!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [13:31:44] RESOLVED: LiftWingServiceErrorRate: ... [13:31:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revertrisk&var-backend=revertrisk-multilingual-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:50:12] (03CR) 10Gkyziridis: "`poetry lock` takes more than `1300 secs`, I will wait until is done." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [14:17:52] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:42:15] (03CR) 10Nikerabbit: "`" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1240745 (owner: 10Sbisson) [14:45:15] (03CR) 10Nikerabbit: "Do we need to update our dependencies file?" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1240745 (owner: 10Sbisson) [15:10:26] (03CR) 10Bartosz Wójtowicz: "Thank you so much for prototyping it this fast! Left a couple of comments, not sure if we want to address the input validation now, we can" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1243810 (https://phabricator.wikimedia.org/T418350) (owner: 10Kevin Bazira) [15:41:54] (03CR) 10Sbisson: "Maybe Kartik would know if a similar upgrade has or needs to happen in production." [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1240745 (owner: 10Sbisson) [17:04:11] 06Machine-Learning-Team, 06Data-Engineering, 06Data-Engineering-Radar, 10Event-Platform, 13Patch-For-Review: Add Multilingual RevertRisk predictions to mediawiki.page_revert_risk_prediction_change - https://phabricator.wikimedia.org/T415892#11651593 (10Ottomata) [18:10:18] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team (Kanban), and 2 others: Enable revert risk filters for first batch of wikis: < 1000 monthly edits - https://phabricator.wikimedia.org/T411485#11651895 (10jsn.sherman) patch [[ https://gerrit.wikimedia.o... [18:17:52] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [20:15:55] 06Machine-Learning-Team, 10ORES, 07Regression: ORES API query is slow - https://phabricator.wikimedia.org/T418202#11652235 (10Novem_Linguae) 05Open→03Resolved [20:19:36] 10Lift-Wing, 06Machine-Learning-Team, 06Wikimedia Enterprise, 10Wikimedia Enterprise - Content Integrity: Request to host on Lift Wing - https://phabricator.wikimedia.org/T404911#11652242 (10FNavas-foundation) [22:17:52] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent