[00:17:11] 07artificial-intelligence, 06cloud-services-team: Supporting AI, LLM, and data models on WMCS - https://phabricator.wikimedia.org/T336905#11643018 (10Sj) @Alaexis and I chatted off-phab, I would love to see more tools trying this out. (An advantage of these fully open tools is that anyone can run them locally... [02:09:44] (03CR) 10Kevin Bazira: "since you're targeting GPU usage with `RevertRiskMultilingualGPU`, you might want to update this isvc's base image from bookworm:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [02:34:05] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [02:34:11] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [04:18:28] (03CR) 10Kevin Bazira: Revertrisk-multilingual: Add predictions to events stream. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis) [04:19:44] FIRING: [31x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:19:49] FIRING: [3x] KubernetesDeploymentUnavailableReplicas: Deployment edit-check-predictor-00001-deployment in edit-check at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [04:24:44] FIRING: [31x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:34:44] FIRING: [30x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:49:44] FIRING: [30x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [04:59:44] FIRING: [29x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:04:44] FIRING: [29x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:09:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:14:44] FIRING: [26x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:29:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:39:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:44:44] FIRING: [29x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:54:44] FIRING: [30x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:59:44] FIRING: [29x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:14:44] FIRING: [28x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:19:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:24:44] FIRING: [26x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:34:05] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [06:34:06] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [06:44:44] FIRING: [26x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:49:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:59:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:04:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:14:44] FIRING: [28x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:19:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:24:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:34:44] FIRING: [24x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:39:44] FIRING: [24x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:44:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:54:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [07:59:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:04:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:14:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:19:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:19:49] FIRING: [3x] KubernetesDeploymentUnavailableReplicas: Deployment edit-check-predictor-00001-deployment in edit-check at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [08:20:26] 06Machine-Learning-Team, 10ORES: Rename ORES extension - https://phabricator.wikimedia.org/T377563#11643616 (10Novem_Linguae) I'm not a big fan of extension renames. For example, the attempt to rename Extension:Flow to Extension:StructuredDiscussions I find and found to be very confusing. I think having someth... [08:22:27] 06Machine-Learning-Team, 10ORES, 07Regression: ORES API query is slow - https://phabricator.wikimedia.org/T418202 (10Novem_Linguae) 03NEW [08:23:33] 06Machine-Learning-Team, 10ORES, 07Regression: ORES API query is slow - https://phabricator.wikimedia.org/T418202#11643630 (10Novem_Linguae) [08:24:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:33:56] 06Machine-Learning-Team: Explore gpt-oss-safeguard-20b - https://phabricator.wikimedia.org/T417860#11643639 (10kevinbazira) Following yesterday's discussion between ML and T&S teams, we wanted to understand more about this model's multilingual support. I have done some digging and found multilingual performance... [08:34:44] FIRING: [24x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:39:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:54:49] RESOLVED: [3x] KubernetesDeploymentUnavailableReplicas: Deployment edit-check-predictor-00001-deployment in edit-check at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [09:04:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:09:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:14:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:32:16] looking.. [09:34:44] FIRING: [28x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:39:44] FIRING: [28x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:55:25] kserve error logs: aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('http://en.wikipedia.org:80/w/api.php?action=query&prop=revisions&revids=1340098389&rvslots=main&rvprop=comment%7Cids%7Cuserid%7Ccontentmodel%7Csize%7Ccontent%7Cuser%7Ctimestamp&format=json') [09:56:20] all the revscoring models and the article country model [09:59:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:09:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:14:44] FIRING: [28x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:17:36] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [10:19:44] FIRING: [27x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:24:44] FIRING: [26x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:29:44] FIRING: [34x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:34:05] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [10:34:06] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [10:39:44] FIRING: [34x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:49:44] FIRING: [33x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:54:44] FIRING: [32x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [10:59:44] FIRING: [32x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:04:44] FIRING: [33x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:09:44] FIRING: [33x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:14:44] FIRING: [33x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:19:44] FIRING: [30x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:24:44] FIRING: [30x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:25:34] aiko: o/ [11:26:07] I'd like to check httpbb on both inference endpoints (eqiad and codfw), do you have the command to use ready to go? Otherwise I'll look for it [11:28:28] httpbb --hosts inference.svc.eqiad.wmnet --https_port 30443 /srv/deployment/httpbb-tests/liftwing/production/test_article-descriptions.yaml [11:34:44] FIRING: [24x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:39:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:44:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:49:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:54:44] FIRING: [25x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:59:44] FIRING: [26x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:09:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:14:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:19:44] FIRING: [24x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:29:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:39:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:44:44] FIRING: [22x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:49:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:54:44] FIRING: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:59:44] RESOLVED: [23x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:26:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [13:52:59] 06Machine-Learning-Team, 06Data-Engineering, 10Event-Platform: Create new mediawiki links change streams based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399#11645088 (10Ottomata) > there is no way to fetch the renderId from the revision API ? Answering my own question:... [14:17:36] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:34:05] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [14:34:06] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [15:06:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [15:32:39] 06Machine-Learning-Team, 06Data-Engineering, 10Event-Platform, 13Patch-For-Review: Add Multilingual RevertRisk predictions to mediawiki.page_revert_risk_prediction_change - https://phabricator.wikimedia.org/T415892#11646013 (10gkyziridis) === Update === I updated and test the model on docker using newer bl... [15:39:56] (03PS4) 10Gkyziridis: Revertrisk-multilingual: Add predictions to events stream. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1238685 (https://phabricator.wikimedia.org/T415892) [15:47:44] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07Russian-Sites: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11646300 (10jsn.sherman) Note that mutilingual revert risk... [15:55:24] 06Machine-Learning-Team: Edit Suggestions - Edit suggestion generation with loose edit types - https://phabricator.wikimedia.org/T418097#11646445 (10OKarakaya-WMF) [15:59:09] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07Russian-Sites: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11646553 (10Samwalton9-WMF) {F72316920} Apparently all mod... [16:01:44] RESOLVED: LiftWingServiceErrorRate: ... [16:01:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=articletopic-outlink&var-backend=outlink-topic-model-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:09:51] 06Machine-Learning-Team, 05Goal: Q2 FY2025-26 Goal: Host safeguard model on LiftWing - https://phabricator.wikimedia.org/T418267 (10Sucheta-Salgaonkar-WMF) 03NEW [16:25:54] (03CR) 10Abijeet Patro: Cache update: randomize sleep time after failure (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1240745 (owner: 10Sbisson) [16:27:53] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07Russian-Sites: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11646880 (10Q-bit-array) Same for me: the models appear to... [16:36:19] (03CR) 10Sbisson: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1240745 (owner: 10Sbisson) [16:46:44] FIRING: LiftWingServiceErrorRate: ... [16:46:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=huwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:51:44] RESOLVED: LiftWingServiceErrorRate: ... [16:51:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=huwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [17:36:44] FIRING: LiftWingServiceErrorRate: ... [17:36:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=huwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [17:42:19] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07Russian-Sites: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11647519 (10Novem_Linguae) Cross-ref (likely the same root... [17:42:50] 06Machine-Learning-Team, 10ORES, 07Regression: ORES API query is slow - https://phabricator.wikimedia.org/T418202#11647534 (10Novem_Linguae) Cross-ref (likely the same root cause, but a different symptom): {T418223} [17:43:12] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 2 others: ORES/LiftWing infrastructure is not working for filtering Recent Changes edits - https://phabricator.wikimedia.org/T418223#11647537 (10Novem_Linguae) [17:51:44] RESOLVED: LiftWingServiceErrorRate: ... [17:51:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=huwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [18:15:23] 06Machine-Learning-Team, 10EditCheck, 06Growth-Team, 10Revise-Tone-Structured-Task, and 3 others: LiftWing edit-check:predict model is 404ing - https://phabricator.wikimedia.org/T418173#11647696 (10DLynch) [18:17:51] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [18:28:50] RESOLVED: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [18:28:51] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [22:17:51] FIRING: [3x] SLOMetricAbsent: revertrisk-la-availability - https://slo.wikimedia.org/?search=revertrisk-la-availability - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [22:21:25] 06Machine-Learning-Team, 10EditCheck, 06Growth-Team, 10Revise-Tone-Structured-Task, and 3 others: LiftWing edit-check:predict model is 404ing - https://phabricator.wikimedia.org/T418173#11648809 (10ppelberg)