[05:55:44] FIRING: LiftWingServiceErrorRate: ... [05:55:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=zhwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:10:44] RESOLVED: LiftWingServiceErrorRate: ... [06:10:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=zhwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:35:11] good morning! [06:43:12] Re errors above: we've been hitting a lot of TimeoutErrors on this call: https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/revscoring_model/model_servers/model_servers.py#144 [06:43:48] There are also some Service Unavailable errors we get on the same call: `mwapi.errors.RequestError: 503, message='Service Unavailable', url=URL('http://zh.wikipedia.org:80/w/api.php?action=query&prop=revisions&revids=87568130&rvslots=main&rvprop=ids%7Cuser%7Ctimestamp%7Ccomment%7Ccontent%7Ccontentmodel%7Cuserid%7Csize&format=json')` [06:44:53] It seems the URL works now so perhaps it was indeed problem with MW API? [06:50:40] Good morning. [06:50:47] good morning [06:55:43] good morning! [06:58:34] it seems that the above error was transient [06:58:50] we can keep an eye and investigate further if it occurs again [07:02:14] folks we will start the backport deployment [07:02:21] if you are interested join the meeting [08:14:40] regarding the cancelled deployment: I see that the changes were synced under /srv/mediawiki-staging/wmf-config so nothing to do there [08:16:07] I realized that the previous time something else had happened : I deployed another change before this sync had happened which caused an error as there was a diff between wmf-conifg repo and the local repo in the deployment host [08:16:41] georgekyz: do you need any help with the patch? [08:16:54] no no [08:16:56] I am fine [08:16:58] we can ask if it is ok to deploy now since we have validated that all other wikis work [08:17:18] ``` [08:17:18] composer manage-dblist add afwiki ores [08:17:18] composer manage-dblist add bewiki ores [08:17:18] composer manage-dblist add bnwiki ores [08:17:18] composer manage-dblist add cywiki ores [08:17:18] composer manage-dblist add hawiki ores [08:17:19] composer manage-dblist add iswiki ores [08:17:19] composer manage-dblist add kkwiki ores [08:17:20] ``` [08:17:50] just pasting the commands I used -- not sure if this could be done by one command [08:19:05] nevermind about deploying now. Let's do a proper review and schedule it for the afternoon backport window. no reason to rush things! [08:24:46] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 3 others: [batch #2] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395823#10886749 (10isarantopoulos) During the deployment this morning there w... [08:25:07] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 3 others: [batch #2] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395823#10886753 (10isarantopoulos) [08:26:51] isaranto: patch ready: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1153945 After review I can schedule it for the next deployment window [08:36:32] georgekyz: great! the patch looks great. I left a suggestion for changing the commit title so that it is not exactly the same as the previous one [08:50:56] thnx [08:52:34] +1! let's schedule it . thank you! [09:00:08] done [09:02:30] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.45-notes (1.45.0-wmf.4; 2025-06-03): Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10886844 (10gkyziridis) a:03gkyziridis [09:04:20] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.45-notes (1.45.0-wmf.4; 2025-06-03): Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10886847 (10gkyziridis) 05Open→03Resolved [09:07:18] (03CR) 10Harroyo-wmf: [C:03+2] LiftWingService: Add method to evaluate pre-save revert risk [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1152269 (https://phabricator.wikimedia.org/T364705) (owner: 10Máté Szabó) [09:07:22] (03CR) 10Harroyo-wmf: [C:03+2] Add revertrisk_score AbuseFilter variable [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1152270 (https://phabricator.wikimedia.org/T364705) (owner: 10Máté Szabó) [09:20:50] (03Merged) 10jenkins-bot: LiftWingService: Add method to evaluate pre-save revert risk [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1152269 (https://phabricator.wikimedia.org/T364705) (owner: 10Máté Szabó) [09:21:41] (03Merged) 10jenkins-bot: Add revertrisk_score AbuseFilter variable [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1152270 (https://phabricator.wikimedia.org/T364705) (owner: 10Máté Szabó) [10:13:10] So, folks what we just learnt from the operations team about the backport deployment is that in a case of failing or declining syncing (like what we did) during a deployment we need first to revert the patch as we did via gerrit and then backport deploy the revert as well in order to leave the deployment in the previous stable state. [10:13:10] There is this ticket as well: https://phabricator.wikimedia.org/T396106 [10:14:05] I scheduled the next deployment at 15:00 [10:16:01] exactly! thanks for the follow up George [11:19:34] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Host an OpenVINO model in LiftWing - https://phabricator.wikimedia.org/T395012#10887319 (10santhosh) Updates: I successfully created a production docker image on top of WMF production wookworm image. Upstream provides Dockerfile for Redhat 8 and Ubun... [11:19:53] FYI, ml-etcd2001 will briefly go down for a Ganeti reboot [11:31:36] ack! [12:55:05] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10887581 (10isarantopoulos) In order to avoid any confusion with any backend/frontend implementation shall we specifically mention in the title of the SLO that i... [12:55:48] georgekyz: aiko where shall I offer a review/feedback on the current status of the Tone check SLO in https://wikitech.wikimedia.org/wiki/SLO/ToneCheck? [12:56:00] on the phabricator task? [12:59:01] isaranto: we just finished it and update it. I suggest to leave a comment to the phabticket [12:59:15] ok! thank you [13:16:07] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10887628 (10achou) The SLO draft is now complete: https://wikitech.wikimedia.org/wiki/SLO/ToneCheck @elukey, we'd appreciate your feedback when you have time :)... [13:33:42] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 3 others: [batch #2] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395823#10887672 (10gkyziridis) [13:41:04] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10887686 (10gkyziridis) >>! In T390706#10887628, @achou wrote: > The SLO draft is now complete: https://wikitech.wikimedia.org/wiki/SLO/ToneCheck > @elukey, we'd... [13:44:08] FYI, ml-etcd2002 will briefly go down for a Ganeti reboot [15:02:35] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10888041 (10isarantopoulos) @achou @gkyziridis thank you for working on this! I have a comment regarding the Success Ratio SLI which is defined as following in... [16:35:50] (03CR) 10Matěj Suchánek: "I think Máté pushed this through in I0ccf97880001c3d0c81c612bb98f1da5ab9bb452." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1051837 (https://phabricator.wikimedia.org/T364705) (owner: 10Kosta Harlan) [18:23:43] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 06Wikipedia-Android-App-Backlog, 05FY2024-25 WE4.2: Ensure all ORES i18n messages are available for wikis to add revert risk language agnostic filters to - https://phabricator.wikimedia.org/T395481#10888670 (10Kgraessle) [18:24:08] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 13Patch-For-Review: [batch #1] Enable revertrisk filters in simplewiki & trwiki - https://phabricator.wikimedia.org/T395668#10888672 (10Kgraessle) [18:27:08] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 06Wikipedia-Android-App-Backlog, 05FY2024-25 WE4.2: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298#10888681 (10Kgraessle)