[00:06:35] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304686 (https://phabricator.wikimedia.org/T414873) (owner: 10Vipz) [00:07:34] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304688 (https://phabricator.wikimedia.org/T414868) (owner: 10Vipz) [00:08:13] (03CR) 10Acamicamacaraca: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304686 (https://phabricator.wikimedia.org/T414873) (owner: 10Vipz) [00:08:20] (03CR) 10Acamicamacaraca: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304688 (https://phabricator.wikimedia.org/T414868) (owner: 10Vipz) [00:17:15] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [01:12:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304709 [01:12:13] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304709 (owner: 10TrainBranchBot) [01:21:04] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304709 (owner: 10TrainBranchBot) [02:00:26] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:07:25] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 59s) [02:09:40] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:40] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [03:12:03] PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 624.23 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response [03:14:03] RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 0.03 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response [03:30:56] 06SRE: New Scroll Request for: Duck (Service) - https://phabricator.wikimedia.org/T426847#12038976 (10Ladsgroup) I feel this is an example service request and duck is not a real case. I don't know what to tag it with in SRE tags [03:37:11] (03PS1) 10Papaul: Remove my old ssh key using FIDO key now [puppet] - 10https://gerrit.wikimedia.org/r/1304711 (https://phabricator.wikimedia.org/T423293) [04:17:15] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [05:05:42] (03CR) 10Giuseppe Lavagetto: hiddenparma: switch to native CAS authentication (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) (owner: 10Giuseppe Lavagetto) [05:07:07] (03CR) 10Giuseppe Lavagetto: [C:03+2] hiddenparma: switch to native CAS authentication [puppet] - 10https://gerrit.wikimedia.org/r/1299475 (https://phabricator.wikimedia.org/T422235) (owner: 10Giuseppe Lavagetto) [05:17:29] (03CR) 10ArielGlenn: [C:03+1] "Looks good except for the one typo." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298031 (https://phabricator.wikimedia.org/T428184) (owner: 10Daniel Kinzler) [05:18:41] (03Abandoned) 10Arnaudb: gitlab: add a hiera key for broadcast_message banner [puppet] - 10https://gerrit.wikimedia.org/r/1302733 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [05:18:50] (03Abandoned) 10Arnaudb: gitlab: announce the SSH hostname migration via banner [puppet] - 10https://gerrit.wikimedia.org/r/1302734 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [05:18:52] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [05:18:52] !log marostegui@cumin1003 dbmaint on es7@codfw T429463 [05:18:58] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [05:19:13] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2039: Upgrading es2039.codfw.wmnet [05:19:45] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2039: Upgrading es2039.codfw.wmnet [05:20:42] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2039.codfw.wmnet with OS trixie [05:37:38] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2039.codfw.wmnet with reason: host reimage [05:37:41] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on es2039.codfw.wmnet with reason: host reimage [05:38:22] (03PS1) 10Giuseppe Lavagetto: hiddenparma: use username, not uid as user identifier [puppet] - 10https://gerrit.wikimedia.org/r/1304712 [05:38:59] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] hiddenparma: use username, not uid as user identifier [puppet] - 10https://gerrit.wikimedia.org/r/1304712 (owner: 10Giuseppe Lavagetto) [05:42:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2039.codfw.wmnet with reason: upgrading [05:43:12] (03PS1) 10Marostegui: es2039: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1304714 [05:43:52] 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#12039056 (10jcrespo) > Putting the responsibility to keep the technical infrastructure running on volunteer contributors instead of developing sol... [05:44:46] (03CR) 10Marostegui: [C:03+2] es2039: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1304714 (owner: 10Marostegui) [05:47:56] (03PS1) 10Kevin Bazira: ml-services: deploy cope-b-a4b isvc in LW prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304715 (https://phabricator.wikimedia.org/T427497) [05:54:43] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2039.codfw.wmnet with OS trixie [06:03:18] (03PS1) 10Marostegui: Revert "es2039: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1304717 [06:03:56] (03CR) 10Marostegui: [C:03+2] Revert "es2039: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1304717 (owner: 10Marostegui) [06:06:49] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [06:07:37] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2039: Migration of es2039.codfw.wmnet completed [06:22:01] (03PS1) 10Marostegui: mariadb: Move db1208 to x1 [puppet] - 10https://gerrit.wikimedia.org/r/1304718 (https://phabricator.wikimedia.org/T429562) [06:22:42] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1203: Moving db1203 to x1 T429562 [06:22:46] T429562: Move a host to x1 - https://phabricator.wikimedia.org/T429562 [06:22:55] (03CR) 10Marostegui: [C:03+2] mariadb: Move db1208 to x1 [puppet] - 10https://gerrit.wikimedia.org/r/1304718 (https://phabricator.wikimedia.org/T429562) (owner: 10Marostegui) [06:23:01] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1203: Moving db1203 to x1 T429562 [06:23:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1179,1203].eqiad.wmnet with reason: upgrading [06:24:30] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:24:31] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:26:04] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:26:04] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:31:37] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:31:38] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:32:57] (03PS1) 10Marostegui: clone.py: Trailing "/" being added when cloning [cookbooks] - 10https://gerrit.wikimedia.org/r/1304719 [06:33:23] (03PS2) 10Marostegui: clone.py: Remove trailing "/" being added when cloning [cookbooks] - 10https://gerrit.wikimedia.org/r/1304719 [06:34:39] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:34:39] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:36:02] (03CR) 10Marostegui: [C:03+2] "commit meesage was wrong, this was db1203" [puppet] - 10https://gerrit.wikimedia.org/r/1304718 (https://phabricator.wikimedia.org/T429562) (owner: 10Marostegui) [06:36:26] (03CR) 10Slyngshede: [C:03+1] Remove my old ssh key using FIDO key now [puppet] - 10https://gerrit.wikimedia.org/r/1304711 (https://phabricator.wikimedia.org/T423293) (owner: 10Papaul) [06:36:33] (03CR) 10Marostegui: "Probably not needed, see: https://phabricator.wikimedia.org/P94291#384005" [cookbooks] - 10https://gerrit.wikimedia.org/r/1304719 (owner: 10Marostegui) [06:44:19] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:44:19] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [06:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:51:05] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1179: Moving db1203 to x1 T429562 [06:51:09] T429562: Move a host to x1 - https://phabricator.wikimedia.org/T429562 [07:00:05] Amir1, urbanecm, and awight: UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T0700). Please do the needful. [07:00:05] dcausse and vipz: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:48] Ready to test. [07:00:57] o/ [07:01:09] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provissioning - https://phabricator.wikimedia.org/T429429#12039181 (10MoritzMuehlenhoff) 05In progress→03Resolved Since the key has been replaced, I'm resolving the task to clear the Clinic Duty Dashboard [07:01:18] I can deploy [07:01:42] Vipz: hi, I'll start with your changes, looking at your patches [07:03:22] Hello @dcausse! This is my first time doing any of this, would you be kind enough to guide me through deployment of my patches? [07:04:47] Vipz: sure! first I'm reviewing your patch and then will start deploying, at some point I'll ask you to test using https://wikitech.wikimedia.org/wiki/WikimediaDebug this is a browser extension [07:05:15] it allows you to see the effect of you changes on the "test servers" [07:05:31] (03CR) 10Muehlenhoff: "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1304711 (https://phabricator.wikimedia.org/T423293) (owner: 10Papaul) [07:05:32] if you don't have this browser extension you should install it for testing [07:06:11] Yep, I have that installed on my Firefox as an addon. [07:06:33] Sorry, the message with @ did not ping you? [07:06:47] My IRC client doesn't seem to have /ping :/ [07:08:30] Vipz: it did, just say my name (the @ is not necessary) and I'll be pinged automatically [07:09:35] Fantastic. Alright, I'm all ears regarding deployment of my two patches. [07:13:10] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2039: Migration of es2039.codfw.wmnet completed [07:13:11] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [07:17:16] !log jmm@cumin2003 START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm [07:18:05] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304686 (https://phabricator.wikimedia.org/T414873) (owner: 10Vipz) [07:18:05] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304688 (https://phabricator.wikimedia.org/T414868) (owner: 10Vipz) [07:18:35] Vipz: the deployment process just started, in a couple minutes I'll ping you again to test [07:18:58] it is doing both changes at once [07:19:41] Sure thing, patiently waiting :) [07:19:55] (03Merged) 10jenkins-bot: shwiki: update wordmark and tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304686 (https://phabricator.wikimedia.org/T414873) (owner: 10Vipz) [07:19:59] (03Merged) 10jenkins-bot: shwiktionary: update logo, wordmark and tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304688 (https://phabricator.wikimedia.org/T414868) (owner: 10Vipz) [07:20:33] !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1304686|shwiki: update wordmark and tagline (T414873)]], [[gerrit:1304688|shwiktionary: update logo, wordmark and tagline (T414868)]] [07:20:38] !log jmm@cumin2003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm [07:20:39] T414873: Fix Serbo-Croatian Wikipedia wordmark and tagline - https://phabricator.wikimedia.org/T414873 [07:20:39] T414868: Change Serbo-Croatian Wiktionary logo - https://phabricator.wikimedia.org/T414868 [07:23:15] (03PS1) 10Muehlenhoff: Add seanleong-wmde to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1304725 (https://phabricator.wikimedia.org/T429474) [07:23:53] jmm@cumin2003 reimage (PID 3581433) is awaiting input [07:25:50] 06SRE, 10SRE-Access-Requests: Requesting access for lerickson to deploy the RDF streaming updater on wikikube - https://phabricator.wikimedia.org/T429610#12039262 (10MoritzMuehlenhoff) @lerickson This needs approval by your manager. Can they please approve here on task? @thcipriani This needs your approval for... [07:30:54] !log jmm@cumin2003 START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm [07:32:41] !log update pfw policies - T429543 [07:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:14] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1179: Moving db1203 to x1 T429562 [07:34:18] T429562: Move a host to x1 - https://phabricator.wikimedia.org/T429562 [07:37:13] almost there... [07:37:21] !log dcausse@deploy1003 dcausse, vipz: Backport for [[gerrit:1304686|shwiki: update wordmark and tagline (T414873)]], [[gerrit:1304688|shwiktionary: update logo, wordmark and tagline (T414868)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:37:27] T414873: Fix Serbo-Croatian Wikipedia wordmark and tagline - https://phabricator.wikimedia.org/T414873 [07:37:28] T414868: Change Serbo-Croatian Wiktionary logo - https://phabricator.wikimedia.org/T414868 [07:37:36] Vipz: this should be ready for testing [07:38:26] dcausse: k8s-mwdebug, do I check any of the boxes? [07:39:00] Vipz: k8s-mwdebug is fine, others are for testing more precisely what cluster to hit [07:39:04] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1304025 (owner: 10L10n-bot) [07:40:11] (03CR) 10Muehlenhoff: [C:03+2] Add seanleong-wmde to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1304725 (https://phabricator.wikimedia.org/T429474) (owner: 10Muehlenhoff) [07:42:37] dcausse: Testing shwiki first. I am not seeing any changes on shwiki... [07:42:45] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-wmde-users for Seanleong-WMDE - https://phabricator.wikimedia.org/T429474#12039302 (10MoritzMuehlenhoff) 05In progress→03Resolved a:03MoritzMuehlenhoff Your access to analytics-wmde-users has been enabled and will prop... [07:43:07] !log jmm@cumin2003 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage [07:43:15] some images have changed, but indeed, I don't see the cyrillic version when switching from latin to cyrl... [07:43:46] Vipz: the lat vs cyrl variant seems to work OK on wiktionary [07:44:05] but not on sh.wikipedia.org... or I'm not testing properly [07:44:24] (03CR) 10Bartosz Wójtowicz: [C:03+1] ml-services: deploy cope-b-a4b isvc in LW prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304715 (https://phabricator.wikimedia.org/T427497) (owner: 10Kevin Bazira) [07:44:33] dcausse: shwiktionary appears to work properly in all aspects, both Vector 2022 and Vector legacy. [07:44:47] (03CR) 10Kevin Bazira: [C:03+2] ml-services: deploy cope-b-a4b isvc in LW prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304715 (https://phabricator.wikimedia.org/T427497) (owner: 10Kevin Bazira) [07:44:55] so something not right on shwiki, looking closer [07:48:01] (03Merged) 10jenkins-bot: ml-services: deploy cope-b-a4b isvc in LW prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304715 (https://phabricator.wikimedia.org/T427497) (owner: 10Kevin Bazira) [07:48:05] So shwiki's logo has been behaving weird in general prior to this. For whatever reason, logged-in users always see the Cyrillic variant and unlogged users see the Latin variant most of the time. [07:48:21] weird... [07:49:18] dcausse: If I want to send screenshots, where do I upload them? [07:49:29] both variants seem to point to wikipedia-wordmark-sh.svg... [07:50:11] Vipz: as a comment of T414873 in phabricator is fine I think [07:50:12] T414873: Fix Serbo-Croatian Wikipedia wordmark and tagline - https://phabricator.wikimedia.org/T414873 [07:50:24] !log jmm@cumin2003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage [07:54:11] Vipz: what should I do? From my POV the site is not broken and I'm fine shipping this if you believe that is a step in the right direction but totally OK to abort the deployment if you prefer [07:55:01] (03PS1) 10Federico Ceratto: sre.mysql.clone: set username HTTP header [cookbooks] - 10https://gerrit.wikimedia.org/r/1304730 (https://phabricator.wikimedia.org/T429748) [07:56:25] !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [07:56:59] I suspect that the config you based your chages assumed that wikipedia-wordmark-sh-latn.svg was used but apparently wikipedia-wordmark-sh.svg is the one used [07:57:33] (03Abandoned) 10Marostegui: clone.py: Remove trailing "/" being added when cloning [cookbooks] - 10https://gerrit.wikimedia.org/r/1304719 (owner: 10Marostegui) [07:57:56] dcausse: Did I not overwrite the prefixless -sh.svg? [07:58:24] !log kevinbazira@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . [07:58:47] 06SRE, 10SRE-tools, 06Infrastructure-Foundations: Traceback with IRC notifications on Trixie - https://phabricator.wikimedia.org/T429681#12039356 (10MoritzMuehlenhoff) 05Open→03Invalid No code change was needed, this was solely triggered by a missing ACL which got fixed with https://gerrit.wikimedia.... [07:58:53] Vipz: you did indeed sorry, it's now latin [07:59:30] but yes switching to cyrl it still wants to load the suffix less image [07:59:41] so the variant handling appears broken somehowe [07:59:56] dcausse: I am not well versed in Phabricator/deployment guidelines, will shipping it without actual confirmation that it works breach any? [08:00:34] !log kevinbazira@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [08:02:18] Vipz: for me the site is fine, for you have to confirm that you're OK with the current behavior, so far shwiktionary is working as expected but shwiki now displays latin for all variants vs cyrl for all variants [08:02:56] so the big change is the logo on shwiki that seemed to have defaulted to latin without the variant handling working properly [08:03:48] dcausse: I don't believe this is a fault on new patch's behalf. I'll need to examine what's been breaking variant behaviour on shwiki that is probably preventing this patch from working. [08:04:23] So I believe this should be shipped. [08:04:30] Vipz: sounds good [08:05:00] !log dcausse@deploy1003 dcausse, vipz: Continuing with deployment [08:06:19] Vipz: after the deploy I'll have to purge some caches so the change won't be immediately visible (for the files you changed) [08:06:43] jouncebot: nowandnext [08:06:43] No deployments scheduled for the next 1 hour(s) and 53 minute(s) [08:06:43] In 1 hour(s) and 53 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1000) [08:08:25] I mean, where is it going to pull the Cyrillic variant from if it no longer exists as -sh.svg once this patch goes live? [08:08:25] (03CR) 10Elukey: [C:03+2] "LGTM thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1304677 (https://phabricator.wikimedia.org/T388287) (owner: 10Hashar) [08:08:41] 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#12039400 (10TheDJ) As a volunteer, I fully support the comments by @jcrespo [08:10:03] !log jmm@cumin2003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2009.codfw.wmnet with OS bookworm [08:10:19] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [08:10:19] !log marostegui@cumin1003 dbmaint on es7@eqiad T429463 [08:10:24] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [08:10:40] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1048: Upgrading es1048.eqiad.wmnet [08:10:47] (03PS1) 10Muehlenhoff: proton: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304735 [08:11:37] Vipz: that I don't know, I feel that the variant handling was possibly broken before this patch, my understanding is that it's now going to default to latin instead of cyrl [08:12:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc2017: pc7 migration to debian trixie [08:12:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [08:12:27] Vipz: if this is a big problem (defaulting to latin) I can ship a revert quickly after just for sh.wikipedia.org [08:12:28] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [08:12:28] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2017: pc7 migration to debian trixie [08:12:51] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [08:12:51] !log marostegui@cumin1003 dbmaint on pc7@codfw T429178 [08:12:55] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [08:12:57] T429178: Upgrade pc7 and pc8 to Debian Trixie - https://phabricator.wikimedia.org/T429178 [08:13:26] dcausse: In fact, it's better to have a Latin-script variant as default, so reverting will not be needed. [08:13:40] sounds good [08:14:12] If you look closely, shwiki is a Latin-default at the moment with a one-way langconverter to Cyrillic. [08:14:21] sh defaults to sh-latn [08:14:25] ack [08:14:58] But that's irrelevant to this issue, we're targetting the language settings, not langconverter one. [08:15:14] Thank you so much for guiding me through my first deployment dcausse! [08:15:25] Vipz: you're welcome! [08:15:34] I hope we'll see soon when I get around to this bug. :') [08:15:42] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1048: Upgrading es1048.eqiad.wmnet [08:15:46] (03CR) 10Muehlenhoff: [C:03+2] proton: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304735 (owner: 10Muehlenhoff) [08:15:50] Vipz: sure! please followup on the task and possibly file a new one for this bug [08:16:38] Vipz: note that the deployment will be fully once we get a final ping from logmsgbot [08:17:14] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [08:17:14] !log marostegui@cumin1003 dbmaint on pc7@codfw T429178 [08:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [08:17:16] (03CR) 10Ayounsi: Cookbook to configure switch port vlans for cloud hosts (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303397 (https://phabricator.wikimedia.org/T429466) (owner: 10Cathal Mooney) [08:17:17] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [08:17:29] !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304686|shwiki: update wordmark and tagline (T414873)]], [[gerrit:1304688|shwiktionary: update logo, wordmark and tagline (T414868)]] (duration: 56m 56s) [08:17:34] T414873: Fix Serbo-Croatian Wikipedia wordmark and tagline - https://phabricator.wikimedia.org/T414873 [08:17:34] T414868: Change Serbo-Croatian Wiktionary logo - https://phabricator.wikimedia.org/T414868 [08:17:36] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] proton: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304735 (owner: 10Muehlenhoff) [08:17:40] Vipz: in 1409 it should have been sh-cyrl: instead of shwiki-cyrl: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1304686/1/logos/config.yaml [08:17:54] Vipz: ok it's now deployed, I'll clear the caches in the next 10mins or so [08:18:42] marostegui@cumin1003 major-upgrade (PID 1736046) is awaiting input [08:20:05] extending the backport window to ship another patch [08:20:33] anzx: That isn't supposed to prevent the change from working, is it? I'll be correcting this in the next round of patches, unless it's actual issue. [08:21:03] !log jmm@deploy1003 helmfile [staging] START helmfile.d/services/proton: apply [08:21:05] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc2017.codfw.wmnet with reason: Reimage to Trixie [08:21:08] the* actual issue [08:21:08] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS trixie [08:21:09] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc2017: Reimage to Trixie [08:21:09] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [08:21:15] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [08:21:15] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2017: Reimage to Trixie [08:21:49] !log jmm@deploy1003 helmfile [staging] DONE helmfile.d/services/proton: apply [08:22:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [extensions/Translate] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304565 (https://phabricator.wikimedia.org/T429479) (owner: 10DCausse) [08:23:11] Vipz: probably, because shwiki-cyrl not correct language variant which your your supposed to change , so changes may not appear [08:23:34] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS trixie [08:24:03] anzx: Why did shwiktionary-cyrl work as expected for sh.wiktionary.org? [08:24:04] (03Merged) 10jenkins-bot: ttmserver-export: pass source language for translation batch IDs [extensions/Translate] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304565 (https://phabricator.wikimedia.org/T429479) (owner: 10DCausse) [08:24:24] !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1304565|ttmserver-export: pass source language for translation batch IDs (T429479)]] [08:24:29] T429479: Investigate difference between batchInsertTranslations and update (from web) codepaths - https://phabricator.wikimedia.org/T429479 [08:24:39] !log jmm@deploy1003 helmfile [codfw] START helmfile.d/services/proton: apply [08:25:50] !log jmm@deploy1003 helmfile [codfw] DONE helmfile.d/services/proton: apply [08:26:17] I don't see any issues with how it's defined at the moment, sh-cyrl is referring to shwiki-cyrl which is defined above... [08:28:23] !log jmm@deploy1003 helmfile [eqiad] START helmfile.d/services/proton: apply [08:28:26] In fact, naming it sh-cyrl might actually break it. I referred to already existing variant naming schemes when naming this. [08:28:38] !log dcausse@deploy1003 mwscript-k8s job started: purgeList.php --wiki enwiki # purging cache changed images (T414873, T414868) [08:28:43] T414873: Fix Serbo-Croatian Wikipedia wordmark and tagline - https://phabricator.wikimedia.org/T414873 [08:28:44] T414868: Change Serbo-Croatian Wiktionary logo - https://phabricator.wikimedia.org/T414868 [08:28:48] Vipz: or it might be cache, may work when images are purged https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Purging [08:29:21] Vipz: or it might be cache, may work when images are purged https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Purging [08:29:37] !log jmm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/proton: apply [08:30:07] !log dcausse@deploy1003 dcausse: Backport for [[gerrit:1304565|ttmserver-export: pass source language for translation batch IDs (T429479)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:30:11] T429479: Investigate difference between batchInsertTranslations and update (from web) codepaths - https://phabricator.wikimedia.org/T429479 [08:32:09] !log dcausse@deploy1003 dcausse: Continuing with deployment [08:33:14] Vipz, anzx caches should have been purged for changed images, please let me know if you see something not matching [08:33:23] dcausse, anzx: Everything works properly for non-logged in users. [08:33:51] Logged-in users are still affected and only Latin variant shows up regardless of lang settings. [08:36:18] Well, it works as well as it can. We'll be probably revisiting this. [08:36:25] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage [08:38:55] !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304565|ttmserver-export: pass source language for translation batch IDs (T429479)]] (duration: 14m 31s) [08:38:59] T429479: Investigate difference between batchInsertTranslations and update (from web) codepaths - https://phabricator.wikimedia.org/T429479 [08:39:29] In non-logged view, &uselang= &variant= when set to one of valid values shows the wordmark/tagline properly: src="/static/images/mobile/copyright/wikipedia-wordmark-sh.svg" style="width: 7.5em; height: 1.3125em;"> [08:39:33] (03CR) 10Marostegui: [C:03+1] "cloning went fine" [cookbooks] - 10https://gerrit.wikimedia.org/r/1304730 (https://phabricator.wikimedia.org/T429748) (owner: 10Federico Ceratto) [08:39:52] alright, I'm done deploying, closing the backport window [08:40:18] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage [08:40:32] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage [08:44:37] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage [08:46:47] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#12039576 (10MoritzMuehlenhoff) [08:46:52] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1179.eqiad.wmnet onto db1203.eqiad.wmnet [08:47:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1179: repool after recloning another host [08:47:27] (03CR) 10Muehlenhoff: [C:03+2] sre.puppet.disable-merges: Avoid using puppet-merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1303997 (https://phabricator.wikimedia.org/T423121) (owner: 10Muehlenhoff) [08:47:48] (03CR) 10Federico Ceratto: "Adding Ceri as reviewer." [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [08:48:20] (03CR) 10Federico Ceratto: [C:03+2] sre.mysql.clone: set username HTTP header [cookbooks] - 10https://gerrit.wikimedia.org/r/1304730 (https://phabricator.wikimedia.org/T429748) (owner: 10Federico Ceratto) [08:48:46] (03PS1) 10Marostegui: db1203: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1304750 (https://phabricator.wikimedia.org/T429562) [08:48:54] (03PS2) 10Federico Ceratto: sre.mysql.clone: set username HTTP header [cookbooks] - 10https://gerrit.wikimedia.org/r/1304730 (https://phabricator.wikimedia.org/T429748) [08:51:22] (03CR) 10Marostegui: [C:03+2] db1203: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1304750 (https://phabricator.wikimedia.org/T429562) (owner: 10Marostegui) [08:54:40] (03PS1) 10Muehlenhoff: Add laurabarluzzi to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304751 (https://phabricator.wikimedia.org/T429431) [08:55:00] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12039632 (10MoritzMuehlenhoff) [08:55:59] (03PS1) 10Jforrester: [abstractwiki] Enable Abstract Client mode (and on test wiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304752 [08:56:38] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS trixie [09:01:24] I'm going to deploy a private code change [09:03:22] (03PS1) 10Elukey: _cookbook: add the mgmt_password config field [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) [09:04:08] 06SRE, 10SRE-swift-storage, 07Essential-Work: Migrate production swift clusters to trixie - https://phabricator.wikimedia.org/T429630#12039706 (10MatthewVernon) [09:06:07] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1048: Migration of es1048.eqiad.wmnet completed [09:06:35] (03CR) 10Federico Ceratto: [V:03+2 C:03+2] sre.mysql.clone: set username HTTP header [cookbooks] - 10https://gerrit.wikimedia.org/r/1304730 (https://phabricator.wikimedia.org/T429748) (owner: 10Federico Ceratto) [09:08:04] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2017.codfw.wmnet with OS trixie [09:08:06] (03CR) 10CI reject: [V:04-1] _cookbook: add the mgmt_password config field [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [09:08:59] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc2017: after reimage to trixie [09:08:59] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc2017: after reimage to trixie [09:09:29] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1017.eqiad.wmnet with reason: Reimage to Trixie [09:09:32] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc1017: Reimage to Trixie [09:09:32] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [09:09:38] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [09:09:38] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1017: Reimage to Trixie [09:10:57] !log Deployed private code changes to Suggested Investigations [09:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:24] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS trixie [09:29:04] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage [09:32:47] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1179: repool after recloning another host [09:33:03] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1203: repool after recloning another host [09:35:05] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage [09:38:23] (03PS2) 10Jforrester: [abstractwiki] Enable Abstract Client mode (and on test wiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304752 (https://phabricator.wikimedia.org/T422657) [09:39:24] !log atsuko@deploy1003 mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver eqiad-k8s # T425377 populating translation memory (dblist: https://phabricator.wikimedia.org/P94306) [09:39:28] T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377 [09:39:31] !log atsuko@deploy1003 mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver codfw-k8s # T425377 populating translation memory (dblist: https://phabricator.wikimedia.org/P94307) [09:41:32] (03PS1) 10Blake: Add wikikube-worker refreshes. [puppet] - 10https://gerrit.wikimedia.org/r/1304756 (https://phabricator.wikimedia.org/T424942) [09:43:07] Doing a quick config deploy. [09:43:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304752 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester) [09:44:10] (03Merged) 10jenkins-bot: [abstractwiki] Enable Abstract Client mode (and on test wiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304752 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester) [09:44:26] !log jforrester@deploy1003 Started scap sync-world: Backport for [[gerrit:1304752|[abstractwiki] Enable Abstract Client mode (and on test wiki) (T422657)]] [09:44:30] T422657: Enable abstract client mode on Test Wikipedia - https://phabricator.wikimedia.org/T422657 [09:46:13] !log jforrester@deploy1003 jforrester: Backport for [[gerrit:1304752|[abstractwiki] Enable Abstract Client mode (and on test wiki) (T422657)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:48:20] !log jforrester@deploy1003 jforrester: Continuing with deployment [09:51:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1048: Migration of es1048.eqiad.wmnet completed [09:51:37] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [09:52:37] !log jforrester@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304752|[abstractwiki] Enable Abstract Client mode (and on test wiki) (T422657)]] (duration: 08m 11s) [09:52:40] T422657: Enable abstract client mode on Test Wikipedia - https://phabricator.wikimedia.org/T422657 [09:57:33] jouncebot: nowandnext [09:57:33] No deployments scheduled for the next 0 hour(s) and 2 minute(s) [09:57:33] In 0 hour(s) and 2 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1000) [09:57:39] * Raine first try! [09:57:39] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS trixie [09:58:15] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator: Add logout.d script for Phabricator - https://phabricator.wikimedia.org/T286904#12039921 (10LSobanski) @brennen @Aklapper https://phabricator.wikimedia.org/T406495 is now complete, how would you like to proceed from here? [09:58:33] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1017: after reimage to trixie [09:58:33] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1017: after reimage to trixie [10:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1000) [10:02:53] All right, time to redirect away the API Portal wiki *cracks knuckles* [10:02:55] (03CR) 10Hnowlan: "Please go ahead, thank you!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1303420 (owner: 10Ssingh) [10:03:39] (03PS1) 10Gkyziridis: ml-services: Deploy Qwen3.6-27B-FP8 model in experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304758 (https://phabricator.wikimedia.org/T425680) [10:04:41] (03PS1) 10Tiziano Fogli: slothslos/report2drive: remove report2drive::instances [labs/private] - 10https://gerrit.wikimedia.org/r/1304759 (https://phabricator.wikimedia.org/T425795) [10:05:06] (03CR) 10Tiziano Fogli: [C:03+2] slothslos/report2drive: remove report2drive::instances [labs/private] - 10https://gerrit.wikimedia.org/r/1304759 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:05:09] (03CR) 10Tiziano Fogli: [V:03+2 C:03+2] slothslos/report2drive: remove report2drive::instances [labs/private] - 10https://gerrit.wikimedia.org/r/1304759 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:05:29] (03CR) 10Clément Goubert: [C:03+2] redirects.dat: Funnel api.w.o to mw.o/wiki/Wikimedia_APIs [puppet] - 10https://gerrit.wikimedia.org/r/1302106 (https://phabricator.wikimedia.org/T418492) (owner: 10Clément Goubert) [10:06:49] (03PS1) 10Kosta Harlan: hCaptcha: Skip blocked-IP risk-score collection for crawlers [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304761 (https://phabricator.wikimedia.org/T429755) [10:07:04] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [10:10:21] I'll have a private patch about Suggested Investigations to deploy. Can someone ping me when it's okay to deploy it? [10:10:59] (03CR) 10Kamila Součková: "They are using it, but with `general` instead of `clusterinfo` because of an oversight on my part, see I2066ac45829c40ed6b68b14be2c7673eba" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304588 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [10:11:43] Msz2001: We're pretty packed for the window between my redirect and Raine updating mw debian base, can we discuss urgency in _security? [10:11:49] RESOLVED: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [10:12:13] It's not urgent, so I'll wait [10:13:01] ack [10:13:46] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1017: repool after maintenance [10:13:46] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1017: repool after maintenance [10:14:07] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1017: repool after maintenance [10:14:07] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1017: repool after maintenance [10:14:18] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc2017: repool after maintenance [10:14:19] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc2017: repool after maintenance [10:14:41] (03CR) 10Sergio Gimeno: "This should be good to go now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1301349 (https://phabricator.wikimedia.org/T426742) (owner: 10Sergio Gimeno) [10:15:12] thanks Msz2001 + claime <3 [10:15:46] !log cgoubert@deploy1003 Started scap sync-world: T418492 Redirect API Portal wiki URLs to www.mediawiki.org/wiki/Wikimedia_APIs [10:15:50] T418492: Redirect API Portal wiki URLs to www.mediawiki.org/wiki/Wikimedia_APIs - https://phabricator.wikimedia.org/T418492 [10:16:13] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc2017: repool after maintenance [10:16:14] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [10:16:26] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [10:16:27] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc2017: repool after maintenance [10:16:34] !log cgoubert@deploy1003 cgoubert: T418492 Redirect API Portal wiki URLs to www.mediawiki.org/wiki/Wikimedia_APIs synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [10:17:26] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: sync [10:18:07] All tests green, proceeding [10:18:12] !log cgoubert@deploy1003 cgoubert: Continuing with deployment [10:18:14] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: sync [10:18:28] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: sync [10:18:28] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1203: repool after recloning another host [10:19:15] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync [10:19:24] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: sync [10:20:12] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync [10:20:17] (03CR) 10Elukey: log: fix tests (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [10:21:04] (03CR) 10Volans: _cookbook: add the mgmt_password config field (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [10:21:49] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [10:22:06] !log cgoubert@deploy1003 Finished scap sync-world: T418492 Redirect API Portal wiki URLs to www.mediawiki.org/wiki/Wikimedia_APIs (duration: 07m 17s) [10:22:11] T418492: Redirect API Portal wiki URLs to www.mediawiki.org/wiki/Wikimedia_APIs - https://phabricator.wikimedia.org/T418492 [10:22:21] (03CR) 10Kevin Bazira: [C:03+1] ml-services: Deploy Qwen3.6-27B-FP8 model in experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304758 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:23:44] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:23:48] (03PS1) 10Tiziano Fogli: slothslos/report2drive: adjust file extension [labs/private] - 10https://gerrit.wikimedia.org/r/1304763 (https://phabricator.wikimedia.org/T425795) [10:24:30] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:24:33] (03CR) 10Tiziano Fogli: [C:03+2] slothslos/report2drive: adjust file extension [labs/private] - 10https://gerrit.wikimedia.org/r/1304763 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:24:35] (03CR) 10Tiziano Fogli: [V:03+2 C:03+2] slothslos/report2drive: adjust file extension [labs/private] - 10https://gerrit.wikimedia.org/r/1304763 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:24:36] PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:24:51] The httpbb errors are expected, I just forgot to run puppet on cumin nodes to updates the tests [10:24:53] doing so now [10:26:09] (03PS2) 10JMeybohm: Update istio to 1.29.4 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1303475 (https://phabricator.wikimedia.org/T427401) [10:27:15] (03CR) 10JMeybohm: [V:03+2 C:03+2] Update istio to 1.29.4 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1303475 (https://phabricator.wikimedia.org/T427401) (owner: 10JMeybohm) [10:27:22] (03PS1) 10Muehlenhoff: Failover url-downloader.codfw CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1304764 (https://phabricator.wikimedia.org/T427282) [10:28:32] (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy Qwen3.6-27B-FP8 model in experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304758 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:29:18] (03PS1) 10Hnowlan: hadoop: emit HDFS HA status to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1304766 (https://phabricator.wikimedia.org/T407138) [10:29:20] (03PS1) 10Hnowlan: hadoop: remove migrated hadoop-hdfs-active-namenode icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1304767 (https://phabricator.wikimedia.org/T407138) [10:29:21] (03PS1) 10Hnowlan: hadoop: add hdfs alert for HA status [alerts] - 10https://gerrit.wikimedia.org/r/1304769 (https://phabricator.wikimedia.org/T407138) [10:29:22] (03PS1) 10Hnowlan: hadoop: migrate hdfs topology check to alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1304768 (https://phabricator.wikimedia.org/T407138) [10:31:10] (03Merged) 10jenkins-bot: ml-services: Deploy Qwen3.6-27B-FP8 model in experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304758 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:32:09] All looks fine, Raine I think you're g2g when ready [10:32:18] (03CR) 10CI reject: [V:04-1] hadoop: remove migrated hadoop-hdfs-active-namenode icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1304767 (https://phabricator.wikimedia.org/T407138) (owner: 10Hnowlan) [10:33:16] (03PS2) 10Hnowlan: hadoop: remove migrated hadoop-hdfs-active-namenode icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1304767 (https://phabricator.wikimedia.org/T407138) [10:33:38] (03PS9) 10Tiziano Fogli: slothslos/report2drive: add modules [puppet] - 10https://gerrit.wikimedia.org/r/1298294 (https://phabricator.wikimedia.org/T425795) [10:33:38] (03PS11) 10Tiziano Fogli: slothslos/report2drive: add profiles [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) [10:33:38] (03PS11) 10Tiziano Fogli: slothslos/report2drive: instantiate resources [puppet] - 10https://gerrit.wikimedia.org/r/1298296 (https://phabricator.wikimedia.org/T425795) [10:33:39] (03PS11) 10Tiziano Fogli: slothslos/report2drive: add Hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) [10:33:40] (03PS11) 10Tiziano Fogli: slothslos/report2drive: enable deep merge for vars [puppet] - 10https://gerrit.wikimedia.org/r/1298298 (https://phabricator.wikimedia.org/T425795) [10:33:44] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:34:30] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:34:36] RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:34:41] (03PS12) 10Tiziano Fogli: slothslos/report2drive: add profiles [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) [10:34:41] (03PS12) 10Tiziano Fogli: slothslos/report2drive: instantiate resources [puppet] - 10https://gerrit.wikimedia.org/r/1298296 (https://phabricator.wikimedia.org/T425795) [10:34:41] (03PS12) 10Tiziano Fogli: slothslos/report2drive: add Hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) [10:34:42] (03PS12) 10Tiziano Fogli: slothslos/report2drive: enable deep merge for vars [puppet] - 10https://gerrit.wikimedia.org/r/1298298 (https://phabricator.wikimedia.org/T425795) [10:34:54] (03PS1) 10Jforrester: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 [10:35:20] thanks claime <3 on it [10:35:54] (03PS2) 10Jforrester: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) [10:36:21] !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . [10:37:28] (03CR) 10Kamila Součková: [C:03+2] kubernetes: switch mw-{debug,experimental} to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1302929 (https://phabricator.wikimedia.org/T429030) (owner: 10Kamila Součková) [10:41:13] (03CR) 10Clément Goubert: [C:03+1] dse-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304573 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [10:41:39] (03CR) 10Clément Goubert: [C:03+1] aux-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304572 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [10:41:56] (03PS2) 10Clément Goubert: tls_terminator: Fix ratelimit config [puppet] - 10https://gerrit.wikimedia.org/r/1304586 (https://phabricator.wikimedia.org/T414440) [10:42:20] (03CR) 10Kamila Součková: [C:03+2] aux-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304572 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [10:43:43] (03CR) 10Elukey: [C:04-1] "Had a chat with Riccardo, needs more thinking/refactoring!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [10:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [10:45:19] (03Merged) 10jenkins-bot: aux-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304572 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [10:45:21] (03CR) 10Elukey: [C:03+1] slothslos/report2drive: add profiles [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:45:47] !log upgrading mw-debug+mw-experimental to Bookworm [10:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:46:55] (03CR) 10Elukey: slothslos/report2drive: add Hiera configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:47:19] (03CR) 10Elukey: "This can be merged with the next change, or do you prefer to gradually add things?" [puppet] - 10https://gerrit.wikimedia.org/r/1298296 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [10:47:28] !log kamila@deploy1003 Started scap sync-world: Upgrading mw-debug,mw-experimental to Debian Bookworm [10:59:09] (03CR) 10JMeybohm: [C:03+1] "Sounds good, thanks" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303389 (owner: 10Clément Goubert) [10:59:18] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [10:59:18] !log marostegui@cumin1003 dbmaint on es7@eqiad T429463 [10:59:24] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [10:59:38] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1040: Upgrading es1040.eqiad.wmnet [11:02:30] (03PS6) 10Federico Ceratto: tox.ini: Pass cache env var [cookbooks] - 10https://gerrit.wikimedia.org/r/1302159 [11:02:38] (03CR) 10EMcFarland: [C:03+2] "I'm approving this now that wmf.7 is broadly deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1301349 (https://phabricator.wikimedia.org/T426742) (owner: 10Sergio Gimeno) [11:02:40] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1040: Upgrading es1040.eqiad.wmnet [11:02:43] !log kamila@deploy1003 sync-world failed: Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.47.0-wmf.6,1.47.0-wmf.7,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/medi [11:02:43] awiki-multiversion --singleversion-image-basename docker-registry.discovery.wmnet/restricted/mediawiki-singleversion --webserver-image-name docker-registry.discovery.wmnet/restricted/mediawiki-webserver --latest-tag latest --label vnd.wikimedia.builder.name=scap --label vnd.wikimedia.builder.version=4.269.0 --label vnd.wikimedia.scap.stage_dir=/srv/mediawiki-staging --label vnd.wikimedia.scap.build_state_dir=/srv/mediawik [11:02:43] i-staging/scap/image-build --full' returned non-zero exit status 1. (scap version: 4.269.0) (duration: 16m 07s) [11:03:31] (03Merged) 10jenkins-bot: Remove no longer used eventlogging_HomepageModule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1301349 (https://phabricator.wikimedia.org/T426742) (owner: 10Sergio Gimeno) [11:03:32] PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:04:01] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1040.eqiad.wmnet with OS trixie [11:06:53] hi, a Growth engineer naively +2'ed https://gerrit.wikimedia.org/r/1301349 and now it's merged without being actually scheduled for deploy. How should I handle this? Revert + schedule? Or is it ok if I schedule it for the next window? [11:13:31] RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:19:37] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage [11:19:54] !log kamila@deploy1003 Started scap sync-world: Upgrade mw-debug,mw-experimental to debian bookworm [11:20:05] sergi0: next window works, methinks [11:21:08] deployments might take longer than usual though, so perhaps coordinate with folks to batch it [11:21:39] (we're temporarily building 2 versions of images because I'm upgrading the debian version today) [11:23:05] gotcha, I'm gonna schedule it right away and will chime in before window starts. The change is pretty harmless and can go along with others. thank you @Raine ! [11:23:27] sounds good [11:23:58] Thank you! [11:24:25] yw :-) [11:24:46] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage [11:37:18] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841#12040275 (10SLyngshede-WMF) 05Open→03In progress Currently targeting CAS 7.3.X series. [11:37:27] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841#12040277 (10SLyngshede-WMF) a:03SLyngshede-WMF [11:40:19] (03CR) 10Clément Goubert: [C:03+2] ratelimit: Unify statsd-exporter labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303389 (owner: 10Clément Goubert) [11:40:22] (03CR) 10EMcFarland: [C:03+2] "A note for my future self: This should have been a +1 and not a +2." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1301349 (https://phabricator.wikimedia.org/T426742) (owner: 10Sergio Gimeno) [11:41:42] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1040.eqiad.wmnet with OS trixie [11:42:27] (03Merged) 10jenkins-bot: ratelimit: Unify statsd-exporter labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/1303389 (owner: 10Clément Goubert) [11:47:17] !log cgoubert@deploy1003 helmfile [staging] START helmfile.d/services/ratelimit: apply [11:47:25] !log cgoubert@deploy1003 helmfile [staging] DONE helmfile.d/services/ratelimit: apply [11:47:32] !log cgoubert@deploy1003 helmfile [staging] START helmfile.d/services/ratelimit: apply [11:47:41] !log cgoubert@deploy1003 helmfile [staging] DONE helmfile.d/services/ratelimit: apply [11:48:00] !log cgoubert@deploy1003 helmfile [codfw] START helmfile.d/services/ratelimit: apply [11:48:25] !log cgoubert@deploy1003 helmfile [codfw] DONE helmfile.d/services/ratelimit: apply [11:48:37] !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/ratelimit: apply [11:49:05] !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply [11:50:07] !log cgoubert@deploy1003 helmfile [codfw] START helmfile.d/services/ratelimit: apply [11:50:30] (03CR) 10Kamila Součková: [C:03+2] kubernetes: switch MW canaries to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1302930 (https://phabricator.wikimedia.org/T429030) (owner: 10Kamila Součková) [11:51:32] !log cgoubert@deploy1003 helmfile [codfw] DONE helmfile.d/services/ratelimit: apply [11:52:05] !log kamila@deploy1003 Finished scap sync-world: Upgrade mw-debug,mw-experimental to debian bookworm (duration: 33m 39s) [11:52:33] !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/ratelimit: apply [11:53:25] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1040: Migration of es1040.eqiad.wmnet completed [11:53:31] !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply [11:58:56] jouncebot: nowandnext [11:58:56] No deployments scheduled for the next 1 hour(s) and 1 minute(s) [11:58:56] In 1 hour(s) and 1 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1300) [11:59:13] Raine: are you still deploying? [11:59:20] (03PS1) 10Clément Goubert: ratelimit: Fix metric name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304780 [11:59:49] kostajh: I was going to start, but in principle you can go pick up the upgrade if you're not scared :D just give puppet a minute please [11:59:58] Raine: no worries, I'll wait [12:00:02] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provisioning - https://phabricator.wikimedia.org/T429429#12040362 (10Aklapper) [12:00:52] (03PS2) 10Hnowlan: prometheus: use dc label in appservers_red reporting rules [puppet] - 10https://gerrit.wikimedia.org/r/1302185 (https://phabricator.wikimedia.org/T249663) [12:03:02] kostajh: ok, I'm ready for a deployment, you can deploy your change, which will pick up mine along the way (mine only affects canaries and will probably be easy to distinguish) [12:03:17] or I can go first if you prefer [12:03:43] Raine: please go ahead first, and I'll go after you. thanks! [12:03:51] alright, on it [12:05:30] (03PS1) 10Slyngshede: P:idp allow selective mfa enablement [puppet] - 10https://gerrit.wikimedia.org/r/1304784 (https://phabricator.wikimedia.org/T277841) [12:06:13] !log kamila@deploy1003 Started scap sync-world: upgrade canaries to Debian Bookworm [12:06:18] (03PS1) 10Svantje Lilienthal: Global rollout - Sub-ref deployments to group 2 wikis (batch 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) [12:06:31] (03CR) 10Slyngshede: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304784 (https://phabricator.wikimedia.org/T277841) (owner: 10Slyngshede) [12:06:42] (03CR) 10Clément Goubert: [C:03+2] ratelimit: Fix metric name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304780 (owner: 10Clément Goubert) [12:06:49] (03CR) 10CI reject: [V:04-1] Global rollout - Sub-ref deployments to group 2 wikis (batch 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) (owner: 10Svantje Lilienthal) [12:08:53] (03Merged) 10jenkins-bot: ratelimit: Fix metric name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304780 (owner: 10Clément Goubert) [12:11:01] (03PS2) 10Slyngshede: P:idp allow selective mfa enablement [puppet] - 10https://gerrit.wikimedia.org/r/1304784 (https://phabricator.wikimedia.org/T277841) [12:11:47] (03CR) 10Slyngshede: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304784 (https://phabricator.wikimedia.org/T277841) (owner: 10Slyngshede) [12:12:47] !log cgoubert@deploy1003 helmfile [staging] START helmfile.d/services/ratelimit: apply [12:12:56] !log cgoubert@deploy1003 helmfile [staging] DONE helmfile.d/services/ratelimit: apply [12:13:03] !log cgoubert@deploy1003 helmfile [codfw] START helmfile.d/services/ratelimit: apply [12:14:25] !log cgoubert@deploy1003 helmfile [codfw] DONE helmfile.d/services/ratelimit: apply [12:15:59] !log kamila@deploy1003 Finished scap sync-world: upgrade canaries to Debian Bookworm (duration: 10m 33s) [12:16:25] kostajh: done [12:16:56] Raine: thanks, deploying now [12:17:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304761 (https://phabricator.wikimedia.org/T429755) (owner: 10Kosta Harlan) [12:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [12:18:25] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1304791 (owner: 10L10n-bot) [12:18:38] (03Merged) 10jenkins-bot: hCaptcha: Skip blocked-IP risk-score collection for crawlers [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304761 (https://phabricator.wikimedia.org/T429755) (owner: 10Kosta Harlan) [12:18:42] !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/ratelimit: apply [12:19:04] !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply [12:21:57] (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304792 [12:22:09] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1304761|hCaptcha: Skip blocked-IP risk-score collection for crawlers (T429755)]] [12:22:14] T429755: hCaptcha: Exclude self-identified crawlers from IP blocked edit notice risk score collection - https://phabricator.wikimedia.org/T429755 [12:22:17] sergi0: I'm syncing your merged patch now too [12:22:43] alright, I'm here [12:25:28] 06SRE, 06Infrastructure-Foundations: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12040448 (10Marostegui) [12:28:03] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1304761|hCaptcha: Skip blocked-IP risk-score collection for crawlers (T429755)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:28:08] T429755: hCaptcha: Exclude self-identified crawlers from IP blocked edit notice risk score collection - https://phabricator.wikimedia.org/T429755 [12:28:16] (03CR) 10Clément Goubert: [C:03+1] Failover url-downloader.codfw CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1304764 (https://phabricator.wikimedia.org/T427282) (owner: 10Muehlenhoff) [12:29:22] 06SRE, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware, 13Patch-For-Review: wikikube-ctrl100[56] implementation tracking - https://phabricator.wikimedia.org/T418920#12040455 (10MLechvien-WMF) @jasmine_ can you create the decommissioning task for wikikube-ctrl100[23] ? [12:31:05] (03CR) 10Jelto: [V:03+1 C:03+1] "with the `UserKnownHostsFile` set to `~/.ssh/known_hosts.d/wmf-prod` from the new `wmf-laptop` package I successfully cloned `gitlab-ssh.w" [puppet] - 10https://gerrit.wikimedia.org/r/1300763 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [12:32:03] (03PS2) 10Svantje Lilienthal: Global rollout - Sub-ref deployments to group 2 wikis (batch 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) [12:32:03] !log kharlan@deploy1003 kharlan: Continuing with deployment [12:33:23] !log atsuko@deploy1003 mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver eqiad-k8s # T425377 populating translation memory (dblist: https://phabricator.wikimedia.org/P94318) [12:33:26] !log atsuko@deploy1003 mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver codfw-k8s # T425377 populating translation memory (dblist: https://phabricator.wikimedia.org/P94319) [12:33:28] Msz2001: over to you once https://spiderpig.wikimedia.org/jobs/2342 is done [12:33:29] T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377 [12:33:35] Ack [12:38:29] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304761|hCaptcha: Skip blocked-IP risk-score collection for crawlers (T429755)]] (duration: 16m 20s) [12:38:34] T429755: hCaptcha: Exclude self-identified crawlers from IP blocked edit notice risk score collection - https://phabricator.wikimedia.org/T429755 [12:38:45] done [12:38:55] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1040: Migration of es1040.eqiad.wmnet completed [12:38:56] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [12:39:23] (03PS13) 10Tiziano Fogli: slothslos/report2drive: add Hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) [12:39:23] (03PS13) 10Tiziano Fogli: slothslos/report2drive: enable deep merge for vars [puppet] - 10https://gerrit.wikimedia.org/r/1298298 (https://phabricator.wikimedia.org/T425795) [12:39:32] (03CR) 10Tiziano Fogli: slothslos/report2drive: add profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [12:39:59] (03CR) 10Tiziano Fogli: slothslos/report2drive: add Hiera configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [12:40:23] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) (owner: 10Svantje Lilienthal) [12:40:33] thank you @kostajh [12:40:44] yw [12:40:52] Msz2001: over to you [12:41:44] thanks [12:42:40] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1300892 (https://phabricator.wikimedia.org/T411771) (owner: 10TChin) [12:44:26] (03CR) 10Kamila Součková: [C:03+1] rest-gateway: put request ID into rate limit respose [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300775 (owner: 10Daniel Kinzler) [12:46:34] (03CR) 10Atsuko: [C:03+2] deployment_server: adding dse monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1303488 (https://phabricator.wikimedia.org/T423078) (owner: 10Atsuko) [12:51:27] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [12:55:43] !log Deployed changes to private code (SuggestedInvestigations and PrivateSettings.php) [12:55:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] Lucas_WMDE, urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1300) [13:00:04] manfredi, tgr, sergi0, Msz2001, tchin, and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:17] hi. i need someone to deploy my change :) [13:00:59] I'm at a meeting but can deploy in 20 min or so [13:01:58] Hi, im around [13:01:59] (03PS1) 10Giuseppe Lavagetto: Fix utf-8 names for non-mod-cas [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1304799 [13:02:05] o/ [13:02:08] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Fix utf-8 names for non-mod-cas [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1304799 (owner: 10Giuseppe Lavagetto) [13:02:38] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2189.codfw.wmnet [13:02:39] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2189.codfw.wmnet [13:02:48] Hi! I deployed my change to private code just before this window, so it's already done [13:03:06] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2212.codfw.wmnet [13:03:07] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2212.codfw.wmnet [13:03:07] (I'm in a meeting as well, I'd prefer someone else to deploy the changes in this window) [13:03:09] tgr_: Hi Gergo, can you deploy my patches too please? Later is fine to me [13:03:26] sure [13:03:30] thx! [13:03:45] !log oblivian@cumin2003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix utf-8 name handling - oblivian@cumin2003" [13:03:48] !log oblivian@cumin2003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix utf-8 name handling - oblivian@cumin2003 [13:03:58] (03CR) 10Kamila Součková: [C:03+1] Add wikikube-worker refreshes. [puppet] - 10https://gerrit.wikimedia.org/r/1304756 (https://phabricator.wikimedia.org/T424942) (owner: 10Blake) [13:04:08] (03PS3) 10Jforrester: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) [13:04:08] (03PS1) 10Jforrester: [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) [13:04:36] !log oblivian@cumin2003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix utf-8 name handling - oblivian@cumin2003 [13:04:38] !log oblivian@cumin2003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix utf-8 name handling - oblivian@cumin2003" [13:05:02] (03CR) 10CI reject: [V:04-1] [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester) [13:05:05] (03CR) 10CI reject: [V:04-1] [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester) [13:05:12] (03CR) 10Blake: [C:03+2] Add wikikube-worker refreshes. [puppet] - 10https://gerrit.wikimedia.org/r/1304756 (https://phabricator.wikimedia.org/T424942) (owner: 10Blake) [13:06:07] (03PS2) 10Jforrester: [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) [13:06:07] (03PS4) 10Jforrester: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) [13:08:32] I have spiderpig access but I've never tried it before, I guess I can attempt to deploy mine while everyone else is busy [13:10:01] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchin@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1300892 (https://phabricator.wikimedia.org/T411771) (owner: 10TChin) [13:10:06] (03CR) 10Jelto: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304573 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [13:10:43] (03CR) 10JMeybohm: [C:03+1] "This is pretty counterintuitive and makes sense 🙈" [puppet] - 10https://gerrit.wikimedia.org/r/1304586 (https://phabricator.wikimedia.org/T414440) (owner: 10Clément Goubert) [13:11:30] (03Merged) 10jenkins-bot: [PageViewInfo] Add new config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1300892 (https://phabricator.wikimedia.org/T411771) (owner: 10TChin) [13:11:47] !log tchin@deploy1003 Started scap sync-world: Backport for [[gerrit:1300892|[PageViewInfo] Add new config (T411771)]] [13:11:53] T411771: Migrate PageViewInfo calls away from rest-gateway - https://phabricator.wikimedia.org/T411771 [13:12:16] 06SRE, 06Infrastructure-Foundations: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12040636 (10CWilliams-WMF) @MoritzMuehlenhoff @Marostegui Here is confirmation from cumin2003 regarding DB access. Those without the grant/inaccessible: `sh cwilliams@cumin2003:~/wip $ grep -F ERRO... [13:13:47] 06SRE, 06Infrastructure-Foundations: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12040640 (10Marostegui) Thanks @CWilliams-WMF >>! In T427897#12040635, @CWilliams-WMF wrote: > @MoritzMuehlenhoff @Marostegui > > Here is confirmation from cumin2003 regarding DB access. > > Thos... [13:14:12] (03PS1) 10Muehlenhoff: mirrors: Remove rsync [puppet] - 10https://gerrit.wikimedia.org/r/1304801 (https://phabricator.wikimedia.org/T416707) [13:16:08] (03PS1) 10JavierMonton: stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304802 [13:16:20] 06SRE, 06Infrastructure-Foundations: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12040654 (10MoritzMuehlenhoff) >>! In T427897#12040635, @CWilliams-WMF wrote: > @MoritzMuehlenhoff @Marostegui > > Here is confirmation from cumin2003 regarding DB access. Thanks! [13:16:31] btw, my change was already deployed [13:16:56] (03PS2) 10JavierMonton: stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304802 (https://phabricator.wikimedia.org/T425624) [13:17:50] !log tchin@deploy1003 tchin: Backport for [[gerrit:1300892|[PageViewInfo] Add new config (T411771)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:17:54] T411771: Migrate PageViewInfo calls away from rest-gateway - https://phabricator.wikimedia.org/T411771 [13:19:40] !log tchin@deploy1003 tchin: Continuing with deployment [13:21:06] (03PS1) 10Muehlenhoff: Move the the hourly httpbb run to cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1304803 (https://phabricator.wikimedia.org/T427897) [13:21:37] (03CR) 10CI reject: [V:04-1] Move the the hourly httpbb run to cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1304803 (https://phabricator.wikimedia.org/T427897) (owner: 10Muehlenhoff) [13:23:17] (03CR) 10A-pizzata: [C:03+1] stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304802 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton) [13:24:04] 06SRE, 06Infrastructure-Foundations, 10netops: cr2-esams rpd failure after enabling bgp 'graceful-shutdown' (June 2026) - https://phabricator.wikimedia.org/T429386#12040676 (10Papaul) @cmooney thank you for the update [13:24:07] (03CR) 10JavierMonton: [C:03+2] stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304802 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton) [13:24:40] (03CR) 10JMeybohm: [C:03+1] role::kafka::main: add missing ACL for statsv [puppet] - 10https://gerrit.wikimedia.org/r/1304096 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey) [13:24:51] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [13:24:51] !log marostegui@cumin1003 dbmaint on es7@eqiad T429463 [13:24:58] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [13:25:01] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1039: Upgrading es1039.eqiad.wmnet [13:25:22] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1039: Upgrading es1039.eqiad.wmnet [13:26:11] !log tchin@deploy1003 Finished scap sync-world: Backport for [[gerrit:1300892|[PageViewInfo] Add new config (T411771)]] (duration: 14m 24s) [13:26:15] T411771: Migrate PageViewInfo calls away from rest-gateway - https://phabricator.wikimedia.org/T411771 [13:26:26] (03Merged) 10jenkins-bot: stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304802 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton) [13:26:45] (03PS14) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [13:26:55] manfredi: should your patches go out in that order? [13:27:17] (03CR) 10CI reject: [V:04-1] diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [13:27:19] 1304125 first [13:27:39] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS trixie [13:27:52] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [13:28:02] (03PS1) 10Gerrit maintenance bot: mariadb: Promote es2039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1304804 (https://phabricator.wikimedia.org/T429794) [13:28:07] (03PS2) 10Muehlenhoff: Move the the hourly httpbb run to cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1304803 (https://phabricator.wikimedia.org/T427897) [13:28:12] (03PS5) 10Jforrester: wikifunctions: Switch JavaScript evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300271 (https://phabricator.wikimedia.org/T417870) [13:28:24] (03CR) 10Ayounsi: "I tested it with the migration of diffscan to Trixie, fixed a few bugs on the way and it now runs correctly." [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [13:28:44] (03CR) 10Jforrester: [C:03+2] wikifunctions: Switch JavaScript evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300271 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [13:29:20] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [13:30:05] spiderpig says "🐌 Change(s) 1304125 touch l10n-related files and are likely to trigger a large l10n rebuild, resulting in a slow deployment (~20 minutes). [13:30:25] but it doesn't actually do that, does it? [13:30:39] i don't think it does [13:30:48] no, doesn't look like it… [13:30:59] not sure what could be getting misdetected there [13:31:15] (03Merged) 10jenkins-bot: wikifunctions: Switch JavaScript evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300271 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [13:31:18] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304125 (https://phabricator.wikimedia.org/T428293) (owner: 10Mmartorana) [13:31:23] (03PS15) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [13:31:47] https://gitlab.wikimedia.org/repos/releng/scap/-/blob/3b03fae7b32a75427d7e8c429f059f080c2d89aa/scap/backport.py#L1793 extension.json, apparently [13:32:10] i guess an extension.json change could be adding i18n directories? [13:32:26] let's hope it does not actually rebuild l10n [13:32:42] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [13:33:12] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [13:33:30] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [13:33:47] (03PS16) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [13:34:08] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [13:34:09] (03CR) 10Andrew Bogott: [C:03+1] Put cloudvirt10[77-80] in service [puppet] - 10https://gerrit.wikimedia.org/r/1303962 (https://phabricator.wikimedia.org/T429563) (owner: 10Filippo Giunchedi) [13:34:16] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [13:34:35] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [13:34:37] (03CR) 10WMDE-Fisch: [C:03+1] Global rollout - Sub-ref deployments to group 2 wikis (batch 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) (owner: 10Svantje Lilienthal) [13:34:40] (03Merged) 10jenkins-bot: Add email confirmation banner Test Kitchen instrumentation (long-term) [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304125 (https://phabricator.wikimedia.org/T428293) (owner: 10Mmartorana) [13:34:57] !log tgr@deploy1003 Started scap sync-world: Backport for [[gerrit:1304125|Add email confirmation banner Test Kitchen instrumentation (long-term) (T428293)]] [13:35:01] T428293: Instrument email confirmation banner via Test Kitchen instrument (impressions, clicks, confirmations, removals) - https://phabricator.wikimedia.org/T428293 [13:39:34] !log tgr@deploy1003 tgr, mmartorana: Backport for [[gerrit:1304125|Add email confirmation banner Test Kitchen instrumentation (long-term) (T428293)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:40:04] manfredi: do you need to test it? [13:40:16] (03PS1) 10Gkyziridis: ml-services: Deploy outlink model latest version on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304805 (https://phabricator.wikimedia.org/T429675) [13:40:19] No, you can go ahead [13:40:34] !log tgr@deploy1003 tgr, mmartorana: Continuing with deployment [13:41:03] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040766 (10Jclark-ctr) @Marostegui Dell believes that reboot was caused by firmware updates or someone initiating reboot can we add server back to service? I ran hardware test on all hard... [13:43:07] (03PS5) 10Jforrester: wikifunctions: Drop temporary Rust evaluator releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300272 (https://phabricator.wikimedia.org/T417870) [13:43:41] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: host reimage [13:45:01] (03PS2) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) [13:46:57] !log tgr@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304125|Add email confirmation banner Test Kitchen instrumentation (long-term) (T428293)]] (duration: 11m 59s) [13:47:02] T428293: Instrument email confirmation banner via Test Kitchen instrument (impressions, clicks, confirmations, removals) - https://phabricator.wikimedia.org/T428293 [13:48:01] (03CR) 10Elukey: [C:03+2] role::kafka::main: add missing ACL for statsv [puppet] - 10https://gerrit.wikimedia.org/r/1304096 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey) [13:48:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304122 (https://phabricator.wikimedia.org/T428292) (owner: 10Mmartorana) [13:48:41] (03PS3) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) [13:48:43] MatmaRex: we can do yours after the meeting if that works for you [13:49:19] (03Merged) 10jenkins-bot: config: Enable EmailConfirmationBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304122 (https://phabricator.wikimedia.org/T428292) (owner: 10Mmartorana) [13:49:27] (03PS4) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) [13:49:36] !log tgr@deploy1003 Started scap sync-world: Backport for [[gerrit:1304122|config: Enable EmailConfirmationBanner on all wikis (T428292)]] [13:49:41] T428292: Roll out email confirmation banner to all wikis - https://phabricator.wikimedia.org/T428292 [13:49:43] (03CR) 10Kamila Součková: [C:03+2] dse-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304573 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [13:49:45] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: host reimage [13:50:27] (03CR) 10Elukey: __init__: modify the management_password property (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [13:50:38] !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . [13:51:26] (03PS1) 10Gkyziridis: Retrigger the deployment for qwen36 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 [13:51:31] tgr_: sure [13:52:22] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [13:53:08] (03CR) 10CI reject: [V:04-1] __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [13:55:34] (03PS1) 10JavierMonton: stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304812 (https://phabricator.wikimedia.org/T425624) [13:55:37] !log tgr@deploy1003 mmartorana, tgr: Backport for [[gerrit:1304122|config: Enable EmailConfirmationBanner on all wikis (T428292)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:55:41] T428292: Roll out email confirmation banner to all wikis - https://phabricator.wikimedia.org/T428292 [13:56:12] tgr_: perfect, it works, you can go ahead [13:56:23] !log tgr@deploy1003 mmartorana, tgr: Continuing with deployment [13:56:28] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040817 (10Marostegui) >>! In T428832#12040766, @Jclark-ctr wrote: > @Marostegui Dell believes that reboot was caused by firmware updates or someone initiating reboot can we add server ba... [13:56:38] (03PS1) 10Dpogorzelski: ml-serve: fix vram stats collection [puppet] - 10https://gerrit.wikimedia.org/r/1304813 (https://phabricator.wikimedia.org/T429597) [13:57:06] (03Merged) 10jenkins-bot: dse-k8s-services/*: Fix early inclusion of clusterinfo values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304573 (https://phabricator.wikimedia.org/T388390) (owner: 10Kamila Součková) [14:00:52] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040832 (10Jclark-ctr) I agree. I've been going back and forth with them on this. They have attempted to close the ticket, claiming that this is a new issue rather than a continuation of t... [14:01:42] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040847 (10Marostegui) Thanks - I have started mariadb and it is catching up, once it finishes I will repool. Thanks for the help [14:02:41] (03CR) 10JavierMonton: [C:03+2] stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304812 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton) [14:02:44] !log tgr@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304122|config: Enable EmailConfirmationBanner on all wikis (T428292)]] (duration: 13m 08s) [14:02:49] T428292: Roll out email confirmation banner to all wikis - https://phabricator.wikimedia.org/T428292 [14:03:05] !log UTC afternoon deploys done(ish; will do the remaining two patches in about an hour) [14:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:01] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1375.eqiad.wmnet with OS trixie [14:04:36] (03CR) 10Jforrester: [C:03+2] wikifunctions: Drop temporary Rust evaluator releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300272 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [14:04:47] Gergo thanks a lot! [14:05:07] (03Merged) 10jenkins-bot: stream: webrequest-page-view-next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304812 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton) [14:06:39] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1376.eqiad.wmnet with OS trixie [14:06:40] (03Merged) 10jenkins-bot: wikifunctions: Drop temporary Rust evaluator releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1300272 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [14:06:56] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1377.eqiad.wmnet with OS trixie [14:06:59] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1039.eqiad.wmnet with OS trixie [14:07:10] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1378.eqiad.wmnet with OS trixie [14:07:18] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [14:07:22] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1379.eqiad.wmnet with OS trixie [14:07:29] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [14:07:37] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [14:07:47] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [14:07:57] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [14:08:00] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [14:09:37] (03CR) 10RLazarus: [C:03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1304803 (https://phabricator.wikimedia.org/T427897) (owner: 10Muehlenhoff) [14:12:59] (03PS1) 10MVernon: Pontoon: specmap for swift in pontoon [puppet] - 10https://gerrit.wikimedia.org/r/1304817 (https://phabricator.wikimedia.org/T429630) [14:14:23] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040890 (10Marostegui) @Jclark-ctr it didn't take long for the host to crash again as soon as it got some load. It is unreachable again, no logs on the HW side that I can see on my end. [14:15:11] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040891 (10Jclark-ctr) Thank you! [14:15:46] (03PS1) 10MVernon: swift::proxy: copy rewrite middleware to appropriate python path [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) [14:16:18] (03CR) 10CI reject: [V:04-1] swift::proxy: copy rewrite middleware to appropriate python path [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:16:46] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1375.eqiad.wmnet with reason: host reimage [14:17:28] (03PS1) 10MVernon: puppetserver::pontoon: add optional swift::fetch_rings [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) [14:19:28] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1039: Migration of es1039.eqiad.wmnet completed [14:19:33] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1376.eqiad.wmnet with reason: host reimage [14:20:02] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040911 (10Jclark-ctr) That might have been my mistake. I had rebooted the server a few times while updating additional firmware, and it looks like the PSU firmware update did not actually... [14:20:13] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1378.eqiad.wmnet with reason: host reimage [14:20:20] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1377.eqiad.wmnet with reason: host reimage [14:20:23] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1379.eqiad.wmnet with reason: host reimage [14:20:39] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1375.eqiad.wmnet with reason: host reimage [14:20:48] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: db1262 crashed - https://phabricator.wikimedia.org/T428832#12040912 (10Marostegui) No worries - I just started it again. [14:21:23] (03PS1) 10MVernon: swift::proxy: look for puppetserver::pontoon in non-prod environments [puppet] - 10https://gerrit.wikimedia.org/r/1304822 (https://phabricator.wikimedia.org/T429630) [14:22:04] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [14:23:23] (03PS2) 10MVernon: swift::proxy: copy rewrite middleware to appropriate python path [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) [14:23:40] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1379.eqiad.wmnet with reason: host reimage [14:24:26] (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:24:57] (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304822 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:27:32] (03CR) 10Arnaudb: "Thanks for the review, one of the questions is still open, please let me know what you want to do with it. I'll reach out either via IRC o" [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [14:27:43] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1378.eqiad.wmnet with reason: host reimage [14:30:04] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1430) [14:30:38] (03PS2) 10Arnaudb: trafficserver: raise ATS timeouts for the gerrit secondary backends [puppet] - 10https://gerrit.wikimedia.org/r/1304808 (https://phabricator.wikimedia.org/T429749) [14:30:57] (03PS1) 10Arnaudb: trafficserver: raise ATS timeouts for the gerrit primary backend [puppet] - 10https://gerrit.wikimedia.org/r/1304821 (https://phabricator.wikimedia.org/T429749) [14:32:01] (03CR) 10Arnaudb: trafficserver: raise ATS timeouts for the gerrit secondary backends (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304808 (https://phabricator.wikimedia.org/T429749) (owner: 10Arnaudb) [14:32:14] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1377.eqiad.wmnet with reason: host reimage [14:36:20] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1376.eqiad.wmnet with reason: host reimage [14:36:49] RESOLVED: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [14:39:02] (03CR) 10CWilliams: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:39:03] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [14:39:04] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1375.eqiad.wmnet with OS trixie [14:39:56] (03CR) 10CWilliams: [C:03+1] swift::proxy: look for puppetserver::pontoon in non-prod environments [puppet] - 10https://gerrit.wikimedia.org/r/1304822 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:40:44] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: puppetmaster hostcert and hostprivkey point to nonexistent files - https://phabricator.wikimedia.org/T179099#12041029 (10MoritzMuehlenhoff) 05Stalled→03Declined No longer relevant, was only an issue with Puppet 5 [14:41:06] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1379.eqiad.wmnet with OS trixie [14:41:07] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 07Puppet (Puppet 7.0): Issues which should be fixed by puppet7 upgrade - https://phabricator.wikimedia.org/T351104#12041034 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff We're on Puppet 7 for quite a while now. [14:42:02] (03CR) 10Elukey: [C:03+2] kafka::broker: Stop using the transition package [puppet] - 10https://gerrit.wikimedia.org/r/1299443 (owner: 10Muehlenhoff) [14:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [14:45:03] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1304822 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:45:13] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1378.eqiad.wmnet with OS trixie [14:48:24] (03CR) 10Ozge: [C:03+1] ml-services: Deploy outlink model latest version on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304805 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis) [14:49:35] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1377.eqiad.wmnet with OS trixie [14:52:23] !log deny ANONYMOUS write traffic on Kafka main for statsv (only varnishkafka will push events from now on) - T425528 [14:52:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:28] T425528: Rework ACLs on Kafka 3.x clusters - https://phabricator.wikimedia.org/T425528 [14:52:35] Cc: jayme --^ [14:52:50] <# [14:52:58] 06SRE: Rework ACLs on Kafka 3.x clusters - https://phabricator.wikimedia.org/T425528#12041088 (10elukey) 05Open→03Resolved [14:53:07] (03CR) 10Scott French: [C:03+1] main: Add a namespace for the mw-pretrain service. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake) [14:54:00] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1376.eqiad.wmnet with OS trixie [14:54:03] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [14:55:25] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1380.eqiad.wmnet with OS trixie [14:55:33] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1381.eqiad.wmnet with OS trixie [14:55:41] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1382.eqiad.wmnet with OS trixie [14:55:49] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1383.eqiad.wmnet with OS trixie [14:55:58] !log blake@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1384.eqiad.wmnet with OS trixie [14:59:04] (03CR) 10Elukey: sre.hosts.provision: introduce the wmfroot user (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291994 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [14:59:47] (03CR) 10MVernon: [C:03+2] swift::proxy: look for puppetserver::pontoon in non-prod environments [puppet] - 10https://gerrit.wikimedia.org/r/1304822 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [15:00:03] (03CR) 10MVernon: [C:03+2] swift::proxy: copy rewrite middleware to appropriate python path [puppet] - 10https://gerrit.wikimedia.org/r/1304818 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [15:02:23] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:02:29] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:03:53] 06SRE, 10SRE-swift-storage, 07Essential-Work, 13Patch-For-Review: Migrate production swift clusters to trixie - https://phabricator.wikimedia.org/T429630#12041120 (10MatthewVernon) [15:04:58] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1039: Migration of es1039.eqiad.wmnet completed [15:05:00] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [15:06:06] I'll deploy the MediaWiki patches that didn't fit into the backport window [15:06:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [15:06:54] (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1003 using scap backport" [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304690 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [15:08:24] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1382.eqiad.wmnet with reason: host reimage [15:08:47] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1381.eqiad.wmnet with reason: host reimage [15:08:50] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1380.eqiad.wmnet with reason: host reimage [15:09:00] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1384.eqiad.wmnet with reason: host reimage [15:09:04] !log blake@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1383.eqiad.wmnet with reason: host reimage [15:09:27] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:09:33] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:11:12] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:11:17] (03PS5) 10Andrew Bogott: cloud-vps vendordata: refactor cumin bastion logic a bit [puppet] - 10https://gerrit.wikimedia.org/r/1304139 (https://phabricator.wikimedia.org/T422801) [15:11:17] (03PS1) 10Andrew Bogott: open firewall for cumin in the octavia cloud-vps project [puppet] - 10https://gerrit.wikimedia.org/r/1304832 (https://phabricator.wikimedia.org/T422801) [15:12:12] !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply [15:12:21] (03CR) 10CI reject: [V:04-1] Fix displaying events with IP agents [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [15:13:14] 06SRE, 10SRE-Access-Requests: Requesting access for lerickson to deploy the RDF streaming updater on wikikube - https://phabricator.wikimedia.org/T429610#12041162 (10GTurkington-WMF) I'm @lerickson's manager - request is approved, this is needed for her core responsibilities. [15:13:28] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1382.eqiad.wmnet with reason: host reimage [15:13:57] (03Merged) 10jenkins-bot: Preserve redoLocalAuthentication flag when returning from auth domain [extensions/CentralAuth] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304690 (https://phabricator.wikimedia.org/T429495) (owner: 10Gergő Tisza) [15:16:49] (03CR) 10Gergő Tisza: "`" [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [15:17:19] (03CR) 10Gergő Tisza: [C:03+2] Fix displaying events with IP agents [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [15:17:34] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1381.eqiad.wmnet with reason: host reimage [15:21:55] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator: Add logout.d script for Phabricator - https://phabricator.wikimedia.org/T286904#12041211 (10Aklapper) Hi, I myself am not sure what to add apart from T286904#11091482. Please elaborate if anything is needed from our side - thanks! [15:22:26] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1380.eqiad.wmnet with reason: host reimage [15:25:16] (03Merged) 10jenkins-bot: Fix displaying events with IP agents [extensions/Echo] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304222 (https://phabricator.wikimedia.org/T428198) (owner: 10Bartosz Dziewoński) [15:27:01] !log tgr@deploy1003 Started scap sync-world: Backport for [[gerrit:1304222|Fix displaying events with IP agents (T428198)]], [[gerrit:1304690|Preserve redoLocalAuthentication flag when returning from auth domain (T429495)]] [15:27:08] T428198: "Error: Call to a member function getTalkPage() on null" when trying to view notifications - https://phabricator.wikimedia.org/T428198 [15:27:10] T429495: The type parameter of CentralAuthPostLoginRedirect is incorrect when recovering from lost tokenstore data - https://phabricator.wikimedia.org/T429495 [15:27:14] PROBLEM - Host wikikube-worker2315 is DOWN: PING CRITICAL - Packet loss = 100% [15:27:19] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1384.eqiad.wmnet with reason: host reimage [15:27:32] PROBLEM - Host wikikube-worker2314 is DOWN: PING CRITICAL - Packet loss = 100% [15:29:42] RECOVERY - Host wikikube-worker2315 is UP: PING OK - Packet loss = 0%, RTA = 32.98 ms [15:30:05] jan_drewniak: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1530). [15:30:42] RECOVERY - Host wikikube-worker2314 is UP: PING OK - Packet loss = 0%, RTA = 31.67 ms [15:31:07] !log tgr@deploy1003 matmarex, tgr: Backport for [[gerrit:1304222|Fix displaying events with IP agents (T428198)]], [[gerrit:1304690|Preserve redoLocalAuthentication flag when returning from auth domain (T429495)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:31:11] (03PS6) 10Federico Ceratto: pyproject.toml: move conf into pyproject [cookbooks] - 10https://gerrit.wikimedia.org/r/1304776 [15:31:22] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1382.eqiad.wmnet with OS trixie [15:31:47] oh, thanks. i can test it [15:31:53] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1383.eqiad.wmnet with reason: host reimage [15:32:33] FIRING: [2x] KubernetesCalicoDown: wikikube-worker2314.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:32:42] looks good [15:34:17] 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface cr1-eqiad:ae2 (asw2-b-eqiad:ae1) - https://phabricator.wikimedia.org/T429116#12041312 (10Jclark-ctr) @cmooney, this link continues to experience errors. If you're available tomorrow and would like me to swap the optics, I can ping you. [15:34:47] the CA patch looks good as well [15:34:53] !log tgr@deploy1003 matmarex, tgr: Continuing with deployment [15:35:27] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1381.eqiad.wmnet with OS trixie [15:37:32] RESOLVED: [2x] KubernetesCalicoDown: wikikube-worker2314.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:39:58] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1380.eqiad.wmnet with OS trixie [15:40:23] (03PS7) 10Federico Ceratto: pyproject.toml: move conf into pyproject [cookbooks] - 10https://gerrit.wikimedia.org/r/1304776 [15:41:16] !log tgr@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304222|Fix displaying events with IP agents (T428198)]], [[gerrit:1304690|Preserve redoLocalAuthentication flag when returning from auth domain (T429495)]] (duration: 14m 15s) [15:41:22] T428198: "Error: Call to a member function getTalkPage() on null" when trying to view notifications - https://phabricator.wikimedia.org/T428198 [15:41:23] T429495: The type parameter of CentralAuthPostLoginRedirect is incorrect when recovering from lost tokenstore data - https://phabricator.wikimedia.org/T429495 [15:41:42] !log UTC afternoon deploys double done [15:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:25] 10ops-codfw, 06Data-Persistence, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812 (10ayounsi) 03NEW p:05Triage→03High [15:42:48] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12041420 (10ayounsi) [15:45:02] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1384.eqiad.wmnet with OS trixie [15:48:15] (03CR) 10Blake: main: Add a namespace for the mw-pretrain service. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake) [15:49:22] !log blake@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1383.eqiad.wmnet with OS trixie [15:54:18] 10ops-codfw, 06Data-Persistence, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12041495 (10jcrespo) ms-backup2003 -> I will take care of stopping vital network services before the window. [15:57:56] (03PS1) 10Sbisson: Recommendation api: update to 2026-06-15-110926 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304840 [16:01:11] (03CR) 10Muehlenhoff: [C:03+2] Remove my old ssh key using FIDO key now [puppet] - 10https://gerrit.wikimedia.org/r/1304711 (https://phabricator.wikimedia.org/T423293) (owner: 10Papaul) [16:02:52] (03PS6) 10Daniel Kinzler: rest-gateway: emit 401 if rate limit is 0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298031 (https://phabricator.wikimedia.org/T428184) [16:03:04] (03CR) 10Daniel Kinzler: rest-gateway: emit 401 if rate limit is 0 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298031 (https://phabricator.wikimedia.org/T428184) (owner: 10Daniel Kinzler) [16:09:40] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:14:40] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:15:23] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12041649 (10ayounsi) [16:15:35] (03PS3) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [16:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [16:23:56] !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [16:24:27] (03PS1) 10Alex Paskulin: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) [16:25:25] 10ops-codfw, 06Data-Persistence, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12041701 (10Eevans) No action is needed for the aqs nodes themselves. I //think// that the nightly ETL jobs that load data happen at ~2am UTC, whic... [16:25:28] (03CR) 10Kamila Součková: [C:03+2] kubernetes: switch all of MW to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1302931 (https://phabricator.wikimedia.org/T429030) (owner: 10Kamila Součková) [16:25:54] (03CR) 10CI reject: [V:04-1] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [16:28:22] (03CR) 10Andrew Bogott: [C:03+2] open firewall for cumin in the octavia cloud-vps project [puppet] - 10https://gerrit.wikimedia.org/r/1304832 (https://phabricator.wikimedia.org/T422801) (owner: 10Andrew Bogott) [16:30:39] !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [16:30:41] !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [16:31:48] (03PS4) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [16:31:57] !log atsuko@deploy1003 mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver eqiad-k8s # T425377 populating translation memory (dblist: https://phabricator.wikimedia.org/P94329) [16:32:02] T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377 [16:32:38] (03PS2) 10Alex Paskulin: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) [16:32:45] (03PS5) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [16:33:29] (03PS6) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [16:33:38] (03CR) 10CI reject: [V:04-1] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [16:35:16] (03PS3) 10Alex Paskulin: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) [16:36:12] (03CR) 10CI reject: [V:04-1] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [16:37:13] !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [16:38:26] (03PS4) 10Alex Paskulin: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) [16:38:46] !log kamila@deploy1003 Started scap sync-world: upgrade all of prod to debian bookworm [16:40:07] (03PS1) 10AOkoth: site: begin phab2002 decom [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) [16:40:34] (03CR) 10JHathaway: log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [16:43:17] !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [16:43:26] (03CR) 10JHathaway: "ugh, I was not 😞" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304617 (owner: 10JHathaway) [16:43:45] !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [16:43:46] !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [16:44:15] !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [16:44:59] (03CR) 10Alex Paskulin: "Happy to change this if it's preferable to start with a smaller set of changes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [16:46:01] !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [16:46:03] !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [16:46:04] !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [16:46:08] !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [16:46:39] (03PS5) 10Alex Paskulin: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) [16:46:51] !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [16:46:53] !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [16:46:54] !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [16:46:58] !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [16:47:09] !log kamila@deploy1003 Finished scap sync-world: upgrade all of prod to debian bookworm (duration: 09m 17s) [16:52:26] (03CR) 10Elukey: [C:03+1] log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [16:53:54] (03PS1) 10Hnowlan: restbase: add disk space alert [alerts] - 10https://gerrit.wikimedia.org/r/1304852 (https://phabricator.wikimedia.org/T407141) [16:54:07] (03PS1) 10Hnowlan: restbase: remove icinga disk space check, use alertmanager check [puppet] - 10https://gerrit.wikimedia.org/r/1304853 (https://phabricator.wikimedia.org/T407141) [16:57:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:00:04] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1700) [17:00:04] ryankemper: #bothumor I � Unicode. All rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T1700). [17:00:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 18.35% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:05:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.35% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:09:52] PROBLEM - MariaDB read only wikireplica-s2 on clouddb1014 is CRITICAL: Could not connect to localhost:3312 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only [17:09:52] PROBLEM - MariaDB read only s2 on clouddb1014 is CRITICAL: Could not connect to localhost:3312 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only [17:10:22] PROBLEM - Check unit status of security_group_ssh-from-restricted-bastion_to_project_bd825ee46c214f9e88126a2b290d88f8 on cloudcontrol1006 is CRITICAL: CRITICAL: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_bd825ee46c214f9e88126a2b290d88f8 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [17:10:42] mmm clouddb1014 crashed? [17:10:54] RECOVERY - MariaDB read only wikireplica-s2 on clouddb1014 is OK: Version 10.11.16-MariaDB, Uptime 2766490s, read_only: True, event_scheduler: False, 159.60 QPS, connection latency: 0.022338s, query latency: 0.000399s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only [17:10:54] RECOVERY - MariaDB read only s2 on clouddb1014 is OK: Version 10.11.16-MariaDB, Uptime 2766490s, read_only: True, event_scheduler: False, 233.42 QPS, connection latency: 0.019283s, query latency: 0.000334s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only [17:11:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 21.76% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:14:13] weird... [17:23:23] (huh at saturation spike, keeping an eye out) [17:24:08] (eh looks like a scraper or something, probably not related to the debian upgrade, I hope) [17:25:36] (03CR) 10Volans: [C:04-1] "With this change the debian package will not build for bookworm or trixie" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [17:29:04] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12041904 (10XenoRyet) This has my approval [17:31:24] (03CR) 10Elukey: [C:03+1] log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [17:34:16] (03CR) 10Volans: [C:04-1] log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [17:36:45] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.45% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:37:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:41:25] (03CR) 10Elukey: "First pass with some nits that I found while skimming the code, and I closed comments from 2020 :D" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [17:43:33] (03CR) 10Elukey: [C:03+1] log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [17:48:07] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#12041961 (10Jclark-ctr) a:05Jclark-ctr→03None [17:48:48] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q3:rack/setup/install cloudcephosd105[3456] - https://phabricator.wikimedia.org/T419892#12041963 (10Jclark-ctr) a:05Jclark-ctr→03None [17:49:07] (03PS1) 10Kimberly Sarabia: Remove multimediaviewer-beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) [17:50:43] (03PS2) 10Kimberly Sarabia: Remove multimediaviewer-beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) [18:00:21] RECOVERY - Check unit status of security_group_ssh-from-restricted-bastion_to_project_bd825ee46c214f9e88126a2b290d88f8 on cloudcontrol1006 is OK: OK: Status of the systemd unit security_group_ssh-from-restricted-bastion_to_project_bd825ee46c214f9e88126a2b290d88f8 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [18:05:05] (03CR) 10Scott French: "Since we've now demonstrated that we don't appear to need it in practice, the option to just remove it seems like a solid one - particular" [puppet] - 10https://gerrit.wikimedia.org/r/1304060 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [18:07:12] (03CR) 10Scott French: [C:03+1] "This looks reasonable to me!" [puppet] - 10https://gerrit.wikimedia.org/r/1304512 (https://phabricator.wikimedia.org/T427175) (owner: 10Elukey) [18:12:45] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.55% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [18:13:27] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review, 06Release-Engineering-Team (Radar), 07User-notice: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#12042047 (10Dzahn) seeing CI errors on `labs/codesearch` that seem caused by this [18:21:42] (03CR) 10Volans: [C:04-1] "I did a quick pass. I think there are some blocking errors and potentially could be simplified fixing the regex that parses the nmap data," [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [18:22:17] 06SRE, 10DNS, 06Traffic: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042062 (10Asaf) 05In progress→03Open Thank you for looking into this -- we now have the updated request from our vendor. It is: Please update the DNS configuration for the new ALB and remove the old ALB... [18:24:20] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#12042071 (10RobH) a:03RobH [18:24:58] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#12042072 (10RobH) Myriad confirmed they're preparing a quote to replace all the fans and power supplies that we have F2B to B2F. [18:26:18] (03Abandoned) 10Mstyles: ReauthenticateForActions: Add new config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298944 (https://phabricator.wikimedia.org/T427947) (owner: 10Mstyles) [18:26:47] (03PS1) 10Volans: WMCS backups: set retention back to sane value [puppet] - 10https://gerrit.wikimedia.org/r/1304865 (https://phabricator.wikimedia.org/T428867) [18:27:17] (03CR) 10Dzahn: "have we succesfully deployed the latest to phab2003 yet?" [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [18:27:47] (03CR) 10Volans: WMCS backups: set retention back to sane value (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304865 (https://phabricator.wikimedia.org/T428867) (owner: 10Volans) [18:30:11] (03CR) 10Andrew Bogott: [C:03+1] WMCS backups: set retention back to sane value [puppet] - 10https://gerrit.wikimedia.org/r/1304865 (https://phabricator.wikimedia.org/T428867) (owner: 10Volans) [18:32:39] (03PS1) 10Reedy: Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1) [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304867 [18:32:39] (03PS1) 10Reedy: Upgrade guzzle/* [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304868 (https://phabricator.wikimedia.org/T429826) [18:35:08] (03CR) 10Eric Gardner: Remove multimediaviewer-beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) (owner: 10Kimberly Sarabia) [18:35:59] (03CR) 10Volans: [C:03+2] WMCS backups: set retention back to sane value [puppet] - 10https://gerrit.wikimedia.org/r/1304865 (https://phabricator.wikimedia.org/T428867) (owner: 10Volans) [18:36:15] (03CR) 10Volans: [C:03+2] WMCS backups: set retention back to sane value (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304865 (https://phabricator.wikimedia.org/T428867) (owner: 10Volans) [18:36:27] jouncebot: nowandnext [18:36:27] No deployments scheduled for the next 1 hour(s) and 23 minute(s) [18:36:27] In 1 hour(s) and 23 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2000) [18:37:41] (03CR) 10Reedy: [C:03+2] Upgrade guzzle/* [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304868 (https://phabricator.wikimedia.org/T429826) (owner: 10Reedy) [18:37:45] (03CR) 10Reedy: [C:03+2] Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1) [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304867 (owner: 10Reedy) [18:43:29] 06SRE, 10DNS, 06Traffic: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042124 (10Asaf) [18:43:33] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12042127 (10Papaul) [18:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [18:46:02] (03PS1) 10SBassett: Lazily reject pre-fix parser-cache entries for noreferrer/noopener links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) [18:46:19] (03PS1) 10CDobbins: learn.wiki: update ALB records [dns] - 10https://gerrit.wikimedia.org/r/1304877 (https://phabricator.wikimedia.org/T429628) [18:47:12] (03CR) 10SBassett: [C:04-1] "Hold for config deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [18:48:28] (03Merged) 10jenkins-bot: Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1) [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304867 (owner: 10Reedy) [18:48:37] (03Merged) 10jenkins-bot: Upgrade guzzle/* [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304868 (https://phabricator.wikimedia.org/T429826) (owner: 10Reedy) [18:48:47] (03PS1) 10Dduvall: zuul: Use full image refs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/1304878 [18:48:52] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [18:49:14] (03PS1) 10Reedy: Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304879 (https://phabricator.wikimedia.org/T429826) [18:49:49] (03CR) 10Dduvall: "Note this should be a noop." [puppet] - 10https://gerrit.wikimedia.org/r/1304878 (owner: 10Dduvall) [18:54:11] (03PS3) 10Kimberly Sarabia: Remove multimediaviewer-beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) [18:56:01] (03CR) 10Kimberly Sarabia: Remove multimediaviewer-beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) (owner: 10Kimberly Sarabia) [18:58:57] (03CR) 10Eric Gardner: [C:03+1] Remove multimediaviewer-beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) (owner: 10Kimberly Sarabia) [19:02:11] (03CR) 10Reedy: [C:03+2] Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304879 (https://phabricator.wikimedia.org/T429826) (owner: 10Reedy) [19:02:51] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042186 (10BCornwall) @asaf Would you like to remove the stage ALB records as well? [19:05:49] (03PS7) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [19:08:09] (03Merged) 10jenkins-bot: Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304879 (https://phabricator.wikimedia.org/T429826) (owner: 10Reedy) [19:08:20] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042204 (10BCornwall) @Asaf CNAME records are not allowed at the apex. [19:12:05] !log reedy@deploy1003 Started scap sync-world: Backport for [[gerrit:1304867|Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1)]], [[gerrit:1304868|Upgrade guzzle/* (T429826)]], [[gerrit:1304879|Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 (T429826)]] [19:12:10] T429826: CVE-2026-55568, CVE-2026-55767, CVE-2026-55766: guzzlehttp security advisors that impacts upgrading parsoid version in vendor/ - https://phabricator.wikimedia.org/T429826 [19:12:19] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042216 (10BCornwall) @Asaf one last question: Would you like all the associated ACM records removed? i.e. are you utilizing those certs for any other AWS services (e.g. CloudFront)? [19:13:59] !log reedy@deploy1003 reedy: Backport for [[gerrit:1304867|Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1)]], [[gerrit:1304868|Upgrade guzzle/* (T429826)]], [[gerrit:1304879|Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 (T429826)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:14:25] !log reedy@deploy1003 reedy: Continuing with deployment [19:21:11] !log reedy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304867|Upgrading guzzlehttp/psr7 (2.11.0 => 2.11.1)]], [[gerrit:1304868|Upgrade guzzle/* (T429826)]], [[gerrit:1304879|Updated guzzlehttp/guzzle from 7.10.0 to 7.12.1 (T429826)]] (duration: 09m 06s) [19:21:15] T429826: CVE-2026-55568, CVE-2026-55767, CVE-2026-55766: guzzlehttp security advisors that impacts upgrading parsoid version in vendor/ - https://phabricator.wikimedia.org/T429826 [19:22:33] (03PS8) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [19:24:53] (03PS2) 10SBassett: Add info-level logging to wmgMonologChannels for timeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) [19:25:09] (03CR) 10Reedy: [C:03+2] Add info-level logging to wmgMonologChannels for timeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) (owner: 10SBassett) [19:25:59] (03CR) 10CI reject: [V:04-1] log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [19:26:08] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042251 (10Asaf) 1. No, please keep the stage ALB records for now. 2. Since learn.wiki was previously pointing to the old ALB, could you please update the existing apex record in the sam... [19:26:20] (03Merged) 10jenkins-bot: Add info-level logging to wmgMonologChannels for timeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304187 (https://phabricator.wikimedia.org/T429654) (owner: 10SBassett) [19:27:20] !log reedy@deploy1003 Started scap sync-world: Backport for [[gerrit:1304187|Add info-level logging to wmgMonologChannels for timeline (T429654)]] [19:27:25] T429654: Add info-level logging to wmgMonologChannels for timeline - https://phabricator.wikimedia.org/T429654 [19:29:20] !log reedy@deploy1003 sbassett, reedy: Backport for [[gerrit:1304187|Add info-level logging to wmgMonologChannels for timeline (T429654)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:29:45] !log reedy@deploy1003 sbassett, reedy: Continuing with deployment [19:33:25] (03PS9) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [19:34:01] !log reedy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304187|Add info-level logging to wmgMonologChannels for timeline (T429654)]] (duration: 06m 41s) [19:34:05] T429654: Add info-level logging to wmgMonologChannels for timeline - https://phabricator.wikimedia.org/T429654 [19:35:24] (03PS2) 10Jdlrobson: Fix duplicate print and other projects menu links in main menu [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) [19:38:41] (03CR) 10CI reject: [V:04-1] log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [19:49:25] (03PS1) 10Arlolra: Add a hidden lint for pre ext tags expanding templates [extensions/Linter] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304887 (https://phabricator.wikimedia.org/T353697) [19:49:40] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [extensions/Linter] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304887 (https://phabricator.wikimedia.org/T353697) (owner: 10Arlolra) [19:51:44] (03PS10) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [19:56:27] (03PS1) 10Arlolra: Deploy PRV to 5 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304888 (https://phabricator.wikimedia.org/T429830) [20:00:04] RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2000). [20:00:05] RoanKattouw, bpirkle, Sohom_Datta, and arlolra: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:12] o/ [20:01:03] Hello [20:01:08] I can deploy my own patch [20:01:49] Mine can ride with either of the other config patches, or else I recently got spiderpig and can deploy mine. [20:02:57] (03PS1) 10Dduvall: zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 [20:03:44] (03CR) 10CI reject: [V:04-1] zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:04:01] (03CR) 10TrainBranchBot: [C:03+2] "Approved by catrope@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304156 (owner: 10Catrope) [20:05:44] (03PS2) 10Dduvall: zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 [20:08:12] (03CR) 10Kosta Harlan: Lazily reject pre-fix parser-cache entries for noreferrer/noopener links (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [20:08:49] (03PS11) 10JHathaway: log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 [20:08:52] (03CR) 10Dzahn: [C:03+2] zuul: Use full image refs in hiera [puppet] - 10https://gerrit.wikimedia.org/r/1304878 (owner: 10Dduvall) [20:09:12] 06SRE, 10SRE-Access-Requests: Change SSH key for denisse after new laptop provisioning - https://phabricator.wikimedia.org/T429429#12042448 (10andrea.denisse) [20:09:28] (03Merged) 10jenkins-bot: Permissions: Create wmf-officeit group on collabwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304156 (owner: 10Catrope) [20:09:43] !log catrope@deploy1003 Started scap sync-world: Backport for [[gerrit:1304156|Permissions: Create wmf-officeit group on collabwiki]] [20:11:04] (03CR) 10JHathaway: log: fix tests for pytest 9.1 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [20:11:42] !log catrope@deploy1003 catrope: Backport for [[gerrit:1304156|Permissions: Create wmf-officeit group on collabwiki]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:15:12] !log catrope@deploy1003 catrope: Continuing with deployment [20:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [20:19:28] !log catrope@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304156|Permissions: Create wmf-officeit group on collabwiki]] (duration: 09m 45s) [20:20:24] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042494 (10Asaf) Tagging @BCornwall , as this is time-sensitive before the AWS verification window closes and we'd have to start over. [20:20:30] I'll do mine next [20:20:55] (03CR) 10TrainBranchBot: [C:03+2] "Approved by bpirkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304175 (https://phabricator.wikimedia.org/T422770) (owner: 10BPirkle) [20:21:45] I am around to do mine at the end since it touches i18n and I might have a few more patches to add to it [20:21:55] bpirkle: I'm done, go ahead [20:25:36] (03CR) 10Dzahn: [C:03+2] zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:25:42] (03PS3) 10Dduvall: zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 [20:25:45] (03CR) 10Dzahn: [C:03+2] zuul: Update image refs for wmf-14.2.0-1 release [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:26:16] (03Abandoned) 10Kosta Harlan: hcaptcha: Stop attempting to cache credentialed proxy endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1286873 (https://phabricator.wikimedia.org/T426178) (owner: 10Kosta Harlan) [20:26:23] (03Abandoned) 10Kosta Harlan: hcaptcha: Remove ineffective http-level CORS add_headers [puppet] - 10https://gerrit.wikimedia.org/r/1286874 (https://phabricator.wikimedia.org/T426178) (owner: 10Kosta Harlan) [20:27:00] (03Merged) 10jenkins-bot: REST: adjust analytics and wikifunctions REST Sandbox visibility [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304175 (https://phabricator.wikimedia.org/T422770) (owner: 10BPirkle) [20:27:18] !log bpirkle@deploy1003 Started scap sync-world: Backport for [[gerrit:1304175|REST: adjust analytics and wikifunctions REST Sandbox visibility (T422770 T423058 T422771)]] [20:27:26] T422770: REST: Audience Designations - clean up module enabling - https://phabricator.wikimedia.org/T422770 [20:27:27] T423058: REST: Audience Designations - clean up module enabling - enable site.v1 and specs.v0 in core by default - https://phabricator.wikimedia.org/T423058 [20:27:27] T422771: REST: Audience Designations - publish modules to REST Sandbox by default - https://phabricator.wikimedia.org/T422771 [20:29:15] (03PS1) 10CDobbins: learn.wiki: update A records [dns] - 10https://gerrit.wikimedia.org/r/1304893 [20:29:18] !log bpirkle@deploy1003 bpirkle: Backport for [[gerrit:1304175|REST: adjust analytics and wikifunctions REST Sandbox visibility (T422770 T423058 T422771)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:30:34] !log bpirkle@deploy1003 bpirkle: Continuing with deployment [20:31:19] (03PS2) 10CDobbins: learn.wiki: update A records [dns] - 10https://gerrit.wikimedia.org/r/1304893 (https://phabricator.wikimedia.org/T429628) [20:34:48] !log bpirkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304175|REST: adjust analytics and wikifunctions REST Sandbox visibility (T422770 T423058 T422771)]] (duration: 07m 30s) [20:34:56] T422770: REST: Audience Designations - clean up module enabling - https://phabricator.wikimedia.org/T422770 [20:34:56] T423058: REST: Audience Designations - clean up module enabling - enable site.v1 and specs.v0 in core by default - https://phabricator.wikimedia.org/T423058 [20:34:57] T422771: REST: Audience Designations - publish modules to REST Sandbox by default - https://phabricator.wikimedia.org/T422771 [20:35:05] I'm done [20:36:31] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042591 (10BCornwall) @Asaf: IIRC the record names generated are deterministic so even if you were to delete/re-add them the requested records would be the same. Unfortunately, CNAMES c... [20:37:46] Is Sohom around? [20:40:16] I guess not. I will do my patch then [20:40:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by arlolra@deploy1003 using scap backport" [extensions/Linter] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304887 (https://phabricator.wikimedia.org/T353697) (owner: 10Arlolra) [20:40:54] (03CR) 10SBassett: [C:04-1] Lazily reject pre-fix parser-cache entries for noreferrer/noopener links (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [20:41:19] (03PS1) 10Dzahn: codesearch: ensure docker-cli is installed if on trixie or newer [puppet] - 10https://gerrit.wikimedia.org/r/1304895 (https://phabricator.wikimedia.org/T429828) [20:42:47] jouncebot: nowandnext [20:42:47] For the next 0 hour(s) and 17 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2000) [20:42:47] In 0 hour(s) and 17 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2100) [20:42:59] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042603 (10BCornwall) I also see that some IPs were hardcoded at the apex. This is not correct for AWS as the IPs for the ALBs could change at any time: DNS deployments pointing to ALBs... [20:43:07] (03PS1) 10Dreamy Jazz: RiskScoreCollector: Make error_context a string map [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304896 (https://phabricator.wikimedia.org/T429594) [20:43:27] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304896 (https://phabricator.wikimedia.org/T429594) (owner: 10Dreamy Jazz) [20:44:28] (03CR) 10Dzahn: [V:03+1 C:03+2] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:44:56] (03CR) 10Dzahn: [C:03+2] codesearch: ensure docker-cli is installed if on trixie or newer [puppet] - 10https://gerrit.wikimedia.org/r/1304895 (https://phabricator.wikimedia.org/T429828) (owner: 10Dzahn) [20:46:40] (03Merged) 10jenkins-bot: Add a hidden lint for pre ext tags expanding templates [extensions/Linter] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304887 (https://phabricator.wikimedia.org/T353697) (owner: 10Arlolra) [20:46:58] !log arlolra@deploy1003 Started scap sync-world: Backport for [[gerrit:1304887|Add a hidden lint for pre ext tags expanding templates (T353697)]] [20:47:02] T353697: Parsoid/legacy parser {{Pre}} template rendering difference - https://phabricator.wikimedia.org/T353697 [20:53:33] (03CR) 10Dzahn: [V:03+1 C:03+2] "deployed and restarted:" [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:55:32] (03CR) 10Dzahn: [V:03+1 C:03+2] "[zuul1001:~] $ sudo systemctl restart zuul-scheduler" [puppet] - 10https://gerrit.wikimedia.org/r/1304889 (owner: 10Dduvall) [20:57:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:00:05] alexsanford, Reedy, sbassett, Maryum, and manfredi: Your horoscope predicts another Weekly Security deployment window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2100). [21:01:23] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042709 (10Ahsan-arbisoft) @BCornwall 166.117.77.114 76.223.6.7 Please use these IPs for the apex domain, as they are the static IPs provided by AWS Global Accelerator and will not ch... [21:04:28] (03PS1) 10Dreamy Jazz: CaptchaPreAuthenticationProvider: Clear solved state on failure [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) [21:04:58] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:06:41] !log arlolra@deploy1003 arlolra: Backport for [[gerrit:1304887|Add a hidden lint for pre ext tags expanding templates (T353697)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:06:46] T353697: Parsoid/legacy parser {{Pre}} template rendering difference - https://phabricator.wikimedia.org/T353697 [21:08:33] !log arlolra@deploy1003 arlolra: Continuing with deployment [21:09:12] (03CR) 10EarlyWarningBot: "[Failed command](https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php83/86657/consoleFull): `composer run --timeout=0 phpunit" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:09:18] (03PS3) 10CDobbins: learn.wiki: update DNS records [dns] - 10https://gerrit.wikimedia.org/r/1304893 (https://phabricator.wikimedia.org/T429628) [21:11:44] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042732 (10BCornwall) Are there any ACM validation records that need to be added? [21:12:56] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042735 (10Ahsan-arbisoft) NO [21:14:40] (03CR) 10EarlyWarningBot: "[Failed command](https://integration.wikimedia.org/ci/job/quibble-with-gated-extensions-vendor-mysql-php83/42653/consoleFull): `composer r" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:14:59] (03CR) 10CI reject: [V:04-1] CaptchaPreAuthenticationProvider: Clear solved state on failure [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:16:28] (03CR) 10BCornwall: [C:03+1] learn.wiki: update DNS records [dns] - 10https://gerrit.wikimedia.org/r/1304893 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [21:17:27] (03PS2) 10Dreamy Jazz: CaptchaPreAuthenticationProvider: Clear solved state on failure [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) [21:17:30] (03CR) 10CDobbins: [C:03+2] learn.wiki: update DNS records [dns] - 10https://gerrit.wikimedia.org/r/1304893 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [21:18:19] (03PS3) 10Dreamy Jazz: CaptchaPreAuthenticationProvider: Clear solved state on failure [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) [21:18:44] !log cdobbins@dns1004 START - running authdns-update [21:20:39] !log cdobbins@dns1004 END - running authdns-update [21:20:40] !log arlolra@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304887|Add a hidden lint for pre ext tags expanding templates (T353697)]] (duration: 33m 42s) [21:20:48] T353697: Parsoid/legacy parser {{Pre}} template rendering difference - https://phabricator.wikimedia.org/T353697 [21:21:39] Any security deploys to do? [21:21:47] Or can I proceed [21:24:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304896 (https://phabricator.wikimedia.org/T429594) (owner: 10Dreamy Jazz) [21:24:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:25:12] (03PS1) 10Bking: WIP: cirrussearch: set hieradata for OpenSearch 1->2 migration [puppet] - 10https://gerrit.wikimedia.org/r/1304906 (https://phabricator.wikimedia.org/T429844) [21:25:41] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1304906 (https://phabricator.wikimedia.org/T429844) (owner: 10Bking) [21:27:19] (03Merged) 10jenkins-bot: RiskScoreCollector: Make error_context a string map [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304896 (https://phabricator.wikimedia.org/T429594) (owner: 10Dreamy Jazz) [21:31:40] (03CR) 10Scott French: "Thanks, Luca!" [puppet] - 10https://gerrit.wikimedia.org/r/1304596 (https://phabricator.wikimedia.org/T428022) (owner: 10Elukey) [21:31:41] (03Merged) 10jenkins-bot: CaptchaPreAuthenticationProvider: Clear solved state on failure [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304904 (https://phabricator.wikimedia.org/T429705) (owner: 10Dreamy Jazz) [21:32:03] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304896|RiskScoreCollector: Make error_context a string map (T429594)]], [[gerrit:1304904|CaptchaPreAuthenticationProvider: Clear solved state on failure (T429705)]] [21:32:08] T429594: mediawiki.client.error stream validation errors - 2026-06-17 - https://phabricator.wikimedia.org/T429594 [21:32:09] T429705: hCaptcha: Bad login CAPTCHA on second or more attempts shows every other submission - https://phabricator.wikimedia.org/T429705 [21:33:39] (03PS1) 10JHathaway: WIP: ini config rev 2 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304908 [21:34:49] (03PS1) 10Eric Gardner: MMV Beta Viewer: Improve loading/navigation UX [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) [21:35:21] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042833 (10Ahsan-arbisoft) @BCornwall Have you also pointed *.learn.wiki` to the CNAME `a40059d1ee67a3468.awsglobalaccelerator.com? [21:36:01] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304896|RiskScoreCollector: Make error_context a string map (T429594)]], [[gerrit:1304904|CaptchaPreAuthenticationProvider: Clear solved state on failure (T429705)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:36:51] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [21:39:23] (03CR) 10CI reject: [V:04-1] WIP: ini config rev 2 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304908 (owner: 10JHathaway) [21:43:15] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304896|RiskScoreCollector: Make error_context a string map (T429594)]], [[gerrit:1304904|CaptchaPreAuthenticationProvider: Clear solved state on failure (T429705)]] (duration: 11m 12s) [21:43:15] (03PS1) 10Scott French: shellbox: Pick up images reflecting latest code [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304860 (https://phabricator.wikimedia.org/T428013) [21:43:22] T429594: mediawiki.client.error stream validation errors - 2026-06-17 - https://phabricator.wikimedia.org/T429594 [21:43:22] T429705: hCaptcha: Bad login CAPTCHA on second or more attempts shows every other submission - https://phabricator.wikimedia.org/T429705 [21:47:01] (03PS1) 10Dreamy Jazz: hCaptcha: Enable for badlogin on non-SUL wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304911 (https://phabricator.wikimedia.org/T429843) [21:48:02] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12042856 (10Ahsan-arbisoft) If ***.learn.wiki ** cannot be configured due to similar apex-related limitations. Could you please add the following domains individually and point them to th... [21:49:37] (03Restored) 10Jdlrobson: Ensure page tools icons are only shown on small viewports [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304151 (https://phabricator.wikimedia.org/T426131) (owner: 10Jdlrobson) [21:51:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304911 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [21:52:41] (03Merged) 10jenkins-bot: hCaptcha: Enable for badlogin on non-SUL wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304911 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [21:52:57] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] [21:53:02] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [21:55:00] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:58:33] (03PS2) 10SBassett: Lazily reject pre-fix parser-cache entries for noreferrer/noopener links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) [22:04:02] !log dreamyjazz@deploy1003 dreamyjazz: Rolling back deployment [22:04:34] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] (duration: 11m 37s) [22:04:38] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [22:07:06] (03PS1) 10Dreamy Jazz: hCaptcha: Apply generic settings for bad login [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304913 (https://phabricator.wikimedia.org/T429843) [22:07:25] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304913 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [22:08:11] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304913 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [22:08:28] (03PS2) 10Eric Gardner: MMV Beta Viewer: Improve loading/navigation UX [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) [22:08:28] (03PS1) 10Eric Gardner: Inject service RepoGroup into Hooks [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304914 [22:08:41] (03PS2) 10Eric Gardner: Take the feature out of beta [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304910 (https://phabricator.wikimedia.org/T429509) [22:09:30] (03PS1) 10CDobbins: . [dns] - 10https://gerrit.wikimedia.org/r/1304915 [22:10:15] (03Merged) 10jenkins-bot: hCaptcha: Apply generic settings for bad login [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304913 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [22:10:33] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304913|hCaptcha: Apply generic settings for bad login (T429843)]], [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] [22:10:38] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [22:12:29] (03PS2) 10CDobbins: learn.wiki: update DNS records [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) [22:12:34] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304913|hCaptcha: Apply generic settings for bad login (T429843)]], [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:13:29] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [22:16:14] (03CR) 10BCornwall: learn.wiki: update DNS records (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:17:48] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304913|hCaptcha: Apply generic settings for bad login (T429843)]], [[gerrit:1304911|hCaptcha: Enable for badlogin on non-SUL wikis (T429843)]] (duration: 07m 14s) [22:17:52] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [22:20:13] (03PS3) 10CDobbins: learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) [22:22:54] (03CR) 10BCornwall: [C:03+1] learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:23:17] (03PS4) 10CDobbins: learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) [22:23:26] (03CR) 10CDobbins: learn.wiki: set default path to AWS GA (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:23:39] (03CR) 10BCornwall: [C:03+1] learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:24:44] (03PS5) 10CDobbins: learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) [22:25:17] jouncebot: nowandnext [22:25:17] For the next 0 hour(s) and 34 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2100) [22:25:24] In 0 hour(s) and 34 minute(s): Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2300) [22:26:03] (03CR) 10BCornwall: "[nit] PS4 was fine, PS5 adds in newline shenanigans but it's fine" [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:26:41] Readers will be deploying a few patches when our window comes up in half an hour [22:28:47] (03CR) 10BCornwall: [C:03+1] learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:28:59] (03PS1) 10Dreamy Jazz: [WIP] hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) [22:29:13] (03CR) 10CDobbins: [C:03+2] learn.wiki: set default path to AWS GA [dns] - 10https://gerrit.wikimedia.org/r/1304915 (https://phabricator.wikimedia.org/T429628) (owner: 10CDobbins) [22:29:47] !log cdobbins@dns1004 START - running authdns-update [22:31:38] !log cdobbins@dns1004 END - running authdns-update [22:33:57] jouncebot: nowandnext [22:33:57] For the next 0 hour(s) and 26 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2100) [22:33:57] In 0 hour(s) and 26 minute(s): Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2300) [22:35:22] (03PS1) 10Dreamy Jazz: hCaptcha: Enable for bad login on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304920 (https://phabricator.wikimedia.org/T429843) [22:35:41] !log jclark@cumin1003 START - Cookbook sre.dns.netbox [22:35:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304920 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [22:38:16] (03Merged) 10jenkins-bot: hCaptcha: Enable for bad login on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304920 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [22:38:32] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304920|hCaptcha: Enable for bad login on group1 (T429843)]] [22:38:36] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [22:39:28] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:39:53] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding clouddb1031 to eqiad - jclark@cumin1003" [22:39:58] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding clouddb1031 to eqiad - jclark@cumin1003" [22:39:58] !log jclark@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [22:40:28] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:40:32] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1304920|hCaptcha: Enable for bad login on group1 (T429843)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:40:46] !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:41:01] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [22:41:58] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:43:11] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:44:20] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1029.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [22:45:17] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304920|hCaptcha: Enable for bad login on group1 (T429843)]] (duration: 06m 45s) [22:45:21] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [22:46:32] EricGardner: I'm done, so if you wanted to start early it seems you can [22:48:14] !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:49:41] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:53:29] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1030.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:54:10] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1029.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:54:22] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1031.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:54:23] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:55:01] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1032.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [22:55:41] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:00:04] Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260622T2300) [23:03:01] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1030.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:03:26] proceeding [23:03:29] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1031.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:03:49] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304151 (https://phabricator.wikimedia.org/T426131) (owner: 10Jdlrobson) [23:03:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) (owner: 10Jdlrobson) [23:04:03] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1032.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:04:36] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:04:47] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1032.eqiad.wmnet with OS trixie [23:04:55] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043086 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1032.eqiad.wmnet with OS trixie [23:05:13] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1033.eqiad.wmnet with OS trixie [23:05:27] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043088 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1033.eqiad.wmnet with OS trixie [23:05:49] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1031.eqiad.wmnet with OS trixie [23:06:05] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1030.eqiad.wmnet with OS trixie [23:06:06] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043089 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1031.eqiad.wmnet with OS trixie [23:06:15] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043090 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1030.eqiad.wmnet with OS trixie [23:12:40] (03Merged) 10jenkins-bot: Ensure page tools icons are only shown on small viewports [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304151 (https://phabricator.wikimedia.org/T426131) (owner: 10Jdlrobson) [23:12:43] (03CR) 10CI reject: [V:04-1] Fix duplicate print and other projects menu links in main menu [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) (owner: 10Jdlrobson) [23:13:11] (03PS3) 10Jdlrobson: Fix duplicate print and other projects menu links in main menu [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) [23:13:19] (03CR) 10TrainBranchBot: "Approved by jdlrobson@deploy1003 using scap backport" [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) (owner: 10Jdlrobson) [23:13:23] (03CR) 10TrainBranchBot: "Approved by jdlrobson@deploy1003 using scap backport" [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) (owner: 10Jdlrobson) [23:21:13] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1032.eqiad.wmnet with reason: host reimage [23:25:05] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1032.eqiad.wmnet with reason: host reimage [23:25:26] (03Merged) 10jenkins-bot: Fix duplicate print and other projects menu links in main menu [skins/Vector] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304863 (https://phabricator.wikimedia.org/T429676) (owner: 10Jdlrobson) [23:25:46] !log jdlrobson@deploy1003 Started scap sync-world: Backport for [[gerrit:1304151|Ensure page tools icons are only shown on small viewports (T426131)]], [[gerrit:1304863|Fix duplicate print and other projects menu links in main menu (T429676)]] [23:25:53] T426131: Update tools belt to accommodate watch and bookmark and update page tools to only show icons at lower resolutions - https://phabricator.wikimedia.org/T426131 [23:25:53] T429676: "print/export" and "in other projects" menu items appear in the Vector2022 main menu - https://phabricator.wikimedia.org/T429676 [23:26:05] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:27:35] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1030.eqiad.wmnet with reason: host reimage [23:27:38] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1031.eqiad.wmnet with reason: host reimage [23:27:46] !log jdlrobson@deploy1003 jdlrobson: Backport for [[gerrit:1304151|Ensure page tools icons are only shown on small viewports (T426131)]], [[gerrit:1304863|Fix duplicate print and other projects menu links in main menu (T429676)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [23:29:50] !log jdlrobson@deploy1003 jdlrobson: Continuing with deployment [23:30:09] EricGardner: all yours when spiderpig is done [23:30:31] Jdlrobson: thanks [23:31:54] !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:32:34] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1030.eqiad.wmnet with reason: host reimage [23:34:05] !log jdlrobson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304151|Ensure page tools icons are only shown on small viewports (T426131)]], [[gerrit:1304863|Fix duplicate print and other projects menu links in main menu (T429676)]] (duration: 08m 18s) [23:34:12] T426131: Update tools belt to accommodate watch and bookmark and update page tools to only show icons at lower resolutions - https://phabricator.wikimedia.org/T426131 [23:34:12] T429676: "print/export" and "in other projects" menu items appear in the Vector2022 main menu - https://phabricator.wikimedia.org/T429676 [23:34:42] Ok, starting next deployment shortly [23:35:37] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304914 (owner: 10Eric Gardner) [23:35:37] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) (owner: 10Eric Gardner) [23:35:38] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304910 (https://phabricator.wikimedia.org/T429509) (owner: 10Eric Gardner) [23:36:29] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1031.eqiad.wmnet with reason: host reimage [23:38:19] !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:39:53] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:40:17] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:40:18] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1032.eqiad.wmnet with OS trixie [23:40:34] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043180 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1032.eqiad.wmnet with OS trixie completed: - cl... [23:40:49] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1029.eqiad.wmnet with OS trixie [23:41:04] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043181 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1029.eqiad.wmnet with OS trixie [23:42:26] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304926 [23:42:26] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304926 (owner: 10TrainBranchBot) [23:45:53] (03CR) 10CI reject: [V:04-1] Inject service RepoGroup into Hooks [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304914 (owner: 10Eric Gardner) [23:45:53] (03CR) 10CI reject: [V:04-1] MMV Beta Viewer: Improve loading/navigation UX [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) (owner: 10Eric Gardner) [23:45:54] (03CR) 10CI reject: [V:04-1] Take the feature out of beta [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304910 (https://phabricator.wikimedia.org/T429509) (owner: 10Eric Gardner) [23:47:04] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1028.eqiad.wmnet with OS trixie [23:47:09] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:47:14] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043186 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1028.eqiad.wmnet with OS trixie [23:47:25] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:47:26] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1030.eqiad.wmnet with OS trixie [23:47:37] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043187 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1030.eqiad.wmnet with OS trixie completed: - cl... [23:47:58] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1026.eqiad.wmnet with OS trixie [23:48:08] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043188 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1026.eqiad.wmnet with OS trixie [23:50:00] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART [23:50:46] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:51:13] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1029.eqiad.wmnet with reason: host reimage [23:51:56] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [23:51:58] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1031.eqiad.wmnet with OS trixie [23:51:59] !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1033.eqiad.wmnet with OS trixie [23:52:07] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043205 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1031.eqiad.wmnet with OS trixie completed: - cl... [23:52:07] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043204 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1033.eqiad.wmnet with OS trixie executed with e... [23:52:44] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304914 (owner: 10Eric Gardner) [23:52:45] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) (owner: 10Eric Gardner) [23:52:46] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304910 (https://phabricator.wikimedia.org/T429509) (owner: 10Eric Gardner) [23:53:00] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1033.eqiad.wmnet with OS trixie [23:53:10] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043206 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1033.eqiad.wmnet with OS trixie [23:54:40] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043207 (10Jclark-ctr) [23:55:27] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043208 (10Jclark-ctr) [23:55:39] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1029.eqiad.wmnet with reason: host reimage [23:57:23] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1028.eqiad.wmnet with reason: host reimage [23:57:30] !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host clouddb1027.eqiad.wmnet with OS trixie [23:57:44] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043209 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1027.eqiad.wmnet with OS trixie [23:57:57] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1026.eqiad.wmnet with reason: host reimage