[00:00:10] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1304926 (owner: 10TrainBranchBot) [00:00:14] (03Merged) 10jenkins-bot: Inject service RepoGroup into Hooks [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304914 (owner: 10Eric Gardner) [00:03:06] (03Merged) 10jenkins-bot: MMV Beta Viewer: Improve loading/navigation UX [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304909 (https://phabricator.wikimedia.org/T429193) (owner: 10Eric Gardner) [00:03:18] (03Merged) 10jenkins-bot: Take the feature out of beta [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304910 (https://phabricator.wikimedia.org/T429509) (owner: 10Eric Gardner) [00:03:25] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1033.eqiad.wmnet with reason: host reimage [00:03:54] !log egardner@deploy1003 Started scap sync-world: Backport for [[gerrit:1304914|Inject service RepoGroup into Hooks]], [[gerrit:1304909|MMV Beta Viewer: Improve loading/navigation UX (T429193)]], [[gerrit:1304910|Take the feature out of beta (T429509)]] [00:04:01] T429193: Pagination lags or skips because of large images - https://phabricator.wikimedia.org/T429193 [00:04:01] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [00:05:40] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1028.eqiad.wmnet with reason: host reimage [00:07:45] !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1027.eqiad.wmnet with reason: host reimage [00:08:40] !log egardner@deploy1003 sync-world failed: Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.47.0-wmf.6,1.47.0-wmf.7,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/me [00:08:40] diawiki-multiversion --singleversion-image-basename docker-registry.discovery.wmnet/restricted/mediawiki-singleversion --webserver-image-name docker-registry.discovery.wmnet/restricted/mediawiki-webserver --latest-tag latest --label vnd.wikimedia.builder.name=scap --label vnd.wikimedia.builder.version=4.269.0 --label vnd.wikimedia.scap.stage_dir=/srv/mediawiki-staging --label vnd.wikimedia.scap.build_state_dir=/srv/mediaw [00:08:40] iki-staging/scap/image-build' returned non-zero exit status 1. (scap version: 4.269.0) (duration: 04m 45s) [00:09:45] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1026.eqiad.wmnet with reason: host reimage [00:10:19] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:10:39] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:10:40] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1029.eqiad.wmnet with OS trixie [00:10:50] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043251 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1029.eqiad.wmnet with OS trixie completed: - cl... [00:11:07] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043252 (10Jclark-ctr) [00:11:24] !log egardner@deploy1003 Started scap sync-world: Backport for [[gerrit:1304914|Inject service RepoGroup into Hooks]], [[gerrit:1304909|MMV Beta Viewer: Improve loading/navigation UX (T429193)]], [[gerrit:1304910|Take the feature out of beta (T429509)]] [00:11:31] T429193: Pagination lags or skips because of large images - https://phabricator.wikimedia.org/T429193 [00:11:31] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [00:14:55] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043255 (10Jclark-ctr) a:03Jclark-ctr [00:17:32] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1033.eqiad.wmnet with reason: host reimage [00:17:32] !log egardner@deploy1003 sync-world failed: Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.47.0-wmf.6,1.47.0-wmf.7,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/me [00:17:32] diawiki-multiversion --singleversion-image-basename docker-registry.discovery.wmnet/restricted/mediawiki-singleversion --webserver-image-name docker-registry.discovery.wmnet/restricted/mediawiki-webserver --latest-tag latest --label vnd.wikimedia.builder.name=scap --label vnd.wikimedia.builder.version=4.269.0 --label vnd.wikimedia.scap.stage_dir=/srv/mediawiki-staging --label vnd.wikimedia.scap.build_state_dir=/srv/mediaw [00:17:32] iki-staging/scap/image-build' returned non-zero exit status 1. (scap version: 4.269.0) (duration: 03m 50s) [00:17:33] Ok, I'm not able to backport changes despite the patches merging to the target branch successfully. The error seems to be: [00:17:33] [mediawiki-publish-83] Error response from daemon: Error processing tar file(exit status 1): write /srv/mediawiki/php-1.47.0-wmf.7/extensions/Newsletter/i18n/hu.json: no space left on device [00:17:33] It looks like the build host may have run out of disk space -- can anyone from RelEng look into this? [00:17:33] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [00:17:56] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1027.eqiad.wmnet with reason: host reimage [00:20:30] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:20:51] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:20:52] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1028.eqiad.wmnet with OS trixie [00:21:02] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043266 (10Jclark-ctr) [00:21:05] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043267 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1028.eqiad.wmnet with OS trixie completed: - cl... [00:24:10] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:24:25] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:24:26] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1026.eqiad.wmnet with OS trixie [00:24:34] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043269 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1026.eqiad.wmnet with OS trixie completed: - cl... [00:24:39] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043270 (10Jclark-ctr) [00:29:53] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:30:09] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:30:11] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1033.eqiad.wmnet with OS trixie [00:30:23] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043271 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1033.eqiad.wmnet with OS trixie completed: - cl... [00:30:25] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043272 (10Jclark-ctr) [00:33:02] !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:33:16] !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003" [00:33:18] !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1027.eqiad.wmnet with OS trixie [00:33:27] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043275 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host clouddb1027.eqiad.wmnet with OS trixie completed: - cl... [00:33:29] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043276 (10Jclark-ctr) [00:33:45] 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#12043277 (10Jclark-ctr) 05Open→03Resolved [00:39:23] !log codesearch10: systemctl start codesearch-write-config; systemctl restart hound-operations (gerrit:1304848) (T429819) [00:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:23] T429819: Index the slothslo gitlab repo in codesearch - https://phabricator.wikimedia.org/T429819 [00:46:39] !log manually started scap-clean-images.service on deply1003 to reclaim /srv space (93% -> 66% utilization) [00:46:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:47:03] EricGardner: please retry your deployment [00:47:11] ah, I guess Eric is gone? [00:51:20] !log egardner@deploy1003 Started scap sync-world: Backport for [[gerrit:1304914|Inject service RepoGroup into Hooks]], [[gerrit:1304909|MMV Beta Viewer: Improve loading/navigation UX (T429193)]], [[gerrit:1304910|Take the feature out of beta (T429509)]] [00:51:27] T429193: Pagination lags or skips because of large images - https://phabricator.wikimedia.org/T429193 [00:51:27] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [00:55:04] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12043293 (10BCornwall) The DNS changes have been applied. [00:56:14] !log brett@dns7002 START - running authdns-update [00:57:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:57:56] !log brett@dns7002 END - running authdns-update [01:00:45] 10ops-codfw, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: rack A8 maintenance 2026-07-01 10:00 am CT - https://phabricator.wikimedia.org/T429856 (10Papaul) 03NEW [01:06:46] (03PS1) 10BCornwall: learn.wiki: Explicitly set prod host records [dns] - 10https://gerrit.wikimedia.org/r/1304934 (https://phabricator.wikimedia.org/T429628) [01:07:51] (03CR) 10BCornwall: [C:03+2] learn.wiki: Explicitly set prod host records [dns] - 10https://gerrit.wikimedia.org/r/1304934 (https://phabricator.wikimedia.org/T429628) (owner: 10BCornwall) [01:08:18] !log brett@dns7002 START - running authdns-update [01:10:07] !log brett@dns7002 END - running authdns-update [01:10:36] !log egardner@deploy1003 egardner: Backport for [[gerrit:1304914|Inject service RepoGroup into Hooks]], [[gerrit:1304909|MMV Beta Viewer: Improve loading/navigation UX (T429193)]], [[gerrit:1304910|Take the feature out of beta (T429509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [01:10:43] T429193: Pagination lags or skips because of large images - https://phabricator.wikimedia.org/T429193 [01:10:44] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [01:11:15] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.47.0-wmf.8 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1304935 (https://phabricator.wikimedia.org/T423917) [01:11:18] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.47.0-wmf.8 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1304935 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [01:12:28] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304936 [01:12:28] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304936 (owner: 10TrainBranchBot) [01:12:40] !log egardner@deploy1003 egardner: Continuing with deployment [01:20:24] (03Merged) 10jenkins-bot: Branch commit for wmf/1.47.0-wmf.8 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1304935 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [01:20:32] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1304936 (owner: 10TrainBranchBot) [01:24:43] !log egardner@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304914|Inject service RepoGroup into Hooks]], [[gerrit:1304909|MMV Beta Viewer: Improve loading/navigation UX (T429193)]], [[gerrit:1304910|Take the feature out of beta (T429509)]] (duration: 33m 22s) [01:24:49] T429193: Pagination lags or skips because of large images - https://phabricator.wikimedia.org/T429193 [01:24:49] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [01:28:39] (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) (owner: 10Kimberly Sarabia) [01:29:38] (03Merged) 10jenkins-bot: Remove multimediaviewer-beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304859 (https://phabricator.wikimedia.org/T429509) (owner: 10Kimberly Sarabia) [01:29:58] !log egardner@deploy1003 Started scap sync-world: Backport for [[gerrit:1304859|Remove multimediaviewer-beta]] [01:34:13] !log egardner@deploy1003 egardner, ksarabia: Backport for [[gerrit:1304859|Remove multimediaviewer-beta]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [01:34:44] !log egardner@deploy1003 egardner, ksarabia: Continuing with deployment [01:41:12] !log egardner@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304859|Remove multimediaviewer-beta]] (duration: 11m 14s) [02:00:04] Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0200) [02:00:51] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:05:49] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: rack A8 maintenance 2026-07-01 10:00 am CT - https://phabricator.wikimedia.org/T429856#12043444 (10Papaul) p:05Triage→03Medium [02:08:39] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 07m 25s) [02:09:40] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:52] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:24:00] 10ops-codfw, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12043459 (10Papaul) a:03Papaul [02:25:44] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12043469 (10Papaul) [02:44:23] FIRING: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [02:45:37] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [03:00:04] Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0300) [03:01:53] (03PS1) 10TrainBranchBot: testwikis to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304946 (https://phabricator.wikimedia.org/T423917) [03:01:57] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by mwpresync@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304946 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [03:04:27] (03Merged) 10jenkins-bot: testwikis to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304946 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [03:04:53] !log mwpresync@deploy1003 Started scap sync-world: testwikis to 1.47.0-wmf.8 refs T423917 [03:11:07] T423917: 1.47.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T423917 [03:12:27] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [03:15:27] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [03:36:11] (03CR) 10Andrea Denisse: [C:03+1] "I checked the dashboard you mentioned and it's only panel is already using Geomap tho it's not rendering correctly due to an error unrelat" [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/1303453 (owner: 10Ayounsi) [03:44:16] !log mwpresync@deploy1003 Finished scap sync-world: testwikis to 1.47.0-wmf.8 refs T423917 (duration: 39m 23s) [03:44:21] T423917: 1.47.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T423917 [03:46:46] (03CR) 10Arnaudb: [C:03+2] gitlab: advertise gitlab-ssh url on gitlab primary [puppet] - 10https://gerrit.wikimedia.org/r/1300763 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [03:48:54] (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1302185 (https://phabricator.wikimedia.org/T249663) (owner: 10Hnowlan) [03:56:25] FIRING: SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:00:05] Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0400) [04:01:25] RESOLVED: SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:02:43] !log mwpresync@deploy1003 Pruned MediaWiki: 1.47.0-wmf.5 (duration: 02m 37s) [04:13:25] 10ops-codfw, 06DC-Ops: Unresponsive management for kafka-main2009.mgmt:22 - https://phabricator.wikimedia.org/T429864 (10phaultfinder) 03NEW [04:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [04:57:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:01:24] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Primary switchover es7 T429794 [05:01:28] T429794: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T429794 [05:01:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set es2039 with weight 0 T429794', diff saved to https://phabricator.wikimedia.org/P94330 and previous config saved to /var/cache/conftool/dbconfig/20260623-050137-marostegui.json [05:04:57] (03CR) 10Marostegui: [C:03+2] mariadb: Promote es2039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1304804 (https://phabricator.wikimedia.org/T429794) (owner: 10Gerrit maintenance bot) [05:07:32] !log Starting es7 codfw failover from es2038 to es2039 - T429794 [05:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:36] T429794: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T429794 [05:07:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote es2039 to es7 primary T429794', diff saved to https://phabricator.wikimedia.org/P94331 and previous config saved to /var/cache/conftool/dbconfig/20260623-050758-marostegui.json [05:10:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool es2038 T429794', diff saved to https://phabricator.wikimedia.org/P94332 and previous config saved to /var/cache/conftool/dbconfig/20260623-051012-marostegui.json [05:12:16] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [05:12:16] !log marostegui@cumin1003 dbmaint on es7@codfw T429463 [05:12:22] T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463 [05:12:26] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2038: Upgrading es2038.codfw.wmnet [05:12:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2038: Upgrading es2038.codfw.wmnet [05:13:37] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2038.codfw.wmnet with OS trixie [05:14:42] (03CR) 10Ayounsi: [C:03+2] Remove worldmap panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/1303453 (owner: 10Ayounsi) [05:17:25] (03CR) 10Ayounsi: [V:03+2 C:03+2] Remove worldmap panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/1303453 (owner: 10Ayounsi) [05:19:23] RESOLVED: SLOBudgetBurn: Standalone event system success rate is below 99.9% target - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [05:20:15] (03PS1) 10Marostegui: control-mariadb-10.11-bookworm: New version [software] - 10https://gerrit.wikimedia.org/r/1305012 (https://phabricator.wikimedia.org/T428861) [05:21:25] (03CR) 10Marostegui: [C:03+2] control-mariadb-10.11-bookworm: New version [software] - 10https://gerrit.wikimedia.org/r/1305012 (https://phabricator.wikimedia.org/T428861) (owner: 10Marostegui) [05:22:05] (03Merged) 10jenkins-bot: control-mariadb-10.11-bookworm: New version [software] - 10https://gerrit.wikimedia.org/r/1305012 (https://phabricator.wikimedia.org/T428861) (owner: 10Marostegui) [05:26:19] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12043625 (10ayounsi) [05:26:36] (03PS1) 10C. Scott Ananian: [parser] Return HeadingPFragments while preprocessing [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305015 (https://phabricator.wikimedia.org/T391624) [05:27:13] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305015 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [05:29:40] (03PS1) 10C. Scott Ananian: [REST] Move full-document HTML clients to ::getAsRawHtmlString() [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305016 (https://phabricator.wikimedia.org/T393925) [05:31:06] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2038.codfw.wmnet with reason: host reimage [05:31:52] (03PS1) 10C. Scott Ananian: [parser] Add configuration to return experimental PFragment types [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305017 [05:32:18] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305017 (owner: 10C. Scott Ananian) [05:33:18] (03PS1) 10C. Scott Ananian: [parser] Return HeadingPFragments while preprocessing [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) [05:35:09] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [05:35:34] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2038.codfw.wmnet with reason: host reimage [05:36:25] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305016 (https://phabricator.wikimedia.org/T393925) (owner: 10C. Scott Ananian) [05:37:34] (03CR) 10CI reject: [V:04-1] [REST] Move full-document HTML clients to ::getAsRawHtmlString() [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305016 (https://phabricator.wikimedia.org/T393925) (owner: 10C. Scott Ananian) [05:41:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc2018.codfw.wmnet with reason: Reimage to Trixie [05:41:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc2018: Reimage to Trixie [05:41:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [05:41:31] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [05:41:32] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2018: Reimage to Trixie [05:42:14] (03CR) 10CI reject: [V:04-1] [parser] Return HeadingPFragments while preprocessing [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [05:42:34] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host pc2018.codfw.wmnet with OS trixie [05:53:23] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2038.codfw.wmnet with OS trixie [05:55:10] (03PS1) 10Gerrit maintenance bot: mariadb: Promote es1039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1305020 (https://phabricator.wikimedia.org/T429867) [05:55:16] (03PS1) 10Gerrit maintenance bot: wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1305021 (https://phabricator.wikimedia.org/T429867) [05:59:03] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1305022 (https://phabricator.wikimedia.org/T429868) [05:59:10] (03PS1) 10Gerrit maintenance bot: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1305023 (https://phabricator.wikimedia.org/T429868) [05:59:44] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2018.codfw.wmnet with reason: host reimage [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0600) [06:00:05] marostegui, Amir1, and federico3: Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0600). Please do the needful. [06:00:42] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2038: Migration of es2038.codfw.wmnet completed [06:01:46] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 21 hosts with reason: Primary switchover s6 T429868 [06:01:50] T429868: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T429868 [06:01:58] !log fceratto@cumin1003 dbctl commit (dc=all): 'Set db1173 with weight 0 T429868', diff saved to https://phabricator.wikimedia.org/P94335 and previous config saved to /var/cache/conftool/dbconfig/20260623-060157-fceratto.json [06:04:29] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2018.codfw.wmnet with reason: host reimage [06:04:51] (03CR) 10Federico Ceratto: [C:03+2] mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1305022 (https://phabricator.wikimedia.org/T429868) (owner: 10Gerrit maintenance bot) [06:05:49] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [06:06:49] !log Starting s6 eqiad failover from db1201 to db1173 - T429868 [06:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:54] T429868: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T429868 [06:07:09] !log fceratto@cumin1003 dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T429868', diff saved to https://phabricator.wikimedia.org/P94336 and previous config saved to /var/cache/conftool/dbconfig/20260623-060708-fceratto.json [06:07:38] !log fceratto@cumin1003 dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T429868', diff saved to https://phabricator.wikimedia.org/P94337 and previous config saved to /var/cache/conftool/dbconfig/20260623-060737-fceratto.json [06:07:41] fceratto@cumin1003: Failed to log message to wiki. Somebody should check the error logs. [06:10:16] (03CR) 10Federico Ceratto: [C:03+2] wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1305023 (https://phabricator.wikimedia.org/T429868) (owner: 10Gerrit maintenance bot) [06:11:29] !log fceratto@dns1004 START - running authdns-update [06:13:20] !log fceratto@dns1004 END - running authdns-update [06:14:16] !log fceratto@cumin1003 dbctl commit (dc=all): 'Depool db1201 T429868', diff saved to https://phabricator.wikimedia.org/P94338 and previous config saved to /var/cache/conftool/dbconfig/20260623-061416-fceratto.json [06:14:20] T429868: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T429868 [06:14:53] !log fceratto@cumin1003 START - Cookbook sre.mysql.pool pool db1201: Repooling after switchover [06:27:29] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2018.codfw.wmnet with OS trixie [06:28:22] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc2018: after reimage to trixie [06:28:23] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc2018: after reimage to trixie [06:31:42] (03CR) 10Elukey: [C:03+1] "LGTM, I'll wait for Riccardo's review before proceeding." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [06:32:03] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1018.eqiad.wmnet with reason: Reimage to Trixie [06:32:06] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc1018: Reimage to Trixie [06:32:06] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [06:32:12] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [06:32:12] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1018: Reimage to Trixie [06:32:35] (03CR) 10Elukey: [C:03+1] slothslos/report2drive: add Hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [06:32:51] (03CR) 10Elukey: [C:03+1] slothslos/report2drive: enable deep merge for vars [puppet] - 10https://gerrit.wikimedia.org/r/1298298 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [06:32:58] (03CR) 10Elukey: [C:03+1] slothslos/report2drive: instantiate resources [puppet] - 10https://gerrit.wikimedia.org/r/1298296 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [06:34:19] (03PS1) 10C. Scott Ananian: [parser] Return ExtTagPFragments while preprocessing [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305025 (https://phabricator.wikimedia.org/T429624) [06:34:31] (03CR) 10C. Scott Ananian: "recheck" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [06:35:09] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305025 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [06:36:03] marostegui@cumin1003 reimage (PID 2232444) is awaiting input [06:36:11] (03PS1) 10C. Scott Ananian: [parser] Return ExtTagPFragments while preprocessing [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305026 (https://phabricator.wikimedia.org/T429624) [06:37:00] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305026 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [06:37:27] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS trixie [06:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:45:38] (03PS17) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [06:45:49] (03CR) 10Ayounsi: "Thanks! lots of things to change !" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [06:45:58] (03CR) 10CI reject: [V:04-1] [parser] Return ExtTagPFragments while preprocessing [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305026 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [06:46:13] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2038: Migration of es2038.codfw.wmnet completed [06:46:14] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [06:46:21] (03CR) 10CI reject: [V:04-1] diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [06:47:40] (03PS18) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [06:48:17] (03CR) 10CI reject: [V:04-1] diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [06:53:57] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage [07:00:05] Amir1, urbanecm, and awight: How many deployers does it take to do UTC morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T0700). [07:00:05] WMDE-Fisch and Dreamy_Jazz: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:11] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage [07:00:28] \o I would self serve :-) [07:00:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by wmde-fisch@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) (owner: 10Svantje Lilienthal) [07:01:51] (03Merged) 10jenkins-bot: Global rollout - Sub-ref deployments to group 2 wikis (batch 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304785 (https://phabricator.wikimedia.org/T428902) (owner: 10Svantje Lilienthal) [07:01:54] !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1201: Repooling after switchover [07:02:40] !log wmde-fisch@deploy1003 Started scap sync-world: Backport for [[gerrit:1304785|Global rollout - Sub-ref deployments to group 2 wikis (batch 1) (T428902)]] [07:02:44] T428902: Global rollout - Sub-ref deployments to group 2 wikis (batch 1) - https://phabricator.wikimedia.org/T428902 [07:07:29] !log wmde-fisch@deploy1003 lilients, wmde-fisch: Backport for [[gerrit:1304785|Global rollout - Sub-ref deployments to group 2 wikis (batch 1) (T428902)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:08:39] !log wmde-fisch@deploy1003 lilients, wmde-fisch: Continuing with deployment [07:15:17] !log wmde-fisch@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304785|Global rollout - Sub-ref deployments to group 2 wikis (batch 1) (T428902)]] (duration: 12m 37s) [07:15:22] T428902: Global rollout - Sub-ref deployments to group 2 wikis (batch 1) - https://phabricator.wikimedia.org/T428902 [07:16:43] I'm done. Dreamy_Jazz's patch was already deployed it seems. [07:22:41] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1018.eqiad.wmnet with OS trixie [07:23:34] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: after reimage to trixie [07:23:35] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1018: after reimage to trixie [07:30:15] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: repool after recloning another host [07:30:15] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1018: repool after recloning another host [07:30:21] (03Abandoned) 10C. Scott Ananian: [REST] Move full-document HTML clients to ::getAsRawHtmlString() [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305016 (https://phabricator.wikimedia.org/T393925) (owner: 10C. Scott Ananian) [07:30:40] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc2018: repool after recloning another host [07:30:40] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc2018: repool after recloning another host [07:31:22] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: repool after recloning another host [07:31:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [07:31:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [07:31:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1018: repool after recloning another host [07:33:28] I'll have a private code change to deploy in a moment [07:34:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc1018: Maintenance on pc2 [07:34:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [07:34:28] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [07:34:28] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1018: Maintenance on pc2 [07:34:38] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: repool after recloning another host [07:34:38] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1018: repool after recloning another host [07:34:47] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: repool after recloning another host [07:34:47] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [07:35:00] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [07:35:00] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1018: repool after recloning another host [07:46:18] !log Deployed update to SI private code [07:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [08:22:51] 10ops-eqiad, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429876 (10phaultfinder) 03NEW [08:27:32] 10ops-eqiad, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429876#12043968 (10VRiley-WMF) a:03VRiley-WMF [08:30:39] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance [08:30:47] !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1201 (T426633)', diff saved to https://phabricator.wikimedia.org/P94351 and previous config saved to /var/cache/conftool/dbconfig/20260623-083046-fceratto.json [08:32:13] (03PS1) 10Aklapper: phabricator: drop diffusion.allow-http-auth config [puppet] - 10https://gerrit.wikimedia.org/r/1305040 (https://phabricator.wikimedia.org/T418045) [08:33:02] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool pc1018: Maintenance on pc8 [08:33:02] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [08:33:10] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [08:33:10] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1018: Maintenance on pc8 [08:33:37] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: test [08:33:37] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) pool pc1018: test [08:33:55] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool pc1018: test [08:33:56] !log marostegui@cumin1003 START - Cookbook sre.mysql.parsercache [08:34:09] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) [08:34:09] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1018: test [08:35:56] (03PS1) 10Aklapper: phabricator: drop differential.allow-self-accept config [puppet] - 10https://gerrit.wikimedia.org/r/1305041 (https://phabricator.wikimedia.org/T330797) [08:36:19] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12044038 (10cmooney) [08:37:30] !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T426633)', diff saved to https://phabricator.wikimedia.org/P94354 and previous config saved to /var/cache/conftool/dbconfig/20260623-083729-fceratto.json [08:37:58] (03PS1) 10Slyngshede: Favicon: Replace ico file with SVG graphics [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1305042 (https://phabricator.wikimedia.org/T258379) [08:41:48] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Document IDP MFA policy and processes - https://phabricator.wikimedia.org/T284725#12044058 (10SLyngshede-WMF) 05Open→03In progress a:03SLyngshede-WMF Preliminary documentation is available here: https://wikitech.wikimedia.org/wiki/CAS-SSO#Configuring_and_... [08:42:10] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Document IDP MFA policy and processes - https://phabricator.wikimedia.org/T284725#12044062 (10SLyngshede-WMF) [08:42:13] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841#12044063 (10SLyngshede-WMF) [08:42:16] (03PS1) 10Ozge: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305043 (https://phabricator.wikimedia.org/T428882) [08:42:58] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Document IDP MFA policy and processes - https://phabricator.wikimedia.org/T284725#12044066 (10SLyngshede-WMF) [08:42:59] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Enable webauthn in CAS to replace U2F - https://phabricator.wikimedia.org/T311236#12044067 (10SLyngshede-WMF) [08:43:00] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841#12044068 (10SLyngshede-WMF) [08:43:30] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Enable webauthn in CAS to replace U2F - https://phabricator.wikimedia.org/T311236#12044072 (10SLyngshede-WMF) [08:43:31] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: WebAuthn FIDO2 support in CAS - https://phabricator.wikimedia.org/T277841#12044070 (10SLyngshede-WMF) →14Duplicate dup:03T311236 [08:44:29] (03PS1) 10Aklapper: phabricator: drop diffusion.ssh-host config [puppet] - 10https://gerrit.wikimedia.org/r/1305044 (https://phabricator.wikimedia.org/T429367) [08:44:55] 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#12044078 (10VRiley-WMF) @Dzahn I had a question. In the ticket it lists several ganeti servers and zuul servers. ganeti1023 - zuul1004 ganeti1024 - zuul1005 gan... [08:45:05] (03CR) 10Aklapper: [C:04-1] "This first requires merging and deploying https://gitlab.wikimedia.org/repos/phabricator/deployment/-/merge_requests/110" [puppet] - 10https://gerrit.wikimedia.org/r/1305044 (https://phabricator.wikimedia.org/T429367) (owner: 10Aklapper) [08:45:10] (03PS4) 10CWilliams: Cookbook sre.mysql.upgrade should not accept multiple hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1302745 (https://phabricator.wikimedia.org/T429230) [08:46:11] 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#12044095 (10VRiley-WMF) Physicalled relabeled zuul1004-1007. Awaiting clarification before proceeding. [08:46:40] (03PS1) 10Marostegui: db1262: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1305047 (https://phabricator.wikimedia.org/T428832) [08:46:51] (03PS19) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [08:47:20] (03CR) 10Marostegui: [C:03+2] db1262: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1305047 (https://phabricator.wikimedia.org/T428832) (owner: 10Marostegui) [08:47:38] !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P94355 and previous config saved to /var/cache/conftool/dbconfig/20260623-084737-fceratto.json [08:48:15] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1262: After HW issues T428832 [08:48:19] T428832: db1262 crashed - https://phabricator.wikimedia.org/T428832 [08:48:31] !log marostegui@cumin1003 END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db1262: After HW issues T428832 [08:48:40] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1262: After HW issues T428832 [08:49:01] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, and 2 others: db1262 crashed - https://phabricator.wikimedia.org/T428832#12044118 (10Marostegui) 05Open→03Resolved I've repooled the host, let's see if it crashes again. Thanks for all the help John! [08:49:40] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12044120 (10cmooney) [08:50:16] (03CR) 10CWilliams: Cookbook sre.mysql.upgrade should not accept multiple hosts (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1302745 (https://phabricator.wikimedia.org/T429230) (owner: 10CWilliams) [08:51:46] (03PS2) 10Slyngshede: P:idp map family_name to SN [puppet] - 10https://gerrit.wikimedia.org/r/1244670 (https://phabricator.wikimedia.org/T338214) [08:55:18] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select data store for webauthn devices - https://phabricator.wikimedia.org/T380173#12044147 (10SLyngshede-WMF) 05Open→03Resolved a:03SLyngshede-WMF JPA backend selected. This means that we will not have to deal with synchronizing JSON files or managi... [08:56:56] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12044155 (10cmooney) [08:57:10] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select opt-in method for webauthn - https://phabricator.wikimedia.org/T380178#12044157 (10SLyngshede-WMF) 05Open→03In progress a:03SLyngshede-WMF In many cases WebAuthn will be forced. For others we'll simply utilize the built in attribute mechanism. See:... [08:57:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:57:45] !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P94357 and previous config saved to /var/cache/conftool/dbconfig/20260623-085744-fceratto.json [08:57:48] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select opt-in method for webauthn - https://phabricator.wikimedia.org/T380178#12044164 (10SLyngshede-WMF) Patch available here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1304784 (We have a ton of tasks on this topic) [09:00:07] 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Evaluate supported for trusted devices - https://phabricator.wikimedia.org/T380179#12044179 (10SLyngshede-WMF) 05Open→03Resolved a:03SLyngshede-WMF We'll be utilizing FIDO2 devices and enable TOTP once CAS reaches version 8.0.0. We have reported a bug... [09:00:33] (03CR) 10Ozge: [C:03+2] ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305043 (https://phabricator.wikimedia.org/T428882) (owner: 10Ozge) [09:01:03] (03PS1) 10Matthias Mullie: Enable MMV carousel on non-en wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) [09:03:03] (03Merged) 10jenkins-bot: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305043 (https://phabricator.wikimedia.org/T428882) (owner: 10Ozge) [09:03:36] (03PS1) 10Ayounsi: Add more server_depool policies [puppet] - 10https://gerrit.wikimedia.org/r/1305050 (https://phabricator.wikimedia.org/T327300) [09:04:17] 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12044210 (10Mjalaluddin) Thankyou @BCornwall [09:05:26] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie) [09:07:35] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1375.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:07:54] !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T426633)', diff saved to https://phabricator.wikimedia.org/P94359 and previous config saved to /var/cache/conftool/dbconfig/20260623-090752-fceratto.json [09:08:02] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1376.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:09] (03PS1) 10Abijeet Patro: ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) [09:08:09] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1377.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:14] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1378.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:20] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1379.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:26] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1380.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:28] (03CR) 10Cathal Mooney: [C:03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/1305050 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [09:08:30] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1381.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:35] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1382.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:39] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1383.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:08:44] !log blake@cumin1003 conftool action : set/weight=10; selector: name=wikikube-worker1384.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:11:46] 10ops-eqiad, 06SRE, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429876#12044277 (10VRiley-WMF) 05Open→03Resolved [09:12:01] (03CR) 10Ayounsi: [C:03+2] Add more server_depool policies [puppet] - 10https://gerrit.wikimedia.org/r/1305050 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [09:12:22] (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy outlink model latest version on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304805 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis) [09:12:38] (03PS1) 10Ozge: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) [09:13:05] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1375.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:10] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1376.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:15] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1377.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:20] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1378.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:24] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1379.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:28] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1380.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:33] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1381.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:38] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1382.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:43] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1383.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:13:47] !log blake@cumin1003 conftool action : set/pooled=yes; selector: name=wikikube-worker1384.eqiad.wmnet,cluster=kubernetes,service=kubesvc [09:14:01] (03PS2) 10Ozge: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) [09:14:33] (03Merged) 10jenkins-bot: ml-services: Deploy outlink model latest version on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304805 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis) [09:15:38] (03PS3) 10Ozge: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) [09:15:59] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [09:21:30] !log gkyziridis@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:24:39] (03PS2) 10Klausman: home/klausman: Add x bit to script (and make minor edit) [puppet] - 10https://gerrit.wikimedia.org/r/1305053 [09:24:46] (03CR) 10Klausman: [V:03+2 C:03+2] home/klausman: Add x bit to script (and make minor edit) [puppet] - 10https://gerrit.wikimedia.org/r/1305053 (owner: 10Klausman) [09:25:49] FIRING: [2x] HelmReleaseBadStatus: Helm release mw-script/yhn94m3m on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [09:26:56] (03CR) 10CWilliams: mysql: update replication source (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [09:28:26] (03CR) 10CWilliams: mysql: update replication source (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [09:30:32] (03PS1) 10Gkyziridis: ml-services: Deploy outlink model latest version on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305056 (https://phabricator.wikimedia.org/T429675) [09:33:36] (03CR) 10AikoChou: [C:03+1] ml-services: Update editing-suggestions storage URI. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) (owner: 10Ozge) [09:34:05] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1262: After HW issues T428832 [09:34:09] T428832: db1262 crashed - https://phabricator.wikimedia.org/T428832 [09:35:11] (03CR) 10Ozge: [C:03+2] ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) (owner: 10Ozge) [09:35:38] PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [09:36:36] RECOVERY - OSPF status on cr1-drmrs is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [09:37:22] (03Merged) 10jenkins-bot: ml-services: Update editing-suggestions storage URI. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305052 (https://phabricator.wikimedia.org/T428882) (owner: 10Ozge) [09:37:39] FIRING: CoreBGPDown: Core BGP session down between cr2-eqiad and cr1-drmrs (185.15.58.139) - group Confed_drmrs - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=eqiad&var-device=cr2-eqiad:9804&var-bgp_group=Confed_drmrs&var-bgp_neighbor=cr1-drmrs - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [09:39:55] (03CR) 10Hnowlan: [C:03+2] prometheus: use dc label in appservers_red reporting rules (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1302185 (https://phabricator.wikimedia.org/T249663) (owner: 10Hnowlan) [09:41:03] !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . [09:41:42] (03PS2) 10Gkyziridis: Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 (https://phabricator.wikimedia.org/T425680) [09:42:37] (03PS3) 10Gkyziridis: Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 (https://phabricator.wikimedia.org/T425680) [09:42:39] RESOLVED: CoreBGPDown: Core BGP session down between cr2-eqiad and cr1-drmrs (185.15.58.139) - group Confed_drmrs - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=eqiad&var-device=cr2-eqiad:9804&var-bgp_group=Confed_drmrs&var-bgp_neighbor=cr1-drmrs - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [09:48:51] (03CR) 10JMeybohm: [C:03+1] "I don't exactly recall since this is 5 years back. But I *think* I mainly added this as part of the stress testing prior to the dragonfly " [puppet] - 10https://gerrit.wikimedia.org/r/1304512 (https://phabricator.wikimedia.org/T427175) (owner: 10Elukey) [09:49:16] (03PS1) 10Btullis: Enable access to the urldownloader for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305060 (https://phabricator.wikimedia.org/T428544) [09:51:12] (03PS1) 10Klausman: home/klausman: another attempt at fixing the x bit [puppet] - 10https://gerrit.wikimedia.org/r/1305061 [09:51:20] (03CR) 10Klausman: [V:03+2 C:03+2] home/klausman: another attempt at fixing the x bit [puppet] - 10https://gerrit.wikimedia.org/r/1305061 (owner: 10Klausman) [09:54:38] (03CR) 10Brouberol: [C:03+1] Enable access to the urldownloader for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305060 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [09:57:05] (03CR) 10Klausman: [C:03+1] ml-serve: fix vram stats collection [puppet] - 10https://gerrit.wikimedia.org/r/1304813 (https://phabricator.wikimedia.org/T429597) (owner: 10Dpogorzelski) [09:57:27] (03CR) 10Kevin Bazira: [C:03+1] Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1000) [10:03:18] (03CR) 10Federico Ceratto: mysql: update replication source (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [10:04:20] (03CR) 10Gkyziridis: [C:03+2] Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:05:18] (03PS7) 10Federico Ceratto: tox.ini: Pass cache env var [cookbooks] - 10https://gerrit.wikimedia.org/r/1302159 [10:05:29] (03PS1) 10WMDE-Fisch: Improve click intent event logging and exposure tracking [extensions/Cite] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305064 (https://phabricator.wikimedia.org/T426974) [10:05:52] (03CR) 10Ozge: [C:03+1] ml-services: Deploy outlink model latest version on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305056 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis) [10:06:14] (03PS11) 10Federico Ceratto: sre.mysql: split pool/depool [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) [10:06:31] (03Merged) 10jenkins-bot: Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304810 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [10:07:51] (03CR) 10Btullis: [C:03+2] Enable access to the urldownloader for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305060 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [10:08:41] (03PS8) 10Federico Ceratto: pyproject.toml: move conf into pyproject [cookbooks] - 10https://gerrit.wikimedia.org/r/1304776 [10:10:00] (03Merged) 10jenkins-bot: Enable access to the urldownloader for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305060 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [10:13:29] (03PS1) 10Btullis: Add a custom connection to the wme_metrics API endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305065 (https://phabricator.wikimedia.org/T428544) [10:14:57] !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . [10:17:00] (03PS1) 10Klausman: home/klausman: move away from ~/bin scripts [puppet] - 10https://gerrit.wikimedia.org/r/1305067 [10:17:36] (03CR) 10Clément Goubert: [C:03+2] tls_terminator: Fix ratelimit config [puppet] - 10https://gerrit.wikimedia.org/r/1304586 (https://phabricator.wikimedia.org/T414440) (owner: 10Clément Goubert) [10:17:57] (03CR) 10Klausman: [C:03+2] home/klausman: move away from ~/bin scripts [puppet] - 10https://gerrit.wikimedia.org/r/1305067 (owner: 10Klausman) [10:22:00] !log ozge@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [10:24:02] !log ozge@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [10:27:54] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [10:28:33] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [10:30:55] (03PS1) 10Gkyziridis: Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [10:31:12] (03PS1) 10Atsuko: translate: remove CirrusSearch endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) [10:31:12] (03CR) 10Atsuko: "tested on `k8s-mw-experimental-eqiad`" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko) [10:32:23] (03CR) 10DCausse: [C:03+1] translate: remove CirrusSearch endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko) [10:32:24] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko) [10:33:37] (03PS1) 10Hnowlan: redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) [10:34:55] (03PS1) 10DCausse: flink-app: renamed deprecate settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305073 (https://phabricator.wikimedia.org/T426863) [10:34:58] (03PS1) 10DCausse: cirrus-streaming-updater: upgrade to flink 2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305074 (https://phabricator.wikimedia.org/T426839) [10:35:11] (03PS1) 10Hnowlan: redis: remove nrpe checks, replace with prometheus checks [puppet] - 10https://gerrit.wikimedia.org/r/1305075 (https://phabricator.wikimedia.org/T384924) [10:35:44] (03CR) 10CI reject: [V:04-1] redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan) [10:37:47] (03PS1) 10Marostegui: mariadb: Productionize db1290 [puppet] - 10https://gerrit.wikimedia.org/r/1305076 (https://phabricator.wikimedia.org/T423069) [10:38:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1290 [10:39:42] (03PS2) 10Hnowlan: redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) [10:39:49] (03PS2) 10Gkyziridis: Deploy Qwen3.6-27b-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [10:41:03] (03CR) 10JavierMonton: [C:03+1] flink-app: renamed deprecate settings (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305073 (https://phabricator.wikimedia.org/T426863) (owner: 10DCausse) [10:41:54] (03PS1) 10Hnowlan: redis: clean up redis nrpe check components [puppet] - 10https://gerrit.wikimedia.org/r/1305077 (https://phabricator.wikimedia.org/T384924) [10:43:19] (03PS1) 10Slyngshede: data.yaml: extend NDA for hahmed [puppet] - 10https://gerrit.wikimedia.org/r/1305078 [10:43:31] (03PS1) 10Clément Goubert: tls_terminator: Ratelimit accounting and upstream [puppet] - 10https://gerrit.wikimedia.org/r/1305079 (https://phabricator.wikimedia.org/T414440) [10:43:38] PROBLEM - haproxy failover on dbproxy1023 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [10:43:38] PROBLEM - haproxy failover on dbproxy1025 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [10:44:03] expected ^ [10:44:26] (03CR) 10Slyngshede: [C:03+2] data.yaml: extend NDA for hahmed [puppet] - 10https://gerrit.wikimedia.org/r/1305078 (owner: 10Slyngshede) [10:44:40] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:et-0/0/0 (Transport: Arelion (IC-398708) {#20260601}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [10:45:16] (03PS3) 10Gkyziridis: ml-services: Deploy Qwen3.6-27B-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [10:56:38] RECOVERY - haproxy failover on dbproxy1023 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [10:56:38] RECOVERY - haproxy failover on dbproxy1025 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [10:58:57] (03PS2) 10Tiziano Fogli: restbase: disable instance space icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1305083 (https://phabricator.wikimedia.org/T407141) [11:03:37] (03PS4) 10Tiziano Fogli: restabase: remove instance space icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1305084 (https://phabricator.wikimedia.org/T407141) [11:07:31] (03CR) 10DCausse: flink-app: renamed deprecate settings (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305073 (https://phabricator.wikimedia.org/T426863) (owner: 10DCausse) [11:07:57] (03CR) 10Marostegui: [C:03+2] mariadb: Productionize db1290 [puppet] - 10https://gerrit.wikimedia.org/r/1305076 (https://phabricator.wikimedia.org/T423069) (owner: 10Marostegui) [11:08:38] PROBLEM - haproxy failover on dbproxy1023 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [11:08:38] PROBLEM - haproxy failover on dbproxy1025 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [11:09:45] (03CR) 10Tiziano Fogli: [C:03+1] restbase: add disk space alert [alerts] - 10https://gerrit.wikimedia.org/r/1304852 (https://phabricator.wikimedia.org/T407141) (owner: 10Hnowlan) [11:10:18] (03PS1) 10Kosta Harlan: Tally: truncate BLT names by character so the result stays encodable [extensions/SecurePoll] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305088 (https://phabricator.wikimedia.org/T427104) [11:10:31] (03PS1) 10Kosta Harlan: Tally: truncate BLT names by character so the result stays encodable [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305089 (https://phabricator.wikimedia.org/T427104) [11:10:38] jouncebot: nowandnext [11:10:39] No deployments scheduled for the next 0 hour(s) and 49 minute(s) [11:10:39] In 0 hour(s) and 49 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1200) [11:10:53] going to sync a wmf.7 / wmf.8 change for SecurePoll [11:11:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305089 (https://phabricator.wikimedia.org/T427104) (owner: 10Kosta Harlan) [11:11:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/SecurePoll] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305088 (https://phabricator.wikimedia.org/T427104) (owner: 10Kosta Harlan) [11:13:06] (03CR) 10Ilias Sarantopoulos: ml-services: Deploy Qwen3.6-27B-FP8 on experimental ns. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [11:14:38] (03Merged) 10jenkins-bot: Tally: truncate BLT names by character so the result stays encodable [extensions/SecurePoll] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305089 (https://phabricator.wikimedia.org/T427104) (owner: 10Kosta Harlan) [11:15:15] (03Merged) 10jenkins-bot: Tally: truncate BLT names by character so the result stays encodable [extensions/SecurePoll] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305088 (https://phabricator.wikimedia.org/T427104) (owner: 10Kosta Harlan) [11:15:48] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1305089|Tally: truncate BLT names by character so the result stays encodable (T427104)]], [[gerrit:1305088|Tally: truncate BLT names by character so the result stays encodable (T427104)]] [11:15:53] T427104: SecurePoll STV "Create Tally" Not Responding on WMF Wikis - https://phabricator.wikimedia.org/T427104 [11:17:34] (03PS1) 10Kosta Harlan: hCaptcha: Log sitekeys on a sitekey-mismatch error [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305092 (https://phabricator.wikimedia.org/T429891) [11:17:58] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1305089|Tally: truncate BLT names by character so the result stays encodable (T427104)]], [[gerrit:1305088|Tally: truncate BLT names by character so the result stays encodable (T427104)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [11:18:17] !log kharlan@deploy1003 kharlan: Continuing with deployment [11:18:27] (03CR) 10Filippo Giunchedi: [C:03+1] Favicon: Replace ico file with SVG graphics [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1305042 (https://phabricator.wikimedia.org/T258379) (owner: 10Slyngshede) [11:18:55] (03CR) 10Filippo Giunchedi: [C:03+1] Pontoon: specmap for swift in pontoon [puppet] - 10https://gerrit.wikimedia.org/r/1304817 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [11:20:39] (03PS6) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:21:45] (03CR) 10CI reject: [V:04-1] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:21:54] (03CR) 10Filippo Giunchedi: "See inline, LGTM overall" [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [11:22:37] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305089|Tally: truncate BLT names by character so the result stays encodable (T427104)]], [[gerrit:1305088|Tally: truncate BLT names by character so the result stays encodable (T427104)]] (duration: 06m 48s) [11:22:41] T427104: SecurePoll STV "Create Tally" Not Responding on WMF Wikis - https://phabricator.wikimedia.org/T427104 [11:22:55] 06SRE, 10SRE-Access-Requests: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896 (10Nicholusmuwonge_wmde) 03NEW [11:23:54] (03PS7) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:25:51] (03CR) 10CI reject: [V:04-1] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:27:03] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305092 (https://phabricator.wikimedia.org/T429891) (owner: 10Kosta Harlan) [11:27:38] (03PS8) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:29:14] (03Merged) 10jenkins-bot: hCaptcha: Log sitekeys on a sitekey-mismatch error [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305092 (https://phabricator.wikimedia.org/T429891) (owner: 10Kosta Harlan) [11:29:43] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1305092|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] [11:29:47] T429891: hCaptcha: Log the sitekeys used when a sitekey-mismatch error is logged - https://phabricator.wikimedia.org/T429891 [11:31:04] (03PS1) 10Kosta Harlan: hCaptcha: Log sitekeys on a sitekey-mismatch error [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305096 (https://phabricator.wikimedia.org/T429891) [11:31:47] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1305092|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [11:32:06] !log kharlan@deploy1003 kharlan: Continuing with deployment [11:35:32] (03PS9) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:36:20] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305092|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] (duration: 06m 37s) [11:36:24] T429891: hCaptcha: Log the sitekeys used when a sitekey-mismatch error is logged - https://phabricator.wikimedia.org/T429891 [11:36:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305096 (https://phabricator.wikimedia.org/T429891) (owner: 10Kosta Harlan) [11:36:59] (03CR) 10AOkoth: "I believe so but I'll double check with Brennen." [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [11:37:56] (03PS10) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:38:56] (03CR) 10Klausman: [C:03+2] role::ml_k8s::staging::master: enable IPIP encapsulation [puppet] - 10https://gerrit.wikimedia.org/r/1294223 (https://phabricator.wikimedia.org/T420438) (owner: 10Elukey) [11:39:25] (03PS11) 10Clément Goubert: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [11:41:16] (03PS2) 10MVernon: puppetserver::pontoon: add optional swift::fetch_rings [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) [11:42:12] (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [11:42:26] (03CR) 10MVernon: [C:03+2] Pontoon: specmap for swift in pontoon [puppet] - 10https://gerrit.wikimedia.org/r/1304817 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [11:44:38] (03Merged) 10jenkins-bot: hCaptcha: Log sitekeys on a sitekey-mismatch error [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305096 (https://phabricator.wikimedia.org/T429891) (owner: 10Kosta Harlan) [11:45:08] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1305096|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] [11:45:12] T429891: hCaptcha: Log the sitekeys used when a sitekey-mismatch error is logged - https://phabricator.wikimedia.org/T429891 [11:47:12] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1305096|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [11:47:42] !log kharlan@deploy1003 kharlan: Continuing with deployment [11:51:59] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305096|hCaptcha: Log sitekeys on a sitekey-mismatch error (T429891)]] (duration: 06m 51s) [11:52:03] T429891: hCaptcha: Log the sitekeys used when a sitekey-mismatch error is logged - https://phabricator.wikimedia.org/T429891 [11:52:04] (03CR) 10Slyngshede: [V:03+2 C:03+2] "New favicon will roll out with the next update." [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1305042 (https://phabricator.wikimedia.org/T258379) (owner: 10Slyngshede) [11:52:14] (03CR) 10Dpogorzelski: [C:03+2] ml-serve: fix vram stats collection [puppet] - 10https://gerrit.wikimedia.org/r/1304813 (https://phabricator.wikimedia.org/T429597) (owner: 10Dpogorzelski) [12:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1200) [12:06:23] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/Cite] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305064 (https://phabricator.wikimedia.org/T426974) (owner: 10WMDE-Fisch) [12:08:57] (03PS1) 10Jforrester: tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) [12:10:38] (03PS1) 10Kamila Součková: kubernetes: switch mw images back to publish-83 flavour [puppet] - 10https://gerrit.wikimedia.org/r/1305104 (https://phabricator.wikimedia.org/T429030) [12:11:23] (03CR) 10CI reject: [V:04-1] tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) (owner: 10Jforrester) [12:15:04] (03CR) 10Kamila Součková: [C:03+1] shellbox: Pick up images reflecting latest code [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304860 (https://phabricator.wikimedia.org/T428013) (owner: 10Scott French) [12:15:23] !log klausman@cumin2002 START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-master [12:15:25] !log klausman@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host ml-staging-ctrl2001.codfw.wmnet [12:15:26] !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-staging-ctrl2001.codfw.wmnet [12:16:50] (03PS2) 10Jforrester: tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) [12:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [12:17:58] (03PS2) 10Dreamy Jazz: hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) [12:18:24] (03PS3) 10Dreamy Jazz: hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) [12:18:51] !log klausman@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host ml-staging-ctrl2001.codfw.wmnet [12:18:52] !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-staging-ctrl2001.codfw.wmnet [12:18:57] !log klausman@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host ml-staging-ctrl2002.codfw.wmnet [12:18:58] !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-staging-ctrl2002.codfw.wmnet [12:19:05] (03CR) 10CI reject: [V:04-1] tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) (owner: 10Jforrester) [12:21:49] (03PS2) 10Sbisson: Recommendation api: update to 2026-06-15-110926 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304840 [12:22:05] jouncebot now [12:22:05] For the next 0 hour(s) and 37 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1200) [12:22:05] (03PS3) 10Jforrester: tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) [12:22:22] !log klausman@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host ml-staging-ctrl2002.codfw.wmnet [12:22:23] !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-staging-ctrl2002.codfw.wmnet [12:22:24] !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-master [12:23:04] 10ops-eqiad, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429906 (10phaultfinder) 03NEW [12:24:01] (03CR) 10DCausse: [C:03+2] flink-app: renamed deprecate settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305073 (https://phabricator.wikimedia.org/T426863) (owner: 10DCausse) [12:24:25] (03CR) 10CI reject: [V:04-1] tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) (owner: 10Jforrester) [12:24:30] (03CR) 10DCausse: [C:03+2] cirrus-streaming-updater: upgrade to flink 2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305074 (https://phabricator.wikimedia.org/T426839) (owner: 10DCausse) [12:24:58] (03CR) 10Ilias Sarantopoulos: "nice catch George. Left 2 comments and we're good to go!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis) [12:25:31] (03PS4) 10Jforrester: tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) [12:25:59] (03CR) 10Volans: [C:03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [12:27:05] (03Merged) 10jenkins-bot: flink-app: renamed deprecate settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305073 (https://phabricator.wikimedia.org/T426863) (owner: 10DCausse) [12:27:47] (03Merged) 10jenkins-bot: cirrus-streaming-updater: upgrade to flink 2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305074 (https://phabricator.wikimedia.org/T426839) (owner: 10DCausse) [12:27:54] (03CR) 10CI reject: [V:04-1] tables-catalog: Document the new x1 wikifunctions_usage / wikifunctions_usage_wikis tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) (owner: 10Jforrester) [12:31:10] (03CR) 10Sbisson: [C:03+2] Recommendation api: update to 2026-06-15-110926 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304840 (owner: 10Sbisson) [12:31:29] (03PS5) 10Jforrester: tables-catalog: Document the new x1 wikifunctions_usage* tables [puppet] - 10https://gerrit.wikimedia.org/r/1305102 (https://phabricator.wikimedia.org/T428667) [12:31:30] !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [12:32:01] !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:33:33] (03Merged) 10jenkins-bot: Recommendation api: update to 2026-06-15-110926 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304840 (owner: 10Sbisson) [12:35:12] (03CR) 10Volans: "quick reply to comment, I haven't checked the last PS" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond) [12:35:57] (03PS1) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) [12:36:00] (03PS1) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112) [12:36:13] !log sbisson@deploy1003 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [12:37:42] (03PS4) 10Gkyziridis: ml-services: Deploy Qwen3.6-27B-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [12:38:02] 10ops-eqiad, 06SRE, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429906#12045159 (10Jclark-ctr) a:03Jclark-ctr [12:38:36] (03PS1) 10Klausman: role/ml_k8s/master: add IPIP role [puppet] - 10https://gerrit.wikimedia.org/r/1305110 (https://phabricator.wikimedia.org/T420438) [12:39:50] !log dcausse@deploy1003 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply [12:39:59] !log dcausse@deploy1003 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:40:08] (03CR) 10Ayounsi: [C:03+1] "if it was not staging I'd have recommended a PCC run." [puppet] - 10https://gerrit.wikimedia.org/r/1305110 (https://phabricator.wikimedia.org/T420438) (owner: 10Klausman) [12:40:10] (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis) [12:40:20] !log sbisson@deploy1003 helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [12:42:22] (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8769/co" [puppet] - 10https://gerrit.wikimedia.org/r/1305110 (https://phabricator.wikimedia.org/T420438) (owner: 10Klausman) [12:42:53] !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply [12:42:58] !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply [12:43:56] (03PS5) 10Gkyziridis: ml-services: Deploy Qwen3.6-27B-FP8 on experimental ns. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [12:44:39] (03CR) 10Klausman: [V:03+1 C:03+2] role/ml_k8s/master: add IPIP role [puppet] - 10https://gerrit.wikimedia.org/r/1305110 (https://phabricator.wikimedia.org/T420438) (owner: 10Klausman) [12:44:42] (03PS1) 10Jelto: sre.gitlab.upgrade: hold and unhold gitlab-ce package [cookbooks] - 10https://gerrit.wikimedia.org/r/1305112 (https://phabricator.wikimedia.org/T429595) [12:45:56] !log atsuko@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply [12:46:04] !log atsuko@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply [12:46:34] !log sbisson@deploy1003 helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [12:48:29] !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:48:33] 10ops-eqiad, 06SRE, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429906#12045212 (10Jclark-ctr) Replaced Cable. This was one of the older cables we have on site Old Cable 3293 New cable. 7-02407 ` jclark@an-conf1005:~$ cat /sys/class/net/eno1/... [12:48:39] 10ops-eqiad, 06SRE, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429906#12045213 (10Jclark-ctr) 05Open→03Resolved [12:48:40] !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:48:50] !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade [12:48:50] !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893 [12:48:57] T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893 [12:49:10] !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1160: Upgrading db1160.eqiad.wmnet [12:49:12] PROBLEM - Host dbproxy1028 is DOWN: PING CRITICAL - Packet loss = 100% [12:49:16] (03CR) 10Arnaudb: [C:03+1] "looks good to me! have you tried to run it on a replica with `test-cookbook`? Either way, thanks for that update. Gitlab maintenance will " [cookbooks] - 10https://gerrit.wikimedia.org/r/1305112 (https://phabricator.wikimedia.org/T429595) (owner: 10Jelto) [12:49:40] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1160: Upgrading db1160.eqiad.wmnet [12:49:48] 10ops-eqiad, 06SRE, 06DC-Ops: eno1 on an-conf1005:9100 has the wrong speed: 1.25e+07. - https://phabricator.wikimedia.org/T429906#12045216 (10Jclark-ctr) updated netbox with new serial [12:53:47] (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [12:53:52] !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS trixie [12:54:36] 06SRE, 06Data-Platform-SRE: Redeploy cirrus-streaming-updater/producer and cirrus-streaming-updater/consumer to pick up current mirror - https://phabricator.wikimedia.org/T429671#12045232 (10dcausse) 05Open→03Resolved a:03dcausse I deployed the tag `v20260623101803-7c3e7b2` of these two images. [12:54:53] (03CR) 10Filippo Giunchedi: [C:03+1] "Not required for this review, though I'm wondering where/when profile::puppetserver::pontoon::swift_fetch_rings: true would be set ? if it" [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [12:55:14] !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade [12:55:14] !log cwilliams@cumin1003 dbmaint on s4@codfw T429893 [12:55:21] T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893 [12:55:35] !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2155: Upgrading db2155.codfw.wmnet [12:55:57] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2155: Upgrading db2155.codfw.wmnet [12:56:01] (03PS6) 10Gkyziridis: ml-services: Configure the Qwen helm chart for consistency with hotfixes on experimental. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) [12:57:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:57:54] (03CR) 10Filippo Giunchedi: [C:03+2] Put cloudvirt10[77-80] in service [puppet] - 10https://gerrit.wikimedia.org/r/1303962 (https://phabricator.wikimedia.org/T429563) (owner: 10Filippo Giunchedi) [12:57:59] (03PS1) 10Dpogorzelski: Revert "ml-serve(grpc): step 5, change lvs state" [puppet] - 10https://gerrit.wikimedia.org/r/1305114 [12:58:17] !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS trixie [13:00:05] Lucas_WMDE, urbanecm, and TheresNoTime: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC afternoon backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1300). [13:00:05] cscott, matthiasmullie, abijeet, and atsukoito: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:38] (03CR) 10Jelto: [C:03+2] sre.gitlab.upgrade: hold and unhold gitlab-ce package [cookbooks] - 10https://gerrit.wikimedia.org/r/1305112 (https://phabricator.wikimedia.org/T429595) (owner: 10Jelto) [13:01:05] o/ [13:01:19] (03CR) 10MVernon: "I do have some more changes to settings/swift.yaml which probably warrant another CR." [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [13:01:48] fyi, deployments are still going to be building 2 images and thus be slower and eat more space, this will be switched off later today as I didn't manage to squeeze it in before this window [13:01:51] (03CR) 10Dpogorzelski: [C:03+2] Revert "ml-serve(grpc): step 5, change lvs state" [puppet] - 10https://gerrit.wikimedia.org/r/1305114 (owner: 10Dpogorzelski) [13:02:00] I [13:02:06] o/ [13:02:25] RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps2011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:02:30] do y'all need a deployer or are you self-deploying? [13:02:51] (03PS1) 10Dpogorzelski: Revert "ml-serve(grpc): step 4, change lvs state" [puppet] - 10https://gerrit.wikimedia.org/r/1305116 [13:04:14] (03CR) 10Dpogorzelski: [C:03+2] Revert "ml-serve(grpc): step 4, change lvs state" [puppet] - 10https://gerrit.wikimedia.org/r/1305116 (owner: 10Dpogorzelski) [13:04:24] cscott: are you self-deploying yours? [13:04:48] (03PS1) 10Dpogorzelski: Revert "ml-serve(grpc): step 3, add service to k8s pools" [puppet] - 10https://gerrit.wikimedia.org/r/1305117 [13:05:08] I can self deploy [13:05:26] (03Merged) 10jenkins-bot: sre.gitlab.upgrade: hold and unhold gitlab-ce package [cookbooks] - 10https://gerrit.wikimedia.org/r/1305112 (https://phabricator.wikimedia.org/T429595) (owner: 10Jelto) [13:05:27] me here, too [13:05:38] in which case, it would make sense for you to go first cscott - go ahead :) [13:05:39] (03CR) 10Dpogorzelski: [C:03+2] Revert "ml-serve(grpc): step 3, add service to k8s pools" [puppet] - 10https://gerrit.wikimedia.org/r/1305117 (owner: 10Dpogorzelski) [13:06:02] * atsukoito can self-deploy, too [13:06:05] Ok! [13:06:24] (03PS1) 10Dpogorzelski: Revert "ml-serve(grpc): step 2, add entry to service catalog" [puppet] - 10https://gerrit.wikimedia.org/r/1305118 [13:07:14] (03CR) 10Dpogorzelski: [C:03+2] Revert "ml-serve(grpc): step 2, add entry to service catalog" [puppet] - 10https://gerrit.wikimedia.org/r/1305118 (owner: 10Dpogorzelski) [13:07:41] (03PS1) 10Dpogorzelski: Revert "ml-serve(grpc): step 1, etcd data for DNS Discovery" [puppet] - 10https://gerrit.wikimedia.org/r/1305120 [13:07:52] I can also take care of my on patch [13:07:57] !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage [13:07:57] own* [13:08:09] 06SRE, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Redeploy cirrus-streaming-updater/producer and cirrus-streaming-updater/consumer to pick up current mirror - https://phabricator.wikimedia.org/T429671#12045328 (10Gehel) [13:08:20] (03CR) 10Dpogorzelski: [C:03+2] Revert "ml-serve(grpc): step 1, etcd data for DNS Discovery" [puppet] - 10https://gerrit.wikimedia.org/r/1305120 (owner: 10Dpogorzelski) [13:10:59] jouncebot: nowandnext [13:11:00] For the next 0 hour(s) and 49 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1300) [13:11:00] In 0 hour(s) and 49 minute(s): Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1400) [13:11:24] (03PS1) 10Dreamy Jazz: CaptchaFactory: Fallback config for badloginperuser from badlogin [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305122 (https://phabricator.wikimedia.org/T429902) [13:11:35] (03PS1) 10Dreamy Jazz: CaptchaFactory: Fallback config for badloginperuser from badlogin [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305123 (https://phabricator.wikimedia.org/T429902) [13:11:48] * TheresNoTime will be around for the window if needed, but seems everyone can self-deploy (but ping if needed) [13:11:49] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305123 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [13:12:08] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305122 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [13:12:09] 06SRE, 07Kubernetes: Kserve helm chart - https://phabricator.wikimedia.org/T416580#12045363 (10DPogorzelski-WMF) 05Open→03Declined Closing down as this was solved in another ticket. [13:13:13] !log jelto@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test apt-mark hold [13:13:36] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305015 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [13:13:37] (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305026 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [13:15:16] PROBLEM - PyBal IPVS diff check on lvs2013 is CRITICAL: (CRITICAL: Mismatch between IPVS and PyBal https://wikitech.wikimedia.org/wiki/PyBal [13:15:22] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage [13:15:41] ok, i'm doing these in two groups to be a little bit safer [13:15:50] (03CR) 10Volans: "LGTM, modulo CI and one minor fix inline for the puppet integration." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey) [13:16:49] !log jelto@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Test apt-mark hold [13:16:50] !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage [13:17:30] PROBLEM - PyBal IPVS diff check on lvs2014 is CRITICAL: (CRITICAL: Mismatch between IPVS and PyBal https://wikitech.wikimedia.org/wiki/PyBal [13:18:04] 07Puppet, 06SRE, 06Infrastructure-Foundations, 10Puppet-Core, 07Technical-Debt: Uniform cluster nomenclature across puppet - https://phabricator.wikimedia.org/T159411#12045423 (10Aklapper) @Joe: Hi, 9y later, is this still wanted? TIA [13:18:39] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [13:23:43] !log cmooney@cumin1003 START - Cookbook sre.dns.netbox [13:23:45] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage [13:23:56] (03PS1) 10Filippo Giunchedi: Revert "Put cloudvirt10[77-80] in service" [puppet] - 10https://gerrit.wikimedia.org/r/1305124 [13:24:13] (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] "Nothing is live, self-merging" [puppet] - 10https://gerrit.wikimedia.org/r/1305124 (owner: 10Filippo Giunchedi) [13:25:59] (03CR) 10Cathal Mooney: Cookbook to configure switch port vlans for cloud hosts (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303397 (https://phabricator.wikimedia.org/T429466) (owner: 10Cathal Mooney) [13:26:04] FIRING: [2x] HelmReleaseBadStatus: Helm release mw-script/yhn94m3m on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [13:27:11] 06SRE, 10SRE-Access-Requests: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12045470 (10SuzanneWood-WMDE) Approved [13:27:37] (03Merged) 10jenkins-bot: [parser] Return HeadingPFragments while preprocessing [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305015 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [13:27:50] (03Merged) 10jenkins-bot: [parser] Return ExtTagPFragments while preprocessing [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305026 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [13:28:19] !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for sretest1006 - cmooney@cumin1003" [13:29:20] !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1305015|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305026|[parser] Return ExtTagPFragments while preprocessing (T429624)]] [13:29:35] T391624: Parsoid section edit link issues - https://phabricator.wikimedia.org/T391624 [13:29:36] T387520: Support section edit links to nested templates - https://phabricator.wikimedia.org/T387520 [13:29:36] T387521: Section titles failing to resolve redirected templates - https://phabricator.wikimedia.org/T387521 [13:29:37] T384490: Include directives on a line with headings prevent the legacy parser from generating section edit links - https://phabricator.wikimedia.org/T384490 [13:29:37] T387374: Compound templates prevent section edit links where legacy adds them - https://phabricator.wikimedia.org/T387374 [13:29:37] T429624: Link to edit TemplateData is broken with Parsoid Read Views - https://phabricator.wikimedia.org/T429624 [13:29:48] (03CR) 10C. Scott Ananian: [C:03+2] "getting a jump on deploy merge" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305017 (owner: 10C. Scott Ananian) [13:29:51] (03CR) 10C. Scott Ananian: [C:03+2] "getting a jump on deploy merge" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [13:29:54] (03CR) 10C. Scott Ananian: [C:03+2] "getting a jump on deploy merge" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305025 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [13:30:00] (03CR) 10MVernon: [C:03+2] puppetserver::pontoon: add optional swift::fetch_rings [puppet] - 10https://gerrit.wikimedia.org/r/1304820 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [13:31:27] cmooney@cumin1003 netbox (PID 2420420) is awaiting input [13:31:31] !log cscott@deploy1003 cscott: Backport for [[gerrit:1305015|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305026|[parser] Return ExtTagPFragments while preprocessing (T429624)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:31:52] !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for sretest1006 - cmooney@cumin1003" [13:31:52] !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [13:32:20] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS trixie [13:33:21] !log cscott@deploy1003 cscott: Continuing with deployment [13:40:08] (03Merged) 10jenkins-bot: [parser] Add configuration to return experimental PFragment types [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305017 (owner: 10C. Scott Ananian) [13:40:42] (03CR) 10JHathaway: "recheck" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1304908 (owner: 10JHathaway) [13:41:51] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2155.codfw.wmnet with OS trixie [13:41:51] !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305015|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305026|[parser] Return ExtTagPFragments while preprocessing (T429624)]] (duration: 12m 31s) [13:41:53] TheresNoTime: I just got a "K8s deployment to stage production failed: K8s Deployment had the following errors: [13:41:54] Deployment of mediawiki-dumps-legacy-production-dse-k8s-eqiad failed: Command '['helmfile', '-e', 'dse-k8s-eqiad', '--selector', 'name=production', 'apply', '--context', '5']' returned non-zero exit status 1. [13:41:54] " [13:42:07] T391624: Parsoid section edit link issues - https://phabricator.wikimedia.org/T391624 [13:42:07] T387520: Support section edit links to nested templates - https://phabricator.wikimedia.org/T387520 [13:42:08] T387521: Section titles failing to resolve redirected templates - https://phabricator.wikimedia.org/T387521 [13:42:08] T384490: Include directives on a line with headings prevent the legacy parser from generating section edit links - https://phabricator.wikimedia.org/T384490 [13:42:08] T387374: Compound templates prevent section edit links where legacy adds them - https://phabricator.wikimedia.org/T387374 [13:42:09] T429624: Link to edit TemplateData is broken with Parsoid Read Views - https://phabricator.wikimedia.org/T429624 [13:42:10] TheresNoTime: I hit "retry", hopefully this was transient? [13:42:23] (03Merged) 10jenkins-bot: [parser] Return HeadingPFragments while preprocessing [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305018 (https://phabricator.wikimedia.org/T391624) (owner: 10C. Scott Ananian) [13:42:25] I guess it was transient? [13:42:26] (03CR) 10JHathaway: [C:03+2] log: fix tests for pytest 9.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304183 (owner: 10JHathaway) [13:42:36] (03Merged) 10jenkins-bot: [parser] Return ExtTagPFragments while preprocessing [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305025 (https://phabricator.wikimedia.org/T429624) (owner: 10C. Scott Ananian) [13:42:54] (03PS1) 10Fabfur: hiera: disable awslc on magru hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305128 (https://phabricator.wikimedia.org/T419825) [13:43:03] issue looks familiar, one sec [13:44:02] !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1305017|[parser] Add configuration to return experimental PFragment types]], [[gerrit:1305018|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305025|[parser] Return ExtTagPFragments while preprocessing (T429624)]] [13:44:52] whoa, do we do scap deployment on dse-k8s? (atsuko is from DSE) [13:44:54] (03PS1) 10Fabfur: hiera: disable awslc on codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305131 (https://phabricator.wikimedia.org/T419825) [13:44:57] (03PS1) 10Fabfur: hiera: disable awslc on esams hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305132 (https://phabricator.wikimedia.org/T419825) [13:44:57] !log cmooney@cumin1003 START - Cookbook sre.network.cloud-host for host cloudcephosd1054 [13:45:01] !log cmooney@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1054 [13:45:07] I've seen something similar but yeah if it continued after a retry then we'll just see how it goes [13:45:29] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1054 [13:45:29] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.cloud-host (exit_code=0) for host cloudcephosd1054 [13:46:11] !log cscott@deploy1003 cscott: Backport for [[gerrit:1305017|[parser] Add configuration to return experimental PFragment types]], [[gerrit:1305018|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305025|[parser] Return ExtTagPFragments while preprocessing (T429624)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v [13:46:11] erified there. [13:46:37] !log cmooney@cumin1003 START - Cookbook sre.network.cloud-host for host cloudcephosd10543 [13:46:37] !log cmooney@cumin1003 END (FAIL) - Cookbook sre.network.cloud-host (exit_code=99) for host cloudcephosd10543 [13:46:42] !log cmooney@cumin1003 START - Cookbook sre.network.cloud-host for host cloudcephosd1053 [13:46:45] !log cmooney@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1053 [13:46:47] 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12045613 (10Gehel) [13:46:52] !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1160: Migration of db1160.eqiad.wmnet completed [13:47:12] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1053 [13:47:13] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.cloud-host (exit_code=0) for host cloudcephosd1053 [13:47:14] (03CR) 10AOkoth: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1296495 (https://phabricator.wikimedia.org/T420184) (owner: 10Arnaudb) [13:48:15] !log cscott@deploy1003 cscott: Continuing with deployment [13:48:33] (03CR) 10Clément Goubert: [C:03+1] kubernetes: switch mw images back to publish-83 flavour [puppet] - 10https://gerrit.wikimedia.org/r/1305104 (https://phabricator.wikimedia.org/T429030) (owner: 10Kamila Součková) [13:49:25] (03PS1) 10MVernon: pontoon/swift: stack-specific config changes [puppet] - 10https://gerrit.wikimedia.org/r/1305133 (https://phabricator.wikimedia.org/T429630) [13:49:58] jouncebot: nowandnext [13:49:58] For the next 0 hour(s) and 10 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1300) [13:49:58] In 0 hour(s) and 10 minute(s): Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1400) [13:49:59] (03PS1) 10Hnowlan: video: fix logspam issue [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1305134 (https://phabricator.wikimedia.org/T368180) [13:50:51] !log brennen@deploy1003 Started deploy [phabricator/deployment@a640ed9]: test deploy phab2003 [13:51:20] (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/1305133 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [13:51:55] (03PS2) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) [13:51:55] (03PS2) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112) [13:52:13] !log brennen@deploy1003 Finished deploy [phabricator/deployment@a640ed9]: test deploy phab2003 (duration: 01m 22s) [13:52:30] !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305017|[parser] Add configuration to return experimental PFragment types]], [[gerrit:1305018|[parser] Return HeadingPFragments while preprocessing (T391624 T387520 T387521 T384490 T387374)]], [[gerrit:1305025|[parser] Return ExtTagPFragments while preprocessing (T429624)]] (duration: 08m 29s) [13:52:44] T391624: Parsoid section edit link issues - https://phabricator.wikimedia.org/T391624 [13:52:46] T387520: Support section edit links to nested templates - https://phabricator.wikimedia.org/T387520 [13:52:46] T387521: Section titles failing to resolve redirected templates - https://phabricator.wikimedia.org/T387521 [13:52:47] T384490: Include directives on a line with headings prevent the legacy parser from generating section edit links - https://phabricator.wikimedia.org/T384490 [13:52:47] T387374: Compound templates prevent section edit links where legacy adds them - https://phabricator.wikimedia.org/T387374 [13:52:48] T429624: Link to edit TemplateData is broken with Parsoid Read Views - https://phabricator.wikimedia.org/T429624 [13:53:07] !log cmooney@cumin1003 START - Cookbook sre.dns.wipe-cache sretest1006.mgmt.eqiad.wmnet on all recursors [13:53:11] !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1006.mgmt.eqiad.wmnet on all recursors [13:56:03] (03PS1) 10Kevin Bazira: ml: add ROCm build deps for AITER kernel compilation in vllm022 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305136 (https://phabricator.wikimedia.org/T429667) [13:56:14] i'm done, next up [13:57:11] matthiasmullie, abijeet: who's next? [13:57:31] I've withdrawn; have to leave soon [13:57:44] abijeet: you go! [13:58:22] !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie [13:58:35] 06SRE, 06DBA, 07Incident Severity 2, 07Wikimedia-Incident: Edits aren't saving correctly - https://phabricator.wikimedia.org/T418839#12045727 (10MLechvien-WMF) [13:58:44] !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2155: Migration of db2155.codfw.wmnet completed [13:58:47] (think they're idle, probably worth you going atsukoito) [13:59:02] i'm going then! [13:59:19] (03PS1) 10Jelto: Update to v3.30.7 [debs/calico] - 10https://gerrit.wikimedia.org/r/1305137 (https://phabricator.wikimedia.org/T427400) [13:59:20] (03CR) 10TrainBranchBot: [C:03+2] "Approved by atsuko@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko) [13:59:36] 06SRE, 06DBA: Edits aren't saving correctly - https://phabricator.wikimedia.org/T418839#12045732 (10MLechvien-WMF) [14:00:04] Deploy window Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1400) [14:00:31] (03Merged) 10jenkins-bot: translate: remove CirrusSearch endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305062 (https://phabricator.wikimedia.org/T425377) (owner: 10Atsuko) [14:00:33] (03Abandoned) 10Jelto: Update to v3.30.7 [debs/calico] - 10https://gerrit.wikimedia.org/r/1305137 (https://phabricator.wikimedia.org/T427400) (owner: 10Jelto) [14:01:01] !log atsuko@deploy1003 Started scap sync-world: Backport for [[gerrit:1305062|translate: remove CirrusSearch endpoints (T425377)]] [14:01:06] T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377 [14:01:35] 06SRE, 06DBA: Edits aren't saving correctly - https://phabricator.wikimedia.org/T418839#12045747 (10MLechvien-WMF) [14:01:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie) [14:03:33] !log atsuko@deploy1003 atsuko: Backport for [[gerrit:1305062|translate: remove CirrusSearch endpoints (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:03:44] * atsukoito checking changes [14:05:02] (03PS1) 10Jelto: Update to v3.30.7 [debs/calico] (v3.30) - 10https://gerrit.wikimedia.org/r/1305139 (https://phabricator.wikimedia.org/T427400) [14:05:06] (03CR) 10Klausman: [C:03+1] ml: add ROCm build deps for AITER kernel compilation in vllm022 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305136 (https://phabricator.wikimedia.org/T429667) (owner: 10Kevin Bazira) [14:05:46] !log atsuko@deploy1003 atsuko: Continuing with deployment [14:07:10] (03CR) 10Klausman: [C:03+2] ml: add ROCm build deps for AITER kernel compilation in vllm022 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305136 (https://phabricator.wikimedia.org/T429667) (owner: 10Kevin Bazira) [14:07:45] (03CR) 10Klausman: [V:03+2 C:03+2] ml: add ROCm build deps for AITER kernel compilation in vllm022 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305136 (https://phabricator.wikimedia.org/T429667) (owner: 10Kevin Bazira) [14:10:00] !log atsuko@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305062|translate: remove CirrusSearch endpoints (T425377)]] (duration: 08m 59s) [14:10:04] T425377: Migrate Ttmserver (Translatewiki application) indices from production OpenSearch to OpenSearch on k8s - https://phabricator.wikimedia.org/T425377 [14:10:17] abijeet, Dreamy_Jazz: you are next, are you self-deploying? [14:10:37] Yes I will be self deploying [14:10:40] Thanks for the ping [14:11:17] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305123 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [14:11:17] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305122 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [14:14:17] atsukoito, i could use some help deploying [14:15:29] * atsukoito checking if TheresNoTime is here [14:16:14] (03CR) 10Gergő Tisza: [C:03+1] "Removing the db section and wgServer entries will probably break all requests to this wiki (e.g. [dumps](https://gitlab.wikimedia.org/repo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [14:16:40] jouncebot: nowandnext [14:16:40] For the next 0 hour(s) and 13 minute(s): Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1400) [14:16:40] In 0 hour(s) and 13 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1430) [14:17:48] abijeet: If you can test your change, I can bundle it in with the backports I'm waiting for merging [14:18:00] PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs2014 is CRITICAL: CRITICAL: Service pybal.service has not been restarted after /etc/pybal/pybal.conf was changed (gt 1h). https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted [14:18:00] PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs2013 is CRITICAL: CRITICAL: Service pybal.service has not been restarted after /etc/pybal/pybal.conf was changed (gt 1h). https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted [14:18:13] !log ebysans@deploy1003 Started deploy [analytics/refinery@83cc0ad] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@83cc0ad3] [14:18:37] Dreamy_Jazz, I can do that, thanks! [14:18:50] (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM, only non-blocking comments" [puppet] - 10https://gerrit.wikimedia.org/r/1305133 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:19:02] (03Merged) 10jenkins-bot: CaptchaFactory: Fallback config for badloginperuser from badlogin [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305123 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [14:19:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305122 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [14:19:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [14:19:17] (03Merged) 10jenkins-bot: CaptchaFactory: Fallback config for badloginperuser from badlogin [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305122 (https://phabricator.wikimedia.org/T429902) (owner: 10Dreamy Jazz) [14:19:20] Dreamy_Jazz, abijeet: I'm leaving the chat then :) sarabada [14:19:29] atsukoito, thanks! [14:19:53] * TheresNoTime is in a meeting now, sorry [14:19:54] Bye! [14:20:02] TheresNoTime, I can handle it [14:20:06] (ack, ty) [14:20:13] !log ebysans@deploy1003 Finished deploy [analytics/refinery@83cc0ad] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@83cc0ad3] (duration: 02m 00s) [14:20:45] !log ebysans@deploy1003 Started deploy [analytics/refinery@83cc0ad]: Regular analytics weekly train [analytics/refinery@83cc0ad3] [14:22:12] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [14:22:52] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [14:23:11] !log cscott@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [14:23:27] (03PS10) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) [14:23:42] !log cscott@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [14:23:43] !log cscott@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [14:24:14] !log cscott@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [14:24:22] (03CR) 10Federico Ceratto: "Rebased" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [14:25:11] (03CR) 10CI reject: [V:04-1] ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [14:25:22] :-( [14:25:30] (03CR) 10Federico Ceratto: "With the recent changes to zarcillo's API the entry in `servers` will be deleted." [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [14:25:40] (03CR) 10Dreamy Jazz: [C:03+2] "Try again" [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [14:25:53] Yeah :( [14:26:14] Am I crashing into anyone's windows where they need no scap use from me? [14:26:16] jouncebot: nowandnext [14:26:16] For the next 0 hour(s) and 3 minute(s): Test Kitchen UI Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1400) [14:26:16] In 0 hour(s) and 3 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1430) [14:26:29] Looks like not but if so let me know [14:26:36] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [14:26:42] !log ebysans@deploy1003 Finished deploy [analytics/refinery@83cc0ad]: Regular analytics weekly train [analytics/refinery@83cc0ad3] (duration: 05m 57s) [14:27:21] 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for kafka-main2009.mgmt:22 - https://phabricator.wikimedia.org/T429864#12045888 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [14:28:56] Dreamy_Jazz: I was going to disable the double build, but you can go first if you want [14:28:57] !log ebysans@deploy1003 Started deploy [analytics/refinery@83cc0ad] (thin): Regular analytics weekly train THIN [analytics/refinery@83cc0ad3] [14:29:14] I don't mind wating for you to disable the double build? [14:29:26] Still waiting on merges to complete [14:29:26] ok, then I'll ping you once it's done [14:29:37] I've stopped scap [14:30:04] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1430) [14:30:07] (03CR) 10Kamila Součková: [C:03+2] kubernetes: switch mw images back to publish-83 flavour [puppet] - 10https://gerrit.wikimedia.org/r/1305104 (https://phabricator.wikimedia.org/T429030) (owner: 10Kamila Součková) [14:30:26] thanks, much appreciated Dreamy_Jazz <3 [14:30:56] !log ebysans@deploy1003 Finished deploy [analytics/refinery@83cc0ad] (thin): Regular analytics weekly train THIN [analytics/refinery@83cc0ad3] (duration: 01m 59s) [14:31:38] RECOVERY - haproxy failover on dbproxy1025 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [14:31:38] RECOVERY - haproxy failover on dbproxy1023 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [14:31:49] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12045912 (10jcrespo) > These are expected. But... ` jynus@cumin2002:~$ sudo mysql.py -h db1208:3352 # or db1208:analytics_meta Welcome to the MariaDB monitor. Commands end wi... [14:32:10] (03CR) 10Zabe: Remove config related to the API Portal (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [14:32:22] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1160: Migration of db1160.eqiad.wmnet completed [14:32:24] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [14:34:03] !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade [14:34:03] !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893 [14:34:10] T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893 [14:34:23] !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1190: Upgrading db1190.eqiad.wmnet [14:34:29] (03Merged) 10jenkins-bot: ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin [extensions/UniversalLanguageSelector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305051 (https://phabricator.wikimedia.org/T429774) (owner: 10Abijeet Patro) [14:34:51] 10ops-codfw, 06SRE, 06DC-Ops: upgrade selected servers from 1G to 10G - https://phabricator.wikimedia.org/T429631#12045938 (10Jhancock.wm) a:03Jhancock.wm [14:34:56] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie [14:35:04] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1190: Upgrading db1190.eqiad.wmnet [14:36:03] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Wikimedia-Mailing-lists: lists.wikimedia.org subscription email rejected by DKIM - https://phabricator.wikimedia.org/T409137#12045944 (10Aklapper) 05Stalled→03Invalid Unfortunately closing this Phabricator task as no further informati... [14:36:45] !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS trixie [14:37:03] !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie [14:38:04] (03PS11) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) [14:38:43] (03CR) 10Clément Goubert: "I've sent a merge request to remove `apiportalwiki` from the airflow dags [0] and movement-insights canonical-data [1]" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [14:40:08] !log kamila@deploy1003 Started scap sync-world: switch default image to Debian Bookworm, disable Bullseye [14:40:16] \o/ [14:40:52] <3 [14:43:29] (03CR) 10Federico Ceratto: "Ah the error is due to the deletion on orchestrator. I'm unsure about the cause: maybe the entry was already deleted? Perhaps we might hav" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [14:44:15] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2155: Migration of db2155.codfw.wmnet completed [14:44:16] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [14:44:55] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12045982 (10Marostegui) @jcrespo we do not own db1208 either [14:45:27] (03CR) 10Clément Goubert: Remove config related to the API Portal (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [14:45:34] (03PS1) 10Btullis: Add tlsHostnames for wdqs2 services in the new namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305147 (https://phabricator.wikimedia.org/T429313) [14:46:54] (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis) [14:47:56] (03CR) 10MVernon: pontoon/swift: stack-specific config changes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305133 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:48:13] (03CR) 10MVernon: [C:03+2] pontoon/swift: stack-specific config changes [puppet] - 10https://gerrit.wikimedia.org/r/1305133 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [14:49:06] (03CR) 10Kamila Součková: [C:03+1] redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan) [14:49:08] Dreamy_Jazz, let me know when the patch is ready to test on testwiki [14:49:35] Sure, Raine is doing some changes first to make ours faster [14:49:42] cool [14:49:50] Though maybe the `scap sync-world` they are doing might handle out changes too? [14:50:07] 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#12046011 (10Papaul) [14:50:35] I just did an image rebuild [14:50:42] (03PS2) 10Ayounsi: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 [14:51:10] not sure if that's going to pick up yours actually, depends on where you left off I suppose [14:51:12] (03CR) 10Zabe: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin) [14:51:20] !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage [14:51:34] They were all merged into the wmf branches, so presumably scap will pick them up [14:51:39] oh, okay [14:51:44] yeah [14:51:46] (03CR) 10Tiziano Fogli: [C:03+1] "I briefly ran some tests because I’m not very familiar with the context." [puppet] - 10https://gerrit.wikimedia.org/r/1304766 (https://phabricator.wikimedia.org/T407138) (owner: 10Hnowlan) [14:52:36] hmm then I probably should have paused after testservers or something '^^ [14:53:33] Maybe, though Special:Version isn't showing the updated commits so maybe it wasn't included? [14:53:41] I guess the scap output would say [14:54:11] it just started deploying, it'll pop up in Special:Version a bit if included [14:54:30] Ah I see [14:55:13] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage [14:55:24] (03CR) 10Trueg: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305147 (https://phabricator.wikimedia.org/T429313) (owner: 10Btullis) [14:55:32] if the change does need manual testing in testservers, it should be ok to ^C the deployment while the httpbb checks are running, before it moves on to canaries [14:55:49] FIRING: [2x] HelmReleaseBadStatus: Helm release mw-script/yhn94m3m on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [14:55:51] Mine doesn't necessarily need manual testing [14:56:21] abijeet: Does yours need to be tested at the testservers stage (or can it just rollout and you test once it's everywhere)? [14:56:29] I presume that's fine as it was only to wmf.8 [14:56:40] !log Deployed Refinery as part of weekly deployment train [14:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:02] 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#12046036 (10Papaul) [14:57:03] So if it is broken, it's just on the testwikis that it's broken [14:57:53] ok, I won't ^C it then [14:58:02] sorry for the confusion, I hadn't realised you'd already started [14:58:02] ah, yeah if it's just affecting .8 then the fallout is pretty minimal :) [15:00:04] jelto, arnoldokoth, mutante, and arnaudb: Time to snap out of that daydream and deploy SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1500). [15:02:49] (03CR) 10Brouberol: [C:03+1] Add tlsHostnames for wdqs2 services in the new namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305147 (https://phabricator.wikimedia.org/T429313) (owner: 10Btullis) [15:03:09] (03CR) 10Btullis: [C:03+2] Add tlsHostnames for wdqs2 services in the new namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305147 (https://phabricator.wikimedia.org/T429313) (owner: 10Btullis) [15:05:13] (03CR) 10Btullis: [C:03+2] Add a custom connection to the wme_metrics API endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305065 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [15:06:09] (03CR) 10Tiziano Fogli: hadoop: add hdfs alert for HA status (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1304769 (https://phabricator.wikimedia.org/T407138) (owner: 10Hnowlan) [15:06:31] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2003.codfw.wmnet with reason: deploy [15:06:57] !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: deploy [15:06:58] !log brennen@deploy1003 Started deploy [phabricator/deployment@4d1f033]: deploy phab2003 for T429925 [15:07:03] T429925: Deploy Phab/Phorge 2026-06-23 - https://phabricator.wikimedia.org/T429925 [15:07:48] !log brennen@deploy1003 Finished deploy [phabricator/deployment@4d1f033]: deploy phab2003 for T429925 (duration: 00m 50s) [15:07:52] Dreamy_Jazz, abijeet: almost done (testservers definitely done) [15:08:17] !log brennen@deploy1003 Started deploy [phabricator/deployment@4d1f033]: deploy phab1004 for T429925 [15:08:56] Thanks [15:09:06] !log brennen@deploy1003 Finished deploy [phabricator/deployment@4d1f033]: deploy phab1004 for T429925 (duration: 00m 48s) [15:09:12] !log kamila@deploy1003 Finished scap sync-world: switch default image to Debian Bookworm, disable Bullseye (duration: 29m 18s) [15:09:19] and really done :D [15:09:42] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS trixie [15:09:48] (03CR) 10Tiziano Fogli: [C:04-1] hadoop: remove migrated hadoop-hdfs-active-namenode icinga check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304767 (https://phabricator.wikimedia.org/T407138) (owner: 10Hnowlan) [15:10:28] (03CR) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [15:10:49] FIRING: [2x] HelmReleaseBadStatus: Helm release mw-script/yhn94m3m on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [15:11:52] (03Merged) 10jenkins-bot: Add tlsHostnames for wdqs2 services in the new namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305147 (https://phabricator.wikimedia.org/T429313) (owner: 10Btullis) [15:14:25] Special:Version still isn't updated, but I'm guessing there is some caching going on [15:14:34] (03Merged) 10jenkins-bot: Add a custom connection to the wme_metrics API endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305065 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [15:15:34] (03CR) 10Tiziano Fogli: hadoop: migrate hdfs topology check to alertmanager (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304768 (https://phabricator.wikimedia.org/T407138) (owner: 10Hnowlan) [15:15:38] To check, I re-ran scap and it's showing these commits were not synced [15:15:42] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305123|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305122|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305051|ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin (T429774)]] [15:15:48] T429902: ConfirmEdit 'badloginperuser' does not fallback to 'badlogin' CAPTCHA config - https://phabricator.wikimedia.org/T429902 [15:15:48] T429774: Input methods are loaded on mobile if ULS rewrite is enabled - https://phabricator.wikimedia.org/T429774 [15:15:51] So going to run scap [15:16:08] jouncebot: nowandnext [15:16:08] For the next 0 hour(s) and 43 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1500) [15:16:09] In 0 hour(s) and 43 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1600) [15:16:21] 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12046149 (10hashar) Thanks @elukey for the review and merge. Looking at https://docker-registry.wikimedia.org/releng/node22-test-browser/tags/ , the `22.6.0` is still... [15:16:38] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [15:17:11] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [15:17:26] !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade [15:17:26] !log cwilliams@cumin1003 dbmaint on s4@codfw T429893 [15:17:31] T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893 [15:17:47] !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2172: Upgrading db2172.codfw.wmnet [15:17:52] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12046154 (10jcrespo) >>! In T427897#12045982, @Marostegui wrote: > @jcrespo we do not own db1208 either Ok, fair, but then should I maintain db1208 grants, or should I notify p... [15:18:07] !log btullis@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. [15:18:09] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2172: Upgrading db2172.codfw.wmnet [15:18:24] !log btullis@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. [15:18:33] cscott: is T429928 related to your deployment perhaps? [15:18:33] T429928: PHP Notice: Undefined property: MediaWiki\Parser\Parser::$useParsoidFragments - https://phabricator.wikimedia.org/T429928 [15:19:42] It seems like it, given it's started only just after that deployment [15:19:50] !log dreamyjazz@deploy1003 dreamyjazz, abi: Backport for [[gerrit:1305123|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305122|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305051|ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin (T429774)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Cha [15:19:50] nges can now be verified there. [15:19:56] (03PS1) 10Jgiannelos: Publish public PGP key of Yiannis Giannelos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305151 [15:19:59] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [15:20:10] !log dreamyjazz@deploy1003 dreamyjazz, abi: Continuing with deployment [15:20:19] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [15:20:35] !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS trixie [15:21:30] !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1190: Migration of db1190.eqiad.wmnet completed [15:22:40] (03CR) 10CWilliams: [C:04-1] mysql: update replication source (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto) [15:22:57] (03CR) 10Ladsgroup: [C:03+2] video: fix logspam issue [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1305134 (https://phabricator.wikimedia.org/T368180) (owner: 10Hnowlan) [15:24:14] (03PS2) 10Jgiannelos: Publish public PGP key of Yiannis Giannelos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305151 (https://phabricator.wikimedia.org/T423255) [15:26:25] (03Merged) 10jenkins-bot: video: fix logspam issue [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1305134 (https://phabricator.wikimedia.org/T368180) (owner: 10Hnowlan) [15:26:41] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305123|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305122|CaptchaFactory: Fallback config for badloginperuser from badlogin (T429902)]], [[gerrit:1305051|ULS rewrite: Don't initialize IME and undo tooltip on Minerva skin (T429774)]] (duration: 10m 59s) [15:26:47] T429902: ConfirmEdit 'badloginperuser' does not fallback to 'badlogin' CAPTCHA config - https://phabricator.wikimedia.org/T429902 [15:26:47] T429774: Input methods are loaded on mobile if ULS rewrite is enabled - https://phabricator.wikimedia.org/T429774 [15:35:20] (03PS1) 10Gerrit maintenance bot: Add isv to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305153 (https://phabricator.wikimedia.org/T429920) [15:36:41] (03PS1) 10Gerrit maintenance bot: Add bol to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305159 (https://phabricator.wikimedia.org/T429921) [15:37:37] (03Abandoned) 10Ladsgroup: Add ur to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1235199 (https://phabricator.wikimedia.org/T415960) (owner: 10Gerrit maintenance bot) [15:38:29] !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage [15:39:19] (03CR) 10BCornwall: [C:03+1] wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1305021 (https://phabricator.wikimedia.org/T429867) (owner: 10Gerrit maintenance bot) [15:39:31] (03PS1) 10Dreamy Jazz: hCaptcha: Enable for badlogin on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305165 (https://phabricator.wikimedia.org/T429843) [15:39:46] jouncebot: nowandnext [15:39:46] For the next 0 hour(s) and 20 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1500) [15:39:46] In 0 hour(s) and 20 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1600) [15:40:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305165 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [15:41:16] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Cumin hosts to Trixie - https://phabricator.wikimedia.org/T427897#12046541 (10Marostegui) I just added them ` [15:39:16] marostegui@cumin2003:~$ sudo db-mysql db1208:3352 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB... [15:41:21] (03PS2) 10Dreamy Jazz: hCaptcha: Enable for badlogin on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305165 (https://phabricator.wikimedia.org/T429843) [15:42:47] !log rzl@deploy1003:~$ kube-env mw-script-deploy eqiad; helm uninstall yhn94m3m # job completed successfully but helm release stuck in pending-install [15:42:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305165 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [15:43:48] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage [15:44:07] (03Abandoned) 10Ladsgroup: mariadb: Promote db1210 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1277600 (https://phabricator.wikimedia.org/T424551) (owner: 10Gerrit maintenance bot) [15:45:44] (03Merged) 10jenkins-bot: hCaptcha: Enable for badlogin on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305165 (https://phabricator.wikimedia.org/T429843) (owner: 10Dreamy Jazz) [15:45:49] FIRING: [2x] HelmReleaseBadStatus: Helm release mw-script/yhn94m3m on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [15:46:14] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305165|hCaptcha: Enable for badlogin on all wikis (T429843)]] [15:46:18] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [15:48:15] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1305165|hCaptcha: Enable for badlogin on all wikis (T429843)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:49:06] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [15:49:59] Dreamy_Jazz: can you let me know once you're done? :D [15:50:07] Yeah sure [15:50:20] Will be done when I'm done with this scap [15:52:37] (03PS3) 10Ayounsi: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 [15:53:14] 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface cr1-eqiad:ae2 (asw2-b-eqiad:ae1) - https://phabricator.wikimedia.org/T429116#12046627 (10cmooney) >>! In T429116#12041312, @Jclark-ctr wrote: > @cmooney, this link continues to experience errors. If you're available tomorrow and would like me to swap... [15:53:19] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305165|hCaptcha: Enable for badlogin on all wikis (T429843)]] (duration: 07m 06s) [15:53:24] T429843: hCaptcha: Show on bad login trigger - https://phabricator.wikimedia.org/T429843 [15:53:59] Amir1: You can go ahead [15:54:11] awesome! [15:54:24] (03PS4) 10Ayounsi: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 [15:55:25] (03CR) 10Hashar: "I don't think it will help for T420865, in my experience the connection tend to be aborted when doing the preliminary ref advertisement an" [puppet] - 10https://gerrit.wikimedia.org/r/1302834 (https://phabricator.wikimedia.org/T420865) (owner: 10Arnaudb) [15:57:16] !log cmooney@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1006.eqiad.wmnet with OS trixie [16:00:04] jhathaway and rzl: Time to do the Puppet request window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1600). [16:00:05] No Gerrit patches in the queue for this window AFAICS. [16:00:15] (03CR) 10CI reject: [V:04-1] netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 (owner: 10Ayounsi) [16:01:39] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS trixie [16:05:38] (03CR) 10Ladsgroup: [C:03+2] Add bol to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305159 (https://phabricator.wikimedia.org/T429921) (owner: 10Gerrit maintenance bot) [16:05:56] !log ladsgroup@dns1004 START - running authdns-update [16:06:24] (03CR) 10Jgiannelos: [C:03+2] Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos) [16:07:00] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1190: Migration of db1190.eqiad.wmnet completed [16:07:01] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [16:07:15] !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie [16:07:49] !log ladsgroup@dns1004 END - running authdns-update [16:08:23] (03PS2) 10Gerrit maintenance bot: Add isv to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305153 (https://phabricator.wikimedia.org/T429920) [16:08:26] (03CR) 10Ladsgroup: [C:03+2] Add isv to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305153 (https://phabricator.wikimedia.org/T429920) (owner: 10Gerrit maintenance bot) [16:08:29] (03CR) 10Ladsgroup: [V:03+2 C:03+2] Add isv to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1305153 (https://phabricator.wikimedia.org/T429920) (owner: 10Gerrit maintenance bot) [16:08:37] !log ladsgroup@dns1004 START - running authdns-update [16:09:40] FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:10:12] (03CR) 10CI reject: [V:04-1] Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos) [16:10:29] !log ladsgroup@dns1004 END - running authdns-update [16:10:42] (03CR) 10Jgiannelos: [C:04-1] "Blocking until deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos) [16:12:58] (03CR) 10Jgiannelos: [C:04-2] Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos) [16:14:09] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie [16:14:40] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:15:06] (03PS3) 10Jforrester: [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) [16:15:06] (03PS5) 10Jforrester: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) [16:15:06] (03PS1) 10Jforrester: WikiLambda: Expose wikilambda-abstract-optin for global group assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305182 (https://phabricator.wikimedia.org/T422698) [16:15:09] (03PS1) 10MVernon: Pontoon: stack-specific changes for the swift stack [puppet] - 10https://gerrit.wikimedia.org/r/1305183 (https://phabricator.wikimedia.org/T429630) [16:15:19] !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie [16:16:32] !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2172: Migration of db2172.codfw.wmnet completed [16:16:53] (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/1305183 (https://phabricator.wikimedia.org/T429630) (owner: 10MVernon) [16:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [16:29:42] (03PS1) 10Jdlrobson: Restore menu tab underline style [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305191 (https://phabricator.wikimedia.org/T428519) [16:37:39] (03PS10) 10Trueg: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) [16:40:18] 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#12047022 (10Dzahn) >>! In T427353#12044078, @VRiley-WMF wrote: > but I also see zuul1008 and zuul1009? Could you please provide more clarification on this? Hi @V... [16:44:33] (03CR) 10Scott French: [C:03+2] shellbox: Pick up images reflecting latest code [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304860 (https://phabricator.wikimedia.org/T428013) (owner: 10Scott French) [16:45:44] FYI, I'll be deploying some shellbox changes to staging only, in advance of some work later on in the upcoming infra window. [16:47:05] (03Merged) 10jenkins-bot: shellbox: Pick up images reflecting latest code [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304860 (https://phabricator.wikimedia.org/T428013) (owner: 10Scott French) [16:47:11] swfrench-wmf: I'm about to create some wikis, do you think it'll overlap [16:47:15] I can wait [16:48:28] Amir1: totally fine for that to overlap with what I'm doing in shellbox staging shortly. if you have mediawiki deployments that might run past 17:30 UTC, then it might be preferable if you pause for that 30m (i.e., until 18:00 UTC). [16:49:06] (ah, wait ... that's when the train is, heh. so yeah, maybe pause around 17:30 if you can) [16:49:35] I can see how long it's going to take [16:49:42] I can do parts of it [16:49:54] * swfrench-wmf thumbs up [16:50:34] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply [16:51:01] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply [16:51:02] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply [16:51:10] (03PS1) 10Ladsgroup: Init Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305196 (https://phabricator.wikimedia.org/T429920) [16:51:14] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply [16:51:15] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply [16:51:28] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply [16:51:29] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply [16:51:43] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [16:51:44] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply [16:51:49] (03CR) 10Ladsgroup: [C:03+2] Init Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305196 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [16:51:49] jouncebot: nowandnext [16:51:49] For the next 0 hour(s) and 8 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1600) [16:51:49] In 0 hour(s) and 8 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1700) [16:52:01] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply [16:52:02] !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply [16:52:03] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305196 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [16:52:24] !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply [16:52:44] (03Merged) 10jenkins-bot: Init Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305196 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [16:53:11] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1305196|Init Wikipedia Interslavic (T429920)]] [16:53:15] T429920: Create Wikipedia Interslavic - https://phabricator.wikimedia.org/T429920 [16:53:28] Reedy: Amir1 is creating a wiki (in case you were wanting to deploy) [16:53:50] (03CR) 10Jgiannelos: Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos) [16:54:01] Need to deploy a vendor backport "soon" as it'll no doubt have collateral in other deployments [16:54:33] (03PS1) 10Reedy: Upgrade guzzlehttp/* [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305197 (https://phabricator.wikimedia.org/T429965) [16:54:34] Reedy: +2 the patch, once it's merged, I let you go [16:54:40] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:54:51] I have like six different patches I need to deploy (three wikis to create) [16:54:52] (03PS1) 10Reedy: Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305198 (https://phabricator.wikimedia.org/T429965) [16:55:11] gonna have to force merge the vendor patch [16:55:16] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1305196|Init Wikipedia Interslavic (T429920)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [16:55:20] (03CR) 10Reedy: [V:03+2 C:03+2] Upgrade guzzlehttp/* [vendor] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305197 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [16:55:33] (03PS2) 10Reedy: Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305198 (https://phabricator.wikimedia.org/T429965) [16:55:37] (03CR) 10Reedy: [C:03+2] Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305198 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [16:55:48] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [16:55:56] sounds fun [16:56:09] bah, .8 too [16:56:12] I let you know once this scap goes through [16:56:40] will let that one land before backporting vendor to .8 and upsetting .7 [16:59:40] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [17:00:05] swfrench-wmf: MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1700). Please do the needful. [17:00:06] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305196|Init Wikipedia Interslavic (T429920)]] (duration: 06m 55s) [17:00:10] T429920: Create Wikipedia Interslavic - https://phabricator.wikimedia.org/T429920 [17:00:24] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie [17:00:29] Reedy: I'm done for now. Push your changes and let me know once you're done [17:00:48] Amir1: still waiting for jerkins [17:00:57] o/ - Reedy: I'm not touching anything until 17:30 UTC at the earliest, so all yours [17:00:58] cccccbukvgbckevfjjiukjfecgcdbljdvjdibjldkkve [17:01:09] sigh:) as usual [17:01:35] !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie [17:02:04] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2172: Migration of db2172.codfw.wmnet completed [17:02:05] !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) [17:04:03] (03PS1) 10Ladsgroup: Activate Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305199 (https://phabricator.wikimedia.org/T429920) [17:04:55] I quickly push this then ^ so the wiki is live [17:05:00] yeah np :) [17:05:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305199 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [17:05:46] (03PS1) 10Bartosz Dziewoński: mediawiki.org keys.html: Limit height of key code blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305200 [17:06:47] (03CR) 10CI reject: [V:04-1] Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305198 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [17:06:54] (03CR) 10CI reject: [V:04-1] Activate Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305199 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [17:07:04] (03Merged) 10jenkins-bot: Activate Wikipedia Interslavic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305199 (https://phabricator.wikimedia.org/T429920) (owner: 10Ladsgroup) [17:07:12] (03CR) 10AOkoth: "Deploys work as expected... Merging and running the decom." [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [17:07:13] castor can get in the bin [17:07:19] (03PS2) 10AOkoth: site: begin phab2002 decom [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) [17:07:33] (03CR) 10Reedy: [V:03+2 C:03+2] "`" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305198 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [17:08:18] I think it's pushing both [17:08:27] (03PS1) 10Reedy: Upgrade guzzlehttp/* [vendor] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305201 (https://phabricator.wikimedia.org/T429965) [17:08:27] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1305199|Activate Wikipedia Interslavic (T429920)]] [17:08:32] T429920: Create Wikipedia Interslavic - https://phabricator.wikimedia.org/T429920 [17:09:03] (03CR) 10Reedy: [V:03+2 C:03+2] Upgrade guzzlehttp/* [vendor] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305201 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [17:09:21] (03PS1) 10Reedy: Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305202 (https://phabricator.wikimedia.org/T429965) [17:09:34] (03CR) 10Reedy: [C:03+2] Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305202 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [17:10:34] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1305199|Activate Wikipedia Interslavic (T429920)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [17:10:54] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [17:11:02] (03CR) 10AOkoth: [C:03+2] site: begin phab2002 decom [puppet] - 10https://gerrit.wikimedia.org/r/1304849 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [17:13:51] !log cmooney@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie [17:15:11] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305199|Activate Wikipedia Interslavic (T429920)]] (duration: 06m 43s) [17:15:15] T429920: Create Wikipedia Interslavic - https://phabricator.wikimedia.org/T429920 [17:16:11] Reedy: I'm done for now for real :D [17:16:21] Thanks :P [17:16:24] I'm waiting for jerkins again [17:16:24] I think the wmf.7 patch is already deployed too [17:16:37] no big deal if it has :) [17:16:46] aokoth@cumin1003 decommission (PID 2452042) is awaiting input [17:18:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [17:19:10] (03CR) 10Dzahn: "seems like you merged in gitlab and this is ok to go now?" [puppet] - 10https://gerrit.wikimedia.org/r/1305044 (https://phabricator.wikimedia.org/T429367) (owner: 10Aklapper) [17:19:19] (03CR) 10Dzahn: [C:03+1] phabricator: drop diffusion.ssh-host config [puppet] - 10https://gerrit.wikimedia.org/r/1305044 (https://phabricator.wikimedia.org/T429367) (owner: 10Aklapper) [17:20:24] (03CR) 10Dzahn: "does this also come with a change in gitlab?" [puppet] - 10https://gerrit.wikimedia.org/r/1305041 (https://phabricator.wikimedia.org/T330797) (owner: 10Aklapper) [17:21:08] (03CR) 10Dzahn: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1305040 (https://phabricator.wikimedia.org/T418045) (owner: 10Aklapper) [17:21:41] (03Merged) 10jenkins-bot: Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 [core] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305202 (https://phabricator.wikimedia.org/T429965) (owner: 10Reedy) [17:21:47] finally [17:23:06] !log reedy@deploy1003 Started scap sync-world: Backport for [[gerrit:1305202|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305201|Upgrade guzzlehttp/* (T429965)]], [[gerrit:1305198|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305197|Upgrade guzzlehttp/* (T429965)]] [17:23:11] T429965: Host Confusion via Weak URI Host Validation in guzzlehttp/psr7 - https://phabricator.wikimedia.org/T429965 [17:24:21] FIRING: [3x] SLOBudgetBurn: Search update lag is below 95% target in eqiad - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [17:25:06] !log reedy@deploy1003 reedy: Backport for [[gerrit:1305202|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305201|Upgrade guzzlehttp/* (T429965)]], [[gerrit:1305198|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305197|Upgrade guzzlehttp/* (T429965)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [17:26:04] (03PS1) 10Btullis: Revert "Add a custom connection to the wme_metrics API endpoint" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305206 [17:29:21] FIRING: [3x] SLOBudgetBurn: Search update lag is below 95% target in eqiad - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [17:29:25] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [17:29:44] (03PS1) 10Btullis: Fix incorrectly indented connection for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305211 (https://phabricator.wikimedia.org/T428544) [17:29:59] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [17:30:22] (03Abandoned) 10Btullis: Revert "Add a custom connection to the wme_metrics API endpoint" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305206 (owner: 10Btullis) [17:30:37] (03CR) 10Brouberol: [C:03+1] Fix incorrectly indented connection for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305211 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [17:30:48] * swfrench-wmf is pleased to see Reedy's deployment is progressing [17:31:16] Reedy: any concerns if I pick up the infra window work when your deployment wraps up? [17:33:30] (03PS1) 10Arlolra: Expand strip markers when they are present in attribute values [extensions/Kartographer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305215 (https://phabricator.wikimedia.org/T383004) [17:33:58] (03CR) 10Btullis: [C:03+2] Fix incorrectly indented connection for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305211 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [17:33:59] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/Kartographer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305215 (https://phabricator.wikimedia.org/T383004) (owner: 10Arlolra) [17:34:21] RESOLVED: [3x] SLOBudgetBurn: Search update lag is below 95% target in eqiad - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [17:36:11] (03Merged) 10jenkins-bot: Fix incorrectly indented connection for airflow-analytics-product [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305211 (https://phabricator.wikimedia.org/T428544) (owner: 10Btullis) [17:36:54] !log reedy@deploy1003 reedy: Continuing with deployment [17:37:08] swfrench-wmf: sorry, distracted with other stuff. yeah, your GTG when this is done [17:37:20] Reedy: awesome, thanks! [17:37:57] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [17:38:01] !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [17:41:09] !log reedy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305202|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305201|Upgrade guzzlehttp/* (T429965)]], [[gerrit:1305198|Updated guzzlehttp/guzzle from 7.12.1 to 7.12.3 (T429965)]], [[gerrit:1305197|Upgrade guzzlehttp/* (T429965)]] (duration: 18m 03s) [17:41:14] T429965: Host Confusion via Weak URI Host Validation in guzzlehttp/psr7 - https://phabricator.wikimedia.org/T429965 [17:41:18] swfrench-wmf: all yours [17:41:25] * swfrench-wmf thumbs up [17:41:53] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply [17:42:57] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply [17:43:29] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply [17:43:37] (03PS1) 10Zaidusyy: Add clickable link to email verification message [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1305218 (https://phabricator.wikimedia.org/T429901) [17:44:01] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply [17:44:32] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply [17:44:51] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply [17:45:14] (03CR) 10Aklapper: [C:04-1] "Yeah but we first need to deploy the Phab change on a phuture Tuesday, afterwards this can get merged (I think, maybe I'm overcautious?)." [puppet] - 10https://gerrit.wikimedia.org/r/1305044 (https://phabricator.wikimedia.org/T429367) (owner: 10Aklapper) [17:45:22] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply [17:45:40] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [17:46:11] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply [17:46:38] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply [17:47:09] !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply [17:48:10] !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply [17:48:33] (03CR) 10Reedy: "Doesn't this need to go upstream to Gerrit, not just in our deployment branch?" [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1305218 (https://phabricator.wikimedia.org/T429901) (owner: 10Zaidusyy) [17:52:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:56:25] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox: apply [17:57:41] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply [17:58:12] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply [17:58:58] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply [17:59:29] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply [17:59:43] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply [18:00:04] brennen and jeena: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T1800). [18:00:14] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply [18:00:33] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [18:00:54] apologies, the infra window is running a wee bit over. ETA ~ 2-3m. [18:01:04] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply [18:01:23] o/ [18:01:28] no worries swfrench-wmf [18:01:30] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply [18:02:01] !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply [18:02:56] (03PS1) 10C. Scott Ananian: [parser] Rename mStripExtTags to useParsoidFragments [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305223 [18:03:01] !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply [18:03:39] (03PS2) 10C. Scott Ananian: [parser] Rename mStripExtTags to useParsoidFragments [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305223 (https://phabricator.wikimedia.org/T429928) [18:04:05] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305223 (https://phabricator.wikimedia.org/T429928) (owner: 10C. Scott Ananian) [18:05:56] brennen: alright, I'll continue to keep an eye on my end of things, but given that I don't yet see any smoke coming out, I think you're good to proceed at your convenience :) [18:09:06] swfrench-wmf: ack, ty [18:09:10] yeah train folks, i goofed in a backport to wmf.7 earlier today. I don't know if https://phabricator.wikimedia.org/T429928 is considered a train blocker, but i've got another backport which will fix it if so [18:10:03] cscott: if it happens much at all, seems like we might as well go ahead and backport [18:10:08] i can do that [18:10:50] it's https://gerrit.wikimedia.org/r/c/1305223/ and i've scheduled it for the 4pm window if its not done by then. [18:12:06] according to arlo, this is probably only being triggered by our team's internal RT testing, not triggered in production, so it's probably not a train blocker. (aka it could wait if there are more pressing items) [18:13:26] (03CR) 10Eric Gardner: [C:03+1] Enable MMV carousel on non-en wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie) [18:14:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305223 (https://phabricator.wikimedia.org/T429928) (owner: 10C. Scott Ananian) [18:15:34] we're early in the week and have plenty of window here, so nothing more pressing at the moment i think [18:16:06] (i'm working under the assumption that T429720 is currently a beta-only situation, unless someone tells me otherwise soonish.) [18:16:08] T429720: Files on Betacommons can be moved once, then never again - https://phabricator.wikimedia.org/T429720 [18:26:24] (03Merged) 10jenkins-bot: [parser] Rename mStripExtTags to useParsoidFragments [core] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305223 (https://phabricator.wikimedia.org/T429928) (owner: 10C. Scott Ananian) [18:26:53] !log brennen@deploy1003 Started scap sync-world: Backport for [[gerrit:1305223|[parser] Rename mStripExtTags to useParsoidFragments (T429928)]] [18:26:57] T429928: PHP Notice: Undefined property: MediaWiki\Parser\Parser::$useParsoidFragments - https://phabricator.wikimedia.org/T429928 [18:28:54] !log brennen@deploy1003 brennen, cscott: Backport for [[gerrit:1305223|[parser] Rename mStripExtTags to useParsoidFragments (T429928)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [18:30:02] cscott: anything to test? [18:30:26] (based on previous discussion i'm assuming not) [18:31:07] Not really. [18:31:21] cool [18:31:25] !log brennen@deploy1003 brennen, cscott: Continuing with deployment [18:31:25] After it deploys well re-start roundtrip-testing and the logs should stop [18:37:42] !log brennen@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305223|[parser] Rename mStripExtTags to useParsoidFragments (T429928)]] (duration: 10m 49s) [18:37:47] T429928: PHP Notice: Undefined property: MediaWiki\Parser\Parser::$useParsoidFragments - https://phabricator.wikimedia.org/T429928 [18:41:10] (03PS1) 10TrainBranchBot: group0 to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305228 (https://phabricator.wikimedia.org/T423917) [18:41:13] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by brennen@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305228 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [18:45:58] (03Merged) 10jenkins-bot: group0 to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305228 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot) [18:46:43] (03PS3) 10Ahmon Dancy: modules/beta/files/wmf-beta-update-databases.py: Keep update.php jobs topped up [puppet] - 10https://gerrit.wikimedia.org/r/1302910 [18:52:17] !log brennen@deploy1003 rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.8 refs T423917 [18:52:21] T423917: 1.47.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T423917 [19:23:30] 10ops-eqiad, 06SRE, 06DC-Ops: Q4: eqiad: (12) PDUs for ML expansion - https://phabricator.wikimedia.org/T400778#12047877 (10VRiley-WMF) Ticket for rack powerup is 1-261534072456 [19:31:32] FIRING: [2x] SLOBudgetBurn: Search update lag is below 95% target in codfw - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [19:36:32] RESOLVED: [2x] SLOBudgetBurn: Search update lag is below 95% target in codfw - https://alerts.wikimedia.org/?q=alertname%3DSLOBudgetBurn [19:46:04] FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [19:58:55] (03CR) 10Eevans: [C:03+1] "Insofar as I grok these types of alerts (hint: only a bit), this looks good to me." [alerts] - 10https://gerrit.wikimedia.org/r/1304852 (https://phabricator.wikimedia.org/T407141) (owner: 10Hnowlan) [19:59:39] !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [20:00:05] RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: May I have your attention please! UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T2000) [20:00:05] sbassett, WMDE-Fisch, matthiasmullie, arlolra, and cscott: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:06] o/ [20:00:12] \o [20:00:12] !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [20:00:13] !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [20:00:19] o/ [20:00:29] (03CR) 10Eevans: [C:03+1] "I gather this happens after the alertmanager check is live?" [puppet] - 10https://gerrit.wikimedia.org/r/1305083 (https://phabricator.wikimedia.org/T407141) (owner: 10Tiziano Fogli) [20:00:43] (03CR) 10Eevans: [C:03+1] restabase: remove instance space icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1305084 (https://phabricator.wikimedia.org/T407141) (owner: 10Tiziano Fogli) [20:00:48] o/ [20:00:49] !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [20:01:29] I could self serve [20:01:49] Not sure if it makes sense to bundle anything? [20:02:00] (03PS1) 10Dreamy Jazz: hCaptcha: Define login interface name for secureEnclave.js [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305241 (https://phabricator.wikimedia.org/T429963) [20:02:15] (03PS1) 10Dreamy Jazz: hCaptcha: Define login interface name for secureEnclave.js [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305242 (https://phabricator.wikimedia.org/T429963) [20:02:31] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305241 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:02:52] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305242 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:03:05] \o [20:03:21] I can also self-serve, I just have a CS.php config patch [20:03:34] sbassett: Then go ahead! [20:03:53] maybe the two config patche can go together? [20:04:39] matthiasmullie: ^ [20:04:50] I'm happy to have mine bundled with anyone else's (though I can also self serve), I shouldn't need to do much testing [20:05:06] same ;-) [20:05:28] Ok, what other config patches need bundling? I can add them to my spiderpig prompt... [20:05:47] The other config patch is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1305049 [20:05:57] Ok [20:05:58] Based on what I'm seeing at wikitech [20:06:21] Ok, I’ll deploy these now [20:06:30] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbassett@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [20:06:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbassett@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie) [20:07:00] yeah, don't mind mine being combined with something else! [20:07:27] (03Merged) 10jenkins-bot: Lazily reject pre-fix parser-cache entries for noreferrer/noopener links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304876 (https://phabricator.wikimedia.org/T429090) (owner: 10SBassett) [20:07:35] (03Merged) 10jenkins-bot: Enable MMV carousel on non-en wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305049 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie) [20:08:07] !log sbassett@deploy1003 Started scap sync-world: Backport for [[gerrit:1304876|Lazily reject pre-fix parser-cache entries for noreferrer/noopener links (T429090 T429244)]], [[gerrit:1305049|Enable MMV carousel on non-en wikipedias (T429509)]] [20:08:13] T429090: Add "noreferrer" to the "rel" attribute for links leading to archive.today or one of its mirrors - https://phabricator.wikimedia.org/T429090 [20:08:14] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [20:10:12] !log sbassett@deploy1003 sbassett, mlitn: Backport for [[gerrit:1304876|Lazily reject pre-fix parser-cache entries for noreferrer/noopener links (T429090 T429244)]], [[gerrit:1305049|Enable MMV carousel on non-en wikipedias (T429509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:10:53] matthiasmullie: [20:10:59] Any changes you’d like to verify? [20:11:10] checking [20:11:22] (03PS1) 10Hcoplin: Dumps user-agent enforcement messaging [puppet] - 10https://gerrit.wikimedia.org/r/1305243 (https://phabricator.wikimedia.org/T427836) [20:11:55] (03CR) 10CI reject: [V:04-1] Dumps user-agent enforcement messaging [puppet] - 10https://gerrit.wikimedia.org/r/1305243 (https://phabricator.wikimedia.org/T427836) (owner: 10Hcoplin) [20:13:55] (03PS2) 10Mooeypoo: Dumps: user-agent enforcement messaging [puppet] - 10https://gerrit.wikimedia.org/r/1305243 (https://phabricator.wikimedia.org/T427836) (owner: 10Hcoplin) [20:13:57] sbassett: LGTM! [20:14:03] thanks, continuing [20:14:07] !log sbassett@deploy1003 sbassett, mlitn: Continuing with deployment [20:17:16] FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished [20:18:23] !log sbassett@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304876|Lazily reject pre-fix parser-cache entries for noreferrer/noopener links (T429090 T429244)]], [[gerrit:1305049|Enable MMV carousel on non-en wikipedias (T429509)]] (duration: 10m 17s) [20:18:30] T429090: Add "noreferrer" to the "rel" attribute for links leading to archive.today or one of its mirrors - https://phabricator.wikimedia.org/T429090 [20:18:31] T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509 [20:18:49] Ok, matthiasmullie and I should be good now. So whomever’s next on the list… [20:18:55] sbassett: thanks! [20:20:57] Soo arlolra Dreamy_Jazz should we bundle our .7 patches? [20:21:06] Sure [20:21:09] ok [20:21:09] I see the other one is done already [20:21:32] I've got a .7 and a .8 to do [20:21:41] But should be fine to do them both together? [20:21:51] That's what I don't know [20:21:53] :-) [20:22:05] From my point of view, doing both mine together is fine [20:22:17] Yeah, should be fine [20:22:31] Then we put the all together I guess ;-) [20:23:18] Anyone have a desire to run scap for all of us? [20:23:19] Dreamy_Jazz: Want to have the honor? [20:23:22] Sure [20:23:25] ;-D [20:25:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305241 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:25:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305242 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:25:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/Cite] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305064 (https://phabricator.wikimedia.org/T426974) (owner: 10WMDE-Fisch) [20:25:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/Kartographer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305215 (https://phabricator.wikimedia.org/T383004) (owner: 10Arlolra) [20:26:52] (03Merged) 10jenkins-bot: hCaptcha: Define login interface name for secureEnclave.js [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305241 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:26:56] (03Merged) 10jenkins-bot: hCaptcha: Define login interface name for secureEnclave.js [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305242 (https://phabricator.wikimedia.org/T429963) (owner: 10Dreamy Jazz) [20:32:49] (03Merged) 10jenkins-bot: Improve click intent event logging and exposure tracking [extensions/Cite] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305064 (https://phabricator.wikimedia.org/T426974) (owner: 10WMDE-Fisch) [20:32:54] (03Merged) 10jenkins-bot: Expand strip markers when they are present in attribute values [extensions/Kartographer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305215 (https://phabricator.wikimedia.org/T383004) (owner: 10Arlolra) [20:33:24] (03CR) 10Bartosz Dziewoński: [C:03+1] Publish public PGP key of Yiannis Giannelos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305151 (https://phabricator.wikimedia.org/T423255) (owner: 10Jgiannelos) [20:33:27] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305241|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305242|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305064|Improve click intent event logging and exposure tracking]], [[gerrit:1305215|Expand strip markers when they are present in attribute values (T383004)]] [20:33:34] T429963: hCaptcha: Set interface name for login in secureEnclave.js - https://phabricator.wikimedia.org/T429963 [20:33:35] T383004: Parsoid read views: map with extension (cite, templatestyles) in caption results in raw UNIQ QINU marker - https://phabricator.wikimedia.org/T383004 [20:34:02] (03PS2) 10Bartosz Dziewoński: mediawiki.org keys.html: Limit height of key code blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305200 [20:35:26] !log dreamyjazz@deploy1003 dreamyjazz, wmde-fisch, arlolra: Backport for [[gerrit:1305241|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305242|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305064|Improve click intent event logging and exposure tracking]], [[gerrit:1305215|Expand strip markers when they are present in attribute values (T383004)]] synce [20:35:27] d to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:35:49] Any testing to be done for either of your backports? [20:36:14] Yes, give me a minute [20:36:50] Tested, I'm fine! [20:37:10] Good to go [20:37:18] I've also tested, proceeding... [20:37:21] !log dreamyjazz@deploy1003 dreamyjazz, wmde-fisch, arlolra: Continuing with deployment [20:41:36] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305241|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305242|hCaptcha: Define login interface name for secureEnclave.js (T429963)]], [[gerrit:1305064|Improve click intent event logging and exposure tracking]], [[gerrit:1305215|Expand strip markers when they are present in attribute values (T383004)]] (duration: 0 [20:41:37] 8m 09s) [20:41:43] T429963: hCaptcha: Set interface name for login in secureEnclave.js - https://phabricator.wikimedia.org/T429963 [20:41:43] T383004: Parsoid read views: map with extension (cite, templatestyles) in caption results in raw UNIQ QINU marker - https://phabricator.wikimedia.org/T383004 [20:41:57] Dreamy_Jazz: thanks [20:43:53] Dreamy_Jazz: Thank You! [20:44:09] That seems like the window is now done [20:44:26] !log Evening UTC backport window done [20:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:05] Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T2100) [21:06:06] jouncebot: nowandnext [21:06:06] For the next 0 hour(s) and 53 minute(s): Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T2100) [21:06:06] In 8 hour(s) and 53 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T0600) [21:06:17] (03PS1) 10Dreamy Jazz: Revert "SourceEditorOverlay: Show CAPTCHA if has content in onStageChanges" [extensions/MobileFrontend] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305251 (https://phabricator.wikimedia.org/T429996) [21:07:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/MobileFrontend] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305251 (https://phabricator.wikimedia.org/T429996) (owner: 10Dreamy Jazz) [21:16:07] (03Merged) 10jenkins-bot: Revert "SourceEditorOverlay: Show CAPTCHA if has content in onStageChanges" [extensions/MobileFrontend] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305251 (https://phabricator.wikimedia.org/T429996) (owner: 10Dreamy Jazz) [21:16:37] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305251|Revert "SourceEditorOverlay: Show CAPTCHA if has content in onStageChanges" (T429996)]] [21:16:41] T429996: MobileFrontend source editor: "Enter confirmation code" appears to all users who have skipcaptcha - https://phabricator.wikimedia.org/T429996 [21:18:38] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1305251|Revert "SourceEditorOverlay: Show CAPTCHA if has content in onStageChanges" (T429996)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:18:54] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [21:19:11] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [21:23:27] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305251|Revert "SourceEditorOverlay: Show CAPTCHA if has content in onStageChanges" (T429996)]] (duration: 06m 50s) [21:23:34] T429996: MobileFrontend source editor: "Enter confirmation code" appears to all users who have skipcaptcha - https://phabricator.wikimedia.org/T429996 [21:38:17] (03PS4) 10Ahmon Dancy: modules/beta/files/wmf-beta-update-databases.py: Keep update.php jobs topped up [puppet] - 10https://gerrit.wikimedia.org/r/1302910 [21:39:50] (03CR) 10Ahmon Dancy: "Revised to use ThreadPoolExecutor" [puppet] - 10https://gerrit.wikimedia.org/r/1302910 (owner: 10Ahmon Dancy) [21:52:02] (03PS1) 10Dreamy Jazz: Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305255 (https://phabricator.wikimedia.org/T428659) [21:52:26] (03PS1) 10Dreamy Jazz: hCaptcha: Temporarily disable risk score block collection [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305256 (https://phabricator.wikimedia.org/T428659) [21:52:33] (03CR) 10CI reject: [V:04-1] Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305255 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [21:52:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:53:01] jouncebot: nowandnext [21:53:01] For the next 0 hour(s) and 6 minute(s): Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260623T2100) [21:53:01] In 8 hour(s) and 6 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T0600) [21:53:11] (03CR) 10Dreamy Jazz: [C:03+2] hCaptcha: Temporarily disable risk score block collection [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305256 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [21:53:46] (03CR) 10Dreamy Jazz: "recheck" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305255 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [21:57:01] (03Merged) 10jenkins-bot: hCaptcha: Temporarily disable risk score block collection [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305256 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [21:57:19] (03PS1) 10Dreamy Jazz: hCaptcha: Reenable risk score block collection [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305257 (https://phabricator.wikimedia.org/T428659) [21:58:41] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305257 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [21:58:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305255 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [22:06:02] (03Merged) 10jenkins-bot: Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305255 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [22:06:04] (03Merged) 10jenkins-bot: hCaptcha: Reenable risk score block collection [extensions/WikimediaEvents] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305257 (https://phabricator.wikimedia.org/T428659) (owner: 10Dreamy Jazz) [22:06:36] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305256|hCaptcha: Temporarily disable risk score block collection (T428659)]], [[gerrit:1305257|hCaptcha: Reenable risk score block collection (T428659)]], [[gerrit:1305255|Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs (T428659)]] [22:06:41] T428659: hCaptcha risk score collection caused globalblock table reads to triple - https://phabricator.wikimedia.org/T428659 [22:08:36] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1305256|hCaptcha: Temporarily disable risk score block collection (T428659)]], [[gerrit:1305257|hCaptcha: Reenable risk score block collection (T428659)]], [[gerrit:1305255|Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs (T428659)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:11:32] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [22:15:50] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305256|hCaptcha: Temporarily disable risk score block collection (T428659)]], [[gerrit:1305257|hCaptcha: Reenable risk score block collection (T428659)]], [[gerrit:1305255|Update HCaptchaRiskScoreRetrievedForBlocks hook to not use IDs (T428659)]] (duration: 09m 14s) [22:15:54] T428659: hCaptcha risk score collection caused globalblock table reads to triple - https://phabricator.wikimedia.org/T428659 [22:20:21] !log cmooney@cumin1003 START - Cookbook sre.network.peering with action 'configure' for AS: 12389 [22:21:07] !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12389 [22:34:02] (03PS1) 10Cwhite: logstash: send thumbor logs to test partition [puppet] - 10https://gerrit.wikimedia.org/r/1305260 (https://phabricator.wikimedia.org/T368180)