[00:07:51] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189988 [00:07:51] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189988 (owner: 10TrainBranchBot) [00:29:48] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189988 (owner: 10TrainBranchBot) [00:41:20] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:41:26] FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:44:01] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [01:00:36] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [01:12:51] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 12m 14s) [01:36:26] FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:41:20] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:55:01] 06SRE, 10Wikimedia-Mailing-lists: Request for a mailing list for Moore Wikimedians - https://phabricator.wikimedia.org/T405164#11199964 (10Bugreporter) >>! In T405164#11199893, @Ladsgroup wrote: > Is it an officially recognized UG by AffCom? I do not believe it is required. Previously we created mail lists fo... [01:56:21] 06SRE, 06Traffic-Icebox, 10MobileFrontend (Tracking): QA features on the new mobile URLs - https://phabricator.wikimedia.org/T403638#11199968 (10Edtadros) == Requirement Scope: Mobile domain sunsetting QA (desktop + mobile). - Both standard and mobile subdomain URLs must correctly render Minerva for mobile... [02:21:02] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate kibana.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:26:10] FIRING: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [02:29:12] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:31:10] RESOLVED: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [03:31:49] FIRING: MailmanBounceQueueHigh: Mailman bounce queue on lists1004:9100 has more than 50 messages - https://wikitech.wikimedia.org/wiki/Mailman/Runbooks#MailmanBounceQueueHigh - https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3?forceLogin&from=now-3h&orgId=1&to=now&viewPanel=2 - https://alerts.wikimedia.org/?q=alertname%3DMailmanBounceQueueHigh [03:44:01] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [04:43:06] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (conflict) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [04:44:01] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [04:48:06] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (conflict) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [04:57:46] (03PS1) 10Superpes15: [gewikimedia] Update logo and wordmark and change sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190000 (https://phabricator.wikimedia.org/T405147) [05:09:01] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:34:01] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:36:41] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:05:09] 06SRE, 10Wikimedia-Mailing-lists: Request for a mailing list for Moore Wikimedians - https://phabricator.wikimedia.org/T405164#11200111 (10Hasslaebetch) >>! In T405164#11199893, @Ladsgroup wrote: > Is it an officially recognized UG by AffCom? Moore Wikimedia Community has recently applied for a user group and... [06:20:28] (03PS1) 10Giuseppe Lavagetto: tls: ban default UAs with forge URLs [puppet] - 10https://gerrit.wikimedia.org/r/1190004 (https://phabricator.wikimedia.org/T400119) [06:21:02] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate kibana.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [06:27:43] (03CR) 10Kosta Harlan: hCaptcha: Enable on API account creation on test2wiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189879 (https://phabricator.wikimedia.org/T405107) (owner: 10Kosta Harlan) [06:29:12] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:34:25] (03PS1) 10Muehlenhoff: maps2011: Re-enable imports of OSM and waterlines [puppet] - 10https://gerrit.wikimedia.org/r/1190045 (https://phabricator.wikimedia.org/T381565) [06:36:20] (03CR) 10Kosta Harlan: hCaptcha: Enable on API account creation on test2wiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189879 (https://phabricator.wikimedia.org/T405107) (owner: 10Kosta Harlan) [06:37:28] (03CR) 10Muehlenhoff: [C:03+2] maps2011: Re-enable imports of OSM and waterlines [puppet] - 10https://gerrit.wikimedia.org/r/1190045 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [06:37:40] (03PS2) 10Kosta Harlan: hCaptcha: Enable on API account creation on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189879 (https://phabricator.wikimedia.org/T405107) [06:38:39] (03PS3) 10Kosta Harlan: hCaptcha: Enable on API account creation on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189879 (https://phabricator.wikimedia.org/T405107) [06:38:58] jouncebot: nowandnext [06:38:59] For the next 0 hour(s) and 21 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250921T0700) [06:38:59] In 0 hour(s) and 21 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T0700) [06:40:16] (03Merged) 10jenkins-bot: hCaptcha: Enable on API account creation on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189879 (https://phabricator.wikimedia.org/T405107) (owner: 10Kosta Harlan) [06:40:44] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1189879|hCaptcha: Enable on API account creation on test2wiki (T405107)]] [06:40:49] T405107: hCaptcha: Enable on API account creations on test2wiki - https://phabricator.wikimedia.org/T405107 [06:48:36] (03CR) 10Kosta Harlan: SI: Add a configuration flag to hide SI even if the feature is enabled (031 comment) [extensions/CheckUser] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189869 (https://phabricator.wikimedia.org/T405076) (owner: 10Dreamy Jazz) [06:51:44] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:51:44] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:54:53] (03PS1) 10Brouberol: dse-k8s-eqiad/mediawiki-dumps-legacy: bump resource quotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190103 [06:56:42] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54829 bytes in 7.040 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:56:42] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9235 bytes in 7.211 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:00:05] Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T0700). [07:00:05] phuedx, kostajh, and Superpes: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:17] Hi :) [07:00:20] hi, I'm finishing up deploying my patch [07:04:35] o/ (patch opened from Sam Smith (phuedx)) [07:04:43] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1189879|hCaptcha: Enable on API account creation on test2wiki (T405107)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:04:48] T405107: hCaptcha: Enable on API account creations on test2wiki - https://phabricator.wikimedia.org/T405107 [07:05:36] !log kharlan@deploy1003 kharlan: Continuing with sync [07:06:44] hueitan: let's deploy when kostajh's patch is done. [07:06:53] kart_ thanks! [07:08:06] !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts install1004.wikimedia.org [07:08:19] jfyi, we've taken the last patch out of the backport window, we'll deploy it in a one-off window when everything else is done. [07:11:14] jmm@cumin2002 decommission (PID 2142053) is awaiting input [07:13:43] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Move install servers to Bookworm - https://phabricator.wikimedia.org/T396487#11200180 (10MoritzMuehlenhoff) [07:14:30] o/ [07:14:52] Sorry I'm late [07:14:52] nearly done here [07:17:29] (03PS1) 10Ilias Sarantopoulos: ml-services: update articletopic in prod and remove trasnsformer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189871 (https://phabricator.wikimedia.org/T404294) [07:17:46] !log jmm@cumin2002 START - Cookbook sre.dns.netbox [07:18:26] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189879|hCaptcha: Enable on API account creation on test2wiki (T405107)]] (duration: 37m 41s) [07:18:30] T405107: hCaptcha: Enable on API account creations on test2wiki - https://phabricator.wikimedia.org/T405107 [07:18:58] done [07:19:53] ready kart_ [07:20:06] kart_, hueitan: Are you deploying? [07:20:27] phuedx: yes. [07:20:32] oh, one of u deploy is fine [07:20:49] phuedx: go ahead then :D [07:21:33] (03PS1) 10Muehlenhoff: installserver: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) [07:21:34] OK [07:21:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189877 (https://phabricator.wikimedia.org/T404420) (owner: 10Phuedx) [07:22:42] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529#11200208 (10MoritzMuehlenhoff) [07:22:49] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [07:23:19] (03Merged) 10jenkins-bot: xLab: Fix instrument to produce valid events [extensions/WikimediaEvents] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189877 (https://phabricator.wikimedia.org/T404420) (owner: 10Phuedx) [07:23:30] jmm@cumin2002 decommission (PID 2142053) is awaiting input [07:23:40] !log phuedx@deploy1003 Started scap sync-world: Backport for [[gerrit:1189877|xLab: Fix instrument to produce valid events (T404420)]] [07:23:45] T404420: Enable 13 wikis for MinT for Wiki Readers A/A test - https://phabricator.wikimedia.org/T404420 [07:29:20] !log phuedx@deploy1003 phuedx: Backport for [[gerrit:1189877|xLab: Fix instrument to produce valid events (T404420)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:29:25] T404420: Enable 13 wikis for MinT for Wiki Readers A/A test - https://phabricator.wikimedia.org/T404420 [07:29:39] !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" [07:30:37] kart_, hueitan: I've checked the source for the instrument on the test servers and it's coming through OK [07:30:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" [07:30:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [07:30:49] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install1004.wikimedia.org [07:30:50] That is, the translation: {} part is there [07:30:53] (y) [07:31:06] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Move install servers to Bookworm - https://phabricator.wikimedia.org/T396487#11200223 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `install1004.wikimedia.org` - install1004.wikimedia.org (**PASS**)... [07:31:07] !log phuedx@deploy1003 phuedx: Continuing with sync [07:31:15] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Move install servers to Bookworm - https://phabricator.wikimedia.org/T396487#11200226 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done! [07:31:35] RESOLVED: MailmanBounceQueueHigh: Mailman bounce queue on lists1004:9100 has more than 50 messages - https://wikitech.wikimedia.org/wiki/Mailman/Runbooks#MailmanBounceQueueHigh - https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3?forceLogin&from=now-3h&orgId=1&to=now&viewPanel=2 - https://alerts.wikimedia.org/?q=alertname%3DMailmanBounceQueueHigh [07:33:51] RECOVERY - mailman3_queue_size on lists1004 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [07:36:32] (03CR) 10Muehlenhoff: "PCC failure on P5 can be ignored, the Puppet manifests on install use P7-specific code" [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [07:39:09] !log phuedx@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189877|xLab: Fix instrument to produce valid events (T404420)]] (duration: 15m 28s) [07:39:14] T404420: Enable 13 wikis for MinT for Wiki Readers A/A test - https://phabricator.wikimedia.org/T404420 [07:39:18] Superpes: Over to you :) [07:39:53] Yep thanks :) [07:43:44] (03CR) 10Elukey: [C:03+1] installserver: Select the custom nginx provider with no additional modules (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [07:44:01] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:51:13] (03CR) 10Slyngshede: [V:03+1] P:puppetserver::volatile Include XCheeseScore private repo (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) (owner: 10Slyngshede) [07:51:44] (03PS17) 10Slyngshede: P:puppetserver::volatile Include XCheeseScore private repo [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) [07:54:26] (03CR) 10Bartosz Wójtowicz: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189871 (https://phabricator.wikimedia.org/T404294) (owner: 10Ilias Sarantopoulos) [07:54:31] (03CR) 10CI reject: [V:04-1] P:puppetserver::volatile Include XCheeseScore private repo [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) (owner: 10Slyngshede) [07:54:58] (03PS18) 10Slyngshede: P:puppetserver::volatile Include XCheeseScore private repo [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) [07:55:56] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7000/co" [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) (owner: 10Slyngshede) [07:56:44] (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: update articletopic in prod and remove trasnsformer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189871 (https://phabricator.wikimedia.org/T404294) (owner: 10Ilias Sarantopoulos) [07:58:20] (03PS19) 10Slyngshede: P:puppetserver::volatile Include XCheeseScore private repo [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) [07:58:41] (03Merged) 10jenkins-bot: ml-services: update articletopic in prod and remove trasnsformer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189871 (https://phabricator.wikimedia.org/T404294) (owner: 10Ilias Sarantopoulos) [07:58:59] Superpes: are you planning to deploy, or is the morning window finished? [07:59:10] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7001/co" [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) (owner: 10Slyngshede) [07:59:17] I'm not a deployer :/ [07:59:20] (03PS1) 10Kosta Harlan: hCaptcha: Enable account creation trial on phase 2 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190184 (https://phabricator.wikimedia.org/T402366) [08:00:05] WMDE-Fisch and awight: It is that lovely time of the day again! You are hereby commanded to deploy Sub-referencing pilot deployment (dewiki). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T0800). [08:00:05] WMDE-Fisch: A patch you scheduled for Sub-referencing pilot deployment (dewiki) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:22] (03CR) 10Kosta Harlan: [C:04-2] "For September 24" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190184 (https://phabricator.wikimedia.org/T402366) (owner: 10Kosta Harlan) [08:00:25] (03PS2) 10Muehlenhoff: installserver: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) [08:00:36] (03CR) 10Muehlenhoff: installserver: Select the custom nginx provider with no additional modules (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [08:00:55] !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [08:01:00] awight: can you deploy Superpes' patches? If not, I can do it [08:01:04] If there is no one available I will re-schedule the patches for the next window! [08:01:20] Superpes: If that's okay for you (non-urgent?) then that would be great [08:01:28] Sorry that Monday morning deployments are particularly slow [08:01:33] Yep no problem [08:01:37] ack! [08:01:40] Superpes: ty! [08:02:19] !log beginning dewiki sub-referencing deployment window [08:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:34] !log upgrading Envoy on webperf hosts T403663 [08:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:38] T403663: Upgrade Envoy to v1.29.12 - https://phabricator.wikimedia.org/T403663 [08:04:26] (03CR) 10Awight: [C:03+1] Enable sub-references on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189949 (https://phabricator.wikimedia.org/T398669) (owner: 10WMDE-Fisch) [08:04:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by awight@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189949 (https://phabricator.wikimedia.org/T398669) (owner: 10WMDE-Fisch) [08:04:47] Yeah, I know it's a slow window, but it's the best window for me, and I'm always hesitant to ask for deployment access, otherwise I'd solve the problem :/ But it's not such a serious problem! I'll reschedule it asap :) [08:05:52] (03Merged) 10jenkins-bot: Enable sub-references on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189949 (https://phabricator.wikimedia.org/T398669) (owner: 10WMDE-Fisch) [08:06:10] !log awight@deploy1003 Started scap sync-world: Backport for [[gerrit:1189949|Enable sub-references on dewiki (T398669)]] [08:06:15] T398669: Deploy sub-referencing MVP to dewiki (stub) - https://phabricator.wikimedia.org/T398669 [08:06:49] (03PS1) 10Bartosz Wójtowicz: ml-services: Remove the transformer config from articletopic staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190185 (https://phabricator.wikimedia.org/T404294) [08:07:47] (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: Remove the transformer config from articletopic staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190185 (https://phabricator.wikimedia.org/T404294) (owner: 10Bartosz Wójtowicz) [08:09:48] (03Merged) 10jenkins-bot: ml-services: Remove the transformer config from articletopic staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190185 (https://phabricator.wikimedia.org/T404294) (owner: 10Bartosz Wójtowicz) [08:11:12] (03CR) 10Fabfur: [C:03+1] "overall ok, question: why don't re-use `ua_library_default`?" [puppet] - 10https://gerrit.wikimedia.org/r/1190004 (https://phabricator.wikimedia.org/T400119) (owner: 10Giuseppe Lavagetto) [08:12:12] !log awight@deploy1003 wmde-fisch, awight: Backport for [[gerrit:1189949|Enable sub-references on dewiki (T398669)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:12:12] !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [08:12:16] T398669: Deploy sub-referencing MVP to dewiki (stub) - https://phabricator.wikimedia.org/T398669 [08:14:47] (03PS1) 10Tchanders: Deploy temporary accounts to itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190188 (https://phabricator.wikimedia.org/T405195) [08:18:11] !log awight@deploy1003 wmde-fisch, awight: Continuing with sync [08:18:34] (03PS1) 10WMDE-Fisch: Fix pulsating dot not disappearing for logged in users [extensions/Cite] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190189 (https://phabricator.wikimedia.org/T403693) [08:22:11] (03CR) 10Thiemo Kreuz (WMDE): [C:03+1] Fix pulsating dot not disappearing for logged in users [extensions/Cite] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190189 (https://phabricator.wikimedia.org/T403693) (owner: 10WMDE-Fisch) [08:22:42] (03CR) 10Elukey: [C:03+2] Add support for AMD MI300X GPUs (kernel and firmwares) [puppet] - 10https://gerrit.wikimedia.org/r/1189807 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [08:23:24] (03CR) 10Elukey: [C:03+2] profile::amd_gpu: add support for amd-smi from ROCm 6.4.3 [puppet] - 10https://gerrit.wikimedia.org/r/1189815 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [08:23:36] !log awight@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189949|Enable sub-references on dewiki (T398669)]] (duration: 17m 26s) [08:23:40] (03CR) 10Elukey: [C:03+2] Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [08:23:41] T398669: Deploy sub-referencing MVP to dewiki (stub) - https://phabricator.wikimedia.org/T398669 [08:23:51] (03PS7) 10Elukey: Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) [08:24:21] (03CR) 10TrainBranchBot: [C:03+2] "Approved by awight@deploy1003 using scap backport" [extensions/Cite] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190189 (https://phabricator.wikimedia.org/T403693) (owner: 10WMDE-Fisch) [08:27:58] (03PS1) 10Kosta Harlan: hCaptcha: Log error message from upstream [extensions/ConfirmEdit] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190191 [08:28:28] awight: can you let me know when you're done deploying, please? [08:29:06] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189858 (owner: 10Jforrester) [08:32:41] kostajh: It looks like we'll take the full hour, if there's nothing else too urgent? [08:33:20] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189857 (owner: 10Jforrester) [08:33:25] awight: nothing urgent, no [08:33:29] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189856 (owner: 10Jforrester) [08:33:47] !log klausman@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [08:33:48] ack :D [08:35:07] (03CR) 10Btullis: [C:03+1] dse-k8s-eqiad/mediawiki-dumps-legacy: bump resource quotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190103 (owner: 10Brouberol) [08:37:16] (03CR) 10Brouberol: [C:03+2] dse-k8s-eqiad/mediawiki-dumps-legacy: bump resource quotas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190103 (owner: 10Brouberol) [08:39:03] (03PS1) 10Slyngshede: User Block: Add option for unsetting email address [software/bitu] - 10https://gerrit.wikimedia.org/r/1190198 (https://phabricator.wikimedia.org/T404430) [08:40:29] (03Merged) 10jenkins-bot: Fix pulsating dot not disappearing for logged in users [extensions/Cite] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190189 (https://phabricator.wikimedia.org/T403693) (owner: 10WMDE-Fisch) [08:40:44] !log awight@deploy1003 Started scap sync-world: Backport for [[gerrit:1190189|Fix pulsating dot not disappearing for logged in users (T403693)]] [08:40:49] T403693: Implement tooltips onboarding for subreferencs - https://phabricator.wikimedia.org/T403693 [08:43:42] (03PS2) 10Slyngshede: User Block: Add option for unsetting email address [software/bitu] - 10https://gerrit.wikimedia.org/r/1190198 (https://phabricator.wikimedia.org/T404430) [08:44:01] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [08:44:59] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [08:45:31] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [08:46:41] !log awight@deploy1003 awight, wmde-fisch: Backport for [[gerrit:1190189|Fix pulsating dot not disappearing for logged in users (T403693)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:46:45] T403693: Implement tooltips onboarding for subreferencs - https://phabricator.wikimedia.org/T403693 [08:47:05] (03CR) 10FNegri: [C:03+1] P:wmcs: root-keys: Update Taavi's keys to security keys [puppet] - 10https://gerrit.wikimedia.org/r/1189830 (owner: 10Majavah) [08:47:44] !log awight@deploy1003 awight, wmde-fisch: Continuing with sync [08:49:33] (03PS8) 10Elukey: Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) [08:49:33] (03PS1) 10Elukey: profile::amd_gpu: refactor the class to allow more use cases [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) [08:49:59] (03CR) 10FNegri: [C:03+1] P:mariadb::cloudinfra: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1189476 (owner: 10Majavah) [08:53:10] !log awight@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190189|Fix pulsating dot not disappearing for logged in users (T403693)]] (duration: 12m 26s) [08:53:15] T403693: Implement tooltips onboarding for subreferencs - https://phabricator.wikimedia.org/T403693 [08:53:49] !log finished special dewiki sub-referencing window [08:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:56] kostajh: finally done, thanks! [08:54:07] awight: thanks! [08:54:10] jouncebot: nowandnext [08:54:10] For the next 0 hour(s) and 5 minute(s): Sub-referencing pilot deployment (dewiki) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T0800) [08:54:11] In 1 hour(s) and 5 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1000) [08:54:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190191 (owner: 10Kosta Harlan) [08:58:27] (03PS1) 10Brouberol: Enroll dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) [08:59:31] (03PS2) 10Brouberol: Enroll dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) [09:00:58] (03CR) 10Muehlenhoff: [C:03+2] Reset maps1011 [puppet] - 10https://gerrit.wikimedia.org/r/1189829 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [09:01:02] (03CR) 10Brouberol: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7004/co" [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) (owner: 10Brouberol) [09:01:10] !log klausman@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [09:01:45] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:01:45] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:01:51] !log klausman@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [09:04:42] (03CR) 10Muehlenhoff: [C:03+2] installserver: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/1190109 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [09:05:03] (03CR) 10Stevemunene: [C:03+1] Enroll dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) (owner: 10Brouberol) [09:05:22] !log bump space for prometheus k8s-dse in eqiad [09:05:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:44] !log klausman@cumin2002 START - Cookbook sre.discovery.service-route check 2 services: maintenance [09:05:45] !log klausman@cumin2002 END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance [09:06:34] !log klausman@cumin2002 START - Cookbook sre.discovery.service-route check 2 services: maintenance [09:06:34] !log klausman@cumin2002 END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance [09:07:42] (03PS2) 10Elukey: profile::amd_gpu: refactor the class to allow more use cases [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) [09:07:42] (03PS9) 10Elukey: Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) [09:08:34] (03Merged) 10jenkins-bot: hCaptcha: Log error message from upstream [extensions/ConfirmEdit] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190191 (owner: 10Kosta Harlan) [09:08:50] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1190191|hCaptcha: Log error message from upstream]] [09:09:28] !log bwojtowicz@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:10:55] jouncebot: nowandnext [09:10:55] No deployments scheduled for the next 0 hour(s) and 49 minute(s) [09:10:55] In 0 hour(s) and 49 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1000) [09:11:11] !log klausman@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:11:14] (03PS1) 10Ladsgroup: User: Reduce locking severity of ::getInstanceForUpdate() [core] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190205 [09:11:19] (03CR) 10Ladsgroup: [C:03+2] User: Reduce locking severity of ::getInstanceForUpdate() [core] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190205 (owner: 10Ladsgroup) [09:11:40] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54829 bytes in 5.940 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:11:40] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9235 bytes in 6.104 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:13:04] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 5 DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [09:14:29] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1190191|hCaptcha: Log error message from upstream]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:15:59] (03CR) 10Elukey: [V:03+1 C:04-1] "On paper this would be good, but in practice we include profile::amd_gpu in a lot of places where a gpus is not installed (like on all had" [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [09:16:17] !log prune now obsolete nginx packages from install* T329529 [09:16:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:22] T329529: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529 [09:18:27] !log klausman@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:18:28] !log kharlan@deploy1003 kharlan: Continuing with sync [09:18:39] !log klausman@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:19:33] (03PS1) 10Majavah: P:base::cloud_production: Remove kernel_messages script [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) [09:20:00] (03CR) 10CI reject: [V:04-1] P:base::cloud_production: Remove kernel_messages script [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) (owner: 10Majavah) [09:20:39] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7006/co" [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) (owner: 10Majavah) [09:21:18] (03PS2) 10Majavah: P:base::cloud_production: Remove kernel_messages script [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) [09:22:15] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7007/co" [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) (owner: 10Majavah) [09:22:46] !log klausman@cumin2002 START - Cookbook sre.discovery.service-route depool inference in eqiad: maintenance [09:23:43] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190191|hCaptcha: Log error message from upstream]] (duration: 14m 53s) [09:24:06] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529#11200807 (10MoritzMuehlenhoff) [09:24:07] (03CR) 10Majavah: [C:03+2] P:wmcs: root-keys: Update Taavi's keys to security keys [puppet] - 10https://gerrit.wikimedia.org/r/1189830 (owner: 10Majavah) [09:24:55] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529#11200810 (10MoritzMuehlenhoff) [09:25:03] (03CR) 10Majavah: [C:03+2] P:mariadb::cloudinfra: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1189476 (owner: 10Majavah) [09:26:21] (03CR) 10FNegri: [C:03+1] "I initially thought we could remove the alert, but keep the prom metrics... but realistically I don't think the metrics are gonna be usefu" [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) (owner: 10Majavah) [09:26:42] (03CR) 10Majavah: [V:03+1 C:03+2] P:base::cloud_production: Remove kernel_messages script [puppet] - 10https://gerrit.wikimedia.org/r/1190207 (https://phabricator.wikimedia.org/T404300) (owner: 10Majavah) [09:27:04] (03Merged) 10jenkins-bot: User: Reduce locking severity of ::getInstanceForUpdate() [core] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190205 (owner: 10Ladsgroup) [09:27:35] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host maps1011.eqiad.wmnet with OS bookworm [09:27:49] 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11200824 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host maps1011.eqiad.wmnet with OS bookworm [09:27:51] !log klausman@cumin2002 END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool inference in eqiad: maintenance [09:27:56] Hi! I want to run a maintenance script to add wikidata support for mswikiquote. Let me know if this is a bad time, otherwise i'll proceed [09:28:32] (03CR) 10Elukey: [V:03+1 C:04-1] "PCC SUCCESS (CORE_DIFF 5 DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [09:28:55] !log bwojtowicz@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:29:43] !log installing libfastjson security updates [09:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:56] !log btullis@cumin1003 START - Cookbook sre.hosts.decommission for hosts an-backup-datanode[1032,1034-1046].eqiad.wmnet [09:30:17] !log klausman@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [09:31:05] 06SRE, 13Patch-For-Review, 10Trust and Safety Product Sprint (Sprint Dadar Gulung (September 8 - September 26)), 10WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for automatic fallback to FancyCAPTCHA - https://phabricator.wikimedia.org/T404204#11200831 (10kostajh) My curr... [09:31:12] 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: Dell PowerEdge or Supermicro Broadcom RAID Controller (instance an-worker1187) - https://phabricator.wikimedia.org/T405217 (10LSobanski) 03NEW [09:31:57] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1190205|User: Reduce locking severity of ::getInstanceForUpdate()]] [09:33:18] (03CR) 10Brouberol: [V:03+1 C:03+2] Enroll dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) (owner: 10Brouberol) [09:34:10] !log klausman@cumin2002 START - Cookbook sre.discovery.service-route pool inference in eqiad: maintenance [09:36:29] !log joelyrookewmde@deploy1003 mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # T404704 [09:36:34] T404704: Add Wikidata support for mswikiquote - https://phabricator.wikimedia.org/T404704 [09:36:41] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:37:02] (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1189870 (https://phabricator.wikimedia.org/T405078) (owner: 10Majavah) [09:37:59] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1190205|User: Reduce locking severity of ::getInstanceForUpdate()]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:38:44] !log ladsgroup@deploy1003 ladsgroup: Continuing with sync [09:39:17] !log klausman@cumin2002 END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool inference in eqiad: maintenance [09:39:33] btullis@cumin1003 decommission (PID 3649024) is awaiting input [09:39:57] (03CR) 10Majavah: [C:03+2] P:toolforge::k8s::haproxy: Handle API gateway external access [puppet] - 10https://gerrit.wikimedia.org/r/1189841 (https://phabricator.wikimedia.org/T283948) (owner: 10Majavah) [09:40:20] (03PS3) 10Elukey: profile::amd_gpu: refactor the class to allow more use cases [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) [09:40:20] (03PS10) 10Elukey: Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) [09:43:57] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190205|User: Reduce locking severity of ::getInstanceForUpdate()]] (duration: 11m 59s) [09:44:12] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (DIFF 7 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [09:44:21] (03CR) 10Majavah: [C:03+2] P:toolforge::k8s::haproxy: Allow passing list of IPs for VIPs [puppet] - 10https://gerrit.wikimedia.org/r/1189870 (https://phabricator.wikimedia.org/T405078) (owner: 10Majavah) [09:45:31] !log installing imagemagick security updates [09:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:30] (03CR) 10Btullis: [C:03+1] Enroll dse-k8s-worker1014 [puppet] - 10https://gerrit.wikimedia.org/r/1190202 (https://phabricator.wikimedia.org/T405209) (owner: 10Brouberol) [09:46:36] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage [09:50:05] (03CR) 10Elukey: [V:03+1] "Hi! I think that this change would go really nicely with a revert of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1184533, so we'l" [puppet] - 10https://gerrit.wikimedia.org/r/1190201 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [09:52:12] (03CR) 10Joal: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1187896 (https://phabricator.wikimedia.org/T404473) (owner: 10SD0001) [09:53:36] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage [09:54:28] 07sre-alert-triage, 06Data-Platform-SRE: Alert in need of triage: Dell PowerEdge or Supermicro Broadcom RAID Controller (instance an-worker1187) - https://phabricator.wikimedia.org/T405217#11200877 (10elukey) [09:56:44] !log installing modsecurity-apache security updates [09:56:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:41] !log Finished populateSitesTable for mswikiquote (T404704) [09:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:45] T404704: Add Wikidata support for mswikiquote - https://phabricator.wikimedia.org/T404704 [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1000) [10:08:44] !log installing qemu security updates [10:08:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:26] (03PS3) 10Dreamy Jazz: CheckUser: Enable SI on enwiki and frwiki while hiding special page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189890 (https://phabricator.wikimedia.org/T405109) [10:09:31] (03CR) 10Kosta Harlan: SI: Add a configuration flag to hide SI even if the feature is enabled (031 comment) [extensions/CheckUser] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189869 (https://phabricator.wikimedia.org/T405076) (owner: 10Dreamy Jazz) [10:09:55] (03CR) 10Dreamy Jazz: CheckUser: Enable SI on enwiki and frwiki while hiding special page (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189890 (https://phabricator.wikimedia.org/T405109) (owner: 10Dreamy Jazz) [10:11:30] (03PS4) 10Dreamy Jazz: CheckUser: Enable SI on enwiki and frwiki while hiding special page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189890 (https://phabricator.wikimedia.org/T405109) [10:12:54] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1011.eqiad.wmnet with OS bookworm [10:13:08] 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11200908 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host maps1011.eqiad.wmnet with OS bookworm completed: - maps1011 (**PASS**) - Downt... [10:16:37] btullis@cumin1003 decommission (PID 3649024) is awaiting input [10:19:48] 10SRE-SLO: Pyrra calculations for the Initial error budget value of calendar windows - https://phabricator.wikimedia.org/T403729#11200918 (10elukey) After a long research I think I found a good reply for the use case highlighted above. In my head every SLO window should theoretically start from 100% of remainin... [10:21:02] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate kibana.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [10:24:12] (03PS1) 10Muehlenhoff: Disable replication timers in eqiad/bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190226 (https://phabricator.wikimedia.org/T381565) [10:24:14] (03PS1) 10Muehlenhoff: Reapply master_bookworm role to maps1011 [puppet] - 10https://gerrit.wikimedia.org/r/1190227 (https://phabricator.wikimedia.org/T381565) [10:24:46] 10SRE-SLO, 10EditCheck, 10Lift-Wing, 06Machine-Learning-Team, 10Editing-team (Tracking): Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#11200929 (10elukey) To keep archives happy, I added a more detailed explanation of the current limits that Pyrra sho... [10:29:09] !log temporary updating haproxykafka version on cp5021 to check for non-parsable character (T404427) [10:29:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:12] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:30:38] 06SRE, 10Wikimedia-Mailing-lists: Request for a mailing list for Moore Wikimedians - https://phabricator.wikimedia.org/T405164#11200935 (10Ladsgroup) It would have been great if we had some level of scrutiny before creating mailing lists but that's for future. I think `moore-wikimedians@lists.wikimedia.org` is... [10:32:49] !log btullis@cumin1003 START - Cookbook sre.dns.netbox [10:36:43] !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1032,1034-1046].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003" [10:38:45] !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1032,1034-1046].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003" [10:38:45] !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:38:47] !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-backup-datanode[1032,1034-1046].eqiad.wmnet [10:42:29] (03CR) 10Btullis: [C:03+2] Remove the manifests for the absented product_analytics jobs [puppet] - 10https://gerrit.wikimedia.org/r/1189204 (https://phabricator.wikimedia.org/T404639) (owner: 10Btullis) [10:43:27] !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5 [10:48:20] !log fceratto@cumin1002 END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5 [10:48:37] !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5 [10:51:45] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:51:45] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:53:42] fceratto@cumin1002 sanitize-wiki (PID 1744898) is awaiting input [10:55:46] 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26): Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11200987 (10BTullis) a:05BTullis→03None [11:01:32] jouncebot: nowandnext [11:01:32] No deployments scheduled for the next 1 hour(s) and 58 minute(s) [11:01:32] In 1 hour(s) and 58 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1300) [11:01:35] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54829 bytes in 1.095 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:01:35] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9235 bytes in 1.224 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:01:48] (03CR) 10Zabe: [C:03+2] Attach thwikimedia to SUL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189981 (https://phabricator.wikimedia.org/T400001) (owner: 10Zabe) [11:01:52] (03CR) 10Zabe: [C:03+2] Set timezone and project namespace for thwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189985 (https://phabricator.wikimedia.org/T400001) (owner: 10Zabe) [11:02:56] (03Merged) 10jenkins-bot: Attach thwikimedia to SUL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189981 (https://phabricator.wikimedia.org/T400001) (owner: 10Zabe) [11:02:58] (03Merged) 10jenkins-bot: Set timezone and project namespace for thwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189985 (https://phabricator.wikimedia.org/T400001) (owner: 10Zabe) [11:03:17] !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1189981|Attach thwikimedia to SUL (T400001)]], [[gerrit:1189985|Set timezone and project namespace for thwikimedia (T400001)]] [11:03:22] T400001: Create a Wiki for Wikimedia Thailand - https://phabricator.wikimedia.org/T400001 [11:03:56] (03PS1) 10Btullis: Remove old code for decommissioned hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/1190235 (https://phabricator.wikimedia.org/T404970) [11:04:50] (03PS2) 10Btullis: Remove old code for decommissioned hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/1190235 (https://phabricator.wikimedia.org/T404970) [11:06:52] fceratto@cumin1002 sanitize-wiki (PID 1744898) is awaiting input [11:08:48] !log zabe@deploy1003 zabe: Backport for [[gerrit:1189981|Attach thwikimedia to SUL (T400001)]], [[gerrit:1189985|Set timezone and project namespace for thwikimedia (T400001)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [11:08:53] T400001: Create a Wiki for Wikimedia Thailand - https://phabricator.wikimedia.org/T400001 [11:09:56] !log zabe@deploy1003 zabe: Continuing with sync [11:12:25] 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 13Patch-For-Review: Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11201028 (10BTullis) >>! In T404970#11194862, @BTullis wrote: > Ple... [11:15:13] !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189981|Attach thwikimedia to SUL (T400001)]], [[gerrit:1189985|Set timezone and project namespace for thwikimedia (T400001)]] (duration: 11m 55s) [11:15:20] T400001: Create a Wiki for Wikimedia Thailand - https://phabricator.wikimedia.org/T400001 [11:18:03] (03CR) 10Clément Goubert: [C:03+1] api-gateway: Remove .tpl extension from yaml files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189440 (owner: 10Daniel Kinzler) [11:18:16] 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 13Patch-For-Review: Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11201043 (10BTullis) I have cleaned up the kerberos principals and... [11:18:45] (03CR) 10Muehlenhoff: [C:03+2] Add my second FIDO token [puppet] - 10https://gerrit.wikimedia.org/r/1189490 (owner: 10Muehlenhoff) [11:19:36] !log zabe@deploy1003 mwscript-k8s job started: extensions/CentralAuth/maintenance/attachAccount.php --wiki=thwikimedia --userlist users.txt [11:20:07] 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 13Patch-For-Review: Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11201045 (10Jclark-ctr) a:03Jclark-ctr [11:20:09] (03CR) 10Cathal Mooney: [C:03+2] Nokia: EBGP configuration base build [homer/public] - 10https://gerrit.wikimedia.org/r/1187092 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:22:01] (03Merged) 10jenkins-bot: Nokia: EBGP configuration base build [homer/public] - 10https://gerrit.wikimedia.org/r/1187092 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:24:18] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1190198 (https://phabricator.wikimedia.org/T404430) (owner: 10Slyngshede) [11:27:42] 10SRE-swift-storage, 10Thumbor: Gradually drop all thumbnails as a one-off clean up - https://phabricator.wikimedia.org/T379942#11201059 (10Ladsgroup) [11:40:33] 10ops-eqiad, 06Data-Platform-SRE, 06DC-Ops: hw troubleshooting: power up failure for an-worker1132.eqiad.wmnet - https://phabricator.wikimedia.org/T405221 (10BTullis) 03NEW [11:41:05] 06SRE, 10CXServer, 10envoy, 06Language and Product Localization, 13Patch-For-Review: Allow proxy server to accept another valid http header instead of 'HOST' - https://phabricator.wikimedia.org/T404291#11201131 (10akosiaris) I 've already replied in T394982#11201112, but I find it improbable that #SRE wi... [11:41:11] !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5 [11:41:40] !log upgrading Envoy on puppetboard hosts T403663 [11:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:45] T403663: Upgrade Envoy to v1.29.12 - https://phabricator.wikimedia.org/T403663 [11:42:28] (03PS2) 10Tchanders: Enable temporary accounts on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190188 (https://phabricator.wikimedia.org/T405195) [11:44:01] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:45:16] (03CR) 10JMeybohm: [C:03+2] haproxy ipblocks-all: Filter disabled ipblocks [puppet] - 10https://gerrit.wikimedia.org/r/1188300 (https://phabricator.wikimedia.org/T402014) (owner: 10JMeybohm) [11:47:15] (03CR) 10Brouberol: [C:03+1] Remove old code for decommissioned hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/1190235 (https://phabricator.wikimedia.org/T404970) (owner: 10Btullis) [11:49:52] (03PS1) 10JMeybohm: Revert "haproxy ipblocks-all: Filter disabled ipblocks" [puppet] - 10https://gerrit.wikimedia.org/r/1190256 [11:50:52] !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5 [11:52:38] (03CR) 10CI reject: [V:04-1] Revert "haproxy ipblocks-all: Filter disabled ipblocks" [puppet] - 10https://gerrit.wikimedia.org/r/1190256 (owner: 10JMeybohm) [11:53:20] 06SRE, 10Wikimedia-Mailing-lists: Request for a mailing list for Moore Wikimedians - https://phabricator.wikimedia.org/T405164#11201156 (10Hasslaebetch) >>! In T405164#11200935, @Ladsgroup wrote: > It would have been great if we had some level of scrutiny before creating mailing lists but that's for future. I... [11:54:18] (03PS2) 10JMeybohm: Revert "haproxy ipblocks-all: Filter disabled ipblocks" [puppet] - 10https://gerrit.wikimedia.org/r/1190256 [11:55:08] fceratto@cumin1002 sanitize-wiki (PID 1857879) is awaiting input [11:57:38] (03CR) 10JMeybohm: [C:03+2] Revert "haproxy ipblocks-all: Filter disabled ipblocks" [puppet] - 10https://gerrit.wikimedia.org/r/1190256 (owner: 10JMeybohm) [11:58:05] (03PS2) 10Majavah: openstack: Drop obsolete linuxbridge config files [puppet] - 10https://gerrit.wikimedia.org/r/1189393 [11:58:06] (03PS2) 10Majavah: P:openstack: nova: Drop obsolete settings [puppet] - 10https://gerrit.wikimedia.org/r/1189394 [11:58:51] (03CR) 10Majavah: [C:03+2] openstack: Drop obsolete linuxbridge config files [puppet] - 10https://gerrit.wikimedia.org/r/1189393 (owner: 10Majavah) [11:59:26] (03CR) 10STran: [C:03+1] Enable temporary accounts on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190188 (https://phabricator.wikimedia.org/T405195) (owner: 10Tchanders) [12:00:31] (03PS1) 10Slyngshede: data.yaml: add FIDO enabled SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1190258 [12:03:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.938s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:03:54] (03CR) 10Cathal Mooney: data.yaml: add FIDO enabled SSH key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:07:42] 06SRE, 10Wikimedia-Mailing-lists: Excessive Spam on wikipedia-bn@lists.wikimedia.org Mailing List - https://phabricator.wikimedia.org/T388958#11201174 (10Aklapper) 05Open→03Invalid In that case @Aftabuzzaman would be the person to change the mailing list settings. Nothing that SRE can do here, apart fr... [12:08:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.185s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:11:25] (03CR) 10Klausman: [C:03+2] Enable ROCm 6.4.3 amd-smi on ml-serve{1012,1013} [puppet] - 10https://gerrit.wikimedia.org/r/1189816 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey) [12:12:02] (03CR) 10Muehlenhoff: data.yaml: add FIDO enabled SSH key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:15:09] (03CR) 10Cathal Mooney: data.yaml: add FIDO enabled SSH key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:15:29] jouncebot: nowandnext [12:15:29] No deployments scheduled for the next 0 hour(s) and 44 minute(s) [12:15:29] In 0 hour(s) and 44 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1300) [12:15:40] I'll start my deployments early as the calendar is busy [12:16:04] (03CR) 10Dreamy Jazz: [C:03+2] SI: Add a configuration flag to hide SI even if the feature is enabled [extensions/CheckUser] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189869 (https://phabricator.wikimedia.org/T405076) (owner: 10Dreamy Jazz) [12:16:08] (03CR) 10Dreamy Jazz: [C:03+2] CheckUser: Enable SI on enwiki and frwiki while hiding special page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189890 (https://phabricator.wikimedia.org/T405109) (owner: 10Dreamy Jazz) [12:17:37] !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5 [12:17:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/CheckUser] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189869 (https://phabricator.wikimedia.org/T405076) (owner: 10Dreamy Jazz) [12:17:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189890 (https://phabricator.wikimedia.org/T405109) (owner: 10Dreamy Jazz) [12:19:42] (03CR) 10Slyngshede: data.yaml: add FIDO enabled SSH key (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:28:12] (03Merged) 10jenkins-bot: SI: Add a configuration flag to hide SI even if the feature is enabled [extensions/CheckUser] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189869 (https://phabricator.wikimedia.org/T405076) (owner: 10Dreamy Jazz) [12:28:17] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet [12:34:21] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet [12:38:14] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet [12:40:11] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1189869|SI: Add a configuration flag to hide SI even if the feature is enabled (T405076)]], [[gerrit:1189890|CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109)]] [12:40:17] T405076: Suggested investigations: Add a configuration setting to disable the special page when the feature is otherwise enabled - https://phabricator.wikimedia.org/T405076 [12:40:17] T405109: Suggested investigations: Enable case creation on French and English Wikipedias - https://phabricator.wikimedia.org/T405109 [12:40:27] (03CR) 10Muehlenhoff: [C:03+1] "Key verified out-of-band as well" [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:43:10] (03CR) 10Slyngshede: [C:03+2] data.yaml: add FIDO enabled SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1190258 (owner: 10Slyngshede) [12:44:01] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [12:44:22] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet [12:46:16] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1189869|SI: Add a configuration flag to hide SI even if the feature is enabled (T405076)]], [[gerrit:1189890|CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:46:23] T405076: Suggested investigations: Add a configuration setting to disable the special page when the feature is otherwise enabled - https://phabricator.wikimedia.org/T405076 [12:46:24] T405109: Suggested investigations: Enable case creation on French and English Wikipedias - https://phabricator.wikimedia.org/T405109 [12:48:11] (03PS1) 10JMeybohm: haproxy ipblocks-all: Filter disabled ipblocks [puppet] - 10https://gerrit.wikimedia.org/r/1190274 (https://phabricator.wikimedia.org/T402014) [12:48:11] 06SRE, 06Commons, 10MediaWiki-Uploading, 06Traffic: HTTP 503 error when uploading images on Wikimedia Commons - https://phabricator.wikimedia.org/T383274#11201286 (10Aklapper) @Underbar_dk: two months later, would you know if this issue is continuing? [12:49:41] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1188799 (https://phabricator.wikimedia.org/T327063) (owner: 10Sbisson) [12:50:32] (03CR) 10JMeybohm: "Second try...I'm not sure why my local test had not shown the error I saw on cp* after merging (`: error callin" [puppet] - 10https://gerrit.wikimedia.org/r/1190274 (https://phabricator.wikimedia.org/T402014) (owner: 10JMeybohm) [12:53:06] 06SRE, 06Commons, 10MediaWiki-Uploading, 06Traffic: HTTP 503 error when uploading images on Wikimedia Commons - https://phabricator.wikimedia.org/T383274#11201294 (10Underbar_dk) I still have not had the opportunity to upload images to Commons, though I sometimes have trouble making edits to large articles... [12:56:00] (03PS1) 10DCausse: cirrus-streaming-updater: drop custom java17 configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190275 [12:56:06] !log brouberol@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [12:56:40] !log brouberol@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [12:57:23] Still testing.... [12:58:39] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with sync [12:58:49] (03CR) 10Elukey: [C:03+1] Reapply master_bookworm role to maps1011 [puppet] - 10https://gerrit.wikimedia.org/r/1190227 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [12:59:05] (03CR) 10Elukey: [C:03+1] Disable replication timers in eqiad/bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190226 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [12:59:05] (03PS1) 10Btullis: Update the scaffolding helpers for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190277 (https://phabricator.wikimedia.org/T381416) [12:59:07] (03PS1) 10Btullis: Fix the networkpolicy allowing the k8s api to call the spark-operator webhook [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190278 (https://phabricator.wikimedia.org/T381416) [12:59:10] (03PS1) 10Btullis: Add the required securitycontext to pods created by the spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190279 (https://phabricator.wikimedia.org/T381416) [13:00:04] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1300). [13:00:05] Dreamy Jazz, Superpes, and edsanders: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:09] o/ [13:00:44] \o [13:01:09] (03CR) 10CI reject: [V:04-1] Add the required securitycontext to pods created by the spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190279 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:01:32] (03CR) 10Btullis: [C:03+2] Remove old code for decommissioned hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/1190235 (https://phabricator.wikimedia.org/T404970) (owner: 10Btullis) [13:01:40] I can deploy my own (three maint script patches together) once Dream_Jazz has finished [13:03:12] edsanders And then could you also deploy my patches (together)? :P [13:04:09] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189869|SI: Add a configuration flag to hide SI even if the feature is enabled (T405076)]], [[gerrit:1189890|CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109)]] (duration: 23m 58s) [13:04:16] (03CR) 10Muehlenhoff: [C:03+2] Disable replication timers in eqiad/bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190226 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [13:04:16] T405076: Suggested investigations: Add a configuration setting to disable the special page when the feature is otherwise enabled - https://phabricator.wikimedia.org/T405076 [13:04:17] T405109: Suggested investigations: Enable case creation on French and English Wikipedias - https://phabricator.wikimedia.org/T405109 [13:04:23] Superpes: ok [13:04:42] !log brouberol@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [13:04:59] !log brouberol@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [13:06:13] (03CR) 10Brouberol: [C:03+1] cirrus-streaming-updater: drop custom java17 configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190275 (owner: 10DCausse) [13:06:22] (03CR) 10DCausse: [C:03+2] cirrus-streaming-updater: drop custom java17 configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190275 (owner: 10DCausse) [13:06:32] Superpes: Ok, I'll start now [13:07:06] Thanks :) I'll wait! [13:08:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189930 (https://phabricator.wikimedia.org/T405069) (owner: 10Superpes15) [13:08:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190000 (https://phabricator.wikimedia.org/T405147) (owner: 10Superpes15) [13:08:05] (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189856 (owner: 10Jforrester) [13:08:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189857 (owner: 10Jforrester) [13:08:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189858 (owner: 10Jforrester) [13:08:40] (03Merged) 10jenkins-bot: cirrus-streaming-updater: drop custom java17 configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190275 (owner: 10DCausse) [13:09:07] (03Merged) 10jenkins-bot: [enwiki] Throttle exemption for training events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189930 (https://phabricator.wikimedia.org/T405069) (owner: 10Superpes15) [13:09:10] (03Merged) 10jenkins-bot: [gewikimedia] Update logo and wordmark and change sitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190000 (https://phabricator.wikimedia.org/T405147) (owner: 10Superpes15) [13:09:55] !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [13:10:08] !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:11:38] (03PS1) 10Phuedx: Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190280 [13:12:16] (03CR) 10Muehlenhoff: [C:03+2] Reapply master_bookworm role to maps1011 [puppet] - 10https://gerrit.wikimedia.org/r/1190227 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [13:14:19] 06SRE, 13Patch-For-Review, 10Trust and Safety Product Sprint (Sprint Dadar Gulung (September 8 - September 26)), 10WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for automatic fallback to FancyCAPTCHA - https://phabricator.wikimedia.org/T404204#11201395 (10kostajh) p:05Tr... [13:16:51] (03Merged) 10jenkins-bot: LQT->Flow converter: Add a dryRun flag [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189856 (owner: 10Jforrester) [13:16:52] (03Merged) 10jenkins-bot: LQT->Flow converter: Add flag to ignore $wgFlowReadOnly [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189857 (owner: 10Jforrester) [13:16:54] (03Merged) 10jenkins-bot: LQT->Flow converter: Skip pages which have no threads [extensions/Flow] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1189858 (owner: 10Jforrester) [13:17:13] !log esanders@deploy1003 Started scap sync-world: Backport for [[gerrit:1189930|[enwiki] Throttle exemption for training events (T405069)]], [[gerrit:1190000|[gewikimedia] Update logo and wordmark and change sitename (T405147)]], [[gerrit:1189856|LQT->Flow converter: Add a dryRun flag]], [[gerrit:1189857|LQT->Flow converter: Add flag to ignore $wgFlowReadOnly]], [[gerrit:1189858|LQT->Flow converter: Skip pages which have [13:17:13] no threads]] [13:17:20] T405069: IP lift for two training events at Nottingham Central Library (UK) ON 24 Sept & 18 Oct - https://phabricator.wikimedia.org/T405069 [13:17:21] T405147: gewikimedia: update wordmark and title - https://phabricator.wikimedia.org/T405147 [13:21:20] !log esanders@deploy1003 esanders, superpes, jforrester: Backport for [[gerrit:1189930|[enwiki] Throttle exemption for training events (T405069)]], [[gerrit:1190000|[gewikimedia] Update logo and wordmark and change sitename (T405147)]], [[gerrit:1189856|LQT->Flow converter: Add a dryRun flag]], [[gerrit:1189857|LQT->Flow converter: Add flag to ignore $wgFlowReadOnly]], [[gerrit:1189858|LQT->Flow converter: Skip pages whic [13:21:20] h have no threads]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:21:51] LGTM :) [13:21:52] o/ [13:21:52] (03Abandoned) 10Btullis: Add the required securitycontext to pods created by the spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190279 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:22:20] (03PS1) 10DDesouza: Deploy Newcomers survey on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190281 (https://phabricator.wikimedia.org/T402915) [13:22:28] Superpes: ok - proceeding [13:22:44] !log esanders@deploy1003 esanders, superpes, jforrester: Continuing with sync [13:23:44] !log trigger full planet import for maps eqiad/bookworm T381565 [13:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:48] T381565: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565 [13:24:07] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190281 (https://phabricator.wikimedia.org/T402915) (owner: 10DDesouza) [13:26:44] Does resetAuthenticationThrottle.php work with ranges? I don't think so :/ [13:27:57] !log esanders@deploy1003 Finished scap sync-world: Backport for [[gerrit:1189930|[enwiki] Throttle exemption for training events (T405069)]], [[gerrit:1190000|[gewikimedia] Update logo and wordmark and change sitename (T405147)]], [[gerrit:1189856|LQT->Flow converter: Add a dryRun flag]], [[gerrit:1189857|LQT->Flow converter: Add flag to ignore $wgFlowReadOnly]], [[gerrit:1189858|LQT->Flow converter: Skip pages which have [13:27:57] no threads]] (duration: 10m 44s) [13:28:03] T405069: IP lift for two training events at Nottingham Central Library (UK) ON 24 Sept & 18 Oct - https://phabricator.wikimedia.org/T405069 [13:28:04] T405147: gewikimedia: update wordmark and title - https://phabricator.wikimedia.org/T405147 [13:28:55] (03CR) 10Bking: [C:03+2] opensearch-operator: allow egress traffic from the operator to the k8s api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189817 (https://phabricator.wikimedia.org/T404906) (owner: 10Brouberol) [13:28:58] !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [13:29:04] !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:30:28] !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [13:30:44] !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:34:44] (03PS6) 10Federico Ceratto: clone_es.py: clone readonly es* hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1183646 [13:36:30] (03CR) 10Stevemunene: [C:03+2] Register ingress CNAME record for the echoserver-dse-k8s-codfw service [dns] - 10https://gerrit.wikimedia.org/r/1189795 (https://phabricator.wikimedia.org/T404433) (owner: 10Stevemunene) [13:36:41] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:37:22] !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply [13:37:25] edsanders: Is that all of them? I've got a config change that I'd like to deploy [13:37:29] !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply [13:37:40] (03PS2) 10Btullis: Update the scaffolding helper module for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190277 (https://phabricator.wikimedia.org/T381416) [13:37:40] (03PS2) 10Btullis: Fix the networkpolicy allowing the k8s api to call the spark-operator webhook [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190278 (https://phabricator.wikimedia.org/T381416) [13:37:40] (03PS1) 10Btullis: Enable external-services for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190282 (https://phabricator.wikimedia.org/T381416) [13:37:41] (03PS1) 10Btullis: Rename the release from spark-operator to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190283 (https://phabricator.wikimedia.org/T381416) [13:37:46] phuedx: yes [13:37:59] !log stevemunene@dns1004 START - running authdns-update [13:38:18] Thanks for your assistance edsanders :) Could you please run the scripts here? https://pastebin.com/WzyrRrxX [13:38:51] Unfortunately it can't be done for the whole range :/ [13:39:25] !log stevemunene@dns1004 END - running authdns-update [13:41:12] (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190280 (owner: 10Phuedx) [13:42:06] (03Merged) 10jenkins-bot: Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190280 (owner: 10Phuedx) [13:42:20] !log phuedx@deploy1003 Started scap sync-world: Backport for [[gerrit:1190280|Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis"]] [13:45:03] (03PS1) 10Elukey: Decrease the Pyrra window to 4w for AW's SLO [puppet] - 10https://gerrit.wikimedia.org/r/1190284 [13:45:18] 06SRE, 06Fundraising-Backlog, 06Fundraising-Tech-Roadmap, 10MediaWiki-extensions-CentralNotice, 06Traffic: Set expiry time for GeoIP cookies - https://phabricator.wikimedia.org/T122097#11201498 (10ssingh) >>! In T122097#11185306, @AKanji-WMF wrote: > @XenoRyet and I discussed getting this into our next S... [13:46:19] (03CR) 10JMeybohm: [C:04-1] "Fine by me if we don't leak credentials into regular CI builds. One comment inline" [puppet] - 10https://gerrit.wikimedia.org/r/1165477 (https://phabricator.wikimedia.org/T404437) (owner: 10Elukey) [13:48:23] !log phuedx@deploy1003 phuedx: Backport for [[gerrit:1190280|Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:48:23] (03PS1) 10Btullis: Add kubernetes tokens for an analytics-test namespace on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1190285 (https://phabricator.wikimedia.org/T381416) [13:49:14] (03PS12) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [13:49:23] !log phuedx@deploy1003 phuedx: Continuing with sync [13:49:45] (03PS4) 10Elukey: docker_registry::web: allow prod-build to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/1165477 (https://phabricator.wikimedia.org/T404437) [13:49:47] (03CR) 10Elukey: docker_registry::web: allow prod-build to pull restricted images (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1165477 (https://phabricator.wikimedia.org/T404437) (owner: 10Elukey) [13:50:41] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [13:52:55] (03CR) 10Brouberol: [C:03+1] Update the scaffolding helper module for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190277 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:53:08] (03CR) 10Brouberol: [C:03+1] Fix the networkpolicy allowing the k8s api to call the spark-operator webhook [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190278 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:53:19] (03CR) 10Brouberol: [C:03+1] Enable external-services for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190282 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:53:27] (03CR) 10Brouberol: [C:03+1] Rename the release from spark-operator to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190283 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:54:46] !log phuedx@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190280|Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis"]] (duration: 12m 25s) [13:57:29] (03CR) 10Btullis: [C:03+2] Rename the release from spark-operator to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190283 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:57:37] (03CR) 10Btullis: [C:03+2] Enable external-services for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190282 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:57:39] (03CR) 10Btullis: [C:03+2] Fix the networkpolicy allowing the k8s api to call the spark-operator webhook [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190278 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:57:42] (03CR) 10Btullis: [C:03+2] Update the scaffolding helper module for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190277 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [13:58:31] !log delete list: sectrainings@lists.wikimedia.org [no archives, project obsolete since 2022] [13:58:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:23] 10SRE-SLO, 10observability, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 07Essential-Work: Update WDQS SLO lag queries to reflect graph split changes - https://phabricator.wikimedia.org/T393966#11201577 (10elukey) @Gehel @RKemper Hi! A while ago I had a chat with Ryan to figure out how to improve the cur... [14:03:22] (03CR) 10Brouberol: [C:03+2] flink-operator: allow upstream config to be applied [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189823 (https://phabricator.wikimedia.org/T404340) (owner: 10DCausse) [14:04:15] (03PS2) 10Btullis: Add kubernetes tokens for an analytics-test namespace on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1190285 (https://phabricator.wikimedia.org/T381416) [14:04:43] (03CR) 10Ssingh: [C:03+1] P:hcaptcha: add keepalive_timeout [puppet] - 10https://gerrit.wikimedia.org/r/1187828 (https://phabricator.wikimedia.org/T403416) (owner: 10Effie Mouzeli) [14:06:16] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190288 [14:06:44] (03Merged) 10jenkins-bot: Update the scaffolding helper module for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190277 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [14:06:45] (03Merged) 10jenkins-bot: Fix the networkpolicy allowing the k8s api to call the spark-operator webhook [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190278 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [14:06:47] (03Merged) 10jenkins-bot: Enable external-services for spark-operator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190282 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [14:06:48] (03Merged) 10jenkins-bot: Rename the release from spark-operator to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190283 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [14:07:00] (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7010/co" [puppet] - 10https://gerrit.wikimedia.org/r/1190285 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [14:07:52] !log brouberol@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [14:08:51] !log brouberol@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [14:09:50] !log brouberol@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'. [14:10:02] !log brouberol@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'. [14:11:56] !log brouberol@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'. [14:12:47] !log brouberol@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'. [14:18:11] (03PS2) 10Federico Ceratto: instances.yaml: add es2050 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1189148 (https://phabricator.wikimedia.org/T402859) [14:18:11] (03PS1) 10Federico Ceratto: es2050.yaml: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1190289 (https://phabricator.wikimedia.org/T402859) [14:20:02] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [14:20:34] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [14:21:02] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate kibana.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [14:21:52] !log brouberol@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. [14:21:59] (03CR) 10CDanis: [C:03+2] docker_registry: limit layer blobs to 4.5GB [puppet] - 10https://gerrit.wikimedia.org/r/1189886 (https://phabricator.wikimedia.org/T404742) (owner: 10CDanis) [14:22:40] !log brouberol@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. [14:24:11] (03CR) 10Ssingh: [C:03+1] "Thanks for the detailed commit message to understand what this means and +1 for a careful rollout." [puppet] - 10https://gerrit.wikimedia.org/r/1189132 (https://phabricator.wikimedia.org/T401396) (owner: 10Hnowlan) [14:29:12] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:30:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1430) [14:31:02] 10SRE-Access-Requests, 06Data-Engineering, 06Data-Platform-SRE: Requesting Kerberos access for sd - https://phabricator.wikimedia.org/T405219#11201701 (10Ottomata) Approved. [14:31:51] !log dcausse@deploy1003 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply [14:31:52] !log dcausse@deploy1003 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply [14:32:01] !log dcausse@deploy1003 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply [14:32:11] !log dcausse@deploy1003 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:38:31] (03PS1) 10Giuseppe Lavagetto: Create busted image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1190290 [14:41:22] 06SRE-OnFire, 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): [k8s,infra,o11y] Add paging alert when many tools are unreachable - https://phabricator.wikimedia.org/T399870#11201754 (10taavi) a:03taavi [14:44:55] (03CR) 10CDanis: [C:03+2] admin: analytics-admins: allow sudo puppet-run [puppet] - 10https://gerrit.wikimedia.org/r/1188408 (https://phabricator.wikimedia.org/T404630) (owner: 10CDanis) [14:56:10] (03PS12) 10Superpes15: Initial configuration for arbcom_plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1185058 (https://phabricator.wikimedia.org/T391009) [14:56:52] 07sre-alert-triage, 06Infrastructure-Foundations, 10netops: Alert in need of triage: SwitchCoreInterfaceDown (instance ssw1-f1-codfw:9804) - https://phabricator.wikimedia.org/T404946#11201792 (10cmooney) 05Open→03Resolved a:03cmooney All back up now. [15:01:51] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move lvs1020 link from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T404959#11201828 (10ssingh) Thanks for the discovery and writing this up @Cmooney! No concerns from Traffic since as you mentioned it is the ba... [15:09:01] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:10:07] jouncebot: nowandnext [15:10:07] No deployments scheduled for the next 0 hour(s) and 19 minute(s) [15:10:07] In 0 hour(s) and 19 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1530) [15:11:13] !log pt1979@cumin2002 DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ‘pfw1-codfw’ with reason: ‘pfw1 [15:15:48] !log installing clamav security updates [15:15:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:46] !log pt1979@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pfw1-codfw with reason: pfw1-codfw relocation [15:19:05] !log pt1979@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on fasw2-c8a-codfw,fasw2-c8b-codfw with reason: pfw1-codfw relocation [15:20:21] (03PS1) 10Jasmine: debug.json: order codfw (primary) DC backends first [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190293 (https://phabricator.wikimedia.org/T399891) [15:22:30] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bookworm [15:23:00] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Create busted image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1190290 (owner: 10Giuseppe Lavagetto) [15:23:11] (03PS1) 10Santiago Faci: xLab: Deploying v1.0.4 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190294 (https://phabricator.wikimedia.org/T404787) [15:25:14] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 22 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189550 (https://phabricator.wikimedia.org/T405016) (owner: 10Arlolra) [15:27:03] (03PS1) 10Andrew Bogott: cloudcephosd1019 -> bookworm/reef [puppet] - 10https://gerrit.wikimedia.org/r/1190295 [15:27:03] (03PS1) 10Andrew Bogott: cloudcephosd1020 -> reef and bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190296 [15:27:45] (03CR) 10Andrew Bogott: [C:03+2] cloudcephosd1019 -> bookworm/reef [puppet] - 10https://gerrit.wikimedia.org/r/1190295 (owner: 10Andrew Bogott) [15:29:25] 10SRE-SLO, 10EditCheck, 10Editing-team (Kanban Board), 07Essential-Work: Fix EditCheck's SLO metrics and create a dashboard for it - https://phabricator.wikimedia.org/T395444#11202029 (10elukey) @DLynch I was about to close the task but I noticed that the error budget is very much in the red for the last m... [15:30:05] jan_drewniak: How many deployers does it take to do Wikimedia Portals Update deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1530). [15:30:49] Hello! Today I will be playing the role of jan_drewniak [15:31:10] And enwiki search suggestions will be playing the role of portals [15:31:32] (03CR) 10Stoyofuku-wmf: [C:03+1] "Let's try this way first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1187052 (https://phabricator.wikimedia.org/T402048) (owner: 10Jdlrobson) [15:32:21] (03CR) 10TrainBranchBot: [C:03+2] "Approved by toyofuku@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1187052 (https://phabricator.wikimedia.org/T402048) (owner: 10Jdlrobson) [15:32:36] Deploying via UI my favorite [15:33:02] Stream golden: https://open.spotify.com/track/64CfsZTgVJgOnj1jhEDuNM [15:33:29] (03Merged) 10jenkins-bot: Enable search recommendation on Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1187052 (https://phabricator.wikimedia.org/T402048) (owner: 10Jdlrobson) [15:33:42] (03PS1) 10Jasmine: wmnet: update deployment CNAME record to deploy2002 [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) [15:33:45] !log toyofuku@deploy1003 Started scap sync-world: Backport for [[gerrit:1187052|Enable search recommendation on Wikipedia (T402048)]] [15:33:50] T402048: Enable empty search recommendations to English Wikipedia - https://phabricator.wikimedia.org/T402048 [15:34:01] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:36:00] toyofuku: That sounds like a feature request to add a Spotify embed to the SpiderPig UI 😆 [15:36:54] 10SRE-SLO: Reduce the pyrra's multi-dc configurations where it makes sense - https://phabricator.wikimedia.org/T398534#11202083 (10elukey) 05Open→03Resolved We solved this adding versioning to each SLO, and we are going to roll it out very soon. Closing this task. [15:37:55] !log toyofuku@deploy1003 jdlrobson, toyofuku: Backport for [[gerrit:1187052|Enable search recommendation on Wikipedia (T402048)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:38:51] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/0/1:0 (Core: pfw1-codfw:xe-7/2/0 {#11923_12249-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [15:39:05] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Grant Access to analytics-privatedata-users for hueitan - https://phabricator.wikimedia.org/T404681#11202099 (10hueitan) Thanks, adding @Nikerabbit here [15:39:43] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage [15:39:52] 10SRE-SLO: Pyrra calculations for the Initial error budget value of calendar windows - https://phabricator.wikimedia.org/T403729#11202102 (10elukey) The other side of the coin, namely when there are no previous outages: T395444#11202029 {F66097816} {F66097817} [15:40:00] !log toyofuku@deploy1003 jdlrobson, toyofuku: Continuing with sync [15:40:22] lollll [15:40:33] I can always promote kpdh in my commit messages [15:40:51] We're continuing with the deploy! [15:41:28] Things look good - I had to remind myself whether suggestions on the main page is good or not since there aren't any on eswiki, but there are on itwiki so presumably it's more the case that they're not always available [15:42:33] (03PS1) 10Jasmine: hieradata: update deployment_server to deploy2002 [puppet] - 10https://gerrit.wikimedia.org/r/1190300 (https://phabricator.wikimedia.org/T399891) [15:43:35] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage [15:44:01] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:44:42] (03CR) 10Scott French: [C:03+1] "Thanks, Janis!" [puppet] - 10https://gerrit.wikimedia.org/r/1190274 (https://phabricator.wikimedia.org/T402014) (owner: 10JMeybohm) [15:45:20] !log toyofuku@deploy1003 Finished scap sync-world: Backport for [[gerrit:1187052|Enable search recommendation on Wikipedia (T402048)]] (duration: 11m 35s) [15:45:26] T402048: Enable empty search recommendations to English Wikipedia - https://phabricator.wikimedia.org/T402048 [15:46:21] We're all done! [15:46:29] Thanks for playing [15:46:30] (03CR) 10Scott French: [C:03+1] debug.json: order codfw (primary) DC backends first [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190293 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [15:48:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/0/1:3 (pfw1-codfw:xe-0/2/0 {#11922_12273-4}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [15:52:26] (03CR) 10Elukey: "As FYI I got confirmation from Releng that they don't use prod-build in their pipelines (but ci-restricted): https://phabricator.wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/1165477 (https://phabricator.wikimedia.org/T404437) (owner: 10Elukey) [15:52:54] (03CR) 10Stevemunene: [C:03+1] Add kubernetes tokens for an analytics-test namespace on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1190285 (https://phabricator.wikimedia.org/T381416) (owner: 10Btullis) [15:53:04] (03CR) 10JHathaway: [C:03+1] P:puppetserver::volatile Include XCheeseScore private repo [puppet] - 10https://gerrit.wikimedia.org/r/1188770 (https://phabricator.wikimedia.org/T404688) (owner: 10Slyngshede) [15:53:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/0/1:3 (pfw1-codfw:xe-0/2/0 {#11922_12273-4}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [15:54:57] (03CR) 10Clare Ming: [C:03+2] xLab: Deploying v1.0.4 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190294 (https://phabricator.wikimedia.org/T404787) (owner: 10Santiago Faci) [15:56:42] (03Merged) 10jenkins-bot: xLab: Deploying v1.0.4 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190294 (https://phabricator.wikimedia.org/T404787) (owner: 10Santiago Faci) [15:59:22] (03CR) 10FNegri: "Thanks @joal@wikimedia.org for reviewing -- I will merge this tomorrow and apply to clouddb hosts." [puppet] - 10https://gerrit.wikimedia.org/r/1187896 (https://phabricator.wikimedia.org/T404473) (owner: 10SD0001) [16:01:56] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bookworm [16:05:04] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1020.eqiad.wmnet with OS bookworm [16:05:21] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/0/1:3 (pfw1-codfw:xe-0/2/0 {#11922_12273-4}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [16:05:59] 10SRE-SLO, 10EditCheck, 10Editing-team (Kanban Board), 07Essential-Work: Fix EditCheck's SLO metrics and create a dashboard for it - https://phabricator.wikimedia.org/T395444#11202338 (10DLynch) Hm. That starts on the 9th, which doesn't conveniently coincide with any config change to launch something... so... [16:06:21] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move lvs1020 link from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T404959#11202340 (10cmooney) @Jclark-ctr when you have some time can we have a look at this one? No particular rush thanks. [16:08:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:10:54] !log jhathaway@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sretest2001.codfw.wmnet with reason: T383173 [16:10:59] T383173: Supermicro: UEFI HTTP boot request hangs on cold boot - https://phabricator.wikimedia.org/T383173 [16:12:34] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm [16:13:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:14:39] (03CR) 10Jforrester: "Done in e9950cb090be18af9a47552f18f6fa7311a1ca19. Alas." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051343 (https://phabricator.wikimedia.org/T367949) (owner: 10Jforrester) [16:14:42] (03Abandoned) 10Jforrester: Drop bare-metal servers from Wikimedia Debug tool config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1051343 (https://phabricator.wikimedia.org/T367949) (owner: 10Jforrester) [16:19:53] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26): Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11202396 (10Jclark-ctr) [16:22:18] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage [16:22:45] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26): Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11202403 (10Jclark-ctr) @BTullis Thanks for the bump. I did see the original co... [16:22:51] jouncebot: nowandnext [16:22:51] No deployments scheduled for the next 0 hour(s) and 37 minute(s) [16:22:51] In 0 hour(s) and 37 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1700) [16:22:51] In 0 hour(s) and 37 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1700) [16:24:57] (03PS1) 10Andrew Bogott: cloudcephosd1023 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190304 [16:24:57] (03PS1) 10Andrew Bogott: cloudcephosd1023 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190305 [16:26:59] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.09.05 - 2025.09.26): Decommission the 46 hadoop workers and 2 namenode servers that were planned for the hadoop-backup cluster - https://phabricator.wikimedia.org/T404970#11202412 (10Jclark-ctr) @BTullis can you look at an-worker1095 it seems to be... [16:27:12] 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Grant Access to analytics-privatedata-users for hueitan - https://phabricator.wikimedia.org/T404681#11202413 (10Nikerabbit) Approved. [16:28:20] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage [16:31:56] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm [16:32:04] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1025.eqiad.wmnet'] [16:32:33] (03CR) 10Andrew Bogott: [C:03+2] cloudcephosd1020 -> reef and bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190296 (owner: 10Andrew Bogott) [16:34:56] 10ops-codfw, 06SRE, 06DC-Ops, 06Traffic, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11202432 (10elukey) >>! In T392851#11194034, @Jhancock.wm wrote: > @elukey I can wait! wasn't trying to rush you. lemme know next week and we'll take care of it... [16:41:13] !log andrew@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1025.eqiad.wmnet'] [16:43:34] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm [16:44:01] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [16:45:46] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1020.eqiad.wmnet with OS bookworm [16:48:24] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm [16:50:29] 10ops-eqiad, 06SRE, 10Ceph, 06cloud-services-team, and 2 others: cloudcephosd1025 won't reimage - https://phabricator.wikimedia.org/T405258 (10Andrew) 03NEW [16:53:28] (03PS1) 10Jasmine: wmnet: remove mwmaint discovery aliases since turning down production servers [0] [dns] - 10https://gerrit.wikimedia.org/r/1190309 (https://phabricator.wikimedia.org/T397017) [16:54:23] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1023.eqiad.wmnet with OS bookworm [16:54:54] (03PS2) 10Andrew Bogott: cloudcephosd1024 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190305 [16:55:03] (03CR) 10Andrew Bogott: [C:03+2] cloudcephosd1023 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190304 (owner: 10Andrew Bogott) [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1700) [17:00:05] ryankemper: May I have your attention please! Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T1700) [17:03:43] 10ops-eqiad, 06SRE, 10Ceph, 06cloud-services-team, and 2 others: cloudcephosd1025 won't reimage - https://phabricator.wikimedia.org/T405258#11202559 (10Jclark-ctr) I see the Drives are not setup correctly and some list as foreign state. [17:11:26] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage [17:15:21] RESOLVED: CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/0/1:0 (Core: pfw1-codfw:xe-7/2/0 {#11923_12249-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [17:16:50] 10SRE-SLO, 10EditCheck, 10Editing-team (Kanban Board), 07Essential-Work: Fix EditCheck's SLO metrics and create a dashboard for it - https://phabricator.wikimedia.org/T395444#11202613 (10ppelberg) a:05elukey→03DLynch [17:18:52] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage [17:22:21] FIRING: SwitchCoreInterfaceDown: Switch core interface down - fasw2-c8b-codfw:xe-0/0/47 (Core: pfw1-codfw:xe-7/2/1 {#122422}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw2-c8b-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:22:21] FIRING: SwitchCoreInterfaceDown: Switch core interface down - fasw2-c8b-codfw:xe-0/0/47 (Core: pfw1-codfw:xe-7/2/1 {#122422}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw2-c8b-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:23:35] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply [17:24:06] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply [17:26:28] 10ops-eqiad, 06SRE, 10Ceph, 06cloud-services-team, and 2 others: cloudcephosd1025 won't reimage - https://phabricator.wikimedia.org/T405258#11202643 (10Jclark-ctr) Cleared foreign configs and converted drives to non-raid [17:32:40] (03CR) 10BCornwall: [C:03+1] wmnet: remove mwmaint discovery aliases since turning down production servers [0] [dns] - 10https://gerrit.wikimedia.org/r/1190309 (https://phabricator.wikimedia.org/T397017) (owner: 10Jasmine) [17:33:10] (03PS13) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [17:34:08] (03CR) 10Clément Goubert: [C:03+1] wmnet: remove mwmaint discovery aliases since turning down production servers [0] [dns] - 10https://gerrit.wikimedia.org/r/1190309 (https://phabricator.wikimedia.org/T397017) (owner: 10Jasmine) [17:34:40] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [17:36:42] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:36:49] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1023.eqiad.wmnet with OS bookworm [17:38:58] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1024.eqiad.wmnet with OS bookworm [17:48:25] (03PS14) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [17:49:57] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [17:50:14] (03CR) 10Scott French: wmnet: update deployment CNAME record to deploy2002 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [17:51:05] (03CR) 10Scott French: [C:03+1] hieradata: update deployment_server to deploy2002 [puppet] - 10https://gerrit.wikimedia.org/r/1190300 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [17:52:21] RESOLVED: SwitchCoreInterfaceDown: Switch core interface down - fasw2-c8b-codfw:xe-0/0/47 (Core: pfw1-codfw:xe-7/2/1 {#122422}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw2-c8b-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:52:21] RESOLVED: SwitchCoreInterfaceDown: Switch core interface down - fasw2-c8b-codfw:xe-0/0/47 (Core: pfw1-codfw:xe-7/2/1 {#122422}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=fasw2-c8b-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [17:55:17] (03CR) 10BCornwall: [C:04-1] "See swfrench's comment!" [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [17:55:33] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage [17:55:42] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm [17:56:13] (03CR) 10BCornwall: [C:03+1] wmnet: update CNAME records for DB masters for dc switchover [dns] - 10https://gerrit.wikimedia.org/r/1189939 (https://phabricator.wikimedia.org/T399891) (owner: 10Gerrit maintenance bot) [17:58:27] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage [17:59:41] (03PS2) 10Jasmine: wmnet: update deployment CNAME record to deploy2002 [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) [18:00:28] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm [18:00:30] (03CR) 10Jasmine: wmnet: update deployment CNAME record to deploy2002 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [18:00:32] (03PS15) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [18:02:27] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [18:08:31] PROBLEM - ps1-f5-codfw-infeed-load-tower-B-phase-Y on ps1-f5-codfw is CRITICAL: SNMP CRITICAL - ps1-f5-codfw-infeed-load-tower-B-phase-Y *-1* https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:08:57] PROBLEM - ps1-f5-codfw-infeed-load-tower-B-phase-Z on ps1-f5-codfw is CRITICAL: SNMP CRITICAL - ps1-f5-codfw-infeed-load-tower-B-phase-Z *-1* https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:08:57] PROBLEM - ps1-f5-codfw-infeed-load-tower-B-phase-X on ps1-f5-codfw is CRITICAL: SNMP CRITICAL - ps1-f5-codfw-infeed-load-tower-B-phase-X *-1* https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:09:10] (03CR) 10Andrew Bogott: [C:03+2] cloudcephosd1024 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190305 (owner: 10Andrew Bogott) [18:15:29] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1024.eqiad.wmnet with OS bookworm [18:15:57] (03CR) 10Scott French: [C:03+1] "Thanks, Jasmine!" [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [18:19:05] (03CR) 10Scott French: [C:03+1] wmnet: update deployment CNAME record to deploy2002 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) (owner: 10Jasmine) [18:20:37] (03PS3) 10Jasmine: wmnet: update deployment CNAME record to deploy2002 [dns] - 10https://gerrit.wikimedia.org/r/1190298 (https://phabricator.wikimedia.org/T399891) [18:20:49] (03CR) 10Andrea Denisse: [C:03+2] alert: Add Slack route to send Prometheus alerts [puppet] - 10https://gerrit.wikimedia.org/r/1184611 (https://phabricator.wikimedia.org/T401730) (owner: 10Andrea Denisse) [18:21:02] (03CR) 10Zabe: es2050.yaml: enable notifications (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1190289 (https://phabricator.wikimedia.org/T402859) (owner: 10Federico Ceratto) [18:21:02] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate kibana.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [18:22:31] RECOVERY - ps1-f5-codfw-infeed-load-tower-B-phase-Y on ps1-f5-codfw is OK: SNMP OK - ps1-f5-codfw-infeed-load-tower-B-phase-Y 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:22:57] RECOVERY - ps1-f5-codfw-infeed-load-tower-B-phase-X on ps1-f5-codfw is OK: SNMP OK - ps1-f5-codfw-infeed-load-tower-B-phase-X 33 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:22:57] RECOVERY - ps1-f5-codfw-infeed-load-tower-B-phase-Z on ps1-f5-codfw is OK: SNMP OK - ps1-f5-codfw-infeed-load-tower-B-phase-Z 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:23:58] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bookworm [18:29:49] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:37:04] (03CR) 10Ssingh: [C:03+1] varnish: Improve GeoIP to use cookie domain similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) (owner: 10Krinkle) [18:40:41] (03PS1) 10Jforrester: wikifunctions: Configure supported evaluator languages with ZIDs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190318 (https://phabricator.wikimedia.org/T287154) [18:41:13] andrew@cumin2002 reimage (PID 2488172) is awaiting input [18:43:45] FIRING: Emergency syslog message: Alert for device pfw1-codfw.wikimedia.org - Emergency syslog message - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message [18:43:46] FIRING: Emergency syslog message: Alert for device pfw1-codfw.wikimedia.org - Emergency syslog message - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message [18:48:45] RESOLVED: Emergency syslog message: Device pfw1-codfw.wikimedia.org recovered from Emergency syslog message - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message [18:48:46] RESOLVED: Emergency syslog message: Device pfw1-codfw.wikimedia.org recovered from Emergency syslog message - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message [18:50:49] jouncebot: nowandnext [18:50:49] No deployments scheduled for the next 1 hour(s) and 9 minute(s) [18:50:49] In 1 hour(s) and 9 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T2000) [18:51:20] (03PS1) 10Kosta Harlan: PrefUpdateInstrumentation: Track PSI related preferences [extensions/WikimediaEvents] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190319 [18:51:33] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190319 (owner: 10Kosta Harlan) [19:00:41] (03Merged) 10jenkins-bot: PrefUpdateInstrumentation: Track PSI related preferences [extensions/WikimediaEvents] (wmf/1.45.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1190319 (owner: 10Kosta Harlan) [19:01:00] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1190319|PrefUpdateInstrumentation: Track PSI related preferences]] [19:06:07] (03PS1) 10RLazarus: deployment_server: Add --resume_at to charlie [puppet] - 10https://gerrit.wikimedia.org/r/1190324 [19:06:07] (03PS1) 10RLazarus: deployment_server: Rework interactive prompts in charlie [puppet] - 10https://gerrit.wikimedia.org/r/1190325 [19:06:07] (03PS1) 10RLazarus: deployment_server: Refactor charlie preparatory to adding rety options [puppet] - 10https://gerrit.wikimedia.org/r/1190326 [19:06:08] (03PS1) 10RLazarus: deployment_server: Add retry options to charlie [puppet] - 10https://gerrit.wikimedia.org/r/1190327 [19:06:41] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1190319|PrefUpdateInstrumentation: Track PSI related preferences]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:13:22] !log kharlan@deploy1003 kharlan: Continuing with sync [19:14:56] RECOVERY - Postfix SMTP on crm2001 is OK: OK - Certificate crm2001.codfw.wmnet will expire on Mon 20 Oct 2025 06:40:00 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting [19:16:38] (03PS2) 10GergesShamon: eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189902 (https://phabricator.wikimedia.org/T405095) [19:16:46] (03CR) 10CI reject: [V:04-1] eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189902 (https://phabricator.wikimedia.org/T405095) (owner: 10GergesShamon) [19:18:47] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190319|PrefUpdateInstrumentation: Track PSI related preferences]] (duration: 17m 47s) [19:20:25] 06SRE, 10Wikimedia-Mailing-lists: Reports of unsubscribe from wikitech-ambassadors failing to work - https://phabricator.wikimedia.org/T405153#11202973 (10Quiddity) [19:20:55] (03PS1) 10GergesShamon: eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190328 (https://phabricator.wikimedia.org/T405095) [19:21:00] (03Abandoned) 10GergesShamon: eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189902 (https://phabricator.wikimedia.org/T405095) (owner: 10GergesShamon) [19:23:17] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1026.eqiad.wmnet with OS bookworm [19:23:43] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bookworm [19:29:26] !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm [19:29:55] 06SRE, 10Wikimedia-Mailing-lists: Reports of unsubscribe from wikitech-ambassadors failing to work - https://phabricator.wikimedia.org/T405153#11202993 (10Quiddity) That email address is not currently subscribed to that mailing list. Either your latest unsubscription attempt worked, or someone else manually u... [19:32:32] (03PS16) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [19:34:10] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [19:36:50] !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host cloudcephosd1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [19:40:27] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage [19:42:34] !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [19:43:36] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage [19:44:08] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:51:17] (03PS17) 10Bking: opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) [19:52:38] (03CR) 10CI reject: [V:04-1] opensearch-cluster: Add chart for review (3/3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182206 (https://phabricator.wikimedia.org/T397246) (owner: 10Bking) [19:55:18] (03PS1) 10Andrew Bogott: cloudcephosd1026 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190330 [19:56:01] (03CR) 10Andrew Bogott: [C:03+2] cloudcephosd1026 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190330 (owner: 10Andrew Bogott) [20:00:04] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: Your horoscope predicts another UTC late backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T2000). [20:00:05] _Gerges, stephanebisson, danisztls, Superpes, and arlolra: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:07] <_Gerges> Here [20:00:12] o/ [20:00:13] (03PS1) 10Andrew Bogott: cloudcephosd1027 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190332 [20:00:14] (03PS1) 10Andrew Bogott: cloudcephosd1028 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190333 [20:00:14] (03PS1) 10Andrew Bogott: cloudcephosd1029 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190334 [20:00:14] (03PS1) 10Andrew Bogott: cloudcephosd1030 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190335 [20:00:15] (03PS1) 10Andrew Bogott: cloudcephosd1031 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190336 [20:00:16] (03PS1) 10Andrew Bogott: cloudcephosd1032 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190337 [20:00:20] (03PS1) 10Andrew Bogott: cloudcephosd1033 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190338 [20:00:24] (03PS1) 10Andrew Bogott: cloudcephosd1034 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190339 [20:01:01] _Gerges do you want to deploy your change? [20:01:03] o/ I can self-deploy [20:01:13] <_Gerges> Yes [20:01:23] Go for it, I'll go next [20:02:44] \o [20:04:39] _Gerges Go for it, I'll go next [20:06:24] <_Gerges> Who will deploy this window? [20:07:07] _Gerges i can deploy your change if you want [20:07:31] <_Gerges> Ok [20:08:04] (03PS2) 10Andrew Bogott: cloudcephosd1027 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190332 [20:08:05] (03PS2) 10Andrew Bogott: cloudcephosd1028 -> reef + bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1190333 [20:08:05] (03PS2) 10Andrew Bogott: cloudcephosd1029 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190334 [20:08:05] (03PS2) 10Andrew Bogott: cloudcephosd1030 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190335 [20:08:06] (03PS2) 10Andrew Bogott: cloudcephosd1031 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190336 [20:08:08] (03PS2) 10Andrew Bogott: cloudcephosd1032 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190337 [20:08:12] (03PS2) 10Andrew Bogott: cloudcephosd1033 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190338 [20:08:16] (03PS2) 10Andrew Bogott: cloudcephosd1034 -> bookworm + ceph 'reef' [puppet] - 10https://gerrit.wikimedia.org/r/1190339 [20:08:18] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bookworm [20:08:20] (03PS1) 10Andrew Bogott: Move cloudcephosd1026.yaml host file into hieradata/hosts [puppet] - 10https://gerrit.wikimedia.org/r/1190341 [20:08:28] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190328 (https://phabricator.wikimedia.org/T405095) (owner: 10GergesShamon) [20:09:40] _Gerges will you be able to verify it on the test server using the WikimediaDebug extension? [20:09:56] <_Gerges> No [20:12:57] (03CR) 10Andrew Bogott: [C:03+2] Move cloudcephosd1026.yaml host file into hieradata/hosts [puppet] - 10https://gerrit.wikimedia.org/r/1190341 (owner: 10Andrew Bogott) [20:13:08] _Gerges could you please make sure you have another task to remove those exceptions with a due date after the events? [20:13:21] (03Merged) 10jenkins-bot: eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190328 (https://phabricator.wikimedia.org/T405095) (owner: 10GergesShamon) [20:13:34] !log sbisson@deploy1003 Started scap sync-world: Backport for [[gerrit:1190328|eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095)]] [20:13:39] T405095: Lift IP cap on these dates 2025-09-29; 2025-10-06; 2025-10-13; 2025-10-20 for edit-a-thon for eswiki, commons and wikidata - https://phabricator.wikimedia.org/T405095 [20:14:52] <_Gerges> I will try to do [20:17:49] (03CR) 10BCornwall: [V:03+1 C:03+2] varnish: Improve GeoIP to use cookie domain similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) (owner: 10Krinkle) [20:19:28] !log sbisson@deploy1003 sbisson, gergesshamon: Backport for [[gerrit:1190328|eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:19:33] T405095: Lift IP cap on these dates 2025-09-29; 2025-10-06; 2025-10-13; 2025-10-20 for edit-a-thon for eswiki, commons and wikidata - https://phabricator.wikimedia.org/T405095 [20:19:51] !log sbisson@deploy1003 sbisson, gergesshamon: Continuing with sync [20:25:05] andrew@cumin2002 reimage (PID 2548338) is awaiting input [20:25:08] !log sbisson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190328|eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095)]] (duration: 11m 34s) [20:25:13] T405095: Lift IP cap on these dates 2025-09-29; 2025-10-06; 2025-10-13; 2025-10-20 for edit-a-thon for eswiki, commons and wikidata - https://phabricator.wikimedia.org/T405095 [20:25:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1188799 (https://phabricator.wikimedia.org/T327063) (owner: 10Sbisson) [20:26:33] (03Merged) 10jenkins-bot: SpecialContribute: configure new page target [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1188799 (https://phabricator.wikimedia.org/T327063) (owner: 10Sbisson) [20:26:46] !log sbisson@deploy1003 Started scap sync-world: Backport for [[gerrit:1188799|SpecialContribute: configure new page target (T327063)]] [20:26:51] T327063: Adjust "New page" option of the Contribute options to point to a community page when it exists - https://phabricator.wikimedia.org/T327063 [20:27:03] <_Gerges> Thanks stephanebisson [20:31:31] Superpes, arlolra: I can deploy your changes together with mine to speed up the deployment but I will not be able to run server commands since I will be using spiderpig [20:32:06] sounds good to me [20:32:38] !log sbisson@deploy1003 sbisson: Backport for [[gerrit:1188799|SpecialContribute: configure new page target (T327063)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:32:43] T327063: Adjust "New page" option of the Contribute options to point to a community page when it exists - https://phabricator.wikimedia.org/T327063 [20:32:53] Gotcha! [20:33:00] !log sbisson@deploy1003 sbisson: Continuing with sync [20:34:48] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bookworm [20:36:15] (03PS1) 10Bking: opensearch-operator: move WMF-specific values to chart values.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190343 (https://phabricator.wikimedia.org/T404906) [20:38:26] (03PS2) 10Bking: opensearch-operator: move WMF-specific values to chart values.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190343 (https://phabricator.wikimedia.org/T404906) [20:38:26] !log sbisson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1188799|SpecialContribute: configure new page target (T327063)]] (duration: 11m 33s) [20:38:26] T327063: Adjust "New page" option of the Contribute options to point to a community page when it exists - https://phabricator.wikimedia.org/T327063 [20:39:05] over to you danisztls [20:39:32] (03PS3) 10Bking: opensearch-operator: move WMF-specific values to chart values.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190343 (https://phabricator.wikimedia.org/T404906) [20:39:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190281 (https://phabricator.wikimedia.org/T402915) (owner: 10DDesouza) [20:39:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1185058 (https://phabricator.wikimedia.org/T391009) (owner: 10Superpes15) [20:39:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dani@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189550 (https://phabricator.wikimedia.org/T405016) (owner: 10Arlolra) [20:40:29] (03Merged) 10jenkins-bot: Deploy Newcomers survey on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190281 (https://phabricator.wikimedia.org/T402915) (owner: 10DDesouza) [20:40:40] (03Merged) 10jenkins-bot: Initial configuration for arbcom_plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1185058 (https://phabricator.wikimedia.org/T391009) (owner: 10Superpes15) [20:40:42] (03Merged) 10jenkins-bot: Deploy Parsoid Read Views to 28 Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189550 (https://phabricator.wikimedia.org/T405016) (owner: 10Arlolra) [20:40:58] !log dani@deploy1003 Started scap sync-world: Backport for [[gerrit:1190281|Deploy Newcomers survey on enwiki (T402915)]], [[gerrit:1185058|Initial configuration for arbcom_plwiki (T391009)]], [[gerrit:1189550|Deploy Parsoid Read Views to 28 Wikipedias (T405016)]] [20:41:10] T402915: Newcomer survey: first test, then launch a quicksurvey - https://phabricator.wikimedia.org/T402915 [20:41:10] T391009: Create wikipedia-pl-arbcom.wikimedia.org - https://phabricator.wikimedia.org/T391009 [20:41:11] T405016: Parsoid Read Views to Wikipedia deploy ~2025-09-22 - https://phabricator.wikimedia.org/T405016 [20:44:08] FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [20:44:52] (03PS2) 10Krinkle: Disable wmgUseMdotRouting on fawiki and metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1189511 (https://phabricator.wikimedia.org/T403510) [20:44:57] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1027.eqiad.wmnet with OS bookworm [20:45:19] !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bookworm [20:45:25] (03PS4) 10Bking: opensearch-operator: move WMF-specific values to chart values.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1190343 (https://phabricator.wikimedia.org/T404906) [20:47:13] !log dani@deploy1003 arlolra, superpes, dani: Backport for [[gerrit:1190281|Deploy Newcomers survey on enwiki (T402915)]], [[gerrit:1185058|Initial configuration for arbcom_plwiki (T391009)]], [[gerrit:1189550|Deploy Parsoid Read Views to 28 Wikipedias (T405016)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:47:21] T402915: Newcomer survey: first test, then launch a quicksurvey - https://phabricator.wikimedia.org/T402915 [20:47:21] T391009: Create wikipedia-pl-arbcom.wikimedia.org - https://phabricator.wikimedia.org/T391009 [20:47:22] T405016: Parsoid Read Views to Wikipedia deploy ~2025-09-22 - https://phabricator.wikimedia.org/T405016 [20:47:55] arlolra and Superpes: changes are synced to test servers [20:48:50] Everything is fine from my side [20:48:51] danisztls: lgtm [20:49:15] !log dani@deploy1003 arlolra, superpes, dani: Continuing with sync [20:49:24] (03PS1) 10Krinkle: varnish: Enable unified mobile routing on wikinews wikis [puppet] - 10https://gerrit.wikimedia.org/r/1190345 (https://phabricator.wikimedia.org/T403510) [20:49:39] (03PS1) 10Krinkle: Disable wmgUseMdotRouting on wikinews wikis (group1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1190346 (https://phabricator.wikimedia.org/T403510) [20:54:25] !log dani@deploy1003 Finished scap sync-world: Backport for [[gerrit:1190281|Deploy Newcomers survey on enwiki (T402915)]], [[gerrit:1185058|Initial configuration for arbcom_plwiki (T391009)]], [[gerrit:1189550|Deploy Parsoid Read Views to 28 Wikipedias (T405016)]] (duration: 13m 23s) [20:54:29] T402915: Newcomer survey: first test, then launch a quicksurvey - https://phabricator.wikimedia.org/T402915 [20:54:29] T391009: Create wikipedia-pl-arbcom.wikimedia.org - https://phabricator.wikimedia.org/T391009 [20:54:30] T405016: Parsoid Read Views to Wikipedia deploy ~2025-09-22 - https://phabricator.wikimedia.org/T405016 [20:54:37] sync is done [20:56:35] Thanks :) [20:56:44] Thank you [20:56:59] Can someone run a script? [20:57:21] mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=arbcom_plwiki [21:00:05] Reedy, sbassett, Maryum, and manfredi: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250922T2100). [21:00:43] (03CR) 10Scott French: [C:03+1] "Thanks, Ahmon!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189905 (https://phabricator.wikimedia.org/T405110) (owner: 10Ahmon Dancy)