[00:03:23] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "like this the diff is just some inconsistencies about "status_matches" but the default value should be 200." [puppet] - 10https://gerrit.wikimedia.org/r/1161509 (owner: 10Filippo Giunchedi)
[00:04:30] <icinga-wm>	 PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:04:54] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:06:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:06:39] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[00:08:17] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1166943
[00:08:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1166943 (owner: 10TrainBranchBot)
[00:12:53] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "noop on people* and alert*" [puppet] - 10https://gerrit.wikimedia.org/r/1161509 (owner: 10Filippo Giunchedi)
[00:15:52] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Hiera#Puppet_enc_system" [puppet] - 10https://gerrit.wikimedia.org/r/1166263 (https://phabricator.wikimedia.org/T396936) (owner: 10BryanDavis)
[00:16:56] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Hiera#Puppet_enc_system" [puppet] - 10https://gerrit.wikimedia.org/r/1166262 (https://phabricator.wikimedia.org/T397591) (owner: 10BryanDavis)
[00:19:25] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gitlab: Allow WMCS runners to talk to deployment-prep wikis (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1166262 (https://phabricator.wikimedia.org/T397591) (owner: 10BryanDavis)
[00:20:04] <wikibugs>	 (03PS3) 10BryanDavis: gitlab: Allow WMCS runners to talk to puppet-enc.cloudinfra [puppet] - 10https://gerrit.wikimedia.org/r/1166263 (https://phabricator.wikimedia.org/T396936)
[00:20:39] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
[00:21:03] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
[00:21:15] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gitlab: Allow WMCS runners to talk to puppet-enc.cloudinfra [puppet] - 10https://gerrit.wikimedia.org/r/1166263 (https://phabricator.wikimedia.org/T396936) (owner: 10BryanDavis)
[00:33:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1166943 (owner: 10TrainBranchBot)
[01:08:02] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.45.0-wmf.9 [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1166950 (https://phabricator.wikimedia.org/T392179)
[01:08:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.45.0-wmf.9 [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1166950 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[01:19:27] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.45.0-wmf.9 [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1166950 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[01:42:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:57:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:59:35] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
[02:00:04] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0200)
[02:20:19] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
[02:21:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-registry.service on registry2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:23:36] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
[02:42:12] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
[03:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0300)
[04:00:04] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0400)
[04:04:28] <logmsgbot>	 !log mwpresync@deploy1003 Pruned MediaWiki: 1.45.0-wmf.6 (duration: 04m 24s)
[04:06:25] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:06:39] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[04:14:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1237 is not booting up - https://phabricator.wikimedia.org/T398794#10981837 (10Marostegui) @VRiley-WMF from our side the host is fine. If you or @Jclark-ctr need to work on upgrade firmwares and BIOS, please let me know so I can depool it and have it ready for it.
[04:17:41] <wikibugs>	 (03PS1) 10Marostegui: db1237: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1166959 (https://phabricator.wikimedia.org/T397279)
[04:18:15] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1237: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1166959 (https://phabricator.wikimedia.org/T397279) (owner: 10Marostegui)
[04:23:47] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1222 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/1166960 (https://phabricator.wikimedia.org/T398906)
[04:23:51] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s2-master alias [dns] - 10https://gerrit.wikimedia.org/r/1166961 (https://phabricator.wikimedia.org/T398906)
[04:26:20] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T398906
[04:26:23] <stashbot>	 T398906: Switchover s2 master (db1162 -> db1222) - https://phabricator.wikimedia.org/T398906
[04:26:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db1222 with weight 0 T398906', diff saved to https://phabricator.wikimedia.org/P78780 and previous config saved to /var/cache/conftool/dbconfig/20250708-042646-root.json
[04:31:20] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1222 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/1166960 (https://phabricator.wikimedia.org/T398906) (owner: 10Gerrit maintenance bot)
[04:33:59] <wikibugs>	 (03PS3) 10KartikMistry: machinetranslation: Use s3 for model download in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166543 (https://phabricator.wikimedia.org/T335491)
[04:36:15] <marostegui>	 !log Starting s2 eqiad failover from db1162 to db1222 - T398906
[04:36:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:36:18] <stashbot>	 T398906: Switchover s2 master (db1162 -> db1222) - https://phabricator.wikimedia.org/T398906
[04:36:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T398906', diff saved to https://phabricator.wikimedia.org/P78781 and previous config saved to /var/cache/conftool/dbconfig/20250708-043628-root.json
[04:36:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T398906', diff saved to https://phabricator.wikimedia.org/P78782 and previous config saved to /var/cache/conftool/dbconfig/20250708-043654-root.json
[04:37:20] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[04:37:30] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update s2-master alias [dns] - 10https://gerrit.wikimedia.org/r/1166961 (https://phabricator.wikimedia.org/T398906) (owner: 10Gerrit maintenance bot)
[04:38:04] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[04:38:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1162 T398906', diff saved to https://phabricator.wikimedia.org/P78783 and previous config saved to /var/cache/conftool/dbconfig/20250708-043814-marostegui.json
[04:38:36] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[04:39:23] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[04:40:31] <wikibugs>	 (03PS1) 10Marostegui: db1162: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1166963 (https://phabricator.wikimedia.org/T396549)
[04:41:00] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1162: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1166963 (https://phabricator.wikimedia.org/T396549) (owner: 10Marostegui)
[04:47:20] <icinga-wm>	 PROBLEM - Host an-worker1095 is DOWN: PING CRITICAL - Packet loss = 100%
[04:47:48] <wikibugs>	 (03PS1) 10Marostegui: db1237: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1166964
[04:48:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78784 and previous config saved to /var/cache/conftool/dbconfig/20250708-044803-root.json
[04:51:54] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1237: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1166964 (owner: 10Marostegui)
[04:58:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78785 and previous config saved to /var/cache/conftool/dbconfig/20250708-045812-root.json
[05:03:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78786 and previous config saved to /var/cache/conftool/dbconfig/20250708-050308-root.json
[05:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:13:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78787 and previous config saved to /var/cache/conftool/dbconfig/20250708-051318-root.json
[05:16:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:18:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78788 and previous config saved to /var/cache/conftool/dbconfig/20250708-051814-root.json
[05:28:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78789 and previous config saved to /var/cache/conftool/dbconfig/20250708-052823-root.json
[05:33:10] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Stop loggging requests that would not be rate-limited [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166967
[05:33:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78790 and previous config saved to /var/cache/conftool/dbconfig/20250708-053320-root.json
[05:33:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Stop loggging requests that would not be rate-limited [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166967 (owner: 10Giuseppe Lavagetto)
[05:33:40] <logmsgbot>	 !log arnaudb@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2003.wikimedia.org with reason: WIP
[05:35:14] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
[05:35:15] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
[05:35:47] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
[05:35:48] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
[05:41:35] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert "Stop loggging requests that would not be rate-limited" [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166968
[05:41:42] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Revert "Stop loggging requests that would not be rate-limited" [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166968 (owner: 10Giuseppe Lavagetto)
[05:41:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[05:41:58] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
[05:41:59] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
[05:42:28] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
[05:42:29] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
[05:42:36] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
[05:42:37] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
[05:43:04] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
[05:43:06] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
[05:43:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78791 and previous config saved to /var/cache/conftool/dbconfig/20250708-054329-root.json
[05:48:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78792 and previous config saved to /var/cache/conftool/dbconfig/20250708-054825-root.json
[05:50:49] <wikibugs>	 (03PS1) 10Marostegui: s3 codfw: Migrate to SBR [puppet] - 10https://gerrit.wikimedia.org/r/1166969 (https://phabricator.wikimedia.org/T383795)
[05:51:23] <wikibugs>	 (03CR) 10Marostegui: "This is a NOOP until the change is made lively on the hosts (or mariadb is restarted)" [puppet] - 10https://gerrit.wikimedia.org/r/1166969 (https://phabricator.wikimedia.org/T383795) (owner: 10Marostegui)
[05:51:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] s3 codfw: Migrate to SBR [puppet] - 10https://gerrit.wikimedia.org/r/1166969 (https://phabricator.wikimedia.org/T383795) (owner: 10Marostegui)
[05:51:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[05:52:43] <marostegui>	 !log Migrate s3 codfw to SBR T383795
[05:52:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:52:46] <stashbot>	 T383795: Move sX to STATEMENT based replication - https://phabricator.wikimedia.org/T383795
[05:53:08] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/1129920 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0600)
[06:00:05] <jouncebot>	 marostegui, Amir1, and federico3: Time to do the Primary database switchover deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0600).
[06:13:21] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Fix varnish logging of rate-limiting, take 2 [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166970
[06:13:34] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Fix varnish logging of rate-limiting, take 2 [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1166970 (owner: 10Giuseppe Lavagetto)
[06:14:27] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
[06:14:28] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
[06:14:58] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
[06:15:00] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
[06:16:20] <icinga-wm>	 PROBLEM - Exim SMTP on lists1004 is CRITICAL: connect to address 208.80.154.81 and port 25: Connection refused https://wikitech.wikimedia.org/wiki/Exim
[06:19:25] <icinga-wm>	 RECOVERY - Exim SMTP on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 07 Aug 2025 09:25:51 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Exim
[06:21:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-registry.service on registry2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:30:00] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert "Fix varnish logging of rate-limiting, take 2" [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167078
[06:30:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Revert "Fix varnish logging of rate-limiting, take 2" [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167078 (owner: 10Giuseppe Lavagetto)
[06:30:50] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
[06:30:51] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
[06:31:22] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
[06:31:23] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
[06:35:47] <moritzm>	 !log rebalance following reimages T382513
[06:35:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:49] <stashbot>	 T382513: Update Ganeti servers in drmrs to Bookworm - https://phabricator.wikimedia.org/T382513
[06:36:38] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert logging changes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167079
[06:38:26] <wikibugs>	 (03CR) 10Elukey: pyrra: remove multi-dc for istio-based SLOs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1166076 (https://phabricator.wikimedia.org/T398534) (owner: 10Elukey)
[06:38:39] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Remove esams and magru bgp peer overrides [puppet] - 10https://gerrit.wikimedia.org/r/1166870 (owner: 10Vgutierrez)
[06:42:09] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167080
[06:50:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: "How did you pick 5m ? The current puppet runs on alert hosts take ~3m so 5m would mean puppet-agent basically running all the time, is tha" [puppet] - 10https://gerrit.wikimedia.org/r/1166846 (https://phabricator.wikimedia.org/T398444) (owner: 10Herron)
[06:56:15] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167080 (owner: 10PipelineBot)
[06:58:09] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167080 (owner: 10PipelineBot)
[07:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: #bothumor I � Unicode. All rise for UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0700).
[07:00:04] <jouncebot>	 Tchanders: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:52] <Tchanders>	 o/
[07:00:57] <Tchanders>	 I'll deploy my own patch
[07:01:27] <jinxer-wm>	 FIRING: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:01:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchanders@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1166791 (https://phabricator.wikimedia.org/T381845) (owner: 10Tchanders)
[07:02:06] <wikibugs>	 (03PS1) 10Volans: tox.ini: skip Python 3.10 in CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167081
[07:02:25] <wikibugs>	 (03PS2) 10Volans: cookbook API: simplify -t/--task-id support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1154787
[07:02:25] <wikibugs>	 (03CR) 10Volans: "ready for review" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1154787 (owner: 10Volans)
[07:03:06] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] CX: Add virtual-cx-shared DatabaseVirtualDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152065 (https://phabricator.wikimedia.org/T348513) (owner: 10Abijeet Patro)
[07:03:18] <wikibugs>	 (03Merged) 10jenkins-bot: temp accounts: Separate digits in user names with hyphens [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1166791 (https://phabricator.wikimedia.org/T381845) (owner: 10Tchanders)
[07:03:42] <logmsgbot>	 !log tchanders@deploy1003 Started scap sync-world: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]]
[07:03:44] <stashbot>	 T381845: Add hyphens to break temporary user names into groups of <5 digits - https://phabricator.wikimedia.org/T381845
[07:05:48] <logmsgbot>	 !log tchanders@deploy1003 tchanders: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:06:27] <jinxer-wm>	 RESOLVED: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:09:18] <logmsgbot>	 !log tchanders@deploy1003 tchanders: Continuing with sync
[07:14:44] <logmsgbot>	 !log tchanders@deploy1003 Finished scap sync-world: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]] (duration: 11m 02s)
[07:14:48] <stashbot>	 T381845: Add hyphens to break temporary user names into groups of <5 digits - https://phabricator.wikimedia.org/T381845
[07:17:13] <Tchanders>	 My patch is done, but I won't log that the window is done, in case anyone else wants to deploy something in the next 40 minutes
[07:19:28] <logmsgbot>	 !log jelto@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
[07:22:16] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:22:35] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1135643 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[07:26:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:30:32] <logmsgbot>	 !log jelto@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
[07:30:37] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update Ganeti servers in drmrs to Bookworm - https://phabricator.wikimedia.org/T382513#10982110 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done.
[07:32:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:36:41] <wikibugs>	 (03CR) 10Gmodena: [C:03+2] services: mw-page-content-change-enrich: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166923 (https://phabricator.wikimedia.org/T347282) (owner: 10Gmodena)
[07:38:17] <wikibugs>	 (03Merged) 10jenkins-bot: services: mw-page-content-change-enrich: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166923 (https://phabricator.wikimedia.org/T347282) (owner: 10Gmodena)
[07:42:06] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache: install benthos on all cp hosts [puppet] - 10https://gerrit.wikimedia.org/r/1135643 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[07:42:14] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[07:42:36] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[07:45:12] <fabfur>	 !log temporary disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/1135643 (T329332)
[07:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:47:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:52:00] <wikibugs>	 (03PS1) 10Marostegui: s3 eqiad: Migrate to SBR [puppet] - 10https://gerrit.wikimedia.org/r/1167142 (https://phabricator.wikimedia.org/T383795)
[07:52:19] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Issue dedicated certs for probenet endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1167143 (https://phabricator.wikimedia.org/T398596)
[07:53:03] <wikibugs>	 (03CR) 10Marostegui: "This is a NOOP until the change is made lively on the databases or we restart mariadb" [puppet] - 10https://gerrit.wikimedia.org/r/1167142 (https://phabricator.wikimedia.org/T383795) (owner: 10Marostegui)
[07:53:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] s3 eqiad: Migrate to SBR [puppet] - 10https://gerrit.wikimedia.org/r/1167142 (https://phabricator.wikimedia.org/T383795) (owner: 10Marostegui)
[07:53:53] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Issue dedicated certs for probenet endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1167143 (https://phabricator.wikimedia.org/T398596)
[07:54:21] <marostegui>	 !log Migrate s3 eqiad to SBR T383795
[07:54:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:24] <stashbot>	 T383795: Move sX to STATEMENT based replication - https://phabricator.wikimedia.org/T383795
[07:55:27] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1167143 (https://phabricator.wikimedia.org/T398596) (owner: 10Vgutierrez)
[07:55:28] <wikibugs>	 (03CR) 10Klausman: [C:03+1] machinetranslation: Use s3 for model download in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166543 (https://phabricator.wikimedia.org/T335491) (owner: 10KartikMistry)
[07:55:48] <fabfur>	 !log enabling puppet on A:cp (T329332)
[07:55:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:45] <wikibugs>	 (03CR) 10Jgiannelos: [C:04-1] "Overall other than the kafka topic, it looks OK." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165550 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:00:05] <jouncebot>	 andre and jnuche: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - Utc-0 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0800).
[08:00:36] <wikibugs>	 (03CR) 10Jgiannelos: [C:04-1] services: configure tegola in codfw to use maps-test (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1165550 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:00:54] <wikibugs>	 06SRE, 06Traffic: Benthos -  remove the kafka output module - https://phabricator.wikimedia.org/T398916 (10Fabfur) 03NEW
[08:01:58] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[08:02:14] <logmsgbot>	 !log gmodena@deploy1003 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[08:06:25] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[08:06:47] <logmsgbot>	 !log gmodena@deploy1003 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
[08:06:54] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:06:57] <logmsgbot>	 !log gmodena@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[08:10:38] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3585 MB (3% inode=98%): /tmp 3585 MB (3% inode=98%): /var/tmp 3585 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[08:11:21] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[08:11:34] <logmsgbot>	 !log gmodena@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
[08:11:42] <logmsgbot>	 !log gmodena@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[08:11:45] <moritzm>	 !log installing postgresql-15 security updates
[08:11:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:14:48] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167144 (https://phabricator.wikimedia.org/T392179)
[08:14:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167144 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[08:15:51] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167144 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[08:16:17] <logmsgbot>	 !log aklapper@deploy1003 Started scap sync-world: testwikis to 1.45.0-wmf.9  refs T392179
[08:16:21] <stashbot>	 T392179: 1.45.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T392179
[08:17:41] <wikibugs>	 (03PS9) 10Fabfur: varnish: replace X-Public-Cloud with new X-Provenance header check [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621)
[08:21:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:21:51] <wikibugs>	 (03CR) 10Gmodena: [C:03+2] dse: mw-content-history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166921 (https://phabricator.wikimedia.org/T347282) (owner: 10Gmodena)
[08:22:16] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:23:29] <wikibugs>	 (03Merged) 10jenkins-bot: dse: mw-content-history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166921 (https://phabricator.wikimedia.org/T347282) (owner: 10Gmodena)
[08:26:00] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1167143 (https://phabricator.wikimedia.org/T398596) (owner: 10Vgutierrez)
[08:26:11] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[08:26:18] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[08:28:30] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[08:28:54] <wikibugs>	 (03PS1) 10Tiziano Fogli: Review access change [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1167145
[08:30:07] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
[08:30:23] <moritzm>	 !log created a stub user "bumpuid" to move the allocation of UIDs for accounted created in Wikimedia IDM to 100000+ T355663
[08:30:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:26] <stashbot>	 T355663: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663
[08:30:35] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[08:30:44] <logmsgbot>	 !log gmodena@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
[08:35:04] <wikibugs>	 (03PS9) 10Btullis: Add the new cephosd200[1-3] servers in codfw to their role [puppet] - 10https://gerrit.wikimedia.org/r/1166866 (https://phabricator.wikimedia.org/T374923)
[08:36:50] <wikibugs>	 (03PS3) 10Ladsgroup: tables-catalog: Mark vision to 1 [puppet] - 10https://gerrit.wikimedia.org/r/1166854 (https://phabricator.wikimedia.org/T363581)
[08:36:56] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] tables-catalog: Mark vision to 1 [puppet] - 10https://gerrit.wikimedia.org/r/1166854 (https://phabricator.wikimedia.org/T363581) (owner: 10Ladsgroup)
[08:38:12] <wikibugs>	 (03CR) 10Vgutierrez: "looks good,added some inline comments about discrepancies between regex in requestcl and here and a suggestion about how to improve one of" [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[08:39:10] <wikibugs>	 (03PS1) 10Ladsgroup: Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1167148 (https://phabricator.wikimedia.org/T398033)
[08:39:22] <wikibugs>	 (03PS1) 10Ladsgroup: Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033)
[08:39:34] <wikibugs>	 (03Abandoned) 10Majavah: etcd: Use cfssl for peer-to-peer communication [puppet] - 10https://gerrit.wikimedia.org/r/674077 (owner: 10Majavah)
[08:39:42] <Amir1>	 jouncebot: nowandnext
[08:39:42] <jouncebot>	 For the next 1 hour(s) and 20 minute(s): MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T0800)
[08:39:42] <jouncebot>	 In 1 hour(s) and 20 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1000)
[08:40:01] <wikibugs>	 (03PS1) 10Hashar: Review access change [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1167150
[08:40:27] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.10 point update - https://phabricator.wikimedia.org/T389034#10982431 (10MoritzMuehlenhoff)
[08:40:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.10 point update - https://phabricator.wikimedia.org/T389034#10982433 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done
[08:42:20] <wikibugs>	 (03PS2) 10Hashar: Remove specific force push to refs/sandbox/* branches [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1167150 (https://phabricator.wikimedia.org/T398921)
[08:43:51] <wikibugs>	 (03Abandoned) 10Tiziano Fogli: Review access change [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1167145 (owner: 10Tiziano Fogli)
[08:44:07] <wikibugs>	 (03CR) 10Hashar: [V:03+2 C:03+2] Remove specific force push to refs/sandbox/* branches [puppet] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1167150 (https://phabricator.wikimedia.org/T398921) (owner: 10Hashar)
[08:45:17] <wikibugs>	 (03CR) 10Brouberol: Add the new cephosd200[1-3] servers in codfw to their role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1166866 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis)
[08:46:14] <wikibugs>	 (03Abandoned) 10Majavah: Ensure service catalog schema matches spicerack release [puppet] - 10https://gerrit.wikimedia.org/r/931241 (https://phabricator.wikimedia.org/T339243) (owner: 10Majavah)
[08:48:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[08:48:09] <moritzm>	 !log installing Redis security updates
[08:48:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:01] <wikibugs>	 (03PS3) 10Majavah: Remove l10nupdate manifests [puppet] - 10https://gerrit.wikimedia.org/r/928582
[08:50:27] <wikibugs>	 (03CR) 10Majavah: "found this while cleaning up my puppet.git clone.. this still looks relevant?" [puppet] - 10https://gerrit.wikimedia.org/r/928582 (owner: 10Majavah)
[08:52:50] <moritzm>	 !log installing nginx security updates
[08:52:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:02] <wikibugs>	 (03Abandoned) 10Majavah: openstack::util::patch: add define [puppet] - 10https://gerrit.wikimedia.org/r/958931 (owner: 10David Caro)
[08:54:15] <wikibugs>	 (03Abandoned) 10Majavah: P:toolforge::grid: add bash completion to exec-manage [puppet] - 10https://gerrit.wikimedia.org/r/815780 (owner: 10Majavah)
[08:55:13] <wikibugs>	 (03Abandoned) 10Majavah: aptrepo: cleanup haproxy update and component names [puppet] - 10https://gerrit.wikimedia.org/r/969819 (owner: 10Majavah)
[08:56:12] <wikibugs>	 (03PS1) 10Vgutierrez: varnish: Prevent unknown clients from reaching /evt-103e/v2/events [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181)
[08:58:05] <wikibugs>	 (03Abandoned) 10Majavah: kerberos: manage users with custom puppet type [puppet] - 10https://gerrit.wikimedia.org/r/751100 (https://phabricator.wikimedia.org/T292389) (owner: 10Majavah)
[08:59:15] <wikibugs>	 (03CR) 10Btullis: [V:03+1] Add the new cephosd200[1-3] servers in codfw to their role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1166866 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis)
[08:59:35] <logmsgbot>	 !log aklapper@deploy1003 Finished scap sync-world: testwikis to 1.45.0-wmf.9  refs T392179 (duration: 43m 18s)
[08:59:38] <stashbot>	 T392179: 1.45.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T392179
[09:02:46] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167152 (https://phabricator.wikimedia.org/T392179)
[09:02:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167152 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[09:03:47] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.45.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167152 (https://phabricator.wikimedia.org/T392179) (owner: 10TrainBranchBot)
[09:04:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-eqiad
[09:08:13] <wikibugs>	 (03PS2) 10Vgutierrez: varnish: Prevent unknown clients from reaching /evt-103e/v2/events [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181)
[09:12:55] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-eqiad
[09:15:10] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:15:25] <logmsgbot>	 !log aklapper@deploy1003 rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.9  refs T392179
[09:15:28] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:15:30] <stashbot>	 T392179: 1.45.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T392179
[09:17:00] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.233 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:17:18] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54224 bytes in 0.067 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:18:53] <wikibugs>	 (03PS3) 10Vgutierrez: varnish: Prevent unknown clients from reaching /evt-103e/v2/events [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181)
[09:18:57] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Codfw: management down to racks D3 and D8 (switch port down) - https://phabricator.wikimedia.org/T398598#10982612 (10cmooney) 05Open→03Resolved >>! In T398598#10980766, @Jhancock.wm wrote: > reset the tripped breaker in D3. On...
[09:19:11] <akosiaris>	 I had a quick look at lists1004, nothing out of the ordinary
[09:19:26] <akosiaris>	 If this croaks again in the day I 'll have a more serious look
[09:19:54] <wikibugs>	 (03CR) 10Vgutierrez: "varnishtests are happy: `0 tests failed, 0 tests skipped, 39 tests passed`" [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181) (owner: 10Vgutierrez)
[09:21:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2329.mgmt:22 - https://phabricator.wikimedia.org/T398559#10982616 (10cmooney) 05Open→03Resolved a:03cmooney
[09:21:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for bast2003.mgmt:22 - https://phabricator.wikimedia.org/T398557#10982619 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:02] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2219.mgmt:22 - https://phabricator.wikimedia.org/T398556#10982622 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:19] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2181.mgmt:22 - https://phabricator.wikimedia.org/T398573#10982625 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:27] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for aux-k8s-worker2009.mgmt:22 - https://phabricator.wikimedia.org/T398572#10982628 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:33] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2213.mgmt:22 - https://phabricator.wikimedia.org/T398571#10982631 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:41] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for es2040.mgmt:22 - https://phabricator.wikimedia.org/T398570#10982634 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for es2044.mgmt:22 - https://phabricator.wikimedia.org/T398569#10982637 (10cmooney) 05Open→03Resolved a:03cmooney
[09:22:54] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2182.mgmt:22 - https://phabricator.wikimedia.org/T398568#10982640 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:02] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for puppetdb2003.mgmt:22 - https://phabricator.wikimedia.org/T398567#10982643 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:15] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2173.mgmt:22 - https://phabricator.wikimedia.org/T398565#10982646 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:23] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2217.mgmt:22 - https://phabricator.wikimedia.org/T398564#10982649 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:29] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2330.mgmt:22 - https://phabricator.wikimedia.org/T398563#10982652 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:37] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2320.mgmt:22 - https://phabricator.wikimedia.org/T398562#10982655 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:44] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2201.mgmt:22 - https://phabricator.wikimedia.org/T398561#10982658 (10cmooney) 05Open→03Resolved a:03cmooney
[09:23:52] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for pc2016.mgmt:22 - https://phabricator.wikimedia.org/T398560#10982661 (10cmooney) 05Open→03Resolved a:03cmooney
[09:30:38] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3437 MB (3% inode=98%): /tmp 3437 MB (3% inode=98%): /var/tmp 3437 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[09:38:46] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167158
[09:39:17] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167158 (owner: 10PipelineBot)
[09:40:40] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.11 point update - https://phabricator.wikimedia.org/T394489#10982781 (10MoritzMuehlenhoff)
[09:40:50] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167158 (owner: 10PipelineBot)
[09:41:08] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[09:41:34] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[09:44:41] <wikibugs>	 (03PS1) 10Ladsgroup: tables-catalog: Temporarily set categorylinks to partially public [puppet] - 10https://gerrit.wikimedia.org/r/1167159 (https://phabricator.wikimedia.org/T299951)
[09:45:39] <wikibugs>	 (03PS1) 10Jgiannelos: pcs: Enable profiler on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167160
[09:46:34] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] api-gateway: use ratelimit's inbuilt promethus-statsd agent [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166790 (https://phabricator.wikimedia.org/T388804) (owner: 10Hnowlan)
[09:46:46] <wikibugs>	 (03PS2) 10Jgiannelos: pcs: Enable profiler on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167160
[09:47:21] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] pcs: Enable profiler on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167160 (owner: 10Jgiannelos)
[09:47:31] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] tables-catalog: Temporarily set categorylinks to partially public [puppet] - 10https://gerrit.wikimedia.org/r/1167159 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[09:48:05] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] pcs: Enable profiler on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167160 (owner: 10Jgiannelos)
[09:50:02] <wikibugs>	 (03Merged) 10jenkins-bot: pcs: Enable profiler on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167160 (owner: 10Jgiannelos)
[09:50:59] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[09:51:06] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[09:51:22] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[09:51:31] <moritzm>	 !log installling openssl security updates on Bullseye
[09:51:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:40] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[09:52:17] <wikibugs>	 (03PS4) 10Tiziano Fogli: prom/metamonitor: add dead man switch and public endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1167157 (https://phabricator.wikimedia.org/T397003)
[09:53:10] <Amir1>	 !log dropping term store tables on s8 sanitarium master (T351820)
[09:53:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:13] <stashbot>	 T351820: Move Wikidata term store to separate database cluster - https://phabricator.wikimedia.org/T351820
[09:55:11] <wikibugs>	 (03PS1) 10Zabe: Remove redundant group0 config for categorylinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167162
[09:55:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove redundant group0 config for categorylinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167162 (owner: 10Zabe)
[09:56:18] <wikibugs>	 (03PS2) 10Zabe: Remove redundant group0 config for categorylinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167162
[09:57:59] <wikibugs>	 (03PS1) 10Zabe: Set categorylinks to read new in cebwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167164 (https://phabricator.wikimedia.org/T397912)
[09:58:05] <wikibugs>	 (03PS1) 10Majavah: hieradata: Bump Striker to 2025-07-08-094946-production [puppet] - 10https://gerrit.wikimedia.org/r/1167165 (https://phabricator.wikimedia.org/T355663)
[09:59:46] <wikibugs>	 (03CR) 10Majavah: [C:03+2] hieradata: Bump Striker to 2025-07-08-094946-production [puppet] - 10https://gerrit.wikimedia.org/r/1167165 (https://phabricator.wikimedia.org/T355663) (owner: 10Majavah)
[10:00:00] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1000)
[10:01:55] <wikibugs>	 (03PS1) 10Marostegui: db2157: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167168 (https://phabricator.wikimedia.org/T398928)
[10:02:12] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s8 on clouddb1020 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 618.40 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:03:34] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2157: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167168 (https://phabricator.wikimedia.org/T398928) (owner: 10Marostegui)
[10:03:46] <Amir1>	 it'll recover soon
[10:04:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P78795 and previous config saved to /var/cache/conftool/dbconfig/20250708-100434-marostegui.json
[10:05:16] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.07.05 - 2025.07.25): Q3: an-worker data volumes HDD upgrade tracking task - https://phabricator.wikimedia.org/T385485#10982848 (10BTullis)
[10:05:48] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.07.05 - 2025.07.25): Q3: an-worker data volumes HDD upgrade tracking task - https://phabricator.wikimedia.org/T385485#10982849 (10BTullis) 05Open→03Resolved a:03BTullis
[10:06:14] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Upgrade db1216 & db2201 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167173 (https://phabricator.wikimedia.org/T398928)
[10:07:14] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.11 point update - https://phabricator.wikimedia.org/T394489#10982862 (10MoritzMuehlenhoff)
[10:07:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
[10:09:15] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
[10:11:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
[10:12:33] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
[10:13:12] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s8 on clouddb1020 is OK: OK slave_sql_lag Replication lag: 0.28 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:13:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
[10:14:36] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
[10:14:37] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
[10:16:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
[10:20:33] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
[10:21:11] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
[10:21:12] <wikibugs>	 (03CR) 10Fabfur: varnish: replace X-Public-Cloud with new X-Provenance header check (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[10:21:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78796 and previous config saved to /var/cache/conftool/dbconfig/20250708-102114-marostegui.json
[10:21:35] <wikibugs>	 (03PS10) 10Fabfur: varnish: replace X-Public-Cloud with new X-Provenance header check [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621)
[10:21:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78797 and previous config saved to /var/cache/conftool/dbconfig/20250708-102140-root.json
[10:21:46] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-conf1004.eqiad.wmnet
[10:25:17] <wikibugs>	 (03PS1) 10Ladsgroup: api-testing: Loosen the assert on max-age header [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167176
[10:25:49] <wikibugs>	 (03PS2) 10Ladsgroup: Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033)
[10:26:07] <Amir1>	 jouncebot: nowandnext
[10:26:07] <jouncebot>	 For the next 0 hour(s) and 33 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1000)
[10:26:08] <jouncebot>	 In 1 hour(s) and 33 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1200)
[10:26:29] <wikibugs>	 (03PS1) 10Clément Goubert: Revert "mw-cron: Disable memory limit" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167177
[10:26:44] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:26:48] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] api-testing: Loosen the assert on max-age header [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167176 (owner: 10Ladsgroup)
[10:26:53] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1167148 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:27:07] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1004.eqiad.wmnet
[10:27:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78798 and previous config saved to /var/cache/conftool/dbconfig/20250708-102746-root.json
[10:29:30] <wikibugs>	 (03PS1) 10Marostegui: db1159: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167180 (https://phabricator.wikimedia.org/T398928)
[10:30:01] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1159: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167180 (https://phabricator.wikimedia.org/T398928) (owner: 10Marostegui)
[10:30:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:30:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167176 (owner: 10Ladsgroup)
[10:30:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [extensions/FlaggedRevs] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1167148 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:31:03] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
[10:31:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78799 and previous config saved to /var/cache/conftool/dbconfig/20250708-103106-marostegui.json
[10:32:53] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:33:27] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "please note that this will break at least`cache-text/public_cloud_deprecated_api`" [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[10:34:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
[10:34:31] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] varnish: replace X-Public-Cloud with new X-Provenance header check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[10:36:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78800 and previous config saved to /var/cache/conftool/dbconfig/20250708-103645-root.json
[10:37:15] <Emperor>	 !log reboot apus frontends in eqiad T395240
[10:37:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:34] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-cluster
[10:37:53] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:38:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78801 and previous config saved to /var/cache/conftool/dbconfig/20250708-103826-root.json
[10:42:16] <wikibugs>	 (03Merged) 10jenkins-bot: api-testing: Loosen the assert on max-age header [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167176 (owner: 10Ladsgroup)
[10:42:18] <wikibugs>	 (03Merged) 10jenkins-bot: Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167149 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:42:21] <wikibugs>	 (03Merged) 10jenkins-bot: Fully get rid of tracking and updating pages [extensions/FlaggedRevs] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1167148 (https://phabricator.wikimedia.org/T398033) (owner: 10Ladsgroup)
[10:42:56] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1167149|Fully get rid of tracking and updating pages (T398033)]], [[gerrit:1167176|api-testing: Loosen the assert on max-age header]], [[gerrit:1167148|Fully get rid of tracking and updating pages (T398033)]]
[10:42:59] <stashbot>	 T398033: Traffic spike on s7 due to heavy update query - https://phabricator.wikimedia.org/T398033
[10:43:47] <wikibugs>	 (03PS1) 10Marostegui: db1175: Remove RBR [puppet] - 10https://gerrit.wikimedia.org/r/1167184
[10:43:58] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Upgrade dbprov1005 & dbprov2005 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167185 (https://phabricator.wikimedia.org/T394487)
[10:44:22] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-conf1005.eqiad.wmnet
[10:44:29] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1175: Remove RBR [puppet] - 10https://gerrit.wikimedia.org/r/1167184 (owner: 10Marostegui)
[10:45:07] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1167149|Fully get rid of tracking and updating pages (T398033)]], [[gerrit:1167176|api-testing: Loosen the assert on max-age header]], [[gerrit:1167148|Fully get rid of tracking and updating pages (T398033)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[10:47:00] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with sync
[10:49:29] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1005.eqiad.wmnet
[10:51:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78802 and previous config saved to /var/cache/conftool/dbconfig/20250708-105151-root.json
[10:52:29] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167149|Fully get rid of tracking and updating pages (T398033)]], [[gerrit:1167176|api-testing: Loosen the assert on max-age header]], [[gerrit:1167148|Fully get rid of tracking and updating pages (T398033)]] (duration: 09m 33s)
[10:52:32] <stashbot>	 T398033: Traffic spike on s7 due to heavy update query - https://phabricator.wikimedia.org/T398033
[10:53:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78803 and previous config saved to /var/cache/conftool/dbconfig/20250708-105332-root.json
[10:53:52] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
[10:54:04] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
[10:54:06] <Emperor>	 !log reboot apus frontends in codfw T395240
[10:54:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:12] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-cluster
[10:56:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
[10:56:25] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
[10:58:09] <wikibugs>	 (03CR) 10Fabfur: "ack, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621) (owner: 10Fabfur)
[10:58:20] <wikibugs>	 (03PS11) 10Fabfur: varnish: replace X-Public-Cloud with new X-Provenance header check [puppet] - 10https://gerrit.wikimedia.org/r/1159995 (https://phabricator.wikimedia.org/T396621)
[11:00:17] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
[11:00:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
[11:02:50] <sukhe>	 4
[11:03:40] <logmsgbot>	 !log jynus@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet,db1216.eqiad.wmnet with reason: MariaDB package update
[11:04:03] <logmsgbot>	 !log jmm@cumin1002 START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
[11:06:20] <logmsgbot>	 !log jmm@cumin1002 END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
[11:06:30] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
[11:06:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78805 and previous config saved to /var/cache/conftool/dbconfig/20250708-110656-root.json
[11:07:08] <logmsgbot>	 !log jmm@cumin1002 START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
[11:07:46] <wikibugs>	 (03PS1) 10Majavah: openstack::patch: Disable fuzzing patch locations [puppet] - 10https://gerrit.wikimedia.org/r/1167189
[11:08:20] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] cache::haproxy: Fix requestctl= sanitization [puppet] - 10https://gerrit.wikimedia.org/r/1166775 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[11:08:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78806 and previous config saved to /var/cache/conftool/dbconfig/20250708-110838-root.json
[11:09:27] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
[11:09:54] <wikibugs>	 (03CR) 10Majavah: [C:03+2] openstack::patch: Disable fuzzing patch locations [puppet] - 10https://gerrit.wikimedia.org/r/1167189 (owner: 10Majavah)
[11:10:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
[11:11:20] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Upgrade db1216 & db2201 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167173 (https://phabricator.wikimedia.org/T398928)
[11:13:25] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] mariadb: Upgrade db1216 & db2201 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167173 (https://phabricator.wikimedia.org/T398928) (owner: 10Jcrespo)
[11:15:16] <moritzm>	 !log restarting slapd on seaborgium/serpens to pick up OpenSSL updates
[11:15:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:00] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Netbox: remove old cr2-codfw Switch Control Board inventory items - https://phabricator.wikimedia.org/T398940 (10cmooney) 03NEW p:05Triage→03Medium
[11:19:25] <zabe>	 jouncebot: nowandnext
[11:19:25] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 40 minute(s)
[11:19:25] <jouncebot>	 In 0 hour(s) and 40 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1200)
[11:20:20] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Remove redundant group0 config for categorylinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167162 (owner: 10Zabe)
[11:20:21] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Set categorylinks to read new in cebwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167164 (https://phabricator.wikimedia.org/T397912) (owner: 10Zabe)
[11:20:37] <jynus>	 !log upgrade db1216 mariadb package T394487
[11:20:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:39] <stashbot>	 T394487: Migrate backup sources to MariaDB 10.11 - https://phabricator.wikimedia.org/T394487
[11:21:10] <wikibugs>	 (03Merged) 10jenkins-bot: Remove redundant group0 config for categorylinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167162 (owner: 10Zabe)
[11:21:12] <wikibugs>	 (03Merged) 10jenkins-bot: Set categorylinks to read new in cebwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167164 (https://phabricator.wikimedia.org/T397912) (owner: 10Zabe)
[11:22:03] <logmsgbot>	 !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1167162|Remove redundant group0 config for categorylinks]], [[gerrit:1167164|Set categorylinks to read new in cebwiki (T397912)]]
[11:22:06] <stashbot>	 T397912: Set categorylinks to read new - https://phabricator.wikimedia.org/T397912
[11:23:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1159 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78807 and previous config saved to /var/cache/conftool/dbconfig/20250708-112344-root.json
[11:24:00] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet
[11:24:08] <logmsgbot>	 !log zabe@deploy1003 zabe: Backport for [[gerrit:1167162|Remove redundant group0 config for categorylinks]], [[gerrit:1167164|Set categorylinks to read new in cebwiki (T397912)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[11:25:59] <logmsgbot>	 !log zabe@deploy1003 zabe: Continuing with sync
[11:27:01] <wikibugs>	 (03PS1) 10Clément Goubert: check_user: Use deploy instead of mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/1167195 (https://phabricator.wikimedia.org/T397017)
[11:27:04] <wikibugs>	 (03PS1) 10Clément Goubert: mwaint: Remove from scap [puppet] - 10https://gerrit.wikimedia.org/r/1167196 (https://phabricator.wikimedia.org/T397017)
[11:27:06] <wikibugs>	 (03PS1) 10Clément Goubert: mwmaint: deprecate mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017)
[11:27:20] <logmsgbot>	 !log jmm@cumin1002 END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
[11:27:24] <moritzm>	 !log restarting apache on mirror1001 to pick up openssl sec updates
[11:27:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:44] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: sync
[11:29:50] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: sync
[11:31:39] <logmsgbot>	 !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167162|Remove redundant group0 config for categorylinks]], [[gerrit:1167164|Set categorylinks to read new in cebwiki (T397912)]] (duration: 09m 35s)
[11:31:42] <stashbot>	 T397912: Set categorylinks to read new - https://phabricator.wikimedia.org/T397912
[11:33:49] <wikibugs>	 (03CR) 10Muehlenhoff: "I think you can simply ignore this script; it's already broken and we'll most likely just remove it: https://phabricator.wikimedia.org/T39" [puppet] - 10https://gerrit.wikimedia.org/r/1167195 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:34:54] <wikibugs>	 (03CR) 10Clément Goubert: "Ack." [puppet] - 10https://gerrit.wikimedia.org/r/1167195 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:35:02] <hashar>	 !log Restarted Apache on gerrit1003 and gerrit2002
[11:35:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:09] <wikibugs>	 (03Abandoned) 10Clément Goubert: check_user: Use deploy instead of mwmaint [puppet] - 10https://gerrit.wikimedia.org/r/1167195 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:35:12] <wikibugs>	 (03CR) 10Muehlenhoff: mwmaint: deprecate mwmaint servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:35:18] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mwaint: Remove from scap [puppet] - 10https://gerrit.wikimedia.org/r/1167196 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:35:44] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet
[11:36:13] <logmsgbot>	 !log jmm@cumin1002 START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
[11:36:16] <icinga-wm>	 PROBLEM - MariaDB Replica IO: matomo on db1208 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:36:18] <icinga-wm>	 PROBLEM - mysqld processes on db1208 is CRITICAL: PROCS CRITICAL: 1 process with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[11:36:18] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: matomo on db1208 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:36:18] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: matomo on db1208 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:36:18] <icinga-wm>	 PROBLEM - MariaDB read only matomo on db1208 is CRITICAL: Could not connect to localhost:3351 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:36:26] <wikibugs>	 (03PS2) 10Clément Goubert: mwmaint: deprecate mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017)
[11:36:26] <marostegui>	 btullis: ^
[11:36:36] <wikibugs>	 (03CR) 10Clément Goubert: mwmaint: deprecate mwmaint servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:37:04] <logmsgbot>	 !log jmm@cumin1002 END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
[11:37:43] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] Remove l10nupdate manifests [puppet] - 10https://gerrit.wikimedia.org/r/928582 (owner: 10Majavah)
[11:38:05] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mwmaint: deprecate mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:39:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:39:16] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] Revert "mw-cron: Disable memory limit" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167177 (owner: 10Clément Goubert)
[11:39:18] <icinga-wm>	 RECOVERY - mysqld processes on db1208 is OK: PROCS OK: 2 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[11:39:18] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: matomo on db1208 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:39:20] <icinga-wm>	 RECOVERY - MariaDB read only matomo on db1208 is OK: Version 10.6.18-MariaDB-log, Uptime 26s, read_only: True, event_scheduler: True, 11.22 QPS, connection latency: 0.033039s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[11:39:20] <btullis>	 Apologioes for the noise re db1208 - that was me.
[11:39:43] <wikibugs>	 (03PS1) 10Vgutierrez: Revert "cache,haproxy: Remove http response captures" [puppet] - 10https://gerrit.wikimedia.org/r/1167200
[11:39:44] <jynus>	 !log upgrade db2201 mariadb package T394487
[11:39:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:46] <stashbot>	 T394487: Migrate backup sources to MariaDB 10.11 - https://phabricator.wikimedia.org/T394487
[11:39:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "cache,haproxy: Remove http response captures" [puppet] - 10https://gerrit.wikimedia.org/r/1167200 (owner: 10Vgutierrez)
[11:40:16] <icinga-wm>	 RECOVERY - MariaDB Replica IO: matomo on db1208 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:40:18] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: matomo on db1208 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:40:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:04-1] "Actually, I forgot one thing: These are still on Puppet 5, and the insetup roles default to Puppet 7, so instead we'll need to set these t" [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:41:36] <wikibugs>	 (03PS3) 10Clément Goubert: mwmaint: deprecate mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017)
[11:41:57] <wikibugs>	 (03CR) 10Clément Goubert: "Done." [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:42:13] <logmsgbot>	 !log jmm@cumin1002 START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
[11:43:05] <wikibugs>	 (03PS2) 10Vgutierrez: Revert "cache,haproxy: Remove http response captures" [puppet] - 10https://gerrit.wikimedia.org/r/1167200 (https://phabricator.wikimedia.org/T397917)
[11:43:56] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] api-gateway: use ratelimit's inbuilt promethus-statsd agent [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166790 (https://phabricator.wikimedia.org/T388804) (owner: 10Hnowlan)
[11:44:18] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] machinetranslation: Use s3 for model download in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166543 (https://phabricator.wikimedia.org/T335491) (owner: 10KartikMistry)
[11:44:39] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
[11:45:50] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: use ratelimit's inbuilt promethus-statsd agent [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166790 (https://phabricator.wikimedia.org/T388804) (owner: 10Hnowlan)
[11:46:05] <wikibugs>	 (03Merged) 10jenkins-bot: machinetranslation: Use s3 for model download in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1166543 (https://phabricator.wikimedia.org/T335491) (owner: 10KartikMistry)
[11:46:19] <wikibugs>	 (03CR) 10Hashar: gerrit: avoid hardcoded hostnames, replace with hiera lookups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1129920 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[11:47:05] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6195/co" [puppet] - 10https://gerrit.wikimedia.org/r/1167200 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[11:49:08] <moritzm>	 !log restarting exim on Phabricator nodes to pick up OpenSSL updates
[11:49:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:49:24] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
[11:49:59] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add the new cephosd200[1-3] servers in codfw to their role [puppet] - 10https://gerrit.wikimedia.org/r/1166866 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis)
[11:51:03] <wikibugs>	 (03CR) 10Hashar: gerrit: config replicas for rename-project plugin (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1165832 (https://phabricator.wikimedia.org/T239693) (owner: 10Hashar)
[11:51:17] <wikibugs>	 (03PS4) 10Hashar: gerrit: config replicas for rename-project plugin [puppet] - 10https://gerrit.wikimedia.org/r/1165832 (https://phabricator.wikimedia.org/T239693)
[11:52:30] <moritzm>	 !log restarting FPM on Phabricator nodes to pick up OpenSSL updates
[11:52:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:33] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
[11:52:53] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
[11:52:53] <wikibugs>	 (03PS1) 10Vgutierrez: cache::haproxy: Replace res.hdr() with res.fhdr() [puppet] - 10https://gerrit.wikimedia.org/r/1167203 (https://phabricator.wikimedia.org/T397917)
[11:52:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
[11:53:21] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Add the new cephosd200[1-3] servers in codfw to their role [puppet] - 10https://gerrit.wikimedia.org/r/1166866 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis)
[11:54:26] <logmsgbot>	 !log btullis@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cephosd[2001-2003].codfw.wmnet with reason: Bootstrapping new ceph cluster
[11:54:52] <wikibugs>	 (03PS2) 10Vgutierrez: cache::haproxy: Replace hdr() with fhdr() [puppet] - 10https://gerrit.wikimedia.org/r/1167203 (https://phabricator.wikimedia.org/T397917)
[11:56:47] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1167203 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[11:57:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1167197 (https://phabricator.wikimedia.org/T397017) (owner: 10Clément Goubert)
[11:59:32] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
[11:59:54] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "Let's keep it around for now, it might be useful for T381587." [puppet] - 10https://gerrit.wikimedia.org/r/989542 (owner: 10Majavah)
[12:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1200)
[12:00:08] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
[12:00:31] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] START helmfile.d/services/api-gateway: apply
[12:00:42] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] DONE helmfile.d/services/api-gateway: apply
[12:00:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] "LGTM; I would've fixed only X-analytics in this patch, to reduce the risk, but proceed as you prefer." [puppet] - 10https://gerrit.wikimedia.org/r/1167203 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[12:01:10] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
[12:01:49] <wikibugs>	 (03PS3) 10Majavah: P:wmcs::db: querysampler: cleanup [puppet] - 10https://gerrit.wikimedia.org/r/989542
[12:02:25] <wikibugs>	 (03CR) 10FNegri: [C:03+1] P:wmcs::db: querysampler: cleanup [puppet] - 10https://gerrit.wikimedia.org/r/989542 (owner: 10Majavah)
[12:02:42] <logmsgbot>	 !log jmm@cumin1002 END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
[12:03:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-eqiad
[12:03:55] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] cache::haproxy: Replace hdr() with fhdr() [puppet] - 10https://gerrit.wikimedia.org/r/1167203 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[12:04:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-eqiad
[12:06:05] <wikibugs>	 (03CR) 10Majavah: [C:03+2] P:wmcs::db: querysampler: cleanup [puppet] - 10https://gerrit.wikimedia.org/r/989542 (owner: 10Majavah)
[12:06:08] <wikibugs>	 14SRE-Sprint-Week-Sustainability-March2023, 06DBA, 13Patch-For-Review, 10Sustainability (Incident Followup): Automatically compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253#10983399 (10Ladsgroup) The above patch needs reworking to take advantage of tables cat...
[12:06:25] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[12:06:34] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] START helmfile.d/services/api-gateway: apply
[12:06:46] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] DONE helmfile.d/services/api-gateway: apply
[12:06:54] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:08:59] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
[12:09:04] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
[12:10:16] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/api-gateway: apply
[12:10:35] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
[12:10:39] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3414 MB (3% inode=98%): /tmp 3414 MB (3% inode=98%): /var/tmp 3414 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[12:11:13] <wikibugs>	 (03CR) 10Jcrespo: "Are you aware of the details of db-compare? In addition to ids, sometimes an --order-by is needed as it may produce false positives due to" [puppet] - 10https://gerrit.wikimedia.org/r/979390 (https://phabricator.wikimedia.org/T207253) (owner: 10Ladsgroup)
[12:12:22] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/api-gateway: apply
[12:12:41] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
[12:20:01] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bookworm
[12:20:15] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.move-vlan for host cephosd2001
[12:20:48] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bookworm
[12:20:58] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to LogStash for DSantamaria (IDP) - https://phabricator.wikimedia.org/T398956 (10DSantamaria) 03NEW
[12:21:02] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.move-vlan for host cephosd2002
[12:21:22] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bookworm
[12:21:28] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[12:21:37] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.move-vlan for host cephosd2003
[12:21:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-registry.service on registry2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:24:49] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
[12:24:53] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
[12:24:53] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:24:53] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:24:56] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:24:57] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cephosd2001
[12:25:04] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[12:25:39] <logmsgbot>	 !log mvernon@cumin1003 START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
[12:26:04] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
[12:26:06] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to LogStash for DSantamaria (IDP) - https://phabricator.wikimedia.org/T398956#10983504 (10MoritzMuehlenhoff) Access to Logstash is handled via Wikimedia IDM, please see https://wikitech.wikimedia.org/wiki/SRE/LDAP/Groups/Request_access for details
[12:26:13] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[12:26:24] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[12:26:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2001
[12:26:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2001
[12:27:41] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:27:41] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:27:44] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:27:44] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cephosd2003
[12:28:11] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[12:28:56] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2003
[12:28:56] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2003
[12:30:42] <logmsgbot>	 !log mvernon@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
[12:30:46] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:30:46] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:30:50] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:30:50] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cephosd2002
[12:31:09] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
[12:31:34] <logmsgbot>	 !log mvernon@cumin1003 START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
[12:32:05] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-single for host apus-be2004.codfw.wmnet
[12:32:23] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2002
[12:32:23] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2002
[12:33:10] <wikibugs>	 (03PS1) 10Btullis: Add the new dse-k8s hosts to site.pp so that we can create the VMs [puppet] - 10https://gerrit.wikimedia.org/r/1167209 (https://phabricator.wikimedia.org/T397293)
[12:34:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1167209 (https://phabricator.wikimedia.org/T397293) (owner: 10Btullis)
[12:36:55] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add the new dse-k8s hosts to site.pp so that we can create the VMs [puppet] - 10https://gerrit.wikimedia.org/r/1167209 (https://phabricator.wikimedia.org/T397293) (owner: 10Btullis)
[12:38:01] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2004.codfw.wmnet
[12:38:10] <logmsgbot>	 !log mvernon@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
[12:39:43] <logmsgbot>	 !log mvernon@cumin1003 START - Cookbook sre.hosts.reboot-single for host apus-be1004.eqiad.wmnet
[12:40:17] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
[12:40:38] <moritzm>	 !log installing commons-beanutils security updates
[12:40:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:46] <logmsgbot>	 !log mvernon@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1004.eqiad.wmnet
[12:44:41] <logmsgbot>	 !log mvernon@cumin1003 START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
[12:45:46] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
[12:46:44] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2001.codfw.wmnet
[12:46:45] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[12:46:49] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
[12:48:47] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
[12:49:50] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[12:49:54] <logmsgbot>	 !log mvernon@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
[12:49:56] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[12:50:04] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [eqiad] START helmfile.d/services/mobileapps: apply
[12:50:43] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
[12:51:22] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
[12:51:32] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[12:51:45] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
[12:52:26] <logmsgbot>	 btullis@cumin1003 makevm (PID 816983) is awaiting input
[12:52:29] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[12:52:44] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
[12:54:28] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
[12:54:37] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Upgrade dbprov1005 & dbprov2005 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167185 (https://phabricator.wikimedia.org/T394487)
[12:54:44] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
[12:54:49] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
[12:54:49] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:54:49] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2001.codfw.wmnet on all recursors
[12:54:52] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2001.codfw.wmnet on all recursors
[12:55:16] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
[12:55:21] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
[12:56:27] <moritzm>	 !log installing ICU security updates
[12:56:27] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
[12:56:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:39] <moritzm>	 !log installing ICU security updates on Bookworm
[12:56:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:30] <icinga-wm>	 PROBLEM - Host cephosd2002 is DOWN: PING CRITICAL - Packet loss = 100%
[12:58:21] <logmsgbot>	 btullis@cumin1003 makevm (PID 816983) is awaiting input
[12:59:16] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
[13:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: Your horoscope predicts another UTC afternoon backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1300).
[13:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[13:02:33] <icinga-wm>	 RECOVERY - Host cephosd2002 is UP: PING OK - Packet loss = 0%, RTA = 30.32 ms
[13:04:38] <logmsgbot>	 !log jynus@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbprov2005.codfw.wmnet,dbprov1005.eqiad.wmnet with reason: MariaDB package update
[13:07:29] <wikibugs>	 (03PS1) 10CDobbins: varnish: selectively increase NetworkProbeLimit [puppet] - 10https://gerrit.wikimedia.org/r/1167215
[13:07:44] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] dbbackups: Upgrade dbprov1005 & dbprov2005 MariaDB package to 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167185 (https://phabricator.wikimedia.org/T394487) (owner: 10Jcrespo)
[13:10:48] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Roll-forward again [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167216
[13:11:09] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Roll-forward again [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167216 (owner: 10Giuseppe Lavagetto)
[13:12:21] <wikibugs>	 (03PS1) 10Ladsgroup: Set purge values for parsercache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167217 (https://phabricator.wikimedia.org/T398806)
[13:14:53] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
[13:14:56] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Revert logging changes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167079 (owner: 10Giuseppe Lavagetto)
[13:17:10] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
[13:17:11] <logmsgbot>	 !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
[13:17:47] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
[13:17:48] <logmsgbot>	 !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
[13:18:03] <moritzm>	 !log restarting Postfix on mx* and crm2001 to pick up ICU security updates
[13:18:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:31] <moritzm>	 !log restart clamav on VRTS to pick up ICU security updates
[13:20:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:43] <logmsgbot>	 !log cmooney@cumin2002 START - Cookbook sre.dns.netbox
[13:23:41] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:23:58] <logmsgbot>	 !log cmooney@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
[13:24:31] <icinga-wm>	 RECOVERY - OSPF status on cr1-drmrs is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:25:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.11 point update - https://phabricator.wikimedia.org/T394489#10983753 (10MoritzMuehlenhoff)
[13:26:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[13:26:22] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] varnish: Prevent unknown clients from reaching /evt-103e/v2/events [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181) (owner: 10Vgutierrez)
[13:26:39] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[13:27:03] <logmsgbot>	 cmooney@cumin2002 netbox (PID 2989694) is awaiting input
[13:27:56] <logmsgbot>	 !log cmooney@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
[13:27:57] <logmsgbot>	 !log cmooney@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:27:58] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations, 06Security-Team: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921#10983772 (10Arendpieter)
[13:29:01] <moritzm>	 !llog installing rsync security updates
[13:30:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10983794 (10cmooney)
[13:32:13] <wikibugs>	 (03Abandoned) 10Vgutierrez: Revert "cache,haproxy: Remove http response captures" [puppet] - 10https://gerrit.wikimedia.org/r/1167200 (https://phabricator.wikimedia.org/T397917) (owner: 10Vgutierrez)
[13:32:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10983806 (10cmooney) Updated task description there.  I ran the  Provision a server Netbox script for ml-serve1012, ml-serve1013 and ml-serve1014 just now, as well as t...
[13:32:48] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10983807 (10Jclark-ctr) @cmooney  thanks for assisting with network ports
[13:34:19] <wikibugs>	 06SRE, 06Traffic, 05FY2025-26 WE3.3 Engaging core audiences: [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#10983824 (10Jdrewniak)
[13:35:01] <wikibugs>	 06SRE, 06Traffic, 05FY2025-26 WE3.3 Engaging core audiences: [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#10983830 (10Jdrewniak)
[13:37:46] <wikibugs>	 (03PS1) 10Ssingh: cookbook: add sre.cdn.roll-restart-haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222
[13:38:07] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] varnish: Prevent unknown clients from reaching /evt-103e/v2/events [puppet] - 10https://gerrit.wikimedia.org/r/1167151 (https://phabricator.wikimedia.org/T398181) (owner: 10Vgutierrez)
[13:40:14] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to LogStash for DSantamaria (IDP) - https://phabricator.wikimedia.org/T398956#10983916 (10DSantamaria) 05Open→03Resolved a:03DSantamaria Thanks!
[13:40:44] <moritzm>	 !log installing werkzeug security updates
[13:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:46] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "tested with ` test-cookbook -d -c 1167222 sre.cdn.roll-restart-haproxy --alias cp-ulsfo_upload --reason 'OpenSSL update'` and `test-cookbo" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:45:41] <wikibugs>	 (03CR) 10Ssingh: [V:03+2 C:03+2] cookbook: add sre.cdn.roll-restart-haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:47:41] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] gerrit: avoid hardcoded hostnames, replace with hiera lookups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1129920 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[13:48:59] <wikibugs>	 (03CR) 10Ssingh: [V:03+2 C:03+2] "The +2 was not intentional and a mistake on my part. CI has now finished running." [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:49:31] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] cookbook: add sre.cdn.roll-restart-haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:49:37] <wikibugs>	 (03CR) 10Ssingh: [V:03+1 C:03+2] "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:50:38] <logmsgbot>	 !log arnaudb@cumin1003 START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
[13:50:40] <logmsgbot>	 !log arnaudb@cumin1003 END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
[13:52:37] <wikibugs>	 (03Merged) 10jenkins-bot: cookbook: add sre.cdn.roll-restart-haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167222 (owner: 10Ssingh)
[13:53:14] <logmsgbot>	 !log arnaudb@cumin1003 START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
[13:53:18] <logmsgbot>	 !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
[13:53:25] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp4043 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[13:53:52] <sukhe>	 ^ vgutierrez, is this you and the recent change?
[13:53:56] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2222.mgmt:22 - https://phabricator.wikimedia.org/T398577#10984003 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm pings. D3 breaker reset
[13:54:10] <vgutierrez>	 hmmm no
[13:54:17] <sukhe>	 interesting
[13:54:19] <vgutierrez>	 that sounds like _joe_/fabfur 
[13:54:20] <sukhe>	 let's look
[13:54:21] <vgutierrez>	 but let me check
[13:54:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for restbase2035.mgmt:22 - https://phabricator.wikimedia.org/T398576#10984009 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[13:54:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2214.mgmt:22 - https://phabricator.wikimedia.org/T398575#10984015 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker rest. pings.
[13:55:09] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2227.mgmt:22 - https://phabricator.wikimedia.org/T398574#10984023 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings
[13:55:09] <vgutierrez>	 oh.. reload vcl
[13:55:10] <vgutierrez>	 that's me
[13:55:31] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2225.mgmt:22 - https://phabricator.wikimedia.org/T398566#10984029 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings
[13:55:44] <wikibugs>	 (03PS1) 10Xcollazo: analytics: Absent rsync scripts that import Dumps 1 XML into HDFS [puppet] - 10https://gerrit.wikimedia.org/r/1167224 (https://phabricator.wikimedia.org/T396031)
[13:55:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2223.mgmt:22 - https://phabricator.wikimedia.org/T398558#10984036 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[13:56:43] <vgutierrez>	 sukhe: I might need some coffee but I don't see the problem on cp4043
[13:56:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for restbase2038.mgmt:22 - https://phabricator.wikimedia.org/T398555#10984045 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[13:58:21] <_joe_>	 sukhe: confd seems to think everything's fine by itself
[13:58:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2218.mgmt:22 - https://phabricator.wikimedia.org/T398554#10984054 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[13:58:42] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for maps2008.mgmt:22 - https://phabricator.wikimedia.org/T398553#10984058 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker rest. pings.
[13:58:49] <vgutierrez>	 where is that alert coming from?
[13:59:03] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for mc-misc2002.mgmt:22 - https://phabricator.wikimedia.org/T398552#10984062 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[13:59:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2226.mgmt:22 - https://phabricator.wikimedia.org/T398551#10984066 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset.
[13:59:29] <vgutierrez>	 that's confd_resource_healthy...
[13:59:44] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2193.mgmt:22 - https://phabricator.wikimedia.org/T398550#10984072 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[13:59:57] <denisse>	 !incidents
[13:59:58] <sirenbot>	 6459 (RESOLVED)  [3x] ATSBackendErrorsHigh cache_text sre (eventgate-analytics-external.discovery.wmnet)
[13:59:58] <sirenbot>	 6458 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (eventgate-analytics-external.discovery.wmnet eqsin)
[13:59:58] <sirenbot>	 6457 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (eventgate-analytics-external.discovery.wmnet eqsin)
[14:00:04] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2319.mgmt:22 - https://phabricator.wikimedia.org/T398549#10984079 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[14:00:23] <vgutierrez>	 `confd_vcl_reload_success 0`
[14:00:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2216.mgmt:22 - https://phabricator.wikimedia.org/T398548#10984083 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[14:00:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2192.mgmt:22 - https://phabricator.wikimedia.org/T398547#10984090 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:00:52] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] tox.ini: skip Python 3.10 in CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167081 (owner: 10Volans)
[14:01:05] <vgutierrez>	 so that comes from confd-reload-vcl.sh
[14:01:13] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for conf2006.mgmt:22 - https://phabricator.wikimedia.org/T398546#10984094 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:01:17] <vgutierrez>	 _joe_: latest requestctl commit triggered that error for some reason?
[14:01:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2220.mgmt:22 - https://phabricator.wikimedia.org/T398545#10984099 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[14:02:20] <vgutierrez>	 crap... meeting :D
[14:02:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for arclamp2001.mgmt:22 - https://phabricator.wikimedia.org/T398543#10984117 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:02:33] <sukhe>	 same :| 
[14:02:34] <sukhe>	 will be back in 15
[14:02:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for gerrit2002.mgmt:22 - https://phabricator.wikimedia.org/T398542#10984121 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:03:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for cirrussearch2115.mgmt:22 - https://phabricator.wikimedia.org/T398541#10984128 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker rest. pings.
[14:03:50] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2200.mgmt:22 - https://phabricator.wikimedia.org/T398540#10984132 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[14:04:07] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for db2152.mgmt:22 - https://phabricator.wikimedia.org/T398539#10984136 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:04:21] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for thanos-fe2007.mgmt:22 - https://phabricator.wikimedia.org/T398538#10984140 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm rebooted mgmt switch in D8. pings.
[14:05:10] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: PSU issue on db2213 - https://phabricator.wikimedia.org/T398537#10984148 (10Jhancock.wm) 05Open→03Resolved breaker reset. alert cleared.
[14:05:31] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Unresponsive management for wikikube-worker2318.mgmt:22 - https://phabricator.wikimedia.org/T398536#10984162 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. pings.
[14:11:08] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Power Supply - PS2 Status - issue on wikikube-worker2320:9290 - https://phabricator.wikimedia.org/T398514#10984187 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm D3 breaker reset. alert cleared.
[14:18:11] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
[14:21:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
[14:22:29] <wikibugs>	 (03PS1) 10Vgutierrez: cdn.roll-upgrade-haproxy: Run puppet and then restart haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167229
[14:22:45] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10984249 (10Marostegui)
[14:23:53] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2002.codfw.wmnet
[14:23:54] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[14:24:57] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti2019 / ganeti2020 - https://phabricator.wikimedia.org/T398671#10984256 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[14:25:16] <wikibugs>	 (03PS1) 10Marostegui: db1185: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167231 (https://phabricator.wikimedia.org/T398928)
[14:25:48] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1185: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1167231 (https://phabricator.wikimedia.org/T398928) (owner: 10Marostegui)
[14:26:08] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[14:26:26] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[14:26:27] <jinxer-wm>	 FIRING: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:26:32] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
[14:26:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1185 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78811 and previous config saved to /var/cache/conftool/dbconfig/20250708-142635-marostegui.json
[14:26:55] <wikibugs>	 (03PS1) 10Arnaudb: gerrit: standardize expected rc on systemctl check [cookbooks] - 10https://gerrit.wikimedia.org/r/1167226 (https://phabricator.wikimedia.org/T387833)
[14:26:55] <wikibugs>	 (03CR) 10Arnaudb: "You can test this patch with:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167226 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[14:27:46] <wikibugs>	 10ops-codfw, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephosd2003-dev.codfw.wmnet - https://phabricator.wikimedia.org/T397979#10984285 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[14:28:41] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[14:30:05] <jouncebot>	 Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1430)
[14:30:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Rebuild against latest package versions in bookworm [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1167233
[14:31:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:31:41] <logmsgbot>	 btullis@cumin1003 makevm (PID 834139) is awaiting input
[14:34:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78812 and previous config saved to /var/cache/conftool/dbconfig/20250708-143422-root.json
[14:34:58] <jinxer-wm>	 FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[14:36:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:36:35] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bookworm
[14:36:35] <wikibugs>	 (03PS3) 10Hnowlan: hcaptcha: initial commit for proxy config [puppet] - 10https://gerrit.wikimedia.org/r/1164432 (https://phabricator.wikimedia.org/T397841) (owner: 10Kamila Součková)
[14:39:07] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
[14:39:07] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2001.codfw.wmnet
[14:41:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:41:30] <moritzm>	 !log installing shadow security updates
[14:41:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:38] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Fair enough!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167229 (owner: 10Vgutierrez)
[14:41:52] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] cdn.roll-upgrade-haproxy: Run puppet and then restart haproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/1167229 (owner: 10Vgutierrez)
[14:44:14] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] Add fake hcaptcha proxy secrets. [labs/private] - 10https://gerrit.wikimedia.org/r/1155221 (https://phabricator.wikimedia.org/T397841) (owner: 10Kamila Součková)
[14:44:22] <wikibugs>	 (03CR) 10Hnowlan: [V:03+2 C:03+2] Add fake hcaptcha proxy secrets. [labs/private] - 10https://gerrit.wikimedia.org/r/1155221 (https://phabricator.wikimedia.org/T397841) (owner: 10Kamila Součková)
[14:45:44] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2003.codfw.wmnet
[14:45:46] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[14:45:48] <wikibugs>	 (03CR) 10Hnowlan: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1164432 (https://phabricator.wikimedia.org/T397841) (owner: 10Kamila Součková)
[14:46:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:46:29] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] "Thank you!" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1167233 (owner: 10Muehlenhoff)
[14:46:31] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp4043 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[14:46:38] <vgutierrez>	 sukhe: ^^
[14:47:01] <sukhe>	 :D
[14:47:23] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
[14:47:26] <stashbot>	 T398720: Upgrade to haproxy 2.8.15 - https://phabricator.wikimedia.org/T398720
[14:48:20] <wikibugs>	 (03CR) 10Herron: "Yes, its based on the typical run time and with understanding that sometimes the agent may hit the lock and wait another 5m.  Aiming for a" [puppet] - 10https://gerrit.wikimedia.org/r/1166846 (https://phabricator.wikimedia.org/T398444) (owner: 10Herron)
[14:49:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78813 and previous config saved to /var/cache/conftool/dbconfig/20250708-144928-root.json
[14:49:58] <jinxer-wm>	 RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[14:49:59] <pmiazga>	 !log Ran fixStuckGlobalRename.php for T398837
[14:50:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:01] <stashbot>	 T398837: Unblock stuck global rename of Princekng1425 - https://phabricator.wikimedia.org/T398837
[14:50:19] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
[14:50:23] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
[14:50:23] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:50:23] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2003.codfw.wmnet on all recursors
[14:50:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2003.codfw.wmnet on all recursors
[14:50:30] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[14:50:47] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
[14:50:56] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
[14:51:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Rebuild against latest package versions in bookworm [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1167233 (owner: 10Muehlenhoff)
[14:51:15] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "so long mwdebug!" [puppet] - 10https://gerrit.wikimedia.org/r/1164207 (https://phabricator.wikimedia.org/T397498) (owner: 10Effie Mouzeli)
[14:51:30] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] "looks good, the empty string as a signal to create a noop request feels a little odd, but I don't have a great alternative suggestion." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1154787 (owner: 10Volans)
[14:52:00] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
[14:53:06] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:53:06] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2002.codfw.wmnet on all recursors
[14:53:09] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2002.codfw.wmnet on all recursors
[14:53:32] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
[14:53:34] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-magru
[14:55:49] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-ulsfo
[14:56:37] <logmsgbot>	 btullis@cumin1003 makevm (PID 834139) is awaiting input
[14:57:33] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
[14:58:00] <wikibugs>	 (03CR) 10Volans: [C:03+2] "Ack, thanks. Yes that's what we came out with Luca in wmflib as a way to allow a more ease use on the spicerack side, but is behind a flag" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1154787 (owner: 10Volans)
[14:58:05] <wikibugs>	 (03CR) 10Volans: [C:03+2] tox.ini: skip Python 3.10 in CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167081 (owner: 10Volans)
[15:00:05] <jouncebot>	 jelto, arnoldokoth, and mutante: gettimeofday() says it's time for SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1500)
[15:00:34] <logmsgbot>	 btullis@cumin1003 makevm (PID 834139) is awaiting input
[15:00:58] <wikibugs>	 (03Abandoned) 10JHathaway: Add vendor exclusion to DHCPConfMac [software/spicerack] - 10https://gerrit.wikimedia.org/r/1163801 (owner: 10JHathaway)
[15:02:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bookworm
[15:04:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78814 and previous config saved to /var/cache/conftool/dbconfig/20250708-150434-root.json
[15:05:56] <wikibugs>	 (03CR) 10Herron: [C:03+2] pyrra-filesystem: clear output files on service start [puppet] - 10https://gerrit.wikimedia.org/r/1165571 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[15:09:59] <wikibugs>	 (03Merged) 10jenkins-bot: cookbook API: simplify -t/--task-id support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1154787 (owner: 10Volans)
[15:10:00] <wikibugs>	 (03Merged) 10jenkins-bot: tox.ini: skip Python 3.10 in CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167081 (owner: 10Volans)
[15:11:19] <wikibugs>	 (03PS1) 10Muehlenhoff: thumbor: Update service image to latest rebuild [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167240
[15:11:46] <bvibber>	 gonna deploy a JsonConfig fix for Charts -- 1166942
[15:11:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "Ok thank you for the explanation" [puppet] - 10https://gerrit.wikimedia.org/r/1166846 (https://phabricator.wikimedia.org/T398444) (owner: 10Herron)
[15:12:05] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
[15:12:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bvibber@deploy1003 using scap backport" [extensions/JsonConfig] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1166942 (https://phabricator.wikimedia.org/T398597) (owner: 10Bvibber)
[15:12:53] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] thumbor: Update service image to latest rebuild [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167240 (owner: 10Muehlenhoff)
[15:13:45] <wikibugs>	 (03PS1) 10Zabe: Enable categorylinks read new on a few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167241 (https://phabricator.wikimedia.org/T397912)
[15:18:08] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
[15:18:14] <wikibugs>	 (03PS1) 10Tchanders: Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243
[15:18:48] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-magru
[15:19:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/1166846 (https://phabricator.wikimedia.org/T398444) (owner: 10Herron)
[15:19:40] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-esams
[15:19:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78815 and previous config saved to /var/cache/conftool/dbconfig/20250708-151939-root.json
[15:20:43] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-ulsfo
[15:21:54] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
[15:22:06] <wikibugs>	 (03Merged) 10jenkins-bot: Support null values in data columns in transform output [extensions/JsonConfig] (wmf/1.45.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1166942 (https://phabricator.wikimedia.org/T398597) (owner: 10Bvibber)
[15:22:34] <logmsgbot>	 !log bvibber@deploy1003 Started scap sync-world: Backport for [[gerrit:1166942|Support null values in data columns in transform output (T398597)]]
[15:22:39] <stashbot>	 T398597: Transformed .chart pages crash when the underlying .tab page contains null values - https://phabricator.wikimedia.org/T398597
[15:22:47] <wikibugs>	 (03CR) 10Herron: [C:03+2] alerting_host: set puppet agent to 5m interval [puppet] - 10https://gerrit.wikimedia.org/r/1166846 (https://phabricator.wikimedia.org/T398444) (owner: 10Herron)
[15:24:40] <logmsgbot>	 !log bvibber@deploy1003 bvibber: Backport for [[gerrit:1166942|Support null values in data columns in transform output (T398597)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:25:48] <logmsgbot>	 !log bvibber@deploy1003 bvibber: Continuing with sync
[15:26:16] <wikibugs>	 (03PS1) 10Tchanders: Revert "Add user-related link colors to LinkRenderer::getLinkClasses" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167244 (https://phabricator.wikimedia.org/T392775)
[15:27:58] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
[15:28:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[15:30:02] <wikibugs>	 (03PS11) 10Volans: git::clone: remove remote_name parameter [puppet] - 10https://gerrit.wikimedia.org/r/1148267
[15:30:10] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1148267 (owner: 10Volans)
[15:31:14] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
[15:31:26] <logmsgbot>	 !log bvibber@deploy1003 Finished scap sync-world: Backport for [[gerrit:1166942|Support null values in data columns in transform output (T398597)]] (duration: 08m 52s)
[15:31:30] <stashbot>	 T398597: Transformed .chart pages crash when the underlying .tab page contains null values - https://phabricator.wikimedia.org/T398597
[15:31:36] <bvibber>	 \o/ done
[15:32:00] <Lucas_WMDE>	 uh oh. whither wmopbot
[15:38:16] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
[15:38:16] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2003.codfw.wmnet
[15:42:11] <wikibugs>	 (03PS2) 10Federico Ceratto: Add parsercache pooling/depooling cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389)
[15:42:12] <wikibugs>	 (03CR) 10Federico Ceratto: "Tests are passing. It's a simple cookbook but some care could be needed with handling failure modes around dbctl" [cookbooks] - 10https://gerrit.wikimedia.org/r/1165546 (https://phabricator.wikimedia.org/T388389) (owner: 10Federico Ceratto)
[15:43:01] <wikibugs>	 (03PS1) 10Volans: administrative: add support for empty task ID [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247
[15:44:11] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] administrative: add support for empty task ID [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247 (owner: 10Volans)
[15:44:24] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-esams
[15:48:39] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
[15:48:39] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2002.codfw.wmnet
[15:49:44] <wikibugs>	 (03PS1) 10Hnowlan: changeprop: don't process File: pages for mobile html pages in PCS [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167249 (https://phabricator.wikimedia.org/T397750)
[15:51:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] administrative: add support for empty task ID [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247 (owner: 10Volans)
[15:52:39] <wikibugs>	 10SRE-SLO, 13Patch-For-Review: Reduce the pyrra's multi-dc configurations where it makes sense - https://phabricator.wikimedia.org/T398534#10984766 (10herron) Today I reviewed a sampling of our published SLO docs and while some do make mention of 'datacenter' and specific names like 'eqiad' 'codfw', I didn't s...
[15:52:59] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247 (owner: 10Volans)
[15:57:34] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-drmrs
[16:00:05] <jouncebot>	 jhathaway and moritzm: May I have your attention please! Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1600)
[16:00:05] <jouncebot>	 dancy: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[16:00:22] <moritzm>	 already deployed
[16:00:27] <dancy>	 Indeed.  Thanks Mortiz!
[16:01:06] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.11 point update - https://phabricator.wikimedia.org/T394489#10984815 (10MoritzMuehlenhoff)
[16:02:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[16:06:59] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2001.codfw.wmnet
[16:07:01] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[16:07:26] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2002.codfw.wmnet
[16:07:54] <wikibugs>	 (03CR) 10Volans: [C:03+2] administrative: add support for empty task ID [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247 (owner: 10Volans)
[16:10:57] <mszabo>	 jouncebot: nowandnext
[16:10:57] <jouncebot>	 For the next 0 hour(s) and 49 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1600)
[16:10:58] <jouncebot>	 In 0 hour(s) and 49 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1700)
[16:11:21] <wikibugs>	 (03PS4) 10Pppery: Catalog newsletter tables [puppet] - 10https://gerrit.wikimedia.org/r/1167252 (https://phabricator.wikimedia.org/T398941)
[16:11:40] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
[16:12:03] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[16:12:12] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy1003 using scap backport" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167244 (https://phabricator.wikimedia.org/T392775) (owner: 10Tchanders)
[16:12:12] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy1003 using scap backport" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[16:12:35] <logmsgbot>	 btullis@cumin1003 makevm (PID 848937) is awaiting input
[16:14:11] <wikibugs>	 (03PS1) 10Btullis: Update the IP addresses for cephosd200[1-3] post vlan-move [puppet] - 10https://gerrit.wikimedia.org/r/1167254 (https://phabricator.wikimedia.org/T374923)
[16:14:21] <wikibugs>	 (03PS1) 10Máté Szabó: UpdateMessageJobTest: Read expected transver from latest [extensions/Translate] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167256 (https://phabricator.wikimedia.org/T398904)
[16:14:44] <wikibugs>	 (03PS2) 10Máté Szabó: Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[16:14:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "Add user-related link colors to LinkRenderer::getLinkClasses" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167244 (https://phabricator.wikimedia.org/T392775) (owner: 10Tchanders)
[16:15:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy1003 using scap backport" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167244 (https://phabricator.wikimedia.org/T392775) (owner: 10Tchanders)
[16:15:07] <wikibugs>	 (03CR) 10TrainBranchBot: "Approved by mszabo@deploy1003 using scap backport" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[16:15:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy1003 using scap backport" [extensions/Translate] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167256 (https://phabricator.wikimedia.org/T398904) (owner: 10Máté Szabó)
[16:15:15] <wikibugs>	 (03Merged) 10jenkins-bot: administrative: add support for empty task ID [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167247 (owner: 10Volans)
[16:17:39] <logmsgbot>	 btullis@cumin1003 makevm (PID 848966) is awaiting input
[16:17:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[16:18:52] <wikibugs>	 (03CR) 10Máté Szabó: "recheck" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[16:20:07] <wikibugs>	 (03PS1) 10Krinkle: beta: Remove beta-specific 'http' entry for wgGraphAllowedDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167258
[16:20:07] <wikibugs>	 (03PS1) 10Krinkle: beta: Move beta wikipedia canonical to beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167259 (https://phabricator.wikimedia.org/T289318)
[16:21:39] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
[16:21:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-registry.service on registry2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:21:42] <stashbot>	 T398720: Upgrade to haproxy 2.8.15 - https://phabricator.wikimedia.org/T398720
[16:22:37] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-drmrs
[16:23:26] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqiad
[16:23:44] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bookworm
[16:27:04] <Krinkle>	 !log Add ATS routing to profile::trafficserver::backend::mapping_rules in Hiera (Horizon pupet prefix: cache-text) for a wmcloud version of config-master.wikimedia.beta.wmflabs.org
[16:27:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:27:49] <wikibugs>	 (03Merged) 10jenkins-bot: UpdateMessageJobTest: Read expected transver from latest [extensions/Translate] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167256 (https://phabricator.wikimedia.org/T398904) (owner: 10Máté Szabó)
[16:28:31] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Update the IP addresses for cephosd200[1-3] post vlan-move [puppet] - 10https://gerrit.wikimedia.org/r/1167254 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis)
[16:28:48] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer" [extensions/CampaignEvents] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167243 (owner: 10Tchanders)
[16:28:52] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Add user-related link colors to LinkRenderer::getLinkClasses" [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167244 (https://phabricator.wikimedia.org/T392775) (owner: 10Tchanders)
[16:30:27] <logmsgbot>	 !log mszabo@deploy1003 Started scap sync-world: Backport for [[gerrit:1167244|Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952)]], [[gerrit:1167243|Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer"]], [[gerrit:1167256|UpdateMessageJobTest: Read expected transver from latest (T398904)]]
[16:31:05] <stashbot>	 T392775: Add link color for temporary usernames - https://phabricator.wikimedia.org/T392775
[16:31:05] <stashbot>	 T398714: "Show IP" appearing twice - https://phabricator.wikimedia.org/T398714
[16:31:06] <stashbot>	 T398717: IPInfo button only appears for the first temporary account - https://phabricator.wikimedia.org/T398717
[16:31:06] <stashbot>	 T398952: Inconsistent/confusing styles for temporary account links - https://phabricator.wikimedia.org/T398952
[16:31:07] <stashbot>	 T398904: UpdateMessageJobTest failing as of 2025-07-07 - https://phabricator.wikimedia.org/T398904
[16:31:54] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
[16:32:33] <logmsgbot>	 !log mszabo@deploy1003 tchanders, mszabo: Backport for [[gerrit:1167244|Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952)]], [[gerrit:1167243|Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer"]], [[gerrit:1167256|UpdateMessageJobTest: Read expected transver from latest (T398904)]] synced to the testservers (see https://wikitech.wikimedia.org
[16:32:34] <logmsgbot>	 /wiki/Mwdebug). Changes can now be verified there.
[16:33:51] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
[16:34:12] <logmsgbot>	 !log mszabo@deploy1003 tchanders, mszabo: Continuing with sync
[16:35:16] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
[16:36:30] <logmsgbot>	 !log cdanis@cumin1002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
[16:36:31] <logmsgbot>	 !log cdanis@cumin1002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
[16:37:00] <logmsgbot>	 !log cdanis@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
[16:37:01] <logmsgbot>	 !log cdanis@cumin1002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
[16:39:38] <logmsgbot>	 !log mszabo@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167244|Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952)]], [[gerrit:1167243|Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer"]], [[gerrit:1167256|UpdateMessageJobTest: Read expected transver from latest (T398904)]] (duration: 09m 10s)
[16:39:50] <stashbot>	 T392775: Add link color for temporary usernames - https://phabricator.wikimedia.org/T392775
[16:39:52] <stashbot>	 T398714: "Show IP" appearing twice - https://phabricator.wikimedia.org/T398714
[16:39:52] <stashbot>	 T398717: IPInfo button only appears for the first temporary account - https://phabricator.wikimedia.org/T398717
[16:39:52] <stashbot>	 T398952: Inconsistent/confusing styles for temporary account links - https://phabricator.wikimedia.org/T398952
[16:39:52] <stashbot>	 T398904: UpdateMessageJobTest failing as of 2025-07-07 - https://phabricator.wikimedia.org/T398904
[16:40:39] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.187.0" for 2 host(s)
[16:40:59] <logmsgbot>	 !log bking@cumin1002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
[16:41:02] <stashbot>	 T397227: Build and deploy OpenSearch plugins package for updated regex search - https://phabricator.wikimedia.org/T397227
[16:42:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1189 - https://phabricator.wikimedia.org/T398773#10985142 (10Jclark-ctr) @BTullis I am having issues with this server after Hard drive replacements it will not rebuild VD    I did not want to clear cache it says it could cause data loss   ` STOR305: Una...
[16:42:29] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.187.0" completed for 2 hosts
[16:42:41] <wikibugs>	 (03PS1) 10CDanis: feat: Reverse Depends [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167261
[16:42:55] <cdanis>	 I lwaays forget I need to mess with the deploy repo
[16:43:09] <wikibugs>	 (03CR) 10CDanis: [V:03+2 C:03+2] feat: Reverse Depends [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1167261 (owner: 10CDanis)
[16:43:21] <logmsgbot>	 !log cdanis@cumin1002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
[16:43:23] <logmsgbot>	 !log cdanis@cumin1002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
[16:43:51] <wikibugs>	 (03PS1) 10Cwhite: logstash: nest_root_fields.rb fix memory leak and add tests [puppet] - 10https://gerrit.wikimedia.org/r/1167262 (https://phabricator.wikimedia.org/T398990)
[16:43:53] <logmsgbot>	 !log cdanis@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
[16:43:54] <logmsgbot>	 !log cdanis@cumin1002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
[16:45:00] <wikibugs>	 (03PS2) 10Krinkle: beta: Move beta wikipedia canonical to beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167259 (https://phabricator.wikimedia.org/T289318)
[16:46:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] logstash: nest_root_fields.rb fix memory leak and add tests [puppet] - 10https://gerrit.wikimedia.org/r/1167262 (https://phabricator.wikimedia.org/T398990) (owner: 10Cwhite)
[16:48:31] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqiad
[16:50:05] <wikibugs>	 (03PS2) 10Cwhite: logstash: nest_root_fields.rb fix memory leak and add tests [puppet] - 10https://gerrit.wikimedia.org/r/1167262 (https://phabricator.wikimedia.org/T398990)
[16:52:13] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
[16:53:39] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
[16:56:32] <wikibugs>	 (03PS3) 10Cwhite: logstash: nest_root_fields.rb fix memory leak and add tests [puppet] - 10https://gerrit.wikimedia.org/r/1167262 (https://phabricator.wikimedia.org/T398990)
[16:58:36] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
[16:58:54] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[16:59:54] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T1700)
[17:02:52] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:02:58] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:03:40] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:03:56] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:04:29] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:04:38] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:06:28] <wikibugs>	 (03CR) 10Herron: [C:03+1] "Thanks for the refactor, looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1166135 (https://phabricator.wikimedia.org/T398534) (owner: 10Elukey)
[17:09:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10985333 (10Jclark-ctr) @klausman  Will this be legacy or uefi? it is reachable  @elukey  The first machine learning server is cabled ml-serve1012  The provisioning scr...
[17:10:33] <logmsgbot>	 !log bking@cumin1002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
[17:10:36] <stashbot>	 T397227: Build and deploy OpenSearch plugins package for updated regex search - https://phabricator.wikimedia.org/T397227
[17:10:37] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2076 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:10:39] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[17:10:51] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
[17:11:34] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] logstash: nest_root_fields.rb fix memory leak and add tests [puppet] - 10https://gerrit.wikimedia.org/r/1167262 (https://phabricator.wikimedia.org/T398990) (owner: 10Cwhite)
[17:13:22] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.netbox
[17:13:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
[17:13:26] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:13:26] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2001.codfw.wmnet on all recursors
[17:13:29] <jinxer-wm>	 FIRING: SystemdUnitFailed: push_cross_cluster_settings_9600.service on cirrussearch2076:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:13:30] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2001.codfw.wmnet on all recursors
[17:13:43] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:13:51] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:13:53] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
[17:13:58] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
[17:14:18] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
[17:14:31] <wikibugs>	 (03PS4) 10Ssingh: team-traffic: add dnsbox alert for service status mismatch [alerts] - 10https://gerrit.wikimedia.org/r/1166225 (https://phabricator.wikimedia.org/T374619)
[17:18:27] <wikibugs>	 (03CR) 10Ssingh: "The label_replace mangling is intentional here so as to avoid making changes to the existing setup, both for anycast-hc and how we generat" [alerts] - 10https://gerrit.wikimedia.org/r/1166225 (https://phabricator.wikimedia.org/T374619) (owner: 10Ssingh)
[17:18:34] <wikibugs>	 (03CR) 10Ssingh: "(Ready for review)" [alerts] - 10https://gerrit.wikimedia.org/r/1166225 (https://phabricator.wikimedia.org/T374619) (owner: 10Ssingh)
[17:18:50] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
[17:18:55] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
[17:18:55] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:18:55] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2002.codfw.wmnet on all recursors
[17:18:58] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2002.codfw.wmnet on all recursors
[17:19:21] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
[17:21:07] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10985424 (10cmooney) 05Resolved→03Open @Jhancock.wm I notice these devices are still in Netbox?  https://netbox.wikimedia.org/dcim/devices/?manufacturer_id=96  Not sure...
[17:21:21] <wikibugs>	 (03PS1) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:21:41] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
[17:22:26] <logmsgbot>	 btullis@cumin1003 makevm (PID 848966) is awaiting input
[17:22:38] <wikibugs>	 (03PS2) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:22:44] <wikibugs>	 (03PS3) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:22:45] <wikibugs>	 (03CR) 10Krinkle: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[17:25:30] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
[17:25:41] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
[17:28:29] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: push_cross_cluster_settings_9600.service on cirrussearch2076:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:29:24] <wikibugs>	 (03CR) 10Krinkle: deployment-prep: Add Apache vhost aliases for *.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1153764 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[17:29:33] <wikibugs>	 (03PS4) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:30:01] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
[17:30:37] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2076 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:33:05] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
[17:36:23] <wikibugs>	 (03PS1) 10Krinkle: beta: Add redirect for upload.wikimedia.beta.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/1167268 (https://phabricator.wikimedia.org/T289318)
[17:39:05] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
[17:39:37] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: codfw expansion infrastructure racking task - https://phabricator.wikimedia.org/T387504#10985509 (10cmooney) @Jhancock.wm in terms of the new Nokia switches in the expansion cage we can cable them to the spines like this:  |Spine|Spine Port|Leaf|Leaf Port| |------|-------------|...
[17:43:17] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
[17:48:04] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
[17:50:16] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
[17:52:46] <logmsgbot>	 !log ebernhardson@deploy1003 Started deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts
[17:53:06] <logmsgbot>	 !log ebernhardson@deploy1003 Finished deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts (duration: 00m 19s)
[17:53:31] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
[17:53:43] <wikibugs>	 (03PS7) 10Krinkle: deployment-prep: Add Apache vhost aliases for *.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1153764 (https://phabricator.wikimedia.org/T289318)
[17:55:09] <wikibugs>	 (03PS4) 10Krinkle: beta: Document beta-specific "w.beta.wmcloud.org" handling [puppet] - 10https://gerrit.wikimedia.org/r/1160441 (https://phabricator.wikimedia.org/T396012)
[17:55:14] <wikibugs>	 (03PS5) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:55:55] <wikibugs>	 (03PS6) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[17:55:55] <wikibugs>	 (03PS2) 10Krinkle: beta: Add redirect for upload.wikimedia.beta.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/1167268 (https://phabricator.wikimedia.org/T289318)
[17:58:20] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
[17:58:20] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2001.codfw.wmnet
[18:00:18] <wikibugs>	 (03CR) 10Andrea Denisse: "I just left a small question, otherwise LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1167157 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli)
[18:04:33] <wikibugs>	 06SRE, 06collaboration-services, 13Patch-For-Review: setup gerrit2003 with gerrit service (gerrit on bookworm) - https://phabricator.wikimedia.org/T372804#10985582 (10Dzahn) Since yesterday we are now replicating to the new machine gerrit2003 again.  https://gerrit.wikimedia.org/r/c/operations/puppet/+/1153265
[18:05:05] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [alerts] - 10https://gerrit.wikimedia.org/r/1166225 (https://phabricator.wikimedia.org/T374619) (owner: 10Ssingh)
[18:05:47] <wikibugs>	 06SRE, 06collaboration-services, 13Patch-For-Review: setup gerrit2003 with gerrit service (gerrit on bookworm) - https://phabricator.wikimedia.org/T372804#10985584 (10Dzahn) @ABran-WMF I wonder if you have thoughts on my original question on this ticket, back in August 2024 I said:  "determine if this is res...
[18:07:42] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
[18:07:42] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2002.codfw.wmnet
[18:09:35] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-codfw
[18:11:54] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
[18:14:02] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqsin
[18:22:32] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[18:26:40] <logmsgbot>	 !log kcvelaga@deploy1003 Started deploy [airflow-dags/analytics_product@52ec646]: T394526
[18:26:43] <stashbot>	 T394526: Data pipeline to aggregate CX monthly machine translation service usage - https://phabricator.wikimedia.org/T394526
[18:28:11] <logmsgbot>	 !log kcvelaga@deploy1003 Finished deploy [airflow-dags/analytics_product@52ec646]: T394526 (duration: 01m 35s)
[18:32:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[18:34:15] <wikibugs>	 (03PS1) 10Krinkle: beta: Change beta.wmcloud.org stub redirect to new Meta-Wiki canonical [puppet] - 10https://gerrit.wikimedia.org/r/1167273 (https://phabricator.wikimedia.org/T289318)
[18:34:24] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-codfw
[18:39:28] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqsin
[18:40:40] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#10985734 (10aranyap) >>! In T398650#10974346, @Clement_Goubert wrote: > Please make sure the [[ https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikim...
[18:42:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:52:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on wdqs2023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:59:08] <logmsgbot>	 !log bking@cumin1002 DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2022.codfw.wmnet: Renew puppet certificate - bking@cumin1002
[19:09:21] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "I forgot to add to the +1: 0 tests failed, 0 tests skipped, 18 tests passed" [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[19:12:02] <wikibugs>	 (03Abandoned) 10JHathaway: WIP: do not merge [cookbooks] - 10https://gerrit.wikimedia.org/r/1165598 (owner: 10JHathaway)
[19:13:31] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+1] gerrit: avoid hardcoded hostnames, replace with hiera lookups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1129920 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn)
[19:13:35] <wikibugs>	 (03PS7) 10Krinkle: varnish: Swap hardcoded upload.wm.o cond for upload_domain in path normalize [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318)
[19:13:56] <wikibugs>	 (03CR) 10Krinkle: "@sukhe: Thanks, I've debased this from the rest of the beta stack to ease landing." [puppet] - 10https://gerrit.wikimedia.org/r/1167266 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle)
[19:19:18] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[19:24:07] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "I have tested this and it returns exit code 0 whether service is active or inactive. lgtm" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167226 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[19:24:43] <wikibugs>	 (03CR) 10Dzahn: gerrit: config replicas for rename-project plugin (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1165832 (https://phabricator.wikimedia.org/T239693) (owner: 10Hashar)
[19:30:32] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10986005 (10Dzahn) a:05Arnoldokoth→03None
[19:31:41] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10986023 (10Dzahn) a:03SKivlehan-WMF
[19:32:28] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10986025 (10Dzahn) Still stalled. Assigned to user because we are waiting for their response.
[19:35:20] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167285
[19:37:19] <wikibugs>	 (03CR) 10Dbrant: [C:03+2] mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167285 (owner: 10PipelineBot)
[19:38:55] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1167285 (owner: 10PipelineBot)
[19:40:11] <logmsgbot>	 !log dbrant@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: apply
[19:41:35] <logmsgbot>	 !log dbrant@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[19:41:55] <wikibugs>	 (03PS1) 10Xcollazo: analytics: deprioritize druid MapReduce jobs if needed [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013)
[19:41:57] <logmsgbot>	 !log dbrant@deploy1003 helmfile [eqiad] START helmfile.d/services/mobileapps: apply
[19:42:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] analytics: deprioritize druid MapReduce jobs if needed [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013) (owner: 10Xcollazo)
[19:42:46] <logmsgbot>	 !log dbrant@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
[19:42:55] <logmsgbot>	 !log dbrant@deploy1003 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[19:43:32] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 1234298072 and 57 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:43:44] <logmsgbot>	 !log dbrant@deploy1003 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[19:47:32] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 483120 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:48:34] <wikibugs>	 (03CR) 10Xcollazo: "@btullis@wikimedia.org can you help me with the proper `Host` definition so that we can PPC this?" [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013) (owner: 10Xcollazo)
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: Time to snap out of that daydream and deploy UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T2000).
[20:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[20:02:04] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986083 (10VRiley-WMF)
[20:04:52] <sbassett>	 Hey all - going to use the backport window here (no changes scheduled) to get a private mitigation update deployed.
[20:15:06] <sbassett>	 !log Deployed security mitigation update for T395468
[20:15:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-registry.service on registry2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:23:46] <wikibugs>	 (03PS1) 10Cwhite: logstash: drop most mobileapps-staging outgoing request logs [puppet] - 10https://gerrit.wikimedia.org/r/1167287 (https://phabricator.wikimedia.org/T397252)
[20:25:29] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] "Looks good, and bless you for having a runbook alongside this from the beginning." [alerts] - 10https://gerrit.wikimedia.org/r/1166225 (https://phabricator.wikimedia.org/T374619) (owner: 10Ssingh)
[20:35:59] <wikibugs>	 (03PS1) 10Dzahn: rename build pipelines for sourcebot [container/codesearch] - 10https://gerrit.wikimedia.org/r/1167290 (https://phabricator.wikimedia.org/T268199)
[20:37:35] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "still experimenting with image builds" [container/codesearch] - 10https://gerrit.wikimedia.org/r/1167290 (https://phabricator.wikimedia.org/T268199) (owner: 10Dzahn)
[20:50:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986258 (10VRiley-WMF) es1047  cableID: 1089 port 9 Rack A3 U 5  es1048 cableID: 5180 port 27 Rack B5 U7
[20:56:00] <wikibugs>	 (03CR) 10Btullis: "I think that you should be able to use:" [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013) (owner: 10Xcollazo)
[20:56:27] <wikibugs>	 (03PS5) 10Tiziano Fogli: prom/metamonitor: add dead man switch and public endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1167157 (https://phabricator.wikimedia.org/T397003)
[21:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250708T2100)
[21:02:36] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:02:51] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:04:44] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] logstash: drop most mobileapps-staging outgoing request logs [puppet] - 10https://gerrit.wikimedia.org/r/1167287 (https://phabricator.wikimedia.org/T397252) (owner: 10Cwhite)
[21:06:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10986306 (10Jclark-ctr)  ml-serve1013  is cabled ml-serve1012 manually configured the root account and password
[21:13:11] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
[21:13:54] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:16:12] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[21:16:52] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:20:09] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt  es1047 - vriley@cumin1002"
[21:20:41] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt  es1047 - vriley@cumin1002"
[21:20:42] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:21:08] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:23:28] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[21:24:16] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:24:48] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host es1047
[21:26:35] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1047
[21:27:30] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:28:10] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
[21:31:15] <logmsgbot>	 vriley@cumin1002 netbox (PID 3617261) is awaiting input
[21:31:24] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
[21:31:24] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[21:31:28] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:33:37] <logmsgbot>	 !log andrew@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
[21:33:49] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[21:34:17] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[21:37:34] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
[21:37:51] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
[21:37:51] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:38:24] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
[21:38:25] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd1051
[21:38:28] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
[21:38:31] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
[21:38:38] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
[21:38:46] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
[21:38:54] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
[21:40:27] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:40:34] <wikibugs>	 (03PS1) 10Zabe: Remove stdClass type hint from ApiFeedContributions::feedItem() for now [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167296 (https://phabricator.wikimedia.org/T398925)
[21:41:14] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[21:43:27] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:44:13] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:45:28] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:45:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10986464 (10Jclark-ctr) @dcaro  @Andrew  @cmooney  @ayounsi  I need some assistance. I need to open a block of 4x ports on cloudsw1-f4-eqiad. The least dis...
[21:47:37] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
[21:47:45] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
[21:47:45] <wikibugs>	 (03PS1) 10Ryan Kemper: Replace elasticsearch api with python requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860)
[21:47:49] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
[21:47:57] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
[21:51:15] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
[21:51:55] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
[21:55:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Replace elasticsearch api with python requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper)
[21:55:31] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Enable categorylinks read new on a few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167241 (https://phabricator.wikimedia.org/T397912) (owner: 10Zabe)
[21:55:45] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:56:25] <wikibugs>	 (03Merged) 10jenkins-bot: Enable categorylinks read new on a few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167241 (https://phabricator.wikimedia.org/T397912) (owner: 10Zabe)
[21:56:51] <logmsgbot>	 !log andrew@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
[21:57:53] <zabe>	 dancy: are you currently deploying?
[21:58:16] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10986480 (10Jclark-ctr)
[21:58:17] <dancy>	 I am running an experiment but I can get out of your way
[21:58:54] <dancy>	 Stand by
[21:59:36] <dancy>	 zabe: All yours
[21:59:51] <zabe>	 Alright
[21:59:52] <zabe>	 Thanks
[22:00:12] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:00:36] <logmsgbot>	 !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1167241|Enable categorylinks read new on a few large wikis (T397912)]]
[22:00:39] <stashbot>	 T397912: Set categorylinks to read new - https://phabricator.wikimedia.org/T397912
[22:01:32] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:02:45] <logmsgbot>	 !log zabe@deploy1003 zabe: Backport for [[gerrit:1167241|Enable categorylinks read new on a few large wikis (T397912)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:03:41] <logmsgbot>	 !log zabe@deploy1003 zabe: Continuing with sync
[22:05:04] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Remove stdClass type hint from ApiFeedContributions::feedItem() for now [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167296 (https://phabricator.wikimedia.org/T398925) (owner: 10Zabe)
[22:08:55] <logmsgbot>	 !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167241|Enable categorylinks read new on a few large wikis (T397912)]] (duration: 08m 19s)
[22:08:58] <stashbot>	 T397912: Set categorylinks to read new - https://phabricator.wikimedia.org/T397912
[22:09:00] <wikibugs>	 (03Merged) 10jenkins-bot: Remove stdClass type hint from ApiFeedContributions::feedItem() for now [core] (wmf/1.45.0-wmf.9) - 10https://gerrit.wikimedia.org/r/1167296 (https://phabricator.wikimedia.org/T398925) (owner: 10Zabe)
[22:09:36] <logmsgbot>	 !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1167296|Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)]]
[22:09:39] <stashbot>	 T398925: TypeError: MediaWiki\Api\ApiFeedContributions::feedItem(): Argument #1 ($row) must be of type stdClass, Flow\Formatter\ContributionsRow given, called in /srv/mediawiki/php-1.45.0-wmf.9/includes/api/ApiFeedContributions.php on l - https://phabricator.wikimedia.org/T398925
[22:11:45] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986498 (10VRiley-WMF)
[22:11:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[22:12:11] <logmsgbot>	 !log zabe@deploy1003 zabe: Backport for [[gerrit:1167296|Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:12:49] <wikibugs>	 (03PS1) 10Zabe: Revert "Enable categorylinks read new on a few large wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167303
[22:12:53] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Revert "Enable categorylinks read new on a few large wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167303 (owner: 10Zabe)
[22:13:01] <logmsgbot>	 !log zabe@deploy1003 zabe: Continuing with sync
[22:13:43] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Enable categorylinks read new on a few large wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167303 (owner: 10Zabe)
[22:13:56] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
[22:14:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q2:rack/setup/install ganeti105[34].eqiad.wmnet - https://phabricator.wikimedia.org/T381576#10986503 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host ganeti1053.eqiad.wmnet with OS bookworm
[22:16:11] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host es1047.eqiad.wmnet with OS bookworm
[22:16:17] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986507 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host es1047.eqiad.wmnet with OS bookworm
[22:18:10] <logmsgbot>	 !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167296|Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)]] (duration: 08m 33s)
[22:18:13] <stashbot>	 T398925: TypeError: MediaWiki\Api\ApiFeedContributions::feedItem(): Argument #1 ($row) must be of type stdClass, Flow\Formatter\ContributionsRow given, called in /srv/mediawiki/php-1.45.0-wmf.9/includes/api/ApiFeedContributions.php on l - https://phabricator.wikimedia.org/T398925
[22:18:37] <logmsgbot>	 !log zabe@deploy1003 Started scap sync-world: Backport for [[gerrit:1167303|Revert "Enable categorylinks read new on a few large wikis"]]
[22:20:45] <logmsgbot>	 !log zabe@deploy1003 zabe: Backport for [[gerrit:1167303|Revert "Enable categorylinks read new on a few large wikis"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:21:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[22:21:58] <logmsgbot>	 !log zabe@deploy1003 zabe: Continuing with sync
[22:27:16] <logmsgbot>	 !log zabe@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167303|Revert "Enable categorylinks read new on a few large wikis"]] (duration: 08m 38s)
[22:29:06] <logmsgbot>	 jclark@cumin1002 provision (PID 3666053) is awaiting input
[22:29:35] <zabe>	 dancy: I am done if you want to continue experimenting :)
[22:30:56] <dancy>	 thx.
[22:36:11] <wikibugs>	 06SRE, 06collaboration-services, 10Release-Engineering-Team (Radar): Redirect revisions from svn.wikimedia.org to https://static-codereview.wikimedia.org - https://phabricator.wikimedia.org/T119846#10986675 (10Dzahn) I am uploading a patch for that.  I did notice though that the "SVN repo browser" part is st...
[22:37:24] <wikibugs>	 (03PS1) 10Dzahn: redirects: update SVN rewrite rules, do not link to Phabricator anymore [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846)
[22:38:18] <wikibugs>	 (03PS2) 10Dzahn: redirects: update SVN rewrite rules, do not link to Phabricator anymore [puppet] - 10https://gerrit.wikimedia.org/r/1167306 (https://phabricator.wikimedia.org/T119846)
[22:38:50] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
[22:43:17] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
[22:53:23] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS bookworm
[22:53:34] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986696 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host es1048.eqiad.wmnet with OS bookworm
[23:06:08] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
[23:09:13] <logmsgbot>	 vriley@cumin1002 reimage (PID 3685715) is awaiting input
[23:09:31] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
[23:09:32] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1047.eqiad.wmnet with OS bookworm
[23:09:43] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986720 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host es1047.eqiad.wmnet with OS bookworm completed: - es1047 (**PASS**)   - Remo...
[23:10:12] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986721 (10VRiley-WMF)
[23:11:32] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:15:48] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
[23:16:26] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:19:24] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10986735 (10SKivlehan-WMF) Apologies, @Dzahn ! I have requested wmf LDAP access as provided by @elukey  above. Thank you.
[23:19:26] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
[23:34:10] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
[23:34:17] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q2:rack/setup/install ganeti105[34].eqiad.wmnet - https://phabricator.wikimedia.org/T381576#10986762 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ganeti1053.eqiad.wmnet with OS bookworm executed...
[23:38:02] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1167309
[23:38:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1167309 (owner: 10TrainBranchBot)
[23:43:25] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
[23:43:51] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
[23:43:52] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS bookworm
[23:43:58] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986766 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host es1048.eqiad.wmnet with OS bookworm completed: - es1048 (**PASS**)   - Remo...
[23:44:19] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986767 (10VRiley-WMF)
[23:44:37] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q#:rack/setup/install es104[78] - https://phabricator.wikimedia.org/T393107#10986768 (10VRiley-WMF) 05Open→03Resolved The servers should be all set and ready to go
[23:50:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1167309 (owner: 10TrainBranchBot)
[23:54:13] <wikibugs>	 (03PS2) 10Xcollazo: analytics: deprioritize druid MapReduce jobs if needed [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013)
[23:54:59] <wikibugs>	 (03CR) 10Xcollazo: "Thank you, added." [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013) (owner: 10Xcollazo)
[23:58:02] <wikibugs>	 (03CR) 10Xcollazo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1167286 (https://phabricator.wikimedia.org/T399013) (owner: 10Xcollazo)
[23:58:13] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
[23:58:36] <logmsgbot>	 !log zabe@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply