[00:24:29] <jinxer-wm>	 FIRING: SystemdUnitFailed: ifup@eno12399np0.service on wikikube-worker1290:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:38:12] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101233
[00:38:12] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101233 (owner: 10TrainBranchBot)
[00:55:44] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1101233 (owner: 10TrainBranchBot)
[01:08:11] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101234
[01:08:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101234 (owner: 10TrainBranchBot)
[01:28:24] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1101234 (owner: 10TrainBranchBot)
[01:36:48] <icinga-wm>	 PROBLEM - MD RAID on aqs1014 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[01:36:49] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on aqs1014 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T381742 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[01:36:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742 (10ops-monitoring-bot) 03NEW
[01:37:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:38:31] <jinxer-wm>	 FIRING: Primary inbound port utilisation over 80%  #page: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[01:42:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:43:31] <jinxer-wm>	 RESOLVED: Primary inbound port utilisation over 80%  #page: Device asw2-b-eqiad.mgmt.eqiad.wmnet recovered from Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[02:40:43] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job mysql-test in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:57:42] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tstarling@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100217 (https://phabricator.wikimedia.org/T33951) (owner: 10Tim Starling)
[02:58:22] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare for migration of the Interwiki extension to core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100217 (https://phabricator.wikimedia.org/T33951) (owner: 10Tim Starling)
[02:59:04] <logmsgbot>	 !log tstarling@deploy2002 Started scap sync-world: Backport for [[gerrit:1100217|Prepare for migration of the Interwiki extension to core (T33951)]]
[02:59:08] <stashbot>	 T33951: Merge Interwiki extension into MediaWiki core - https://phabricator.wikimedia.org/T33951
[03:04:28] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:05:43] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job mysql-test in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:10:36] <logmsgbot>	 !log tstarling@deploy2002 tstarling: Backport for [[gerrit:1100217|Prepare for migration of the Interwiki extension to core (T33951)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[03:10:39] <stashbot>	 T33951: Merge Interwiki extension into MediaWiki core - https://phabricator.wikimedia.org/T33951
[03:20:20] <logmsgbot>	 !log tstarling@deploy2002 tstarling: Continuing with sync
[03:30:21] <logmsgbot>	 !log tstarling@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100217|Prepare for migration of the Interwiki extension to core (T33951)]] (duration: 31m 17s)
[03:30:25] <stashbot>	 T33951: Merge Interwiki extension into MediaWiki core - https://phabricator.wikimedia.org/T33951
[03:44:18] <logmsgbot>	 !log tstarling@deploy2002 Started deploy [restbase/deploy@6d0b97e]: no-op test deploy
[03:55:40] <logmsgbot>	 !log tstarling@deploy2002 Finished deploy [restbase/deploy@6d0b97e]: no-op test deploy (duration: 11m 22s)
[03:57:59] <logmsgbot>	 !log tstarling@deploy2002 Started deploy [restbase/deploy@27f4a8e]: add 3 wikis T380726
[03:58:03] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[04:08:45] <logmsgbot>	 !log tstarling@deploy2002 Finished deploy [restbase/deploy@27f4a8e]: add 3 wikis T380726 (duration: 10m 46s)
[04:08:49] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[04:20:37] <logmsgbot>	 !log tstarling@deploy2002 Started deploy [restbase/deploy@27f4a8e]: try again, seems like restbase2026 at least was skipped T380726
[04:20:40] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[04:24:29] <jinxer-wm>	 FIRING: SystemdUnitFailed: ifup@eno12399np0.service on wikikube-worker1290:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:29:37] <logmsgbot>	 !log tstarling@deploy2002 Finished deploy [restbase/deploy@27f4a8e]: try again, seems like restbase2026 at least was skipped T380726 (duration: 09m 00s)
[04:29:40] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[04:31:15] <logmsgbot>	 !log tstarling@deploy2002 Started deploy [restbase/deploy@0531d4e]: try again after removing decom servers T380790 T380726
[04:31:20] <stashbot>	 T380790: decommission restbase202[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T380790
[04:45:51] <logmsgbot>	 !log tstarling@deploy2002 Finished deploy [restbase/deploy@0531d4e]: try again after removing decom servers T380790 T380726 (duration: 14m 36s)
[04:45:56] <stashbot>	 T380790: decommission restbase202[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T380790
[04:45:57] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[05:23:42] <logmsgbot>	 !log tstarling@deploy2002 Started deploy [restbase/deploy@8184836]: also deploy to restbase2036-9  T380726 T377896
[05:23:48] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[05:23:48] <stashbot>	 T377896: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896
[05:39:49] <logmsgbot>	 !log tstarling@deploy2002 Finished deploy [restbase/deploy@8184836]: also deploy to restbase2036-9  T380726 T377896 (duration: 16m 06s)
[05:39:52] <TimStarling>	 here I am getting old waiting for this deployment to finish for the 5th time, I wonder what is taking so long?
[05:39:54] <stashbot>	 T380726: Create Wikivoyage Indonesian - https://phabricator.wikimedia.org/T380726
[05:39:54] <stashbot>	 T377896: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896
[05:39:59] <TimStarling>	 1154875  |       \_ /var/lib/scap/scap/bin/python3 /usr/bin/scap deploy-local -v --repo restbase/deploy -g default promote --refresh-config
[05:39:59] <TimStarling>	 1155000  |           \_ sleep 52
[05:40:11] <TimStarling>	 at least I know how to speed it up in future
[05:40:22] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[05:41:31] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[05:53:44] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[05:54:42] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[05:58:06] <wikibugs>	 (03PS1) 10Tim Starling: Enable canShellboxGetTempUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101239 (https://phabricator.wikimedia.org/T292322)
[06:28:50] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[06:29:48] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[06:49:45] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[06:50:40] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[07:04:28] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:05:43] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job mysql-test in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:18:51] <jelto>	 !log homer 'cr*eqiad*' commit 'T377876'
[07:18:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:55] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[07:21:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Comm Error: backplane 0 when reimaging wikikube-worker1057 - https://phabricator.wikimedia.org/T381676#10389323 (10Jelto) The following commands have to be executed when the host is back (just noting it down so I don't forget it):  ` cookbook sre.hosts.reimage --...
[07:34:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1056.eqiad.wmnet
[07:34:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1056.eqiad.wmnet
[07:35:24] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10389333 (10Jelto)
[07:38:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Deprecate system::role for wikireplicas roles [puppet] - 10https://gerrit.wikimedia.org/r/1101068 (owner: 10Muehlenhoff)
[07:41:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] maps: Remove support for osm2pgsql as OSM engine [puppet] - 10https://gerrit.wikimedia.org/r/1100784 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[07:44:11] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[1039-1042] to wikikube-worker[1064-1067] [puppet] - 10https://gerrit.wikimedia.org/r/1101449 (https://phabricator.wikimedia.org/T377876)
[07:52:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] osm_master: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1100788 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[07:55:41] <wikibugs>	 (03PS4) 10Anzx: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Editation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729)
[07:56:06] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[07:56:53] <wikibugs>	 (03PS2) 10Anzx: idwikivoyage: add timezone, sitename and project namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080)
[07:57:04] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[07:58:42] <wikibugs>	 (03PS1) 10Elukey: modules: add helper_1.1.4.tpl [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101450
[07:58:42] <wikibugs>	 (03PS1) 10Elukey: modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451
[07:58:42] <wikibugs>	 (03PS1) 10Elukey: [WIP] charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: It is that lovely time of the day again! You are hereby commanded to deploy UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T0800).
[08:00:05] <jouncebot>	 mszabo and anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:13] <anzx>	 o/
[08:02:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] modules: add helper_1.1.4.tpl [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101450 (owner: 10Elukey)
[08:03:49] <wikibugs>	 (03CR) 10Brouberol: "Let's add a changelog entry?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:05:05] <wikibugs>	 (03PS2) 10Muehlenhoff: maps: Allow disabling the installation of kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1100456
[08:07:50] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100456 (owner: 10Muehlenhoff)
[08:08:13] <wikibugs>	 (03PS2) 10Elukey: modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451
[08:08:13] <wikibugs>	 (03PS2) 10Elukey: [WIP] charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452
[08:08:25] <wikibugs>	 (03CR) 10Elukey: "Right added!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:15:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a define to determine the postgresql version used for a Debian release [puppet] - 10https://gerrit.wikimedia.org/r/1101454
[08:15:19] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Nicely done!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:17:08] <wikibugs>	 (03CR) 10Elukey: [C:03+2] modules: add helper_1.1.4.tpl [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101450 (owner: 10Elukey)
[08:17:10] <mszabo>	 o/
[08:17:13] <wikibugs>	 (03CR) 10Elukey: [C:03+2] modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:17:20] <wikibugs>	 (03PS3) 10Elukey: modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451
[08:17:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:17:34] <wikibugs>	 (03CR) 10Elukey: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:19:31] <wikibugs>	 (03Merged) 10jenkins-bot: modules: remove tpl() usage in base:helper's resourcesDataChecksum [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101451 (owner: 10Elukey)
[08:23:05] <wikibugs>	 (03PS3) 10Muehlenhoff: maps: Allow disabling the installation of kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1100456 (https://phabricator.wikimedia.org/T381565)
[08:24:29] <jinxer-wm>	 FIRING: SystemdUnitFailed: ifup@eno12399np0.service on wikikube-worker1290:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:24:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10389360 (10elukey) >>! In T378368#10386835, @elukey wrote: > I am reviewing the quote of these nodes to figure out what t...
[08:26:29] <wikibugs>	 (03PS1) 10JMeybohm: Enable pki external service in cfssl-issuer deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101455
[08:28:17] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes[1039-1042] to wikikube-worker[1064-1067] [puppet] - 10https://gerrit.wikimedia.org/r/1101449 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[08:29:45] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1039-1042].eqiad.wmnet
[08:32:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1039-1042].eqiad.wmnet
[08:32:33] <wikibugs>	 (03PS2) 10JMeybohm: Enable pki external service in cfssl-issuer deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101455
[08:32:33] <wikibugs>	 (03PS1) 10JMeybohm: cfssl-issuer: Add external_services to chart fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101456
[08:32:35] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[1039-1042] to wikikube-worker[1064-1067] [puppet] - 10https://gerrit.wikimedia.org/r/1101449 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[08:33:08] <wikibugs>	 (03PS1) 10Elukey: sre.hosts.provision: add uefi only devices for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368)
[08:34:39] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:35:10] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:35:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1039 to wikikube-worker1064
[08:35:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:35:43] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job mysql-test in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:36:29] <wikibugs>	 (03PS2) 10Elukey: sre.hosts.provision: add uefi only devices for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368)
[08:36:44] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:38:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [WIP] charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (owner: 10Elukey)
[08:39:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1039 to wikikube-worker1064 - jelto@cumin1002"
[08:40:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1039 to wikikube-worker1064 - jelto@cumin1002"
[08:40:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:40:11] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1064
[08:40:54] <wikibugs>	 (03CR) 10Elukey: [C:03+1] maps: Allow disabling the installation of kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1100456 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[08:41:59] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:42:00] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Add a define to determine the postgresql version used for a Debian release [puppet] - 10https://gerrit.wikimedia.org/r/1101454 (owner: 10Muehlenhoff)
[08:42:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] maps: Allow disabling the installation of kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1100456 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[08:42:43] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "looks good to me" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101455 (owner: 10JMeybohm)
[08:42:46] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1064
[08:42:58] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "looks good to me" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101456 (owner: 10JMeybohm)
[08:43:24] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1039 to wikikube-worker1064
[08:45:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1040 to wikikube-worker1065
[08:45:40] <jinxer-wm>	 FIRING: [3x] KubernetesRsyslogDown: rsyslog on kubernetes1040:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:46:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:49:37] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Enable pki external service in cfssl-issuer deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101455 (owner: 10JMeybohm)
[08:49:40] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] cfssl-issuer: Add external_services to chart fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101456 (owner: 10JMeybohm)
[08:49:51] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] "Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099837 (owner: 10Wziko)
[08:50:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1040 to wikikube-worker1065 - jelto@cumin1002"
[08:50:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1040 to wikikube-worker1065 - jelto@cumin1002"
[08:50:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:50:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1065
[08:52:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1065
[08:53:06] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1040 to wikikube-worker1065
[08:53:26] <wikibugs>	 (03Merged) 10jenkins-bot: feat(cfssl-issuer): change default value for external_services in cfssl issuer helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099837 (owner: 10Wziko)
[08:53:41] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1041 to wikikube-worker1066
[08:53:43] <wikibugs>	 (03Merged) 10jenkins-bot: cfssl-issuer: Add external_services to chart fixture [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101456 (owner: 10JMeybohm)
[08:53:43] <wikibugs>	 (03Merged) 10jenkins-bot: Enable pki external service in cfssl-issuer deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101455 (owner: 10JMeybohm)
[08:54:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:57:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1041 to wikikube-worker1066 - jelto@cumin1002"
[08:58:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1041 to wikikube-worker1066 - jelto@cumin1002"
[08:58:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:58:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1066
[08:59:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1066
[08:59:26] <wikibugs>	 (03CR) 10Volans: "question inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368) (owner: 10Elukey)
[08:59:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1041 to wikikube-worker1066
[09:00:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1042 to wikikube-worker1067
[09:00:59] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-worker[2074-2075,2091,2124].codfw.wmnet with reason: reimage
[09:01:11] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[09:01:19] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-worker[2074-2075,2091,2124].codfw.wmnet with reason: reimage
[09:02:10] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2074-2075,2091,2124].codfw.wmnet
[09:04:25] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2074-2075,2091,2124].codfw.wmnet
[09:04:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1042 to wikikube-worker1067 - jelto@cumin1002"
[09:04:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1042 to wikikube-worker1067 - jelto@cumin1002"
[09:04:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:04:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1067
[09:05:12] <wikibugs>	 (03PS1) 10Muehlenhoff: maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461
[09:05:23] <wikibugs>	 (03PS2) 10Muehlenhoff: maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461
[09:05:47] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10389434 (10JMeybohm)
[09:06:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1067
[09:07:02] <wikibugs>	 (03CR) 10Harroyo-wmf: [C:03+1] dialog: Fix wrong title on Types of unacceptable behavior step [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101069 (https://phabricator.wikimedia.org/T381529) (owner: 10Máté Szabó)
[09:07:06] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1042 to wikikube-worker1067
[09:07:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1064.eqiad.wmnet wikikube-worker1065.eqiad.wmnet wikikube-worker1066.eqiad.wmnet wikikube-worker1067.eqiad.wmnet on all recursors
[09:07:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1064.eqiad.wmnet wikikube-worker1065.eqiad.wmnet wikikube-worker1066.eqiad.wmnet wikikube-worker1067.eqiad.wmnet on all recursors
[09:08:14] <icinga-wm>	 PROBLEM - BGP status on lsw1-a5-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:09:33] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101461 (owner: 10Muehlenhoff)
[09:10:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1064.eqiad.wmnet with OS bookworm
[09:10:53] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1065.eqiad.wmnet with OS bookworm
[09:11:37] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on wikikube-worker2091 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T381747 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[09:11:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2091 - https://phabricator.wikimedia.org/T381747 (10ops-monitoring-bot) 03NEW
[09:12:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1066.eqiad.wmnet with OS bookworm
[09:12:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1067.eqiad.wmnet with OS bookworm
[09:12:36] <icinga-wm>	 PROBLEM - BGP status on lsw1-a6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:13:05] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2074.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:13:43] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:14:00] <wikibugs>	 (03PS3) 10Muehlenhoff: maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461
[09:14:15] <wikibugs>	 (03CR) 10Harroyo-wmf: [C:03+1] dialog: Fix spacing between buttons in the dialog footer [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101070 (https://phabricator.wikimedia.org/T381530) (owner: 10Máté Szabó)
[09:14:18] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2091.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:14:36] <icinga-wm>	 RECOVERY - BGP status on lsw1-a6-codfw.mgmt is OK: BGP OK - up: 44, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:16:08] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2124.codfw.wmnet with OS bookworm
[09:16:19] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2124.codfw.wmnet with OS bookworm
[09:16:49] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101461 (owner: 10Muehlenhoff)
[09:18:40] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2124.codfw.wmnet with OS bookworm
[09:18:45] <wikibugs>	 (03CR) 10Volans: [C:03+2] style: a pass of black on all files [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100772 (owner: 10Volans)
[09:21:16] <icinga-wm>	 RECOVERY - BGP status on lsw1-a5-codfw.mgmt is OK: BGP OK - up: 32, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:21:36] <icinga-wm>	 PROBLEM - BGP status on lsw1-a6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:21:57] <wikibugs>	 (03PS3) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[09:22:05] <wikibugs>	 (03PS2) 10Gergő Tisza: Fix protocol for .well-known/change-password Apache rule [puppet] - 10https://gerrit.wikimedia.org/r/1101462 (https://phabricator.wikimedia.org/T381625)
[09:25:56] <wikibugs>	 (03PS4) 10Muehlenhoff: maps::postgresql_common: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1101461
[09:28:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1064.eqiad.wmnet with reason: host reimage
[09:28:41] <wikibugs>	 (03Merged) 10jenkins-bot: style: a pass of black on all files [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100772 (owner: 10Volans)
[09:28:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1065.eqiad.wmnet with reason: host reimage
[09:29:53] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1066.eqiad.wmnet with reason: host reimage
[09:30:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1067.eqiad.wmnet with reason: host reimage
[09:31:11] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1064.eqiad.wmnet with reason: host reimage
[09:31:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add a define to determine the postgresql version used for a Debian release [puppet] - 10https://gerrit.wikimedia.org/r/1101454 (owner: 10Muehlenhoff)
[09:34:05] <wikibugs>	 (03PS1) 10JMeybohm: move-vlan: Don't fail if there is nothing to do [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464
[09:35:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1066.eqiad.wmnet with reason: host reimage
[09:35:27] <wikibugs>	 (03PS5) 10Anzx: idwikivoyage: add logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101459 (https://phabricator.wikimedia.org/T381080)
[09:36:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101459 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[09:37:36] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101069 (https://phabricator.wikimedia.org/T381529) (owner: 10Máté Szabó)
[09:37:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101070 (https://phabricator.wikimedia.org/T381530) (owner: 10Máté Szabó)
[09:37:58] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100101 (owner: 10Máté Szabó)
[09:38:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1067.eqiad.wmnet with reason: host reimage
[09:38:24] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10389536 (10JMeybohm)
[09:38:28] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2124.codfw.wmnet with reason: host reimage
[09:39:05] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2091.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:39:12] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2074.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:39:18] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:40:42] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2091.codfw.wmnet with OS bookworm
[09:40:50] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2074.codfw.wmnet with OS bookworm
[09:40:54] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2075.codfw.wmnet with OS bookworm
[09:41:11] <kostajh>	 jouncebot: nowandnext
[09:41:11] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 18 minute(s)
[09:41:11] <jouncebot>	 In 1 hour(s) and 18 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1100)
[09:42:19] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2124.codfw.wmnet with reason: host reimage
[09:43:57] <wikibugs>	 (03CR) 10Volans: move-vlan: Don't fail if there is nothing to do (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:44:19] <icinga-wm>	 PROBLEM - BGP status on lsw1-a5-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:46:17] <wikibugs>	 (03PS2) 10JMeybohm: move-vlan: Don't fail if there is nothing to do [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464
[09:46:25] <wikibugs>	 (03CR) 10Jelto: "two comments in-line" [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:46:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1065.eqiad.wmnet with reason: host reimage
[09:47:21] <wikibugs>	 (03CR) 10Elukey: sre.hosts.provision: add uefi only devices for Supermicro (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368) (owner: 10Elukey)
[09:47:54] <wikibugs>	 (03CR) 10Jelto: move-vlan: Don't fail if there is nothing to do (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:48:41] <wikibugs>	 (03CR) 10JMeybohm: move-vlan: Don't fail if there is nothing to do (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:49:54] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368) (owner: 10Elukey)
[09:49:54] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1064.eqiad.wmnet with OS bookworm
[09:51:27] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:52:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[09:53:11] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm now" [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:53:12] <wikibugs>	 (03PS1) 10Muehlenhoff: maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565)
[09:54:15] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1066.eqiad.wmnet with OS bookworm
[09:56:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1055493 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[09:56:19] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] move-vlan: Don't fail if there is nothing to do [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[09:56:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] tests: validate deploy-tag values [alerts] - 10https://gerrit.wikimedia.org/r/1101019 (owner: 10Filippo Giunchedi)
[09:56:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1067.eqiad.wmnet with OS bookworm
[09:59:40] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2074.codfw.wmnet with reason: host reimage
[09:59:50] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2075.codfw.wmnet with reason: host reimage
[10:00:06] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2091.codfw.wmnet with reason: host reimage
[10:01:49] <icinga-wm>	 RECOVERY - BGP status on lsw1-a6-codfw.mgmt is OK: BGP OK - up: 44, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:01:53] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10389656 (10JMeybohm)
[10:02:04] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2124.codfw.wmnet with OS bookworm
[10:02:47] <wikibugs>	 (03Merged) 10jenkins-bot: move-vlan: Don't fail if there is nothing to do [cookbooks] - 10https://gerrit.wikimedia.org/r/1101464 (owner: 10JMeybohm)
[10:03:26] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2074.codfw.wmnet with reason: host reimage
[10:04:29] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mediawiki_job_translationnotifications-metawiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:04:59] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1065.eqiad.wmnet with OS bookworm
[10:06:23] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2075.codfw.wmnet with reason: host reimage
[10:06:43] <jelto>	 !log homer 'cr*eqiad*' commit 'T377876'
[10:06:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:06:46] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[10:08:34] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358#10389669 (10MoritzMuehlenhoff) Looks fine, the server is running stable now and the error message disappeared from IPMI logs:   `   40 | 11/11/2024 | 05:35:48 PM UTC | Proces...
[10:08:42] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594#10389670 (10MoritzMuehlenhoff)
[10:08:42] <icinga-wm>	 PROBLEM - Swift https frontend on ms-fe1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[10:09:26] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[10:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:09:32] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe1010 is OK: HTTP OK: HTTP/1.1 200 OK - 294 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Swift
[10:10:10] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2091.codfw.wmnet with reason: host reimage
[10:10:16] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1010 is OK: HTTP OK: HTTP/1.1 200 OK - 501 bytes in 0.059 second response time https://wikitech.wikimedia.org/wiki/Swift
[10:10:34] <moritzm>	 !log rebalance Ganeti cluster in codfw/A following server refresh T376594
[10:10:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:37] <stashbot>	 T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594
[10:14:10] <icinga-wm>	 PROBLEM - Disk space on titan2001 is CRITICAL: DISK CRITICAL - free space: /srv 23738MiB (1% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=titan2001&var-datasource=codfw+prometheus/ops
[10:15:14] <godog>	 mmhh I'll take a look at that ^
[10:20:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1064-1067].eqiad.wmnet
[10:20:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1064-1067].eqiad.wmnet
[10:21:17] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10389703 (10Jelto)
[10:22:55] <wikibugs>	 06SRE, 06Data-Platform-SRE, 06Infrastructure-Foundations, 10netops: Add QoS markings to profile Hadoop/HDFS analytics traffic - https://phabricator.wikimedia.org/T381389#10389706 (10BTullis) This change looks fine to me, but would it be OK to wait until the New Year to implement it? I'm just a bit cautious...
[10:23:06] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2074.codfw.wmnet with OS bookworm
[10:23:30] <jinxer-wm>	 FIRING: Primary inbound port utilisation over 80%  #page: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[10:24:30] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[10:25:31] <wikibugs>	 06SRE-OnFire, 10MW-on-K8s, 06serviceops, 13Patch-For-Review, 10Sustainability (Incident Followup): mwscript-k8s creates too many resources - https://phabricator.wikimedia.org/T376795#10389740 (10dcausse) The search platform team is working on migrating a set of tools from `mwscript` to `mwscript-k8s` (T3...
[10:25:51] <godog>	 checking
[10:25:56] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2075.codfw.wmnet with OS bookworm
[10:28:30] <jinxer-wm>	 RESOLVED: Primary inbound port utilisation over 80%  #page: Device asw2-b-eqiad.mgmt.eqiad.wmnet recovered from Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[10:29:09] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[1043-1046] to wikikube-worker[1068-1071] [puppet] - 10https://gerrit.wikimedia.org/r/1101473 (https://phabricator.wikimedia.org/T377876)
[10:29:30] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[10:29:32] <icinga-wm>	 RECOVERY - BGP status on lsw1-a5-codfw.mgmt is OK: BGP OK - up: 32, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:30:02] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2091.codfw.wmnet with OS bookworm
[10:32:32] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes[1043-1046] to wikikube-worker[1068-1071] [puppet] - 10https://gerrit.wikimedia.org/r/1101473 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[10:32:58] <wikibugs>	 (03PS1) 10Slyngshede: Prevent leak via window.opener [software/bitu] - 10https://gerrit.wikimedia.org/r/1101474 (https://phabricator.wikimedia.org/T381637)
[10:33:18] <wikibugs>	 06SRE: The ops-maint-gcal.js script is missing support for some vendors - https://phabricator.wikimedia.org/T381680#10389761 (10Aklapper)
[10:33:18] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100417 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[10:34:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1043-1046].eqiad.wmnet
[10:34:10] <icinga-wm>	 RECOVERY - Disk space on titan2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=titan2001&var-datasource=codfw+prometheus/ops
[10:35:46] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2074-2075,2091,2124].codfw.wmnet
[10:35:49] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2074-2075,2091,2124].codfw.wmnet
[10:36:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1043-1046].eqiad.wmnet
[10:36:20] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Prevent leak via window.opener [software/bitu] - 10https://gerrit.wikimedia.org/r/1101474 (https://phabricator.wikimedia.org/T381637) (owner: 10Slyngshede)
[10:36:52] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[1043-1046] to wikikube-worker[1068-1071] [puppet] - 10https://gerrit.wikimedia.org/r/1101473 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[10:37:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1101474 (https://phabricator.wikimedia.org/T381637) (owner: 10Slyngshede)
[10:37:57] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10389785 (10JMeybohm)
[10:38:31] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:38:33] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-worker[2103-2106].codfw.wmnet with reason: reimage
[10:38:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1043 to wikikube-worker1068
[10:38:53] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-worker[2103-2106].codfw.wmnet with reason: reimage
[10:38:57] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[10:39:05] <wikibugs>	 (03Merged) 10jenkins-bot: Prevent leak via window.opener [software/bitu] - 10https://gerrit.wikimedia.org/r/1101474 (https://phabricator.wikimedia.org/T381637) (owner: 10Slyngshede)
[10:39:23] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2103-2106].codfw.wmnet
[10:41:41] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2103-2106].codfw.wmnet
[10:42:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1043 to wikikube-worker1068 - jelto@cumin1002"
[10:42:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1043 to wikikube-worker1068 - jelto@cumin1002"
[10:42:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:42:52] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1068
[10:44:06] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1068
[10:44:21] <wikibugs>	 (03PS2) 10Muehlenhoff: maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565)
[10:44:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1043 to wikikube-worker1068
[10:45:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1044 to wikikube-worker1069
[10:45:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[10:46:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] apt::repository: Fix configuration of source-only repositories on bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1100814 (https://phabricator.wikimedia.org/T379343) (owner: 10Muehlenhoff)
[10:47:17] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101461 (owner: 10Muehlenhoff)
[10:47:40] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:49:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1044 to wikikube-worker1069 - jelto@cumin1002"
[10:49:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubernetes1045:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[10:49:51] <wikibugs>	 (03PS4) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[10:50:01] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1044 to wikikube-worker1069 - jelto@cumin1002"
[10:50:01] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:50:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1069
[10:50:46] <icinga-wm>	 PROBLEM - BGP status on lsw1-b6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:51:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1069
[10:51:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1044 to wikikube-worker1069
[10:52:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1045 to wikikube-worker1070
[10:52:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[10:54:12] <wikibugs>	 (03PS1) 10FNegri: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477
[10:54:32] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es2024.codfw.wmnet with reason: cloning
[10:54:46] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2024.codfw.wmnet with reason: cloning
[10:55:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2024 to clone es2045', diff saved to https://phabricator.wikimedia.org/P71639 and previous config saved to /var/cache/conftool/dbconfig/20241209-105508-marostegui.json
[10:55:24] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on wikikube-worker2106 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 2, Failed: 0, Spare: 1 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T381765 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:55:33] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2106 - https://phabricator.wikimedia.org/T381765 (10ops-monitoring-bot) 03NEW
[10:55:48] <icinga-wm>	 RECOVERY - BGP status on lsw1-b6-codfw.mgmt is OK: BGP OK - up: 38, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:55:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[10:56:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1045 to wikikube-worker1070 - jelto@cumin1002"
[10:56:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1045 to wikikube-worker1070 - jelto@cumin1002"
[10:56:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:56:32] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1070
[10:56:38] <wikibugs>	 (03PS5) 10Elukey: charts: Add kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101452 (https://phabricator.wikimedia.org/T216826)
[10:57:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1070
[10:58:24] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1045 to wikikube-worker1070
[10:58:38] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Move db1159 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/1101478 (https://phabricator.wikimedia.org/T381550)
[10:59:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1046 to wikikube-worker1071
[10:59:05] <wikibugs>	 (03PS2) 10FNegri: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477
[10:59:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[10:59:37] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: add multi-team to conntrack alert [alerts] - 10https://gerrit.wikimedia.org/r/1101480
[10:59:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1210 to clone db1159 T381550', diff saved to https://phabricator.wikimedia.org/P71640 and previous config saved to /var/cache/conftool/dbconfig/20241209-105941-marostegui.json
[10:59:45] <stashbot>	 T381550: Move db1159 to s5 - https://phabricator.wikimedia.org/T381550
[10:59:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Move db1159 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/1101478 (https://phabricator.wikimedia.org/T381550) (owner: 10Marostegui)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1100)
[11:00:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[11:00:32] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1159.eqiad.wmnet with reason: cloning
[11:00:39] <wikibugs>	 (03PS1) 10Aklapper: Phabricator: Add "video/webm" to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/1101481 (https://phabricator.wikimedia.org/T309222)
[11:00:45] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1159.eqiad.wmnet with reason: cloning
[11:01:16] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: cloning
[11:01:29] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: cloning
[11:01:44] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2103.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:01:51] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2104.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:01:57] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2105.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:02:03] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2106.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:03:01] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1046 to wikikube-worker1071 - jelto@cumin1002"
[11:03:02] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.mysql.clone of db1210.eqiad.wmnet onto db1159.eqiad.wmnet
[11:03:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1046 to wikikube-worker1071 - jelto@cumin1002"
[11:03:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:03:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1071
[11:03:51] <icinga-wm>	 PROBLEM - BGP status on lsw1-b6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/
[11:03:51] <icinga-wm>	 monitoring%23BGP_status
[11:04:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1071
[11:04:28] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:05:06] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1046 to wikikube-worker1071
[11:05:17] <wikibugs>	 (03PS3) 10Muehlenhoff: maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565)
[11:05:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1068.eqiad.wmnet wikikube-worker1069.eqiad.wmnet wikikube-worker1070.eqiad.wmnet wikikube-worker1071.eqiad.wmnet on all recursors
[11:05:21] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1068.eqiad.wmnet wikikube-worker1069.eqiad.wmnet wikikube-worker1070.eqiad.wmnet wikikube-worker1071.eqiad.wmnet on all recursors
[11:05:41] <wikibugs>	 (03PS1) 10Elukey: profile::k8s::deployment_server: add config for Kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1101483 (https://phabricator.wikimedia.org/T216826)
[11:08:03] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[11:08:24] <wikibugs>	 (03PS3) 10FNegri: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477
[11:08:51] <icinga-wm>	 RECOVERY - BGP status on lsw1-b6-codfw.mgmt is OK: BGP OK - up: 38, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:09:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[11:11:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1068.eqiad.wmnet with OS bookworm
[11:11:44] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1069.eqiad.wmnet with OS bookworm
[11:12:11] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1070.eqiad.wmnet with OS bookworm
[11:12:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1071.eqiad.wmnet with OS bookworm
[11:15:46] <wikibugs>	 (03PS1) 10Elukey: admin_ng: add the kartotherian namespace on Wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101487 (https://phabricator.wikimedia.org/T216826)
[11:15:48] <wikibugs>	 (03PS1) 10Elukey: services: add helmfile config for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101488 (https://phabricator.wikimedia.org/T216826)
[11:21:09] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2105.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:21:14] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2106.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:21:17] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2103.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:21:20] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2104.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[11:23:21] <wikibugs>	 06SRE, 06Infrastructure-Foundations: ganeti105[34] implementation tracking - https://phabricator.wikimedia.org/T381581#10389915 (10MoritzMuehlenhoff) p:05Triage→03Medium
[11:23:29] <wikibugs>	 (03PS1) 10Btullis: Add hadoop/HTTP keytabs for labs hadoop workers [labs/private] - 10https://gerrit.wikimedia.org/r/1101490 (https://phabricator.wikimedia.org/T381087)
[11:24:15] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Add hadoop/HTTP keytabs for labs hadoop workers [labs/private] - 10https://gerrit.wikimedia.org/r/1101490 (https://phabricator.wikimedia.org/T381087) (owner: 10Btullis)
[11:24:34] <wikibugs>	 (03CR) 10Zoe: [C:03+1] "I don't have +2 permissions here" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099658 (owner: 10PipelineBot)
[11:25:34] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2103.codfw.wmnet with OS bookworm
[11:25:45] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2103
[11:25:45] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2103
[11:25:45] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2104.codfw.wmnet with OS bookworm
[11:25:55] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2104
[11:25:55] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2104
[11:26:41] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2105.codfw.wmnet with OS bookworm
[11:26:51] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2105
[11:26:52] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2105
[11:26:57] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2106.codfw.wmnet with OS bookworm
[11:27:08] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2106
[11:27:08] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2106
[11:27:28] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1068.eqiad.wmnet with OS bookworm
[11:28:08] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101494
[11:30:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1070.eqiad.wmnet with reason: host reimage
[11:30:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1071.eqiad.wmnet with reason: host reimage
[11:32:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1068.eqiad.wmnet with OS bookworm
[11:33:34] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Much much better than my approach." [alerts] - 10https://gerrit.wikimedia.org/r/1101480 (owner: 10Filippo Giunchedi)
[11:33:48] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1 C:03+2] P:prometheus::ops JMX collector for IDP hosts [puppet] - 10https://gerrit.wikimedia.org/r/1100771 (https://phabricator.wikimedia.org/T380402) (owner: 10Slyngshede)
[11:34:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1070.eqiad.wmnet with reason: host reimage
[11:36:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] sre: add multi-team to conntrack alert [alerts] - 10https://gerrit.wikimedia.org/r/1101480 (owner: 10Filippo Giunchedi)
[11:37:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1071.eqiad.wmnet with reason: host reimage
[11:40:34] <wikibugs>	 (03PS4) 10Muehlenhoff: maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565)
[11:42:10] <wikibugs>	 (03PS2) 10Hnowlan: mediawiki: add debug flag for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101081 (https://phabricator.wikimedia.org/T371701)
[11:42:55] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1210.eqiad.wmnet onto db1159.eqiad.wmnet
[11:45:01] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2103.codfw.wmnet with reason: host reimage
[11:45:23] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2104.codfw.wmnet with reason: host reimage
[11:45:48] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] mediawiki: add debug flag for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101081 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:45:56] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2105.codfw.wmnet with reason: host reimage
[11:46:01] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2106.codfw.wmnet with reason: host reimage
[11:48:06] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: add debug flag for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101081 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:48:21] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[11:48:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101483 (https://phabricator.wikimedia.org/T216826) (owner: 10Elukey)
[11:48:41] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2103.codfw.wmnet with reason: host reimage
[11:48:56] <icinga-wm>	 PROBLEM - BGP status on lsw1-b6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_
[11:48:56] <icinga-wm>	 ng%23BGP_status
[11:49:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1068.eqiad.wmnet with reason: host reimage
[11:50:30] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: add debug flag for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101081 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:51:52] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2104.codfw.wmnet with reason: host reimage
[11:52:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1070.eqiad.wmnet with OS bookworm
[11:55:16] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1068.eqiad.wmnet with reason: host reimage
[11:55:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1071.eqiad.wmnet with OS bookworm
[12:00:15] <wikibugs>	 (03PS1) 10Muehlenhoff: netbox::db: Use new helper function [puppet] - 10https://gerrit.wikimedia.org/r/1101497
[12:02:10] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: fix mercurius argument order [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101498 (https://phabricator.wikimedia.org/T371701)
[12:02:30] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2105.codfw.wmnet with reason: host reimage
[12:04:31] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1069.eqiad.wmnet with OS bookworm
[12:05:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1069.eqiad.wmnet with OS bookworm
[12:05:25] <wikibugs>	 (03PS4) 10FNegri: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477
[12:06:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:06:58] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2106.codfw.wmnet with reason: host reimage
[12:07:06] <moritzm>	 !log installing reportbug bugfix updates
[12:07:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] mediawiki: fix mercurius argument order [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101498 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:08:48] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2103.codfw.wmnet with OS bookworm
[12:08:59] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: fix mercurius argument order [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101498 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:10:51] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: fix mercurius argument order [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101498 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[12:12:00] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2104.codfw.wmnet with OS bookworm
[12:12:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bullseye 11.10 point update - https://phabricator.wikimedia.org/T368288#10390060 (10MoritzMuehlenhoff)
[12:13:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1068.eqiad.wmnet with OS bookworm
[12:15:34] <wikibugs>	 (03PS5) 10David Caro: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:16:36] <wikibugs>	 (03PS6) 10David Caro: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:19:04] <wikibugs>	 (03CR) 10FNegri: [C:03+1] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:19:12] <wikibugs>	 (03CR) 10David Caro: [C:03+2] WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:21:12] <wikibugs>	 (03Merged) 10jenkins-bot: WMCS: fix expected number of active nodes [alerts] - 10https://gerrit.wikimedia.org/r/1101477 (owner: 10FNegri)
[12:22:48] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2105.codfw.wmnet with OS bookworm
[12:26:48] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2106.codfw.wmnet with OS bookworm
[12:27:00] <icinga-wm>	 RECOVERY - BGP status on lsw1-b6-codfw.mgmt is OK: BGP OK - up: 38, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:30:06] <icinga-wm>	 PROBLEM - Disk space on ml-lab1001 is CRITICAL: DISK CRITICAL - free space: /srv 0MiB (0% inode=96%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[12:50:06] <icinga-wm>	 RECOVERY - Disk space on ml-lab1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ml-lab1001&var-datasource=eqiad+prometheus/ops
[12:52:06] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s2 on dbstore1007 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table recentchanges is corrupt: try to repair it on query. Default database: nlwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:57:33] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1101497 (owner: 10Muehlenhoff)
[12:57:44] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1069.eqiad.wmnet with OS bookworm
[12:59:36] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on dbstore1007 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 630.41 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:07:08] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: amd-pytorch25: add torch 2.5.1 + ROCm 6.1 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101524
[13:07:22] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "`" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101524 (owner: 10Ilias Sarantopoulos)
[13:07:28] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] "Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100217 (https://phabricator.wikimedia.org/T33951) (owner: 10Tim Starling)
[13:09:24] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "`" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101524 (owner: 10Ilias Sarantopoulos)
[13:12:40] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: fix jmx_idp config [puppet] - 10https://gerrit.wikimedia.org/r/1101525
[13:13:10] <godog>	 
[13:15:38] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Comm Error: backplane 0 when reimaging wikikube-worker1069 - https://phabricator.wikimedia.org/T381770 (10Jelto) 03NEW
[13:16:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: fix jmx_idp config [puppet] - 10https://gerrit.wikimedia.org/r/1101525 (owner: 10Filippo Giunchedi)
[13:16:19] <jelto>	 !log homer 'cr*eqiad*' commit 'T377876'
[13:16:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:23] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[13:24:08] <wikibugs>	 (03PS5) 10Anzx: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729)
[13:26:11] <wikibugs>	 06SRE, 06Traffic: Occasional saturation of asw2-b-eqiad / cr port uplink and cache upload usage - https://phabricator.wikimedia.org/T381771 (10fgiunchedi) 03NEW
[13:28:07] <Lucas_WMDE>	 jouncebot: now
[13:28:07] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 31 minute(s)
[13:30:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1068,1070-1071].eqiad.wmnet
[13:30:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1068,1070-1071].eqiad.wmnet
[13:32:39] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10390178 (10Jelto)
[13:35:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Comm Error: backplane 0 when reimaging wikikube-worker1069 - https://phabricator.wikimedia.org/T381770#10390181 (10Jelto) The following commands have to be executed when the host is back (just noting it down so I don't forget it):  ` cookbook s...
[13:41:42] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes[1047-1050] to wikikube-worker[1072-1075] [puppet] - 10https://gerrit.wikimedia.org/r/1101526 (https://phabricator.wikimedia.org/T377876)
[13:42:14] <Lucas_WMDE>	 I’ll run a maintenance script in a moment if nobody objects
[13:42:40] <wikibugs>	 (03PS1) 10Stevemunene: Enable airflow-analytics-test access to mx server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926)
[13:45:47] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes[1047-1050] to wikikube-worker[1072-1075] [puppet] - 10https://gerrit.wikimedia.org/r/1101526 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[13:46:10] <wikibugs>	 (03PS1) 10Btullis: Add a truststore password for the hadoopcluster in labs [labs/private] - 10https://gerrit.wikimedia.org/r/1101528 (https://phabricator.wikimedia.org/T381087)
[13:46:18] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2103-2106].codfw.wmnet
[13:46:21] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2103-2106].codfw.wmnet
[13:46:34] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Add a truststore password for the hadoopcluster in labs [labs/private] - 10https://gerrit.wikimedia.org/r/1101528 (https://phabricator.wikimedia.org/T381087) (owner: 10Btullis)
[13:46:43] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.provision: add uefi only devices for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1101457 (https://phabricator.wikimedia.org/T378368) (owner: 10Elukey)
[13:47:03] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10390202 (10JMeybohm)
[13:47:23] <Lucas_WMDE>	 about to start PropertySuggester UpdateTable.php for wikidatawiki on deploy2002
[13:47:29] <Lucas_WMDE>	 (will log when it’s done)
[13:48:31] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 2 others: Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10390203 (10elukey) @Jclark-ctr @bking I updated the provision cookbook to support this case, but the TL;DR is that we may need to use UEFI to avoid weird co...
[13:49:00] <wikibugs>	 (03CR) 10Elukey: [C:03+1] maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[13:49:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1047-1050].eqiad.wmnet
[13:54:18] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1047-1050].eqiad.wmnet
[13:55:52] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes[1047-1050] to wikikube-worker[1072-1075] [puppet] - 10https://gerrit.wikimedia.org/r/1101526 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[13:57:53] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1047 to wikikube-worker1072
[13:58:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1400).
[14:00:05] <jouncebot>	 wangombe_g, joelyrookewmde, abijeet, anzx, and mszabo: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:13] * anzx 👋
[14:00:44] <Lucas_WMDE>	 !log 'Updated the Wikidata property suggester with data from 20241125’s JSON dump: mwscript-k8s --attach -- extensions/PropertySuggester/maintenance/UpdateTable.php --wiki wikidatawiki --file php://stdin < wbs_propertypairs.csv # T377986, T376604'
[14:00:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:49] <stashbot>	 T377986: Q4 2024 update of Property Suggester data - https://phabricator.wikimedia.org/T377986
[14:00:50] <stashbot>	 T376604: [PS] Update PropertySuggester update process for mwscript-k8s - https://phabricator.wikimedia.org/T376604
[14:01:08] <icinga-wm>	 PROBLEM - BGP status on lsw1-e3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Netw
[14:01:08] <icinga-wm>	 toring%23BGP_status
[14:02:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1047 to wikikube-worker1072 - jelto@cumin1002"
[14:02:12] <Lucas_WMDE>	 I can deploy, I think
[14:02:52] <wangombe_g>	 ✋🏽
[14:03:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1047 to wikikube-worker1072 - jelto@cumin1002"
[14:03:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:03:21] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1072
[14:03:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1072
[14:04:12] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1047 to wikikube-worker1072
[14:04:19] <Lucas_WMDE>	 let’s start with wangombe_g then :)
[14:04:34] <wikibugs>	 (03PS1) 10Btullis: Add hadoop keystore_keypassword [labs/private] - 10https://gerrit.wikimedia.org/r/1101530 (https://phabricator.wikimedia.org/T381087)
[14:05:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[14:05:24] <wikibugs>	 (03PS2) 10Btullis: Add hadoop keystore_keypassword [labs/private] - 10https://gerrit.wikimedia.org/r/1101530 (https://phabricator.wikimedia.org/T381087)
[14:05:30] <wangombe_g>	 tetsting
[14:05:40] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Add hadoop keystore_keypassword [labs/private] - 10https://gerrit.wikimedia.org/r/1101530 (https://phabricator.wikimedia.org/T381087) (owner: 10Btullis)
[14:05:46] <Lucas_WMDE>	 way too early for testing
[14:05:47] <wikibugs>	 (03Merged) 10jenkins-bot: Add Metrics Platform stream configuration for translate_extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[14:05:59] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1048 to wikikube-worker1073
[14:06:10] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1097499|Add Metrics Platform stream configuration for translate_extension (T364460)]]
[14:06:14] <stashbot>	 T364460: Implement the instrumentation to track usage of MinT in the Translate extension - https://phabricator.wikimedia.org/T364460
[14:06:20] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:08:17] <wangombe_g>	 changes look good on my end.
[14:08:36] <wangombe_g>	 on testwiki, that is...
[14:08:37] <Lucas_WMDE>	 that’s strange, because they have not yet been fully deployed to the test hosts
[14:08:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubernetes1050:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes1050 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:08:53] <wangombe_g>	 Oh?
[14:09:01] <Lucas_WMDE>	 when did you start testing?
[14:09:12] <Lucas_WMDE>	 they started out rolling to test servers at 14:07:21 UTC according to scap
[14:09:27] <Lucas_WMDE>	 so if it was after that, then I guess it’s possible that you coincidentally hit a server that already had the change
[14:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:09:30] <wangombe_g>	 It's a config change. not feature. So I'm looking for errors, warning...
[14:09:30] <Lucas_WMDE>	 but you’re not supposed to test yet :)
[14:09:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1048 to wikikube-worker1073 - jelto@cumin1002"
[14:10:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1048 to wikikube-worker1073 - jelto@cumin1002"
[14:10:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:10:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1073
[14:10:10] <Lucas_WMDE>	 (that said, I’m not sure why sync-testservers-k8s is taking almost three minutes already o_O currently at 83%)
[14:10:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1073
[14:11:08] <abijeet>	 hello
[14:11:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1048 to wikikube-worker1073
[14:11:19] <wangombe_g>	 Makes sense why I didn't find any 😄
[14:11:38] * Lucas_WMDE waves at abijeet 
[14:11:45] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, wangombe: Backport for [[gerrit:1097499|Add Metrics Platform stream configuration for translate_extension (T364460)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:11:48] <abijeet>	 o/
[14:11:49] <stashbot>	 T364460: Implement the instrumentation to track usage of MinT in the Translate extension - https://phabricator.wikimedia.org/T364460
[14:11:56] <Lucas_WMDE>	 wangombe_g: now you can test ^^
[14:12:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1049 to wikikube-worker1074
[14:12:00] <Lucas_WMDE>	 (with WikimediaDebug)
[14:12:06] <wangombe_g>	 👍
[14:12:08] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:14:40] <wikibugs>	 (03PS1) 10Btullis: Remove hadoop_clusters_secrets for labs from common.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/1101532 (https://phabricator.wikimedia.org/T381087)
[14:15:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1049 to wikikube-worker1074 - jelto@cumin1002"
[14:16:07] <Lucas_WMDE>	 wangombe_g: are you still testing?
[14:16:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1049 to wikikube-worker1074 - jelto@cumin1002"
[14:16:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:16:11] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1074
[14:16:20] <Lucas_WMDE>	 (just want to make sure there’s no misunderstanding between us and we’re both waiting for each other ^^)
[14:16:21] <wangombe_g>	 Done. It's good.
[14:16:24] <Lucas_WMDE>	 ok, thanks!
[14:16:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, wangombe: Continuing with sync
[14:17:09] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1074
[14:17:15] <Lucas_WMDE>	 I don’t see joelyrookewmde yet so I guess abijeet’s config change will be next once the current change is done
[14:17:43] <mszabo>	 o/ sorry, i'm around just forgot to post a notice
[14:17:48] <Lucas_WMDE>	 hi!
[14:17:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1049 to wikikube-worker1074
[14:18:00] <Lucas_WMDE>	 I’m not sure we’ll have enough time for your backports though, it’s a full window :/
[14:18:11] <abijeet>	 Lucas_WMDE, ok
[14:18:14] <mszabo>	 no problem, I can self-service later in that case
[14:18:52] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1050 to wikikube-worker1075
[14:19:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[14:20:10] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Remove hadoop_clusters_secrets for labs from common.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/1101532 (https://phabricator.wikimedia.org/T381087) (owner: 10Btullis)
[14:23:07] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1050 to wikikube-worker1075 - jelto@cumin1002"
[14:23:23] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1097499|Add Metrics Platform stream configuration for translate_extension (T364460)]] (duration: 17m 12s)
[14:23:26] <stashbot>	 T364460: Implement the instrumentation to track usage of MinT in the Translate extension - https://phabricator.wikimedia.org/T364460
[14:23:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101008 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:24:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1050 to wikikube-worker1075 - jelto@cumin1002"
[14:24:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:24:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1075
[14:24:35] <wikibugs>	 (03Merged) 10jenkins-bot: Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101008 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:24:51] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1101008|Translate: Enable message group subscription for 6 wikis (T372386)]]
[14:24:54] <stashbot>	 T372386: Enable message group subscription feature on Wikimedia wikis - https://phabricator.wikimedia.org/T372386
[14:24:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1075
[14:25:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] maps/postgresql: Support bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1101465 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[14:25:36] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1050 to wikikube-worker1075
[14:25:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1072.eqiad.wmnet wikikube-worker1073.eqiad.wmnet wikikube-worker1074.eqiad.wmnet wikikube-worker1075.eqiad.wmnet on all recursors
[14:25:54] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1072.eqiad.wmnet wikikube-worker1073.eqiad.wmnet wikikube-worker1074.eqiad.wmnet wikikube-worker1075.eqiad.wmnet on all recursors
[14:29:39] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 abi, lucaswerkmeister-wmde: Backport for [[gerrit:1101008|Translate: Enable message group subscription for 6 wikis (T372386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:29:47] <Lucas_WMDE>	 abijeet: please test :)
[14:29:51] <abijeet>	 Lucas_WMDE, on it
[14:33:13] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2089-2090].codfw.wmnet
[14:33:36] <abijeet>	 Lucas_WMDE, looks OK
[14:33:40] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 abi, lucaswerkmeister-wmde: Continuing with sync
[14:33:42] <Lucas_WMDE>	 ok!
[14:34:24] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2089-2090].codfw.wmnet
[14:34:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-worker[2089-2090].codfw.wmnet with reason: reimage
[14:34:53] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-worker[2089-2090].codfw.wmnet with reason: reimage
[14:35:04] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1072.eqiad.wmnet with OS bookworm
[14:35:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1073.eqiad.wmnet with OS bookworm
[14:35:26] <wikibugs>	 (03PS1) 10Btullis: Add HTTP keytabs to hadoop masters in labs [labs/private] - 10https://gerrit.wikimedia.org/r/1101534 (https://phabricator.wikimedia.org/T381087)
[14:35:42] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Add HTTP keytabs to hadoop masters in labs [labs/private] - 10https://gerrit.wikimedia.org/r/1101534 (https://phabricator.wikimedia.org/T381087) (owner: 10Btullis)
[14:35:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1074.eqiad.wmnet with OS bookworm
[14:36:00] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1075.eqiad.wmnet with OS bookworm
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=upload&var-origin=swift.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[14:37:28] <godog>	 checking ^
[14:38:20] <herron>	 looks like the spike has already passed
[14:38:22] <herron>	 !incidents
[14:38:23] <sirenbot>	 5530 (UNACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[14:38:23] <sirenbot>	 5526 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[14:38:24] <sirenbot>	 5525 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (asw2-b-eqiad.mgmt.eqiad.wmnet)
[14:38:24] <sirenbot>	 5524 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (asw2-b-eqiad.mgmt.eqiad.wmnet)
[14:38:24] <sirenbot>	 5523 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[14:38:32] <herron>	 !ack 5530
[14:38:32] <sirenbot>	 5530 (ACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[14:38:42] <godog>	 indeed
[14:38:44] * Lucas_WMDE currently has a scap running ftr
[14:39:16] <icinga-wm>	 PROBLEM - BGP status on lsw1-b8-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:39:25] <godog>	 ack thank you Lucas_WMDE 
[14:39:25] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101008|Translate: Enable message group subscription for 6 wikis (T372386)]] (duration: 14m 34s)
[14:39:29] <stashbot>	 T372386: Enable message group subscription feature on Wikimedia wikis - https://phabricator.wikimedia.org/T372386
[14:39:33] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10390302 (10JMeybohm)
[14:39:45] <Lucas_WMDE>	 anzx: still there? (just checking ^^)
[14:39:54] <anzx>	 Lucas_WMDE: yes around 
[14:39:55] <Lucas_WMDE>	 godog: can you let me know when it’s okay to continue deploying? (holding off for now)
[14:39:59] <fabfur>	 godog: on #wikimedia-traffic `FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX`
[14:40:04] <fabfur>	 could be related ? 
[14:40:17] <Lucas_WMDE>	 anzx: okay! currently pausing deployments due to the above incidents
[14:40:28] * Lucas_WMDE reviews the gerrit change in the meantime
[14:40:31] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10390304 (10JMeybohm) 05In progress→03Resolved a:03JMeybohm Well, that was a pretty painful experience - thanks @Clement_Goubert for working out the p...
[14:40:46] <godog>	 Lucas_WMDE: will do
[14:40:53] <godog>	 fabfur: not sure yet tbh
[14:42:20] <wikibugs>	 (03PS1) 10Btullis: Revert "Remove hadoop_clusters_secrets for labs from common.yaml" [labs/private] - 10https://gerrit.wikimedia.org/r/1101535
[14:42:27] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Revert "Remove hadoop_clusters_secrets for labs from common.yaml" [labs/private] - 10https://gerrit.wikimedia.org/r/1101535 (owner: 10Btullis)
[14:42:55] <wikibugs>	 (03CR) 10Brouberol: Enable airflow-analytics-test access to mx server (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101527 (https://phabricator.wikimedia.org/T377926) (owner: 10Stevemunene)
[14:43:16] <icinga-wm>	 RECOVERY - BGP status on lsw1-b8-codfw.mgmt is OK: BGP OK - up: 16, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:44:52] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2090.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:44:59] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:45:05] <godog>	 Lucas_WMDE: still looking if we're ok to proceed with the deployments btw
[14:45:26] <Lucas_WMDE>	 ack
[14:46:08] <godog>	 Lucas_WMDE: I think we're okay, please go ahead
[14:46:17] <Lucas_WMDE>	 ok, thanks!
[14:46:30] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.provision for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:46:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101459 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[14:46:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=upload&var-origin=swift.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[14:47:16] <icinga-wm>	 PROBLEM - BGP status on lsw1-b8-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:47:18] <wikibugs>	 (03Merged) 10jenkins-bot: idwikivoyage: add logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101459 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[14:47:35] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1101459|idwikivoyage: add logo, wordmark (T381080)]]
[14:47:38] <stashbot>	 T381080: Post-creation work for idwikivoyage - https://phabricator.wikimedia.org/T381080
[14:51:10] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358#10390324 (10Jhancock.wm) 05Open→03Resolved good to know. closing ticket and sending back the part.
[14:51:57] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, anzx: Backport for [[gerrit:1101459|idwikivoyage: add logo, wordmark (T381080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:52:06] <Lucas_WMDE>	 anzx: please test :)
[14:52:16] <icinga-wm>	 RECOVERY - BGP status on lsw1-b8-codfw.mgmt is OK: BGP OK - up: 16, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:52:17] <anzx>	 Lucas_WMDE: checking 
[14:52:17] <anzx>	 ok
[14:52:19] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] hadoop: sort local-dirs [puppet] - 10https://gerrit.wikimedia.org/r/1101093 (https://phabricator.wikimedia.org/T381538) (owner: 10JHathaway)
[14:53:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1072.eqiad.wmnet with reason: host reimage
[14:53:22] <anzx>	 Lucas_WMDE: looks good, both skin logos
[14:53:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, anzx: Continuing with sync
[14:53:32] <Lucas_WMDE>	 thanks!
[14:53:41] <wikibugs>	 (03PS1) 10Hnowlan: php8.1: rebuild to pick up new mercurius images. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101536 (https://phabricator.wikimedia.org/T371701)
[14:53:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1074.eqiad.wmnet with reason: host reimage
[14:54:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1075.eqiad.wmnet with reason: host reimage
[14:55:24] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[14:56:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1072.eqiad.wmnet with reason: host reimage
[14:56:58] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] php8.1: rebuild to pick up new mercurius images. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101536 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[14:58:04] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:58:31] <wikibugs>	 (03CR) 10Hnowlan: [V:03+2 C:03+2] php8.1: rebuild to pick up new mercurius images. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101536 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[14:58:51] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2089.codfw.wmnet with OS bookworm
[14:59:02] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2089
[14:59:02] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2089
[14:59:19] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101459|idwikivoyage: add logo, wordmark (T381080)]] (duration: 11m 44s)
[14:59:23] <stashbot>	 T381080: Post-creation work for idwikivoyage - https://phabricator.wikimedia.org/T381080
[14:59:33] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2090.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[15:00:01] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[15:00:08] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2090.codfw.wmnet with OS bookworm
[15:00:18] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2090
[15:00:19] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2090
[15:00:20] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1074.eqiad.wmnet with reason: host reimage
[15:00:59] <wikibugs>	 (03PS6) 10Anzx: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729)
[15:01:46] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:01:46] <wikibugs>	 06SRE, 06Traffic: Occasional saturation of asw2-b-eqiad / cr port uplink and cache upload usage - https://phabricator.wikimedia.org/T381771#10390368 (10Fabfur) Contacted WME SRE that kindly agreed to lower current requests parallelism and check for results
[15:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:53] <Lucas_WMDE>	 I don’t have time to continue deploying, sorry
[15:02:06] <Lucas_WMDE>	 anzx: please reschedule the remaining changes at your convenience
[15:02:17] <icinga-wm>	 PROBLEM - BGP status on lsw1-b8-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:02:25] <anzx>	 Lucas_WMDE: ok, thanks 
[15:02:43] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[15:02:44] <mszabo>	 jouncebot: now
[15:02:45] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 27 minute(s)
[15:02:45] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "LGTM – previous deployments suggest no special steps are needed for changing these settings – but ran out of time to deploy today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[15:04:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1075.eqiad.wmnet with reason: host reimage
[15:04:25] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[15:04:28] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:05:21] <wikibugs>	 (03CR) 10Anzx: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[15:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:08:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy2002 using scap backport" [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101069 (https://phabricator.wikimedia.org/T381529) (owner: 10Máté Szabó)
[15:08:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy2002 using scap backport" [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101070 (https://phabricator.wikimedia.org/T381530) (owner: 10Máté Szabó)
[15:08:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszabo@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100101 (owner: 10Máté Szabó)
[15:08:57] <wikibugs>	 (03Merged) 10jenkins-bot: Prep IRS config for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100101 (owner: 10Máté Szabó)
[15:15:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1072.eqiad.wmnet with OS bookworm
[15:18:45] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
[15:18:46] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1074.eqiad.wmnet with OS bookworm
[15:20:11] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2090.codfw.wmnet with reason: host reimage
[15:20:13] <wikibugs>	 (03Merged) 10jenkins-bot: dialog: Fix wrong title on Types of unacceptable behavior step [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101069 (https://phabricator.wikimedia.org/T381529) (owner: 10Máté Szabó)
[15:20:14] <wikibugs>	 (03Merged) 10jenkins-bot: dialog: Fix spacing between buttons in the dialog footer [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101070 (https://phabricator.wikimedia.org/T381530) (owner: 10Máté Szabó)
[15:20:36] <logmsgbot>	 !log mszabo@deploy2002 Started scap sync-world: Backport for [[gerrit:1101069|dialog: Fix wrong title on Types of unacceptable behavior step (T381529)]], [[gerrit:1101070|dialog: Fix spacing between buttons in the dialog footer (T381530)]], [[gerrit:1100101|Prep IRS config for testwiki]]
[15:20:41] <stashbot>	 T381529: Wrong title on Types of unacceptable behavior step - https://phabricator.wikimedia.org/T381529
[15:20:41] <stashbot>	 T381530: Missing spacing between buttons - https://phabricator.wikimedia.org/T381530
[15:21:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1075.eqiad.wmnet with OS bookworm
[15:22:11] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
[15:24:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381720#10390469 (10phaultfinder)
[15:25:06] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2090.codfw.wmnet with reason: host reimage
[15:25:06] <logmsgbot>	 !log mszabo@deploy2002 mszabo: Backport for [[gerrit:1101069|dialog: Fix wrong title on Types of unacceptable behavior step (T381529)]], [[gerrit:1101070|dialog: Fix spacing between buttons in the dialog footer (T381530)]], [[gerrit:1100101|Prep IRS config for testwiki]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:28:49] <logmsgbot>	 !log mszabo@deploy2002 mszabo: Continuing with sync
[15:28:59] <hnowlan>	 jouncebot: nowandnext
[15:28:59] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 1 minute(s)
[15:28:59] <jouncebot>	 In 1 hour(s) and 1 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1630)
[15:31:09] <mszabo>	 IRS will need a followup config change (these changes are good as they are but do not actually enable the extension on testwiki...)
[15:32:22] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2091 - https://phabricator.wikimedia.org/T381747#10390489 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm T 358489 - probably an false error from this. server is fine right now. logged into idrac and all disks are active.
[15:33:39] <Emperor>	 !log depool/restart swift/repool ms-fe1010
[15:33:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:57] <wikibugs>	 (03PS1) 10Máté Szabó: Actually load IRS in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541
[15:34:15] <logmsgbot>	 !log mszabo@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101069|dialog: Fix wrong title on Types of unacceptable behavior step (T381529)]], [[gerrit:1101070|dialog: Fix spacing between buttons in the dialog footer (T381530)]], [[gerrit:1100101|Prep IRS config for testwiki]] (duration: 13m 39s)
[15:34:20] <stashbot>	 T381529: Wrong title on Types of unacceptable behavior step - https://phabricator.wikimedia.org/T381529
[15:34:21] <stashbot>	 T381530: Missing spacing between buttons - https://phabricator.wikimedia.org/T381530
[15:34:37] <Emperor>	 !log depool/restart swift/repool ms-fe1012
[15:34:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:07] <wikibugs>	 (03PS2) 10Slyngshede: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075)
[15:38:58] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti test cluster to UEFI - https://phabricator.wikimedia.org/T381780 (10MoritzMuehlenhoff) 03NEW
[15:39:09] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti test cluster to UEFI - https://phabricator.wikimedia.org/T381780#10390513 (10MoritzMuehlenhoff) p:05Triage→03Medium
[15:41:47] <hnowlan>	 I'm going to do a scap sync to rebuild images to pick up new php 8.1 base images
[15:43:01] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2089.codfw.wmnet with OS bookworm
[15:44:25] <icinga-wm>	 RECOVERY - BGP status on lsw1-b8-codfw.mgmt is OK: BGP OK - up: 16, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:44:31] <logmsgbot>	 !log hnowlan@deploy2002 Started scap sync-world: Rebuild and deploy to pick up new php8.1 base
[15:45:08] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2090.codfw.wmnet with OS bookworm
[15:55:08] <wikibugs>	 10ops-codfw, 06SRE, 06cloud-services-team, 06DC-Ops: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10390547 (10Jhancock.wm) @Andrew  we wanna swap the power supplies. It looks like all three happened on PSU2. We need to shut it off to s...
[15:55:27] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1073.eqiad.wmnet with OS bookworm
[15:56:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1073.eqiad.wmnet with OS bookworm
[16:00:20] <wikibugs>	 (03PS1) 10Samtar: IS/IS-l: wgUseCodexSpecialBlock for beta, prod test.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101545 (https://phabricator.wikimedia.org/T377121)
[16:04:27] <wikibugs>	 (03PS1) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547
[16:05:10] <logmsgbot>	 !log hnowlan@deploy2002 Finished scap sync-world: Rebuild and deploy to pick up new php8.1 base (duration: 23m 00s)
[16:05:29] <wikibugs>	 (03PS2) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547
[16:05:31] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (owner: 10CDanis)
[16:06:25] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2089-2090].codfw.wmnet
[16:06:27] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2089-2090].codfw.wmnet
[16:07:19] <wikibugs>	 (03CR) 10Kosta Harlan: [C:03+1] Actually load IRS in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541 (owner: 10Máté Szabó)
[16:07:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (owner: 10CDanis)
[16:12:33] <moritzm>	 !log rebalance Ganeti cluster in codfw/B following server refresh T376594
[16:12:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:37] <stashbot>	 T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594
[16:14:29] <wikibugs>	 (03PS3) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547
[16:15:50] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (owner: 10CDanis)
[16:16:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (owner: 10CDanis)
[16:17:33] <wikibugs>	 (03PS4) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771)
[16:17:39] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10390615 (10Dzahn) We just installed package version 3.3.8-2~deb12u2 and this should be fixed now. Please let us know how it looks.
[16:18:00] <wikibugs>	 (03PS5) 10CDanis: Skip cache on WME upload.wm.o HEAD reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771)
[16:18:25] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[16:22:26] <hnowlan>	 jouncebot: nowandnext
[16:22:26] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 7 minute(s)
[16:22:26] <jouncebot>	 In 0 hour(s) and 7 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1630)
[16:24:12] <hnowlan>	 I'm going to do another sync-world to rebuild the 8.1 images to pick something up that was missed last time
[16:26:37] <logmsgbot>	 !log hnowlan@deploy2002 Started scap sync-world: Rebuild and deploy to pick up new php8.1 base
[16:28:00] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "I would say it's ok, I'd prefer someone else have a look anyway" [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[16:29:29] <wikibugs>	 (03PS6) 10CDanis: Skip cache on WME upload.wm.o HEAD reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771)
[16:30:05] <jouncebot>	 jan_drewniak: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1630).
[16:34:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381720#10390671 (10phaultfinder)
[16:39:25] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - cxserver_4002: Servers kubernetes2056.codfw.wmnet, wikikube-worker2063.codfw.wmnet, mw2338.codfw.wmnet, mw2370.codfw.wmnet, wikikube-worker2155.codfw.wmnet, wikikube-worker2076.codfw.wmnet, wikikube-worker2071.codfw.wmnet, wikikube-worker2022.codfw.wmnet, wikikube-worker2157.codfw.wmnet, wikikube-worker2139.codfw.wmnet, wikikube-worker2058.codfw.wmne
[16:39:25] <icinga-wm>	 ube-worker2065.codfw.wmnet, wikikube-worker2055.codfw.wmnet, kubernetes2039.codfw.wmnet, wikikube-worker2062.codfw.wmnet, wikikube-worker2045.codfw.wmnet, kubernetes2022.codfw.wmnet, mw2419.codfw.wmnet, wikikube-worker2014.codfw.wmnet, wikikube-worker2156.codfw.wmnet, wikikube-worker2133.codfw.wmnet, wikikube-worker2127.codfw.wmnet, wikikube-worker2087.codfw.wmnet, wikikube-worker2013.codfw.wmnet, wikikube-worker2106.codfw.wmnet, mw2372.c
[16:39:25] <icinga-wm>	 et, wikikube-worker2104.codfw.wmnet, wikikube-worker2146.codfw.wmnet, wikikube-worker2035.codfw.wmnet, wikikube-worker2024.codfw.wmnet, kubernetes2017.codfw.wmnet, wikikube-worker2112.c https://wikitech.wikimedia.org/wiki/PyBal
[16:40:25] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:40:29] <wikibugs>	 (03CR) 10BBlack: [C:03+1] "SGTM! Nice catch!" [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[16:41:01] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Skip cache on WME upload.wm.o HEAD reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101547 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[16:43:56] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101497 (owner: 10Muehlenhoff)
[16:45:04] <wikibugs>	 (03CR) 10Klausman: [C:03+1] amd-pytorch25: add torch 2.5.1 + ROCm 6.1 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101524 (owner: 10Ilias Sarantopoulos)
[16:46:58] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Occasional saturation of asw2-b-eqiad / cr port uplink and cache upload usage - https://phabricator.wikimedia.org/T381771#10390692 (10Fabfur) Adding a comment to not forget:  - Investigate why (if) Varnish performs GET for each HEAD request, and if this is the rationale...
[16:47:08] <logmsgbot>	 !log hnowlan@deploy2002 Finished scap sync-world: Rebuild and deploy to pick up new php8.1 base (duration: 21m 09s)
[16:48:08] <wikibugs>	 (03PS1) 10Herron: thanos: add bool_gauge recording rules for search/wdqs update lag slos [puppet] - 10https://gerrit.wikimedia.org/r/1101558 (https://phabricator.wikimedia.org/T302995)
[16:57:22] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2106 - https://phabricator.wikimedia.org/T381765#10390723 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm T 358489 - probably an false error from this. server is fine right now. logged into idrac and all disks are active.
[16:57:36] <wikibugs>	 (03PS1) 10Herron: pyrra: onboard wdqs/serach update lag slos [puppet] - 10https://gerrit.wikimedia.org/r/1101560 (https://phabricator.wikimedia.org/T302995)
[16:58:16] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[16:58:57] <wikibugs>	 (03PS1) 10CDanis: Skip cache on all WME upload.wm.o reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101561 (https://phabricator.wikimedia.org/T381771)
[16:59:10] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[17:09:03] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:09:53] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:11:47] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53071 bytes in 2.672 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:11:53] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.193 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:12:04] <wikibugs>	 (03CR) 10BBlack: [C:03+1] Skip cache on all WME upload.wm.o reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101561 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[17:14:52] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
[17:15:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 4 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10390785 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1025.eqiad.wmnet with OS bullseye
[17:15:15] <cdanis>	 !log 💙cdanis@cumin1002.eqiad.wmnet ~ 🕛☕ sudo cumin 'A:cp'  'disable-puppet "cdanis testing in production I464702d8fb T381771"'
[17:15:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:18] <stashbot>	 T381771: Occasional saturation of asw2-b-eqiad / cr port uplink and cache upload usage - https://phabricator.wikimedia.org/T381771
[17:15:47] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Skip cache on all WME upload.wm.o reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101561 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[17:16:21] <logmsgbot>	 !log jelto@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1073.eqiad.wmnet with OS bookworm
[17:18:26] <cdanis>	 !log T381771 💙cdanis@cp1107.eqiad.wmnet ~ 🕧☕ sudo run-puppet-agent --force
[17:18:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:43] <wikibugs>	 10ops-codfw, 06DC-Ops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788 (10Jhancock.wm) 03NEW
[17:18:59] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Comm Error: backplane 0 when reimaging wikikube-worker1073 - https://phabricator.wikimedia.org/T381789 (10Jelto) 03NEW
[17:19:42] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Comm Error: backplane 0 when reimaging wikikube-worker1073 - https://phabricator.wikimedia.org/T381789#10390817 (10Jelto) The following commands have to be executed when the host is back (just noting it down so I don't forget it):  ` c...
[17:19:44] <wikibugs>	 10ops-codfw, 06DC-Ops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10390819 (10Jhancock.wm) @bking I believe this is part of your team. But please correct me if I'm wrong. Is it possible to move this server? the down time would only be for a few minutes. If yes, when wo...
[17:20:11] <jelto>	 !log homer 'lsw1-e3-eqiad*' commit 'T377876'
[17:20:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:15] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[17:22:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1072,1074-1075].eqiad.wmnet
[17:22:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1072,1074-1075].eqiad.wmnet
[17:23:14] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504#10390832 (10Jelto)
[17:24:57] <wikibugs>	 (03PS7) 10Hnowlan: mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701)
[17:25:47] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] Skip cache on all WME upload.wm.o reqs [puppet] - 10https://gerrit.wikimedia.org/r/1101561 (https://phabricator.wikimedia.org/T381771) (owner: 10CDanis)
[17:25:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:26:45] <jinxer-wm>	 FIRING: KubernetesDeploymentUnavailableReplicas: ...
[17:26:46] <jinxer-wm>	 Deployment cxserver-production in cxserver at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=cxserver&var-deployment=cxserver-production - ...
[17:26:46] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[17:30:42] <wikibugs>	 (03CR) 10Klausman: [V:03+2 C:03+2] amd-pytorch25: add torch 2.5.1 + ROCm 6.1 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1101524 (owner: 10Ilias Sarantopoulos)
[17:35:31] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[17:36:05] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[17:38:09] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] phabricator: switch firewall provider to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1055493 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[17:41:45] <jinxer-wm>	 RESOLVED: KubernetesDeploymentUnavailableReplicas: ...
[17:41:46] <jinxer-wm>	 Deployment cxserver-production in cxserver at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=cxserver&var-deployment=cxserver-production - ...
[17:41:46] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[17:43:57] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1025.eqiad.wmnet with reason: host reimage
[17:44:13] <cdanis>	 !log 💙cdanis@cumin1002.eqiad.wmnet ~ 🕧☕ sudo cumin 'A:cp'  'enable-puppet "cdanis testing in production I464702d8fb T381771"'
[17:44:15] <wikibugs>	 (03PS1) 10Hnowlan: jobqueue: disable webvideotranscodeprioritized [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101565 (https://phabricator.wikimedia.org/T371701)
[17:44:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:17] <stashbot>	 T381771: Occasional saturation of asw2-b-eqiad / cr port uplink and cache upload usage - https://phabricator.wikimedia.org/T381771
[17:44:24] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "Aware of the need to reboot, planning that for tomorrow during the window where we sometimes do phab deployments." [puppet] - 10https://gerrit.wikimedia.org/r/1055493 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[17:44:57] <wikibugs>	 (03CR) 10Scott French: [C:03+1] jobqueue: disable webvideotranscodeprioritized [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101565 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:47:02] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] jobqueue: disable webvideotranscodeprioritized [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101565 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:47:49] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1025.eqiad.wmnet with reason: host reimage
[17:48:18] <wikibugs>	 (03Merged) 10jenkins-bot: jobqueue: disable webvideotranscodeprioritized [deployment-charts] - 10https://gerrit.wikimedia.org/r/1101565 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[17:51:06] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
[17:52:13] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
[17:58:28] <wikibugs>	 (03PS1) 10Herron: thanos: add bool_gauge recording rules for search/wdqs update lag slos [puppet] - 10https://gerrit.wikimedia.org/r/1101558 (https://phabricator.wikimedia.org/T302995)
[17:58:28] <wikibugs>	 (03CR) 10Herron: [C:03+2] "self merging for slo onboarding" [puppet] - 10https://gerrit.wikimedia.org/r/1101558 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[18:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1800)
[18:00:04] <jouncebot>	 ryankemper: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T1800).
[18:05:05] <wikibugs>	 (03PS1) 10Herron: pyrra: onboard wdqs/serach update lag slos [puppet] - 10https://gerrit.wikimedia.org/r/1101560 (https://phabricator.wikimedia.org/T302995)
[18:05:05] <wikibugs>	 (03CR) 10Herron: [C:03+2] "self merge for onboarding" [puppet] - 10https://gerrit.wikimedia.org/r/1101560 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[18:06:07] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1025.eqiad.wmnet with OS bullseye
[18:06:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 4 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10391035 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1025.eqiad.wmnet with OS bullseye completed: - wdqs...
[18:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:12:20] <wikibugs>	 (03PS1) 10Herron: add onboarded notes to wdqs/search update lag slos [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1101567
[18:12:47] <wikibugs>	 (03CR) 10Herron: [V:03+2 C:03+2] add onboarded notes to wdqs/search update lag slos [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1101567 (owner: 10Herron)
[18:14:25] <wikibugs>	 06SRE, 06Traffic: Survey the third-party library market for UA policy compliance - https://phabricator.wikimedia.org/T313634#10391067 (10Scott_French) Tagging this as #Traffic for consideration as it's likely a better fit than #SRE as a whole (though I realize @CDanis may have interest in picking this back up).
[18:16:39] <icinga-wm>	 PROBLEM - Disk space on build2001 is CRITICAL: DISK CRITICAL - free space: / 10413 MB (4% inode=79%): /tmp 10413 MB (4% inode=79%): /var/tmp 10413 MB (4% inode=79%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=build2001&var-datasource=codfw+prometheus/ops
[18:17:38] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[18:17:44] <logmsgbot>	 !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
[18:22:27] <wikibugs>	 (03CR) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventLoggingLegacyConverter::submitEvent (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[18:29:20] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 4 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10391170 (10bking) It took a few tries, but `wdqs1025` is now running off UEFI. I left some notes [[ https://wikitech.wikimedia.org/wiki/Talk:UEFI_Boot#Results_...
[18:45:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100158 (https://phabricator.wikimedia.org/T377128) (owner: 10Ebernhardson)
[18:47:39] <wikibugs>	 (03PS1) 10Jdlrobson: Expand support for dark mode for anonymous users (itwiki, enwikivoyage) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101573 (https://phabricator.wikimedia.org/T379352)
[18:47:39] <wikibugs>	 (03CR) 10Jly: [C:03+1] Fix protocol for .well-known/change-password Apache rule [puppet] - 10https://gerrit.wikimedia.org/r/1101462 (https://phabricator.wikimedia.org/T381625) (owner: 10Gergő Tisza)
[18:47:41] <wikibugs>	 (03PS1) 10Jdlrobson: Disable QuickSurveys for recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101574 (https://phabricator.wikimedia.org/T379241)
[18:47:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101573 (https://phabricator.wikimedia.org/T379352) (owner: 10Jdlrobson)
[18:48:05] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101574 (https://phabricator.wikimedia.org/T379241) (owner: 10Jdlrobson)
[18:55:09] <wikibugs>	 (03PS1) 10Eevans: aqs1010: canary Cassandra 4.1.7 [puppet] - 10https://gerrit.wikimedia.org/r/1101576 (https://phabricator.wikimedia.org/T380420)
[18:57:25] <wikibugs>	 (03PS1) 10Arlolra: Add Atieno's public key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101577
[18:57:52] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101576 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[19:01:53] <wikibugs>	 (03PS2) 10Eevans: aqs1010: canary Cassandra 4.1.7 [puppet] - 10https://gerrit.wikimedia.org/r/1101576 (https://phabricator.wikimedia.org/T380420)
[19:03:20] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1101576 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[19:04:28] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:06:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381720#10391297 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF Rebalanced Power
[19:22:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10391331 (10bking) @elukey I'm fine with focusing our efforts on UEFI, it seems like the best use of our time.  Ping me in...
[19:35:13] <wikibugs>	 (03PS1) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584
[19:36:01] <wikibugs>	 (03PS2) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584
[19:37:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (owner: 10FNegri)
[19:41:37] <wikibugs>	 (03CR) 10Wangombe: Add Metrics Platform stream configuration for translate_extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[19:45:07] <wikibugs>	 (03CR) 10AOkoth: "- The miscweb chart was chosen primarily just to reduce the "blast radius" of this change. We did not want to a change that might affect o" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1098486 (https://phabricator.wikimedia.org/T350794) (owner: 10AOkoth)
[19:54:13] <wikibugs>	 (03CR) 10Eevans: [C:03+2] aqs1010: canary Cassandra 4.1.7 [puppet] - 10https://gerrit.wikimedia.org/r/1101576 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[19:58:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 09 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541 (owner: 10Máté Szabó)
[19:58:49] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
[19:58:53] <stashbot>	 T380420: Upgrade Cassandra clusters to v4.1.7 - https://phabricator.wikimedia.org/T380420
[20:00:48] <wikibugs>	 (03PS2) 10CDanis: Actually load IRS in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[20:07:27] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
[20:07:30] <stashbot>	 T380420: Upgrade Cassandra clusters to v4.1.7 - https://phabricator.wikimedia.org/T380420
[20:17:01] <wikibugs>	 (03PS1) 10Bking: wdqs1025: enable as wdqs-internal-main host [puppet] - 10https://gerrit.wikimedia.org/r/1101588 (https://phabricator.wikimedia.org/T376150)
[20:18:29] <wikibugs>	 (03PS3) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807)
[20:19:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807) (owner: 10FNegri)
[20:21:12] <wikibugs>	 (03PS4) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807)
[20:22:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807) (owner: 10FNegri)
[20:23:36] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@1d9b4b5]: Canary events generation: pooling
[20:24:25] <wikibugs>	 (03PS5) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807)
[20:25:23] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@1d9b4b5]: Canary events generation: pooling (duration: 01m 46s)
[20:33:03] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmde, ldap/nda for SuzanneWood-WMDE - https://phabricator.wikimedia.org/T380487#10391531 (10KFrancis) Hi all, I am confirming the NDA is complete.  Please proceed with next steps.  Thanks!
[20:33:20] <wikibugs>	 (03PS6) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807)
[20:35:03] <wikibugs>	 (03PS7) 10FNegri: WMCS: fix expr in TooManyCloud*Down [alerts] - 10https://gerrit.wikimedia.org/r/1101584 (https://phabricator.wikimedia.org/T381807)
[20:46:09] <wikibugs>	 (03PS8) 10Bking: dse-k8s-services: introduce Blunderbuss config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994)
[20:51:57] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmde, ldap/nda for SuzanneWood-WMDE - https://phabricator.wikimedia.org/T380487#10391566 (10Scott_French) 05Stalled→03In progress a:03Scott_French Great, thank you! I'll take this from here.
[20:58:56] <wikibugs>	 (03PS9) 10Bking: dse-k8s-services: introduce Blunderbuss config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T2100). Please do the needful.
[21:00:05] <jouncebot>	 anzx, ebernhardson, Jdlrobson, and kostajh: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:09] <ebernhardson>	 \o
[21:01:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T381742#10391603 (10Eevans) There is History™ here, see: {T362841}.  The original drive that failed then was `/dev/sdg` (disk:2 of the second controller).  Another disk was pulled in the process `sdf` (disk:1, second c...
[21:02:09] <Jdlrobson>	 o/
[21:03:54] <kostajh>	 hi
[21:05:40] <kostajh>	 I'd prefer not to be the deployer, as it's late here
[21:06:05] <cjming>	 hi - sorry to be late - i'll deploy
[21:06:22] <kostajh>	 thanks cjming!
[21:06:35] <cjming>	 np!
[21:06:51] <cjming>	 anzx: are you around?
[21:07:08] <kostajh>	 cjming: any chance we could start with mine, if it's the same to the others?
[21:07:15] <cjming>	 sure!
[21:07:19] <Jdlrobson>	 thanks cjming for running it today!
[21:07:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[21:08:39] <wikibugs>	 (03Merged) 10jenkins-bot: Actually load IRS in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101541 (https://phabricator.wikimedia.org/T374105) (owner: 10Máté Szabó)
[21:08:57] <kostajh>	 thx
[21:08:58] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101541|Actually load IRS in production (T374105)]]
[21:09:02] <stashbot>	 T374105: Incident Reporting System - MVP - https://phabricator.wikimedia.org/T374105
[21:12:51] <cjming>	 kostajh: up on test servers if it's testable
[21:13:20] <kostajh>	 cjming: thanks, looking
[21:13:31] <logmsgbot>	 !log cjming@deploy2002 cjming, mszabo: Backport for [[gerrit:1101541|Actually load IRS in production (T374105)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:14:45] <kostajh>	 cjming: lgtm
[21:14:50] <cjming>	 cool - syncing
[21:14:52] <logmsgbot>	 !log cjming@deploy2002 cjming, mszabo: Continuing with sync
[21:17:04] <cjming>	 ebernhardson: i'll do yours next - can you rebase?
[21:17:25] <ebernhardson>	 cjming: sure, sec
[21:17:44] <wikibugs>	 (03PS4) 10Ebernhardson: cirrus: Enable mlr-2024 for select wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100158 (https://phabricator.wikimedia.org/T377128)
[21:21:27] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101541|Actually load IRS in production (T374105)]] (duration: 12m 29s)
[21:21:31] <stashbot>	 T374105: Incident Reporting System - MVP - https://phabricator.wikimedia.org/T374105
[21:21:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100158 (https://phabricator.wikimedia.org/T377128) (owner: 10Ebernhardson)
[21:22:05] <cjming>	 kostajh: should be live :)
[21:22:10] <kostajh>	 cjming: thanks!
[21:22:17] <cjming>	 yw!
[21:22:42] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Enable mlr-2024 for select wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100158 (https://phabricator.wikimedia.org/T377128) (owner: 10Ebernhardson)
[21:22:56] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1100158|cirrus: Enable mlr-2024 for select wikis (T377128)]]
[21:23:00] <stashbot>	 T377128: Import recent MLR models built by MjoLniR in production and test them - https://phabricator.wikimedia.org/T377128
[21:25:58] <wikibugs>	 (03PS1) 10Scott French: admin: Add suzannewood to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/1101591 (https://phabricator.wikimedia.org/T380487)
[21:25:58] <wikibugs>	 (03CR) 10Scott French: "Thanks in advance, Reuven!" [puppet] - 10https://gerrit.wikimedia.org/r/1101591 (https://phabricator.wikimedia.org/T380487) (owner: 10Scott French)
[21:26:34] <cjming>	 ebernhardson: on mwdebug if verifiable
[21:26:51] <ebernhardson>	 cjming: sorta, looking
[21:27:13] <wikibugs>	 (03PS2) 10Jdlrobson: Expand support for dark mode for anonymous users (itwiki, enwikivoyage) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101573 (https://phabricator.wikimedia.org/T379352)
[21:27:15] <logmsgbot>	 !log cjming@deploy2002 cjming, ebernhardson: Backport for [[gerrit:1100158|cirrus: Enable mlr-2024 for select wikis (T377128)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:27:28] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:27:32] <ebernhardson>	 cjming: seems reasonable
[21:27:40] <cjming>	 great
[21:27:44] <logmsgbot>	 !log cjming@deploy2002 cjming, ebernhardson: Continuing with sync
[21:28:57] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[21:29:54] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:31:33] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] admin: Add suzannewood to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/1101591 (https://phabricator.wikimedia.org/T380487) (owner: 10Scott French)
[21:32:57] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:33:25] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100158|cirrus: Enable mlr-2024 for select wikis (T377128)]] (duration: 10m 28s)
[21:33:28] <anzx>	 cjming: o/
[21:33:29] <stashbot>	 T377128: Import recent MLR models built by MjoLniR in production and test them - https://phabricator.wikimedia.org/T377128
[21:33:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101573 (https://phabricator.wikimedia.org/T379352) (owner: 10Jdlrobson)
[21:34:04] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[21:34:09] <wikibugs>	 (03CR) 10Scott French: [C:03+2] admin: Add suzannewood to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/1101591 (https://phabricator.wikimedia.org/T380487) (owner: 10Scott French)
[21:34:22] <cjming>	 hi anzx: just finishing up Jdlrobson's patches and we can do yours - maybe in 10-15 minutes?
[21:34:33] <anzx>	 sure 
[21:34:37] <wikibugs>	 (03Merged) 10jenkins-bot: Expand support for dark mode for anonymous users (itwiki, enwikivoyage) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101573 (https://phabricator.wikimedia.org/T379352) (owner: 10Jdlrobson)
[21:34:45] <cjming>	 ebernhardson: your patch should be live :)
[21:34:51] <ebernhardson>	 cjming: awesome, thanks!
[21:34:54] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101573|Expand support for dark mode for anonymous users (itwiki, enwikivoyage) (T379352)]]
[21:34:57] <stashbot>	 T379352: [Spike] Evaluate and provide feedback on itwiki automatic night mode color-darkening - https://phabricator.wikimedia.org/T379352
[21:38:09] <cjming>	 Jdlrobson: 1st patch up on test servers if you want to check
[21:38:44] <logmsgbot>	 !log cjming@deploy2002 jdlrobson, cjming: Backport for [[gerrit:1101573|Expand support for dark mode for anonymous users (itwiki, enwikivoyage) (T379352)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:39:04] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:39:13] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[21:39:54] <Jdlrobson>	 cjming: on it
[21:40:18] <Jdlrobson>	 cjming: that one looks good to sync!
[21:40:24] <logmsgbot>	 !log cjming@deploy2002 jdlrobson, cjming: Continuing with sync
[21:40:51] <wikibugs>	 (03PS2) 10Jdlrobson: Disable QuickSurveys for recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101574 (https://phabricator.wikimedia.org/T379241)
[21:41:28] <cjming>	 Jdlrobson: can you rebase your 2nd patch?
[21:41:34] <Jdlrobson>	 cjming: done
[21:41:39] <cjming>	 ty
[21:41:55] <Jdlrobson>	 cjming: I also have a beta cluster patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1101094?usp=search that I need to get merged (I don't know if that's a case of you just hitting +2... ? If it's more than that I can get someone to merge that after the deploy  window is done.
[21:42:09] <Jdlrobson>	 (typo fix)
[21:42:29] <cjming>	 np - i can do that real quick
[21:44:03] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:46:02] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101573|Expand support for dark mode for anonymous users (itwiki, enwikivoyage) (T379352)]] (duration: 11m 08s)
[21:46:09] <stashbot>	 T379352: [Spike] Evaluate and provide feedback on itwiki automatic night mode color-darkening - https://phabricator.wikimedia.org/T379352
[21:46:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101574 (https://phabricator.wikimedia.org/T379241) (owner: 10Jdlrobson)
[21:46:29] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[21:46:53] <wikibugs>	 (03Merged) 10jenkins-bot: Disable QuickSurveys for recommendations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101574 (https://phabricator.wikimedia.org/T379241) (owner: 10Jdlrobson)
[21:46:54] <Jdlrobson>	 thanks cjming i really appreciate it!
[21:47:06] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101574|Disable QuickSurveys for recommendations (T379241 T380778)]]
[21:47:16] <stashbot>	 T379241: Set up quicksurveys for non-UI experiment pt 2 - https://phabricator.wikimedia.org/T379241
[21:47:17] <stashbot>	 T380778: Simple summary experiment - Rerun QuickSurvey for browser extension - https://phabricator.wikimedia.org/T380778
[21:47:32] <_Gerges>	 Hi
[21:49:17] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Grant Access to ldap/wmde, ldap/nda for SuzanneWood-WMDE - https://phabricator.wikimedia.org/T380487#10391918 (10Scott_French) 05In progress→03Resolved This should all be done now. I'll follow up on T380994 for the next part.
[21:49:42] <cjming>	 Jdlrobson: happy to help - it's all because of your encouragement that i'm even part of the regular deployment roster 😀
[21:50:29] <cjming>	 2nd patch on test servers
[21:51:08] <_Gerges>	 If Deployer had time to Deploy my patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/1101592
[21:51:09] <wikibugs>	 (03PS2) 10Jdlrobson: Fixes A/B test for beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101094 (https://phabricator.wikimedia.org/T378115)
[21:51:11] <logmsgbot>	 !log cjming@deploy2002 cjming, jdlrobson: Backport for [[gerrit:1101574|Disable QuickSurveys for recommendations (T379241 T380778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:51:35] <Jdlrobson>	 cjming: that one also looks good to sync! thanks!
[21:51:40] <cjming>	 cool beans
[21:51:41] <logmsgbot>	 !log cjming@deploy2002 cjming, jdlrobson: Continuing with sync
[21:51:48] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[21:52:16] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[21:57:22] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101574|Disable QuickSurveys for recommendations (T379241 T380778)]] (duration: 10m 15s)
[21:57:27] <stashbot>	 T379241: Set up quicksurveys for non-UI experiment pt 2 - https://phabricator.wikimedia.org/T379241
[21:57:28] <stashbot>	 T380778: Simple summary experiment - Rerun QuickSurvey for browser extension - https://phabricator.wikimedia.org/T380778
[21:57:42] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101094 (https://phabricator.wikimedia.org/T378115) (owner: 10Jdlrobson)
[21:58:01] <cjming>	 hi Gerges - if you can get a +2 on your patch, it should ride the train this week -- if you need it on 1.44.0-wmf.6, please create the backport patches and add them to one of the deployment windows -- i still have to do a few more patches for anzx
[21:58:22] <wikibugs>	 (03Merged) 10jenkins-bot: Fixes A/B test for beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101094 (https://phabricator.wikimedia.org/T378115) (owner: 10Jdlrobson)
[21:58:25] <cjming>	 anzx: still around?
[21:58:40] <anzx>	 cjming: yes, i am around 
[21:58:55] <wikibugs>	 (03PS7) 10Anzx: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729)
[21:59:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[22:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: It is that lovely time of the day again! You are hereby commanded to deploy Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241209T2200).
[22:00:07] <wikibugs>	 (03Merged) 10jenkins-bot: jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101231 (https://phabricator.wikimedia.org/T381729) (owner: 10Anzx)
[22:00:10] <anzx>	 cjming: no need for checking on this, you can sync 
[22:00:16] <cjming>	 cool - thx
[22:00:29] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101231|jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton (T381729)]]
[22:00:33] <stashbot>	 T381729: Lift IP cap on 2024-12-17 and 2025-01-14 for Editation for jawiki - https://phabricator.wikimedia.org/T381729
[22:01:15] <cjming>	 Jdlrobson: all your patches should be live, including the beta cluster one
[22:03:28] <Jdlrobson>	 thanks a bunch cjming really appreciate all your help here!
[22:03:32] <cjming>	 yw!
[22:05:10] <logmsgbot>	 !log cjming@deploy2002 cjming, anzx: Backport for [[gerrit:1101231|jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton (T381729)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:05:11] <logmsgbot>	 !log cjming@deploy2002 cjming, anzx: Continuing with sync
[22:05:56] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10391965 (10Scott_French) a:03Scott_French
[22:07:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10391970 (10Scott_French)
[22:07:36] <cjming>	 Gerges: i don't thing there will be time for your backports -- i have one more config patch and we're already running over -- if you need backports for ULS on 1.44.0-wmf.6, you'll need to create those patches after you get the master patch merged.
[22:08:01] <wikibugs>	 (03PS3) 10Anzx: idwikivoyage: add timezone, sitename and project namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080)
[22:09:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: mediawiki_job_translationnotifications-mediawikiwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:10:32] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101231|jawiki: lift IP cap on 2024-12-17 and 2025-01-14 for Edit-a-ton (T381729)]] (duration: 10m 02s)
[22:10:37] <stashbot>	 T381729: Lift IP cap on 2024-12-17 and 2025-01-14 for Editation for jawiki - https://phabricator.wikimedia.org/T381729
[22:11:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[22:11:54] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10391996 (10Scott_French) @odimitrijevic @Milimetric @Ahoelzl @Ottomata - Could one of you please approve access to `ana...
[22:12:03] <wikibugs>	 (03Merged) 10jenkins-bot: idwikivoyage: add timezone, sitename and project namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101185 (https://phabricator.wikimedia.org/T381080) (owner: 10Anzx)
[22:12:19] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1101185|idwikivoyage: add timezone, sitename and project namespace (T381080)]]
[22:12:24] <stashbot>	 T381080: Post-creation work for idwikivoyage - https://phabricator.wikimedia.org/T381080
[22:12:36] <_Gerges>	 cjming: Do should to upload woff font files?
[22:13:17] <cjming>	 Gerges: i'm not sure - i've never dealt with those before
[22:13:19] <wikibugs>	 (03PS1) 10Daimona Eaytoy: beta: Enable $wgCampaignEventsEnableEventWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101596 (https://phabricator.wikimedia.org/T380077)
[22:14:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10392008 (10Ottomata) Approved!
[22:14:50] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1101596 (https://phabricator.wikimedia.org/T380077) (owner: 10Daimona Eaytoy)
[22:15:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10392011 (10Ottomata) > Also, if WMDE staff are similarly covered by the recent streamlining in T370424, it would be gre...
[22:16:15] <cjming>	 anzx: 2nd patch up on test servers
[22:16:21] <anzx>	 cjming: checking 
[22:16:24] <logmsgbot>	 !log cjming@deploy2002 cjming, anzx: Backport for [[gerrit:1101185|idwikivoyage: add timezone, sitename and project namespace (T381080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:17:21] <anzx>	 cjming: looks good
[22:17:25] <logmsgbot>	 !log cjming@deploy2002 cjming, anzx: Continuing with sync
[22:17:39] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10392013 (10Scott_French)
[22:21:08] <anzx>	 cjming: need to run namespacedupes for idwikivoyafe after deploy
[22:21:28] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10392028 (10Scott_French) Great, thank you very much @Ottomata.
[22:22:50] <cjming>	 ah - thanks for the reminder - will do
[22:23:06] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1101185|idwikivoyage: add timezone, sitename and project namespace (T381080)]] (duration: 10m 46s)
[22:23:10] <stashbot>	 T381080: Post-creation work for idwikivoyage - https://phabricator.wikimedia.org/T381080
[22:25:28] <wikibugs>	 (03PS1) 10Scott French: admin: add suzannewood to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1101595 (https://phabricator.wikimedia.org/T380994)
[22:25:28] <wikibugs>	 (03CR) 10Scott French: "Thanks in advance for the review, Reuven." [puppet] - 10https://gerrit.wikimedia.org/r/1101595 (https://phabricator.wikimedia.org/T380994) (owner: 10Scott French)
[22:26:39] <cjming>	 anzx: your patches should be live - i ran namespacedupes for idwikivoyage
[22:27:37] <anzx>	 cjming: thanks for deployment 
[22:27:45] <cjming>	 yw :)
[22:28:23] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] admin: add suzannewood to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1101595 (https://phabricator.wikimedia.org/T380994) (owner: 10Scott French)
[22:28:50] <cjming>	 !log end of UTC late backport window
[22:28:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:29:35] <ryankemper>	 !log [wdqs-internal graph split] Cleared away old categories units on 5 hosts (`wdqs20[18-20],wdqs202[6-7]`)
[22:29:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:29] <wikibugs>	 (03CR) 10Scott French: [C:03+2] admin: add suzannewood to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1101595 (https://phabricator.wikimedia.org/T380994) (owner: 10Scott French)
[22:34:28] <jinxer-wm>	 RESOLVED: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:40:52] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Suzanne Wood (WMDE) - https://phabricator.wikimedia.org/T380994#10392073 (10Scott_French) 05Stalled→03Resolved Alright, this should now be complete, though the underlying chang...
[22:46:14] <_Gerges>	 cjming: my patch 1101592, Should I change to wmf/* branch?
[22:53:58] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Platform-SRE: Data Platform access streamlining for WMDE staff - https://phabricator.wikimedia.org/T381824 (10Scott_French) 03NEW
[22:54:48] <wikibugs>	 06SRE, 06Data-Platform-SRE, 10Data-Engineering (Q2 2024 October 1st - December 31th): Streamline Data Platform access approvals for WMF staff - https://phabricator.wikimedia.org/T370424#10392108 (10Scott_French) See T381824 for potentially extending the same streamlining to WMDE staff.
[23:15:05] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[23:15:55] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.196 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[23:50:33] <wikibugs>	 (03PS1) 10Bvibber: LanguageConverter: Ignore content inside <math> and <svg> elements [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101600 (https://phabricator.wikimedia.org/T381617)
[23:52:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1101600 (https://phabricator.wikimedia.org/T381617) (owner: 10Bvibber)