[00:09:01] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[00:09:48] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[00:11:44] <wikibugs>	 10ops-eqiad, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T381454 (10phaultfinder) 03NEW
[00:13:01] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
[00:13:01] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1084.eqiad.wmnet with OS bullseye
[00:13:13] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378082 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ms-be1084.eqiad.wmnet with OS bullseye complete...
[00:16:46] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[00:18:15] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1085.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:18:45] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1085.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:22:54] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:26:57] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[00:28:54] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:30:07] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
[00:30:10] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150
[00:31:12] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1085.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:36:22] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1042.eqiad.wmnet with reason: host reimage
[00:37:12] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[00:38:04] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1100216
[00:38:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1100216 (owner: 10TrainBranchBot)
[00:40:11] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1042.eqiad.wmnet with reason: host reimage
[00:41:41] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1085.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:42:01] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:42:12] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:42:50] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS13030/IPv4: Idle - Init7, AS13030/IPv6: Idle - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:43:09] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:43:20] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:45:05] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378111 (10VRiley-WMF)
[00:47:22] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[00:47:53] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be1085.eqiad.wmnet with OS bullseye
[00:48:04] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378117 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host ms-be1085.eqiad.wmnet with OS bullseye
[00:48:19] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:48:29] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[00:50:22] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[00:51:40] <icinga-wm>	 PROBLEM - BGP status on cr1-drmrs is CRITICAL: BGP CRITICAL - AS13030/IPv6: Connect - Init7, AS13030/IPv4: Connect - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:52:02] <icinga-wm>	 PROBLEM - BGP status on cr2-drmrs is CRITICAL: BGP CRITICAL - AS13030/IPv6: Connect - Init7, AS13030/IPv4: Connect - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:52:51] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[00:53:19] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[00:54:10] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1100216 (owner: 10TrainBranchBot)
[00:54:16] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378121 (10VRiley-WMF)
[00:55:16] <wikibugs>	 (03PS1) 10Tim Starling: Prepare for migration of the Interwiki extension to core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100217 (https://phabricator.wikimedia.org/T33951)
[00:56:47] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[00:57:33] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[00:57:33] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1042.eqiad.wmnet with OS bookworm
[00:57:44] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378124 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1042.eqiad.wmnet with OS bookworm co...
[01:00:07] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[01:00:18] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[01:01:56] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 185.15.59.129, interfaces up: 67, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:02:50] <icinga-wm>	 RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 67, down: 2, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[01:02:52] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1041.eqiad.wmnet with OS bookworm
[01:02:56] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 185.15.59.129, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:03:00] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378128 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1041.eqiad.wmnet with OS bookworm ex...
[01:03:08] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1041.eqiad.wmnet with OS bookworm
[01:03:16] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378129 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1041.eqiad.wmnet with OS bookworm
[01:07:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, 06serviceops: Decommission  mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10378133 (10VRiley-WMF)
[01:08:18] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1100219
[01:08:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1100219 (owner: 10TrainBranchBot)
[01:15:22] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[01:15:52] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[01:19:21] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1041.eqiad.wmnet with reason: host reimage
[01:20:43] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1045.eqiad.wmnet with OS bookworm
[01:20:55] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1046.eqiad.wmnet with OS bookworm
[01:20:57] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378136 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm
[01:21:04] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378137 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1046.eqiad.wmnet with OS bookworm
[01:22:42] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1041.eqiad.wmnet with reason: host reimage
[01:22:56] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[01:23:06] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[01:24:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T381454#10378140 (10VRiley-WMF) a:03VRiley-WMF
[01:24:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, 06serviceops: Decommission  mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10378138 (10VRiley-WMF) 05Open→03Resolved
[01:25:44] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1100219 (owner: 10TrainBranchBot)
[01:28:02] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:28:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1015 / ganeti1021 - https://phabricator.wikimedia.org/T381157#10378145 (10VRiley-WMF) a:03VRiley-WMF
[01:36:47] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1046.eqiad.wmnet with reason: host reimage
[01:36:58] <icinga-wm>	 PROBLEM - BFD status on cr2-esams is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:37:02] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:39:29] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1046.eqiad.wmnet with reason: host reimage
[01:39:58] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[01:42:23] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1044.eqiad.wmnet with OS bookworm
[01:42:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378147 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1044.eqiad.wmnet with OS bookworm ex...
[01:46:08] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1043.eqiad.wmnet with OS bookworm
[01:46:17] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378150 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm ex...
[01:52:54] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T381454#10378151 (10VRiley-WMF) 05Open→03Resolved Reseated Power supply
[01:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:56:33] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[02:01:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1012 / ganeti1022 - https://phabricator.wikimedia.org/T381385#10378156 (10VRiley-WMF)
[02:06:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[02:07:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:08:09] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1085.eqiad.wmnet with OS bullseye
[02:08:14] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378157 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ms-be1085.eqiad.wmnet with OS bullseye executed...
[02:09:25] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1012 / ganeti1022 - https://phabricator.wikimedia.org/T381385#10378158 (10VRiley-WMF)
[02:09:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1012 / ganeti1022 - https://phabricator.wikimedia.org/T381385#10378159 (10VRiley-WMF) 05Open→03Resolved
[02:12:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:14:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381230#10378160 (10phaultfinder)
[02:18:43] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381268#10378161 (10VRiley-WMF)
[02:32:00] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381268#10378169 (10VRiley-WMF)
[02:32:16] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, and 3 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381268#10378170 (10VRiley-WMF) 05Open→03Resolved
[02:32:41] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[02:32:42] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1041.eqiad.wmnet with OS bookworm
[02:32:55] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378172 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1041.eqiad.wmnet with OS bookworm co...
[02:33:27] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[02:33:27] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1046.eqiad.wmnet with OS bookworm
[02:33:41] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378173 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1046.eqiad.wmnet with OS bookworm co...
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:40:50] <wikibugs>	 (03PS4) 10Srishakatux: Add new namespaces to hsb wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1090502 (https://phabricator.wikimedia.org/T373634)
[02:40:57] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1045.eqiad.wmnet with OS bookworm
[02:41:11] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10378174 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm ex...
[02:42:02] <icinga-wm>	 RECOVERY - BFD status on cr2-esams is OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:42:06] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:44:06] <icinga-wm>	 RECOVERY - BGP status on cr1-drmrs is OK: BGP OK - up: 107, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:45:10] <icinga-wm>	 RECOVERY - BGP status on cr2-drmrs is OK: BGP OK - up: 110, down: 4, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:54:06] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:55:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1015 / ganeti1021 - https://phabricator.wikimedia.org/T381157#10378186 (10VRiley-WMF)
[02:56:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission ganeti1015 / ganeti1021 - https://phabricator.wikimedia.org/T381157#10378187 (10VRiley-WMF) 05Open→03Resolved
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:22:08] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:30:10] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:52:08] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:54:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T381230#10378202 (10phaultfinder)
[04:02:08] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 1/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:15:08] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:18:52] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:19:02] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 19 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:19:12] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 398710024 and 19 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[04:20:12] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 12072 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[04:31:26] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 219, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:38:06] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:38:10] <icinga-wm>	 RECOVERY - OSPF status on cr2-drmrs is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:38:10] <icinga-wm>	 RECOVERY - BFD status on cr2-drmrs is OK: UP: 6 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:38:12] <icinga-wm>	 RECOVERY - BFD status on cr2-eqdfw is OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:06:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[06:08:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71507 and previous config saved to /var/cache/conftool/dbconfig/20241204-060808-root.json
[06:08:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71508 and previous config saved to /var/cache/conftool/dbconfig/20241204-060834-root.json
[06:10:26] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es2042 [puppet] - 10https://gerrit.wikimedia.org/r/1100232 (https://phabricator.wikimedia.org/T381259)
[06:16:22] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Add es2042 [puppet] - 10https://gerrit.wikimedia.org/r/1100232 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[06:18:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es2042 to dbctl depooled T381259', diff saved to https://phabricator.wikimedia.org/P71509 and previous config saved to /var/cache/conftool/dbconfig/20241204-061821-marostegui.json
[06:18:25] <stashbot>	 T381259: Productionize es204[1-6] - https://phabricator.wikimedia.org/T381259
[06:23:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71510 and previous config saved to /var/cache/conftool/dbconfig/20241204-062313-root.json
[06:23:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71511 and previous config saved to /var/cache/conftool/dbconfig/20241204-062339-root.json
[06:31:20] <wikibugs>	 (03PS2) 10Abijeet Patro: Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386)
[06:38:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71512 and previous config saved to /var/cache/conftool/dbconfig/20241204-063819-root.json
[06:38:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71513 and previous config saved to /var/cache/conftool/dbconfig/20241204-063844-root.json
[06:44:09] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[06:51:51] <wikibugs>	 (03CR) 10Arnaudb: "convolutions flattened, one question still open" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086) (owner: 10Arnaudb)
[06:53:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71514 and previous config saved to /var/cache/conftool/dbconfig/20241204-065324-root.json
[06:53:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71515 and previous config saved to /var/cache/conftool/dbconfig/20241204-065349-root.json
[06:56:22] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T0700)
[07:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:08:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71516 and previous config saved to /var/cache/conftool/dbconfig/20241204-070829-root.json
[07:08:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71517 and previous config saved to /var/cache/conftool/dbconfig/20241204-070855-root.json
[07:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:31:40] <wikibugs>	 (03CR) 10KCVelaga: Add Metrics Platform stream configuration for translate_extension (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[07:35:20] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/1100132 (owner: 10Muehlenhoff)
[07:35:46] <wikibugs>	 (03PS4) 10Wangombe: Add Metrics Platform stream configuration for translate_extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460)
[07:36:04] <wikibugs>	 (03CR) 10Wangombe: Add Metrics Platform stream configuration for translate_extension (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[07:36:42] <wikibugs>	 (03CR) 10Slyngshede: [C:04-1] Extend access request email template (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1100133 (owner: 10Muehlenhoff)
[07:37:46] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to <wmf group> for <cpetrillo> - https://phabricator.wikimedia.org/T381464 (10Cpetrillo) 03NEW
[07:45:03] <wikibugs>	 (03PS1) 10Marostegui: es2042: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1100387
[07:46:02] <wikibugs>	 (03PS1) 10Slyngshede: Updated notification handling [software/bitu] - 10https://gerrit.wikimedia.org/r/1100388 (https://phabricator.wikimedia.org/T381075)
[07:46:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71518 and previous config saved to /var/cache/conftool/dbconfig/20241204-074629-root.json
[07:46:34] <wikibugs>	 (03PS1) 10Jelto: trafficserver: switch query-scholarly to wikikube [puppet] - 10https://gerrit.wikimedia.org/r/1100389 (https://phabricator.wikimedia.org/T350793)
[07:46:35] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es2042: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1100387 (owner: 10Marostegui)
[07:46:43] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Fix typo in SUL reminder [software/bitu] - 10https://gerrit.wikimedia.org/r/1100132 (owner: 10Muehlenhoff)
[07:47:03] <wikibugs>	 (03CR) 10Jelto: [C:03+2] "re-revert: Ic67e16343ecb4deb58e9ba2019af0468bf99e13a" [puppet] - 10https://gerrit.wikimedia.org/r/1098891 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[07:48:46] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es2046 [puppet] - 10https://gerrit.wikimedia.org/r/1100390 (https://phabricator.wikimedia.org/T381259)
[07:49:16] <wikibugs>	 (03Merged) 10jenkins-bot: Fix typo in SUL reminder [software/bitu] - 10https://gerrit.wikimedia.org/r/1100132 (owner: 10Muehlenhoff)
[07:51:13] <wikibugs>	 (03CR) 10Muehlenhoff: New ferm rule to permit HDFS data flows and mark as low-prio for qos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[07:52:16] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Add es2046 [puppet] - 10https://gerrit.wikimedia.org/r/1100390 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[07:54:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es2046 to es5 depooled T381259', diff saved to https://phabricator.wikimedia.org/P71519 and previous config saved to /var/cache/conftool/dbconfig/20241204-075427-marostegui.json
[07:54:31] <stashbot>	 T381259: Productionize es204[1-6] - https://phabricator.wikimedia.org/T381259
[07:55:24] <wikibugs>	 (03PS1) 10Marostegui: es2046: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1100391 (https://phabricator.wikimedia.org/T381259)
[07:56:09] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es2046: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1100391 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[07:57:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 1%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71520 and previous config saved to /var/cache/conftool/dbconfig/20241204-075703-root.json
[07:58:15] <wikibugs>	 (03PS1) 10Slyngshede: Release v0.1.3 [software/bitu] - 10https://gerrit.wikimedia.org/r/1100393
[07:58:25] <wikibugs>	 (03CR) 10Jelto: [C:03+2] trafficserver: switch query-scholarly to wikikube [puppet] - 10https://gerrit.wikimedia.org/r/1100389 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[07:59:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1100393 (owner: 10Slyngshede)
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: Your horoscope predicts another UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T0800).
[08:00:04] <jouncebot>	 kostajh: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:09] <kostajh>	 hello
[08:00:26] <kostajh>	 I'll deploy
[08:00:54] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Release v0.1.3 [software/bitu] - 10https://gerrit.wikimedia.org/r/1100393 (owner: 10Slyngshede)
[08:01:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100117 (https://phabricator.wikimedia.org/T381189) (owner: 10Kosta Harlan)
[08:01:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2042 (re)pooling @ 25%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71522 and previous config saved to /var/cache/conftool/dbconfig/20241204-080134-root.json
[08:03:37] <wikibugs>	 (03Merged) 10jenkins-bot: Release v0.1.3 [software/bitu] - 10https://gerrit.wikimedia.org/r/1100393 (owner: 10Slyngshede)
[08:05:13] <wikibugs>	 (03PS16) 10Arnaudb: mysql: add port number to MysqlClient [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086)
[08:05:13] <wikibugs>	 (03CR) 10Arnaudb: "tests are written down" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086) (owner: 10Arnaudb)
[08:11:20] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.197 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:12:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71524 and previous config saved to /var/cache/conftool/dbconfig/20241204-081208-root.json
[08:12:11] <wikibugs>	 (03Merged) 10jenkins-bot: dialog: Don't duplicate the footer in the behaviour list template [extensions/ReportIncident] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100117 (https://phabricator.wikimedia.org/T381189) (owner: 10Kosta Harlan)
[08:13:18] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1100117|dialog: Don't duplicate the footer in the behaviour list template (T381189)]]
[08:13:20] <stashbot>	 T381189: Footer text on types of unacceptable behavior step is not in dialog footer - https://phabricator.wikimedia.org/T381189
[08:13:43] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
[08:14:04] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594#10378493 (10ops-monitoring-bot) Draining ganeti2018.codfw.wmnet of running VMs
[08:16:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2042 (re)pooling @ 50%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71525 and previous config saved to /var/cache/conftool/dbconfig/20241204-081640-root.json
[08:18:11] <wikibugs>	 (03PS1) 10Slyngshede: Switch to upgraded Bitu node [dns] - 10https://gerrit.wikimedia.org/r/1100395
[08:18:27] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1100117|dialog: Don't duplicate the footer in the behaviour list template (T381189)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:18:30] <stashbot>	 T381189: Footer text on types of unacceptable behavior step is not in dialog footer - https://phabricator.wikimedia.org/T381189
[08:18:42] <wikibugs>	 (03CR) 10DCausse: [C:04-1] "Will consider using the versioned stream conventions" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099727 (https://phabricator.wikimedia.org/T374919) (owner: 10DCausse)
[08:18:55] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[08:20:53] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
[08:21:18] <wikibugs>	 10SRE-swift-storage, 10Observability-Metrics: Capacity planning/estimation for Thanos - https://phabricator.wikimedia.org/T357747#10378504 (10tappof)
[08:21:26] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv4: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:23:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
[08:23:54] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594#10378507 (10ops-monitoring-bot) Draining ganeti2018.codfw.wmnet of running VMs
[08:25:26] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100117|dialog: Don't duplicate the footer in the behaviour list template (T381189)]] (duration: 12m 08s)
[08:25:28] <stashbot>	 T381189: Footer text on types of unacceptable behavior step is not in dialog footer - https://phabricator.wikimedia.org/T381189
[08:27:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 25%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71526 and previous config saved to /var/cache/conftool/dbconfig/20241204-082714-root.json
[08:29:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/1100395 (owner: 10Slyngshede)
[08:31:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71527 and previous config saved to /var/cache/conftool/dbconfig/20241204-083145-root.json
[08:35:18] <moritzm>	 !log rebalance Ganeti eqiad/C following server refreshes
[08:35:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:31] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Switch to upgraded Bitu node [dns] - 10https://gerrit.wikimedia.org/r/1100395 (owner: 10Slyngshede)
[08:42:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 50%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71528 and previous config saved to /var/cache/conftool/dbconfig/20241204-084219-root.json
[08:42:51] <wikibugs>	 (03Abandoned) 10Gehel: java: introduce a standard list of GC logging options for Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/954060 (https://phabricator.wikimedia.org/T345355) (owner: 10Gehel)
[08:43:56] <wikibugs>	 (03CR) 10Gehel: "Oh, I see! There are accesses to top level variables in statistics::published. That's confusing!" [puppet] - 10https://gerrit.wikimedia.org/r/924946 (owner: 10Gehel)
[08:46:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling cloning', diff saved to https://phabricator.wikimedia.org/P71529 and previous config saved to /var/cache/conftool/dbconfig/20241204-084650-root.json
[08:51:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2020 to es4 master T381259', diff saved to https://phabricator.wikimedia.org/P71530 and previous config saved to /var/cache/conftool/dbconfig/20241204-085124-marostegui.json
[08:51:28] <stashbot>	 T381259: Productionize es204[1-6] - https://phabricator.wikimedia.org/T381259
[08:51:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2022 to clone es2043', diff saved to https://phabricator.wikimedia.org/P71531 and previous config saved to /var/cache/conftool/dbconfig/20241204-085143-marostegui.json
[08:51:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es2022.codfw.wmnet with reason: cloning
[08:52:12] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2022.codfw.wmnet with reason: cloning
[08:54:29] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize es2043 [puppet] - 10https://gerrit.wikimedia.org/r/1100399 (https://phabricator.wikimedia.org/T381259)
[08:55:36] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Productionize es2043 [puppet] - 10https://gerrit.wikimedia.org/r/1100399 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[08:57:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71532 and previous config saved to /var/cache/conftool/dbconfig/20241204-085724-root.json
[09:01:01] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] mediawiki: support for service.deployment: none [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081449 (https://phabricator.wikimedia.org/T377040) (owner: 10Scott French)
[09:02:34] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 317, down: 1, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:05:43] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[2440,2442-2444].codfw.wmnet
[09:07:57] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[2440,2442-2444].codfw.wmnet
[09:12:16] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:12:22] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[2440,2442-2444].codfw.wmnet with reason: T377877
[09:12:24] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:12:25] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[09:12:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Pooling in es5', diff saved to https://phabricator.wikimedia.org/P71533 and previous config saved to /var/cache/conftool/dbconfig/20241204-091229-root.json
[09:12:43] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[2440,2442-2444].codfw.wmnet with reason: T377877
[09:13:12] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on mw2440 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T381469 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[09:13:20] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on mw2440 - https://phabricator.wikimedia.org/T381469 (10ops-monitoring-bot) 03NEW
[09:14:51] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2440.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:15:31] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2442.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:21:31] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.decommission for hosts an-presto1001.eqiad.wmnet
[09:24:30] <wikibugs>	 (03PS1) 10Brouberol: aliases: change the an-presto-canary host [puppet] - 10https://gerrit.wikimedia.org/r/1100400 (https://phabricator.wikimedia.org/T381407)
[09:26:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1100400 (https://phabricator.wikimedia.org/T381407) (owner: 10Brouberol)
[09:28:18] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 234, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:28:28] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 317, down: 1, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:29:15] <wikibugs>	 (03PS3) 10Cathal Mooney: New ferm rule to permit HDFS data flows and mark as low-prio for qos [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389)
[09:29:43] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] aliases: change the an-presto-canary host [puppet] - 10https://gerrit.wikimedia.org/r/1100400 (https://phabricator.wikimedia.org/T381407) (owner: 10Brouberol)
[09:29:52] <wikibugs>	 (03CR) 10Cathal Mooney: New ferm rule to permit HDFS data flows and mark as low-prio for qos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[09:29:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] New ferm rule to permit HDFS data flows and mark as low-prio for qos [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[09:30:32] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[09:30:36] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:30:46] <wikibugs>	 (03PS4) 10Cathal Mooney: New ferm rule to permit HDFS data flows and mark as low-prio for qos [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389)
[09:31:18] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:31:30] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:32:52] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2442.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:33:00] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2443.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:33:11] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2444.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:34:17] <wikibugs>	 (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[09:34:36] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[09:35:07] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[09:35:08] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:35:09] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-presto1001.eqiad.wmnet
[09:35:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2023 to es5 master T381259', diff saved to https://phabricator.wikimedia.org/P71534 and previous config saved to /var/cache/conftool/dbconfig/20241204-093519-marostegui.json
[09:35:23] <stashbot>	 T381259: Productionize es204[1-6] - https://phabricator.wikimedia.org/T381259
[09:35:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2024 to clone es2045', diff saved to https://phabricator.wikimedia.org/P71535 and previous config saved to /var/cache/conftool/dbconfig/20241204-093541-marostegui.json
[09:35:56] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es2024.codfw.wmnet with reason: cloning
[09:36:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2024.codfw.wmnet with reason: cloning
[09:38:18] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize es2045 [puppet] - 10https://gerrit.wikimedia.org/r/1100404 (https://phabricator.wikimedia.org/T381259)
[09:39:01] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.decommission for hosts an-presto1002.eqiad.wmnet
[09:39:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Productionize es2045 [puppet] - 10https://gerrit.wikimedia.org/r/1100404 (https://phabricator.wikimedia.org/T381259) (owner: 10Marostegui)
[09:42:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: "We'll need 443 access from internal network too, e.g. prometheus sends probes towards 443" [puppet] - 10https://gerrit.wikimedia.org/r/1100144 (owner: 10Muehlenhoff)
[09:44:08] <wikibugs>	 (03CR) 10Muehlenhoff: "These are covered fleet-wide via the generic full-monitoring-metrics-access rule" [puppet] - 10https://gerrit.wikimedia.org/r/1100144 (owner: 10Muehlenhoff)
[09:45:45] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10378675 (10JMeybohm)
[09:46:22] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2440.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:47:39] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on mw2444 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T381472 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[09:47:44] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on mw2444 - https://phabricator.wikimedia.org/T381472 (10ops-monitoring-bot) 03NEW
[09:49:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[09:50:28] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2443.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:50:29] <wikibugs>	 (03PS1) 10Jaime Nuche: bootstrap-scap-target.sh: handle multiple wheel versions [puppet] - 10https://gerrit.wikimedia.org/r/1100407
[09:50:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "Doh, of course! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1100144 (owner: 10Muehlenhoff)
[09:50:34] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2444.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[09:52:09] <wikibugs>	 (03PS1) 10Tiziano Fogli: thanos/compactor: increase downsampling/compation concurrency [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466)
[09:52:09] <wikibugs>	 (03CR) 10Tiziano Fogli: "The changes are ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466) (owner: 10Tiziano Fogli)
[09:52:42] <icinga-wm>	 PROBLEM - Host mr1-esams.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[09:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:55:00] <wikibugs>	 (03PS1) 10JMeybohm: Rename mw244[02-4] to wikikube-worker201[56],wikikube-worker217[12] [puppet] - 10https://gerrit.wikimedia.org/r/1100408 (https://phabricator.wikimedia.org/T377877)
[09:55:55] <wikibugs>	 (03PS2) 10Jaime Nuche: bootstrap-scap-target.sh: handle multiple wheel versions [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772)
[09:56:20] <godog>	 !log bump space for prometheus k8s-mlserve in eqiad
[09:56:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:39] <wikibugs>	 (03PS3) 10Jaime Nuche: bootstrap-scap-target.sh: handle multiple wheel versions [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772)
[09:57:30] <wikibugs>	 07sre-alert-triage, 06Data-Platform-SRE, 06DBA: Alert in need of triage: PrometheusMysqldExporterFailed (instance db1208:13351) - https://phabricator.wikimedia.org/T376978#10378704 (10Marostegui) I believe so yes, and also this host is part of Analytics.
[09:58:30] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[09:59:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: "See inline, LGTM overall" [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466) (owner: 10Tiziano Fogli)
[10:00:17] <wikibugs>	 07sre-alert-triage, 06DBA, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Alert in need of triage: PrometheusMysqldExporterFailed (instance db1208:13351) - https://phabricator.wikimedia.org/T376978#10378721 (10BTullis) a:03BTullis Apologies for the delay. I'll have a look at this. If I recall correctly, th...
[10:02:49] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[10:03:34] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[10:03:34] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:03:35] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-presto1002.eqiad.wmnet
[10:04:23] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.decommission for hosts an-presto1003.eqiad.wmnet
[10:04:30] <wikibugs>	 07sre-alert-triage, 06DBA, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Alert in need of triage: PrometheusMysqldExporterFailed (instance db1208:13351) - https://phabricator.wikimedia.org/T376978#10378749 (10BTullis) 05Open→03Resolved I had already masked the service and reset the failed unit, but...
[10:04:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
[10:06:58] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[10:07:29] <wikibugs>	 (03CR) 10Volans: [C:04-1] "It looks to me that it can be simplified quite a bit" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086) (owner: 10Arnaudb)
[10:09:11] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on 8 hosts with reason: Rebooting
[10:09:20] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Rebooting
[10:10:13] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[10:13:42] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[10:13:43] <wikibugs>	 (03CR) 10Marostegui: mysql: add port number to MysqlClient (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086) (owner: 10Arnaudb)
[10:15:36] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.197 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:17:23] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db224[12] - https://phabricator.wikimedia.org/T379757#10378785 (10Marostegui)
[10:17:26] <wikibugs>	 (03CR) 10Volans: [WIP, DNM] create sre.k8s.roll-reimage-nodes (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1094494 (https://phabricator.wikimedia.org/T377857) (owner: 10Kamila Součková)
[10:17:45] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db224[12] - https://phabricator.wikimedia.org/T379757#10378790 (10Marostegui) Thank was fast! Thank you Jenn!
[10:18:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1100408 (https://phabricator.wikimedia.org/T377877) (owner: 10JMeybohm)
[10:18:56] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Rename mw244[02-4] to wikikube-worker201[56],wikikube-worker217[12] [puppet] - 10https://gerrit.wikimedia.org/r/1100408 (https://phabricator.wikimedia.org/T377877) (owner: 10JMeybohm)
[10:19:12] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:19:12] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:19:43] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:19:54] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:19:55] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-presto1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin2002"
[10:19:55] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:19:56] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-presto1003.eqiad.wmnet
[10:20:49] <wikibugs>	 (03PS6) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[10:20:54] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2440 to wikikube-worker2015
[10:20:54] <wikibugs>	 (03CR) 10KCVelaga: Add Metrics Platform stream configuration for translate_extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[10:21:05] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:21:57] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:22:05] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.decommission for hosts an-presto1004.eqiad.wmnet
[10:22:18] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2442 to wikikube-worker20160
[10:22:24] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[10:22:30] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2442 to wikikube-worker20160
[10:22:40] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2443 to wikikube-worker2171
[10:22:53] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2444 to wikikube-worker2172
[10:23:10] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[10:23:20] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:23:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2442 to wikikube-worker2016
[10:23:59] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2442 to wikikube-worker2016
[10:25:15] <wikibugs>	 (03PS5) 10Wangombe: Add Metrics Platform stream configuration for translate_extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460)
[10:25:46] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10378808 (10elukey) @Jclark-ctr I fixed the provisioning of ms-be1086, for some reasons if the BMC doesn't have IPv6 enabled the settings that errored ou...
[10:25:49] <wikibugs>	 (03CR) 10Wangombe: Add Metrics Platform stream configuration for translate_extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[10:26:45] <jayme>	 brouberol: merged your an-presto1004 netbox changes
[10:27:11] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Extend bwlimit to upload cluster globally [puppet] - 10https://gerrit.wikimedia.org/r/1100137 (owner: 10Vgutierrez)
[10:27:28] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2442 to wikikube-worker2016
[10:27:31] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2444 to wikikube-worker2172 - jayme@cumin2002"
[10:27:40] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2442 to wikikube-worker2016
[10:27:55] <wikibugs>	 (03CR) 10KCVelaga: [C:03+1] "Looks good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[10:28:23] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:28:24] <vgutierrez>	 !log enabling outbound bandwidth limits enforced by haproxy on the upload cluster
[10:28:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:31] <vgutierrez>	 _joe_: ^^
[10:28:44] <brouberol>	 jayme: thanks! I have a decom cookbook running atm
[10:29:03] <brouberol>	 ah, you currently hold the lock :D
[10:29:07] <jayme>	 brouberol: yeah, I saw that here - that's why I did not ask for confirmation :)
[10:29:14] <brouberol>	 np
[10:29:43] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2444 to wikikube-worker2172 - jayme@cumin2002"
[10:29:43] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:29:44] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2172
[10:30:09] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2172
[10:30:46] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:30:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2171
[10:30:49] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2444 to wikikube-worker2172
[10:30:50] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[10:31:22] <wikibugs>	 (03CR) 10Elukey: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[10:32:34] <wikibugs>	 (03PS2) 10Tiziano Fogli: thanos/compactor: increase downsampling/compation concurrency [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466)
[10:33:12] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:33:13] <moritzm>	 !log removing ganeti2018 from active Ganeti nodes T376594
[10:33:13] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-presto1004.eqiad.wmnet
[10:33:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:16] <stashbot>	 T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594
[10:33:42] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594#10378828 (10MoritzMuehlenhoff)
[10:34:22] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:34:55] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: various mercurius fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100412 (https://phabricator.wikimedia.org/T371701)
[10:35:10] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2171
[10:35:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466) (owner: 10Tiziano Fogli)
[10:35:44] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not reimage es2041, es2042 [puppet] - 10https://gerrit.wikimedia.org/r/1100413
[10:35:51] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2443 to wikikube-worker2171
[10:36:02] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2172.codfw.wmnet with OS bookworm
[10:36:10] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti2018 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[10:36:10] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti2018 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:36:13] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2172
[10:36:34] <wikibugs>	 (03PS7) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[10:36:43] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:36:44] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2015
[10:36:53] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:36:55] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2171.codfw.wmnet with OS bookworm
[10:37:06] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2171
[10:37:08] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti2018:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:37:19] <wikibugs>	 (03PS8) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[10:37:29] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2015
[10:37:50] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2442 to wikikube-worker2016
[10:38:06] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti2018: Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1100414 (https://phabricator.wikimedia.org/T376594)
[10:38:09] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2440 to wikikube-worker2015
[10:38:25] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4632/co" [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[10:38:40] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mediawiki: various mercurius fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100412 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[10:38:49] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2015.codfw.wmnet with OS bookworm
[10:39:27] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.decommission for hosts an-presto1005.eqiad.wmnet
[10:39:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] mediawiki: various mercurius fixes (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100412 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[10:39:47] <wikibugs>	 (03PS9) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[10:40:13] <wikibugs>	 (03CR) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[10:40:38] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2172 - jayme@cumin2002"
[10:40:51] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4633/co" [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[10:41:03] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2172 - jayme@cumin2002"
[10:41:03] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:41:04] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2172.codfw.wmnet 77.48.192.10.in-addr.arpa 7.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:41:07] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2172.codfw.wmnet 77.48.192.10.in-addr.arpa 7.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:41:08] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2172
[10:41:17] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2172
[10:41:18] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2172
[10:41:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:42:03] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Replica SQL: s4 on db1245 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table wbc_entity_usage is corrupt: try to repair it on query. Default database: commonswiki. [Query snipped] Marostegui T381476 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:46:45] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2442 to wikikube-worker2016 - jayme@cumin2002"
[10:46:51] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2442 to wikikube-worker2016 - jayme@cumin2002"
[10:46:51] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:46:52] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2016
[10:46:53] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[10:47:07] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2016
[10:47:48] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2442 to wikikube-worker2016
[10:47:58] <wikibugs>	 10ops-codfw, 06DC-Ops, 06serviceops: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381478 (10JMeybohm) 03NEW
[10:48:31] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2016.codfw.wmnet with OS bookworm
[10:49:23] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:49:38] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-presto1005.eqiad.wmnet
[10:49:38] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2016.codfw.wmnet with OS bookworm
[10:50:05] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:52:29] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:52:29] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2171.codfw.wmnet 152.32.192.10.in-addr.arpa 2.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:52:32] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2171.codfw.wmnet 152.32.192.10.in-addr.arpa 2.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:52:33] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2171
[10:52:42] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2171
[10:52:42] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2171
[10:53:20] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2015
[10:53:27] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[10:54:07] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2016.codfw.wmnet with OS bookworm
[10:55:29] <wikibugs>	 (03PS1) 10Gmodena: EventStreamConfig: add content_history streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100417 (https://phabricator.wikimedia.org/T381322)
[10:55:44] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:57:00] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2015 - jayme@cumin2002"
[10:57:01] <wikibugs>	 (03CR) 10Filippo Giunchedi: "There's a bunch of things to unpack I think, and I may be missing some context so please bear with me!" [puppet] - 10https://gerrit.wikimedia.org/r/1079531 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[10:57:06] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2015 - jayme@cumin2002"
[10:57:06] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:57:06] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2015.codfw.wmnet 149.32.192.10.in-addr.arpa 9.4.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:57:10] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2015.codfw.wmnet 149.32.192.10.in-addr.arpa 9.4.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[10:57:11] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2015
[10:57:17] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2015.codfw.wmnet wikikube-worker2016.codfw.wmnet wikikube-worker2171.codfw.wmnet wikikube-worker2172.codfw.wmnet on all recursors
[10:57:20] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2015.codfw.wmnet wikikube-worker2016.codfw.wmnet wikikube-worker2171.codfw.wmnet wikikube-worker2172.codfw.wmnet on all recursors
[10:57:23] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2015
[10:57:23] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2015
[10:58:05] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2016
[10:58:31] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[11:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1100)
[11:03:32] <vgutierrez>	 !log restarting haproxy on cp1107
[11:03:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:13] <wikibugs>	 (03PS7) 10Aklapper: Redirect svn.wikimedia.org/doc properly [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109950) (owner: 10Dereckson)
[11:07:19] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2016 - jayme@cumin2002"
[11:07:24] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2016 - jayme@cumin2002"
[11:07:24] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:07:25] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2016.codfw.wmnet 151.32.192.10.in-addr.arpa 1.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[11:07:28] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2016.codfw.wmnet 151.32.192.10.in-addr.arpa 1.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[11:07:29] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2016
[11:07:39] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2016
[11:07:39] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2016
[11:07:44] <wikibugs>	 (03PS1) 10Vgutierrez: Revert "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100419
[11:08:13] <wikibugs>	 (03CR) 10Aklapper: "Attempted to rebase/amend. Also removed the generated file `modules/mediawiki/files/apache/sites/redirects.conf` from being included in th" [puppet] - 10https://gerrit.wikimedia.org/r/631888 (https://phabricator.wikimedia.org/T109950) (owner: 10Dereckson)
[11:10:39] <wikibugs>	 (03PS2) 10Vgutierrez: Revert "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100419
[11:11:33] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Revert "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100419 (owner: 10Vgutierrez)
[11:11:48] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2171.codfw.wmnet with reason: host reimage
[11:13:17] <vgutierrez>	 !log disabling outbound bandwidth limits enforced by haproxy on the upload cluster (we are getting haproxy crashes)
[11:13:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:35] <vgutierrez>	 so convenient I'm the one on-call lol
[11:14:35] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2171.codfw.wmnet with reason: host reimage
[11:14:54] <wikibugs>	 (03CR) 10Phuedx: [C:03+1] "The configuration LGTM and will work. I can't speak to the values that you're collecting for this stream though." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1097499 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[11:15:08] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[11:15:11] <wikibugs>	 (03PS2) 10Marostegui: installserver: Do not reimage es2041, es2042 [puppet] - 10https://gerrit.wikimedia.org/r/1100413
[11:15:39] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 226, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:19:13] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti2018:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:21:01] <wikibugs>	 (03PS1) 10Gmodena: dse-k8s: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[11:24:01] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[11:24:39] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[11:24:40] <wikibugs>	 (03PS2) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[11:25:31] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 310, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:26:30] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2016.codfw.wmnet with reason: host reimage
[11:26:37] <wikibugs>	 (03PS3) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[11:30:19] <wikibugs>	 (03CR) 10Gmodena: "Some prep work to support the release of Dumps 2." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[11:32:25] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2016.codfw.wmnet with reason: host reimage
[11:32:26] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: various mercurius fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100412 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:34:00] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] thanos/compactor: increase downsampling/compation concurrency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100403 (https://phabricator.wikimedia.org/T381466) (owner: 10Tiziano Fogli)
[11:34:13] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: various mercurius fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100412 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[11:34:22] <wikibugs>	 (03PS1) 10Vgutierrez: Revert^2 "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100423
[11:34:50] <wikibugs>	 (03PS2) 10Vgutierrez: Revert^2 "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100423
[11:35:05] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100423 (owner: 10Vgutierrez)
[11:35:12] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[11:35:14] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[11:36:46] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2171.codfw.wmnet with OS bookworm
[11:38:18] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2172.codfw.wmnet with OS bookworm
[11:39:02] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2172.codfw.wmnet with OS bookworm
[11:40:53] <wikibugs>	 (03PS3) 10Vgutierrez: Revert^2 "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100423
[11:41:16] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100423 (owner: 10Vgutierrez)
[11:41:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] ganeti2018: Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1100414 (https://phabricator.wikimedia.org/T376594) (owner: 10Muehlenhoff)
[11:42:30] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] datahub: add datahub production index prefix [deployment-charts] - 10https://gerrit.wikimedia.org/r/1097372 (https://phabricator.wikimedia.org/T377814) (owner: 10Stevemunene)
[11:43:35] <wikibugs>	 (03Merged) 10jenkins-bot: datahub: add datahub production index prefix [deployment-charts] - 10https://gerrit.wikimedia.org/r/1097372 (https://phabricator.wikimedia.org/T377814) (owner: 10Stevemunene)
[11:45:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] graphite: Restrict access to Envoy port [puppet] - 10https://gerrit.wikimedia.org/r/1100144 (owner: 10Muehlenhoff)
[11:46:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not reimage es2041, es2042 [puppet] - 10https://gerrit.wikimedia.org/r/1100413 (owner: 10Marostegui)
[11:47:57] <wikibugs>	 (03PS1) 10Dreamy Jazz: Create a DB list for wikis with continuous MediaModeration scans [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100426 (https://phabricator.wikimedia.org/T355169)
[11:48:20] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "Looks that `filter` directive is now present on all hosts, LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1100423 (owner: 10Vgutierrez)
[11:48:22] <Dreamy_Jazz>	 jouncebot: nowandnext
[11:48:22] <jouncebot>	 For the next 0 hour(s) and 11 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1100)
[11:48:22] <jouncebot>	 In 0 hour(s) and 11 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1200)
[11:48:51] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Revert^2 "hiera: Extend bwlimit to upload cluster globally" [puppet] - 10https://gerrit.wikimedia.org/r/1100423 (owner: 10Vgutierrez)
[11:49:06] <wikibugs>	 (03PS10) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[11:49:30] <wikibugs>	 (03CR) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[11:49:39] <vgutierrez>	 !log re-enabling outbound bandwidth limits enforced by haproxy on the upload cluster
[11:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:18] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db1206 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.45 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:50:26] <wikibugs>	 (03PS1) 10Dreamy Jazz: [WIP] Update MediaModeration module to run scans automatically [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169)
[11:50:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[11:50:56] <wikibugs>	 (03PS2) 10Dreamy Jazz: Create a DB list for wikis with continuous MediaModeration scans [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100426 (https://phabricator.wikimedia.org/T355169)
[11:51:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [WIP] Update MediaModeration module to run scans automatically [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[11:51:09] <wikibugs>	 (03PS2) 10Dreamy Jazz: [WIP] Update MediaModeration module to run scans automatically [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169)
[11:51:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100426 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[11:52:23] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2016.codfw.wmnet with OS bookworm
[11:52:36] <wikibugs>	 (03Merged) 10jenkins-bot: Create a DB list for wikis with continuous MediaModeration scans [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100426 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[11:53:04] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1100426|Create a DB list for wikis with continuous MediaModeration scans (T355169)]]
[11:53:06] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[11:56:30] <wikibugs>	 (03PS1) 10Dreamy Jazz: Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169)
[11:58:02] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2172.codfw.wmnet with reason: host reimage
[11:59:12] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1100426|Create a DB list for wikis with continuous MediaModeration scans (T355169)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:59:14] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[11:59:23] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
[12:00:04] <jouncebot>	 mvolz: Your horoscope predicts another Services – Citoid / Zotero deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1200).
[12:01:05] <wikibugs>	 (03PS1) 10Stevemunene: datahub: Rebuild datahub for java updates [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100433 (https://phabricator.wikimedia.org/T377938)
[12:01:32] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
[12:01:45] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
[12:02:18] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2172.codfw.wmnet with reason: host reimage
[12:03:25] <wikibugs>	 (03CR) 10Dreamy Jazz: "Want to backport this so that the fix is ready for when puppet runs the scripts automatically." [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:03:36] <wikibugs>	 (03PS1) 10Dreamy Jazz: Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1100434 (https://phabricator.wikimedia.org/T355169)
[12:04:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:05:15] <wikibugs>	 (03CR) 10Dreamy Jazz: "recheck" [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:05:26] <Dreamy_Jazz>	 jouncebot: nowandnext
[12:05:26] <jouncebot>	 For the next 0 hour(s) and 54 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1200)
[12:05:26] <jouncebot>	 In 1 hour(s) and 54 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[12:05:58] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:06:06] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100426|Create a DB list for wikis with continuous MediaModeration scans (T355169)]] (duration: 13m 02s)
[12:06:09] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[12:06:36] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/MediaModeration] (wmf/1.44.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1100434 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:09:21] <wikibugs>	 (03CR) 10Elukey: [C:03+1] ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[12:12:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[12:13:50] <wikibugs>	 (03PS11) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344)
[12:14:16] <wikibugs>	 (03CR) 10Klausman: ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[12:17:07] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4634/co" [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[12:17:57] <wikibugs>	 (03CR) 10Dreamy Jazz: "recheck" [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:18:13] <wikibugs>	 (03CR) 10Klausman: [V:03+1 C:03+2] ml-lab/gpu: Add environment file that sets correct paths for ROCm/hipcc [puppet] - 10https://gerrit.wikimedia.org/r/1100056 (https://phabricator.wikimedia.org/T371344) (owner: 10Klausman)
[12:22:30] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2172.codfw.wmnet with OS bookworm
[12:22:33] <wikibugs>	 (03CR) 10Dreamy Jazz: Ensure IP reveal buttons are not shown on Special:MassGlobalBlock (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[12:22:39] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] Ensure IP reveal buttons are not shown on Special:MassGlobalBlock [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[12:22:46] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[12:23:13] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] "I can deploy this in the upcoming window where I have other changes to deploy too." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[12:25:54] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] datahub: Rebuild datahub for java updates [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100433 (https://phabricator.wikimedia.org/T377938) (owner: 10Stevemunene)
[12:26:03] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:31:17] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1001.eqiad.wmnet - https://phabricator.wikimedia.org/T381487#10379330 (10brouberol)
[12:31:18] <wikibugs>	 (03CR) 10Btullis: [C:03+1] datahub: Rebuild datahub for java updates [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100433 (https://phabricator.wikimedia.org/T377938) (owner: 10Stevemunene)
[12:31:21] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381488 (10brouberol) 03NEW
[12:31:48] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1003.eqiad.wmnet - https://phabricator.wikimedia.org/T381489 (10brouberol) 03NEW
[12:32:13] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1004.eqiad.wmnet - https://phabricator.wikimedia.org/T381490 (10brouberol) 03NEW
[12:32:32] <moritzm>	 !log installing glib2.0 security updates
[12:32:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:46] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1005.eqiad.wmnet - https://phabricator.wikimedia.org/T381491 (10brouberol) 03NEW
[12:32:53] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1001.eqiad.wmnet - https://phabricator.wikimedia.org/T381487#10379389 (10brouberol)
[12:33:12] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] datahub: Rebuild datahub for java updates [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100433 (https://phabricator.wikimedia.org/T377938) (owner: 10Stevemunene)
[12:33:59] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[12:34:26] <wikibugs>	 (03Merged) 10jenkins-bot: datahub: Rebuild datahub for java updates [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100433 (https://phabricator.wikimedia.org/T377938) (owner: 10Stevemunene)
[12:35:05] <wikibugs>	 (03PS1) 10Dreamy Jazz: Stats: Move StatsFactory flush into emitBufferedStats [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100442 (https://phabricator.wikimedia.org/T380609)
[12:35:36] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100442 (https://phabricator.wikimedia.org/T380609) (owner: 10Dreamy Jazz)
[12:35:44] <Dreamy_Jazz>	 jouncebot: nowandnext
[12:35:44] <jouncebot>	 For the next 0 hour(s) and 24 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1200)
[12:35:44] <jouncebot>	 In 1 hour(s) and 24 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[12:36:03] <Dreamy_Jazz>	 Going to start with gate-and-submit-wmf for some of the backports, as they will take a time to complete
[12:36:20] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Stats: Move StatsFactory flush into emitBufferedStats [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100442 (https://phabricator.wikimedia.org/T380609) (owner: 10Dreamy Jazz)
[12:36:35] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1100434 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:36:39] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:36:57] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1001.eqiad.wmnet - https://phabricator.wikimedia.org/T381487#10379411 (10brouberol)
[12:37:08] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1002.eqiad.wmnet - https://phabricator.wikimedia.org/T381488#10379413 (10brouberol)
[12:37:19] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1003.eqiad.wmnet - https://phabricator.wikimedia.org/T381489#10379419 (10brouberol)
[12:37:22] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1004.eqiad.wmnet - https://phabricator.wikimedia.org/T381490#10379421 (10brouberol)
[12:37:27] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission an-presto1005.eqiad.wmnet - https://phabricator.wikimedia.org/T381491#10379423 (10brouberol)
[12:38:17] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[12:40:06] <hnowlan>	 !log imported debs for mercurius_1.0.2
[12:40:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:40] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[12:44:28] <hnowlan>	 jouncebot: nowandnext
[12:44:29] <jouncebot>	 For the next 0 hour(s) and 15 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1200)
[12:44:29] <jouncebot>	 In 1 hour(s) and 15 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[12:47:08] <moritzm>	 !log uploaded mailman3 3.3.8-2~deb12u2+wmf1 T377045
[12:47:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:11] <stashbot>	 T377045: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045
[12:47:18] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[12:49:12] <wikibugs>	 (03PS4) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[12:49:28] <wikibugs>	 (03PS1) 10Mvolz: Enable wayback in config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100443 (https://phabricator.wikimedia.org/T369084)
[12:50:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-lab1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:52:11] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[12:52:54] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] Enable wayback in config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100443 (https://phabricator.wikimedia.org/T369084) (owner: 10Mvolz)
[12:53:21] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on db1206 is OK: OK slave_sql_lag Replication lag: 4.07 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:53:59] <wikibugs>	 (03Merged) 10jenkins-bot: Enable wayback in config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100443 (https://phabricator.wikimedia.org/T369084) (owner: 10Mvolz)
[12:54:44] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1004,1009].eqiad.wmnet with reason: Hardware refresh
[12:54:59] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1004,1009].eqiad.wmnet with reason: Hardware refresh
[12:55:31] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:55:35] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:56:14] <wikibugs>	 (03PS1) 10DDesouza: miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100444 (https://phabricator.wikimedia.org/T219903)
[12:56:33] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:57:17] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:57:20] <wikibugs>	 (03Merged) 10jenkins-bot: Stats: Move StatsFactory flush into emitBufferedStats [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100442 (https://phabricator.wikimedia.org/T380609) (owner: 10Dreamy Jazz)
[12:57:23] <wikibugs>	 (03Merged) 10jenkins-bot: Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1100434 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:57:25] <wikibugs>	 (03Merged) 10jenkins-bot: Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php [extensions/MediaModeration] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100430 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[12:58:26] <wikibugs>	 (03PS3) 10Hnowlan: mediawiki: add multi-job support to mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701)
[12:58:41] <Dreamy_Jazz>	 I'm going to start scap for these wmf backports as they have merged and it is nearly the time for the window
[12:58:43] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10379487 (10MoritzMuehlenhoff) >>! In T377045#10377117, @Dzahn wrote: > Since we could not test if the service starts on list2001 (fails because...
[12:59:01] <wikibugs>	 (03CR) 10DDesouza: [C:03+2] miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100444 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[12:59:12] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1100442|Stats: Move StatsFactory flush into emitBufferedStats (T380609)]], [[gerrit:1100434|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100430|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]]
[12:59:16] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[12:59:17] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[12:59:55] <Dreamy_Jazz>	 Scap failed with:
[13:00:05] <Dreamy_Jazz>	 mergeMessageFileList.php generated PHP notices/warnings: Warning: socket_sendto(): unable to write to socket [101]: Network is unreachable in /srv/mediawiki-staging/php-1.44.0-wmf.6/includes/debug/logger/monolog/LegacyHandler.php on line 234
[13:00:32] <wikibugs>	 (03CR) 10DDesouza: [V:03+2 C:03+2] miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100444 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[13:00:34] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb(research-landing-page): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100444 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[13:00:35] <hashar>	 aren't logs sent over UDP? ???
[13:00:37] <Dreamy_Jazz>	 Going to re-try scap
[13:00:51] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1100442|Stats: Move StatsFactory flush into emitBufferedStats (T380609)]], [[gerrit:1100434|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100430|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]]
[13:01:11] <Dreamy_Jazz>	 Failed again.
[13:01:44] <hashar>	 or well the patch you are deploying breaks the world
[13:02:00] <Dreamy_Jazz>	 Maybe
[13:02:11] <Dreamy_Jazz>	 I doubt it would be the MediaModeration patches
[13:02:22] <Dreamy_Jazz>	 Perhaps the core patch is suspect
[13:04:18] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[13:04:25] <hashar>	 who knows
[13:04:35] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[13:04:36] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[13:04:50] <hashar>	 but that logger code is a socket_sendto() harnessed behind a $this->useUdp()
[13:04:56] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[13:04:57] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[13:05:02] <hashar>	 why it cant send to udp .. I have no clue
[13:05:11] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[13:05:23] <wikibugs>	 (03PS1) 10Effie Mouzeli: kafka-main: Replace kafka-main1004 with kafka-main1009 [puppet] - 10https://gerrit.wikimedia.org/r/1100447 (https://phabricator.wikimedia.org/T363214)
[13:05:38] <wikibugs>	 (03PS1) 10Jelto: Rename kubernetes1023 and kubernetes1024 [puppet] - 10https://gerrit.wikimedia.org/r/1100448 (https://phabricator.wikimedia.org/T377876)
[13:05:50] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[13:05:53] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[13:05:54] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[13:05:55] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[13:05:58] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[13:05:59] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[13:06:01] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[13:06:07] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[13:06:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1172 (T371742)', diff saved to https://phabricator.wikimedia.org/P71537 and previous config saved to /var/cache/conftool/dbconfig/20241204-130614-ladsgroup.json
[13:06:18] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[13:06:23] <Dreamy_Jazz>	 Given that I successfully backported a config change less than an hour ago, I'll try undoing the core change.
[13:06:33] <Dreamy_Jazz>	 To see if it was the core.
[13:06:36] <Dreamy_Jazz>	 *core patc
[13:06:39] <Dreamy_Jazz>	 *patch
[13:07:03] <wikibugs>	 (03PS1) 10Dreamy Jazz: Revert "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100449
[13:07:21] <wikibugs>	 (03PS2) 10Dreamy Jazz: Revert "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100449
[13:07:28] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Revert "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100449 (owner: 10Dreamy Jazz)
[13:08:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100449 (owner: 10Dreamy Jazz)
[13:09:40] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[13:09:43] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[13:09:44] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[13:09:47] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[13:09:49] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[13:09:51] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[13:11:02] <wikibugs>	 (03PS18) 10Arnaudb: mysql: add port number to MysqlClient [software/spicerack] - 10https://gerrit.wikimedia.org/r/1100114 (https://phabricator.wikimedia.org/T381086)
[13:12:46] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[13:12:49] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[13:12:51] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[13:12:54] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[13:12:55] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[13:12:57] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[13:14:17] <hashar>	 those helmfile `!log` should be cut somehow
[13:14:22] <hashar>	 that is rather spammy
[13:14:55] <urbanecm>	 hashar: and meaningless. i have no idea what was deployed there.
[13:15:09] <hashar>	 miscweb accross all 3 namespaces
[13:15:41] <urbanecm>	 i meant what changed inside miscweb
[13:17:40] <jelto>	 the helm diff should tell you what you are changing/deploying. Miscweb is in one namespace + query service in another namespace.
[13:18:55] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1154.eqiad.wmnet with reason: Alter table
[13:18:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1154.eqiad.wmnet with reason: Alter table
[13:19:06] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10379577 (10BBlack) Seems like a net win to me.  Reduces some error-prone process stuff and makes life simpler!
[13:19:32] <urbanecm>	 jelto: i know, but that is only shown when i'm the deployer, right? not when i'm looking at SAL and trying to figure out what changed from the logs.
[13:19:42] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Alter table
[13:19:45] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Alter table
[13:19:45] <urbanecm>	 or is the diff stored somewhere for later reivew?
[13:19:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Alter table
[13:19:52] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Alter table
[13:20:07] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Alter table
[13:20:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Alter table
[13:20:39] <jelto>	 that's right, the diff is only visible for the deployer. But the diff can contain secrets and credentials so it's not public
[13:20:48] <hashar>	 urbanecm: well you gotta check the deployment-charts repo, find out some image version got bumped, from there head to the repo definining it. You can do a diff of the image yes
[13:21:30] <wikibugs>	 (03PS1) 10Slyngshede: Password reset: use passlib for hashing [software/bitu] - 10https://gerrit.wikimedia.org/r/1100451
[13:22:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10379599 (10BBlack) Also, probably the way to standardize this for sanity (avoiding ORIGIN mistakes on both ends) is to follow some simple rules that:  1. Every one of the new includ...
[13:23:17] <wikibugs>	 (03PS1) 10Effie Mouzeli: Update various kafka-main connection strings for kafka-main1009 Replacing kafka-main1004 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100452 (https://phabricator.wikimedia.org/T363214)
[13:27:45] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Stats: Move StatsFactory flush into emitBufferedStats" [core] (wmf/1.44.0-wmf.6) - 10https://gerrit.wikimedia.org/r/1100449 (owner: 10Dreamy Jazz)
[13:28:15] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1100442|Stats: Move StatsFactory flush into emitBufferedStats (T380609)]], [[gerrit:1100434|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100430|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100449|Revert "Stats: Move StatsFactory flush into emitBufferedSta
[13:28:15] <logmsgbot>	 ts"]]
[13:28:19] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[13:28:19] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[13:29:01] <Dreamy_Jazz>	 Scap is working with the core backport reverted.
[13:30:15] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename kubernetes1023 and kubernetes1024 [puppet] - 10https://gerrit.wikimedia.org/r/1100448 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[13:30:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1023-1024].eqiad.wmnet
[13:31:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1023-1024].eqiad.wmnet
[13:32:57] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename kubernetes1023 and kubernetes1024 [puppet] - 10https://gerrit.wikimedia.org/r/1100448 (https://phabricator.wikimedia.org/T377876) (owner: 10Jelto)
[13:33:49] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1100442|Stats: Move StatsFactory flush into emitBufferedStats (T380609)]], [[gerrit:1100434|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100430|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100449|Revert "Stats: Move StatsFactory flush into emitBufferedStats"]] synced
[13:33:49] <logmsgbot>	 to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:33:57] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
[13:33:58] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[13:33:58] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[13:35:17] <logmsgbot>	 !log jayme@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2015.codfw.wmnet with OS bookworm
[13:35:46] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2015.codfw.wmnet with OS bookworm
[13:38:14] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10379654 (10cmooney)
[13:39:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1023 to wikikube-worker1036
[13:39:10] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10379658 (10cmooney)
[13:39:24] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:39:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10379659 (10cmooney) >>! In T362985#10379599, @BBlack wrote: > Also, probably the way to standardize this for sanity (avoiding ORIGIN mistakes on both ends) is to follow some simple...
[13:39:53] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:39:53] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:41:14] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2016,2171-2172].codfw.wmnet
[13:41:16] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2016,2171-2172].codfw.wmnet
[13:42:13] <icinga-wm>	 PROBLEM - BGP status on lsw1-c5-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:42:53] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100442|Stats: Move StatsFactory flush into emitBufferedStats (T380609)]], [[gerrit:1100434|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100430|Fix handling of 'last-checked' as 'never' in scanFilesInScanTable.php (T355169)]], [[gerrit:1100449|Revert "Stats: Move StatsFactory flush into emitBufferedSt
[13:42:53] <logmsgbot>	 ats"]] (duration: 14m 38s)
[13:42:57] <stashbot>	 T380609: Maintenance scripts do not emit StatsLib metrics - https://phabricator.wikimedia.org/T380609
[13:42:57] <stashbot>	 T355169: Run scanFilesInScanTable.php automatically on WMF wikis - https://phabricator.wikimedia.org/T355169
[13:43:10] <Dreamy_Jazz>	 abijeet: You here for the window?
[13:43:11] <wikibugs>	 (03CR) 10Filippo Giunchedi: alerts: enable paging mariadb through prometheus (034 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1100042 (https://phabricator.wikimedia.org/T381276) (owner: 10Arnaudb)
[13:43:57] <Dreamy_Jazz>	 I'm going to not deploy the core backport, as it appears to be broken on production.
[13:45:21] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+2] Ensure IP reveal buttons are not shown on Special:MassGlobalBlock [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[13:45:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-lab1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:46:05] <wikibugs>	 (03Merged) 10jenkins-bot: Ensure IP reveal buttons are not shown on Special:MassGlobalBlock [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[13:47:01] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1100150|Ensure IP reveal buttons are not shown on Special:MassGlobalBlock (T124607)]]
[13:47:08] <stashbot>	 T124607: Create a special page for mass global (un)block - https://phabricator.wikimedia.org/T124607
[13:47:24] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1023 to wikikube-worker1036 - jelto@cumin1002"
[13:47:34] <Dreamy_Jazz>	 jouncebot: nowandnext
[13:47:34] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 12 minute(s)
[13:47:34] <jouncebot>	 In 0 hour(s) and 12 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[13:47:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1023 to wikikube-worker1036 - jelto@cumin1002"
[13:47:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:47:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1036
[13:47:56] <Dreamy_Jazz>	 Not the window yet. Got myself confused as to when it started.
[13:48:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1036
[13:48:59] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] "Cursory look from my side." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[13:49:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1023 to wikikube-worker1036
[13:49:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubernetes1024:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes1024 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:50:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from kubernetes1024 to wikikube-worker1037
[13:50:09] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[2445-2447].codfw.wmnet
[13:50:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:51:56] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[2445-2447].codfw.wmnet
[13:52:46] <wikibugs>	 (03PS2) 10Slyngshede: Password reset: use passlib for hashing [software/bitu] - 10https://gerrit.wikimedia.org/r/1100451 (https://phabricator.wikimedia.org/T381327)
[13:53:10] <logmsgbot>	 !log dreamyjazz@deploy2002 tchanders, dreamyjazz: Backport for [[gerrit:1100150|Ensure IP reveal buttons are not shown on Special:MassGlobalBlock (T124607)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:53:14] <stashbot>	 T124607: Create a special page for mass global (un)block - https://phabricator.wikimedia.org/T124607
[13:53:34] <logmsgbot>	 !log dreamyjazz@deploy2002 tchanders, dreamyjazz: Continuing with sync
[13:54:14] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: gateway-check: Make indentation consistent [puppet] - 10https://gerrit.wikimedia.org/r/1100111
[13:54:14] <wikibugs>	 (03PS14) 10Alexandros Kosiaris: gateway-check: Support (and use) per wiki rules [puppet] - 10https://gerrit.wikimedia.org/r/1100112 (https://phabricator.wikimedia.org/T374683)
[13:54:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1024 to wikikube-worker1037 - jelto@cumin1002"
[13:54:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: gateway-check: Support (and use) per wiki rules (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1100112 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[13:54:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/1100451 (https://phabricator.wikimedia.org/T381327) (owner: 10Slyngshede)
[13:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:54:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1024 to wikikube-worker1037 - jelto@cumin1002"
[13:54:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:54:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1037
[13:54:55] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2015.codfw.wmnet with reason: host reimage
[13:55:43] <wikibugs>	 (03PS1) 10Muehlenhoff: maps: Allow disabling the installation of kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/1100456
[13:55:43] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:55:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1037
[13:55:51] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:55:59] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Password reset: use passlib for hashing [software/bitu] - 10https://gerrit.wikimedia.org/r/1100451 (https://phabricator.wikimedia.org/T381327) (owner: 10Slyngshede)
[13:56:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1024 to wikikube-worker1037
[13:56:31] <icinga-wm>	 PROBLEM - Host mw2445 is DOWN: PING CRITICAL - Packet loss = 100%
[13:57:17] <jayme>	 this is me
[13:57:59] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2445-2447].codfw.wmnet with reason: reimage
[13:58:13] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:04-1] kafka-main: Replace kafka-main1004 with kafka-main1009 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100447 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[13:58:18] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2445-2447].codfw.wmnet with reason: reimage
[13:58:24] <icinga-wm>	 RECOVERY - Host mw2445 is UP: PING OK - Packet loss = 0%, RTA = 33.84 ms
[13:58:24] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2015.codfw.wmnet with reason: host reimage
[13:58:29] <wikibugs>	 (03Merged) 10jenkins-bot: Password reset: use passlib for hashing [software/bitu] - 10https://gerrit.wikimedia.org/r/1100451 (https://phabricator.wikimedia.org/T381327) (owner: 10Slyngshede)
[13:59:00] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10379753 (10Jclark-ctr)
[13:59:17] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: profile::mariadb::core: alert on all replicas [puppet] - 10https://gerrit.wikimedia.org/r/1100457 (https://phabricator.wikimedia.org/T381276)
[13:59:34] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10379756 (10Jclark-ctr)
[14:00:10] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100150|Ensure IP reveal buttons are not shown on Special:MassGlobalBlock (T124607)]] (duration: 13m 08s)
[14:00:14] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: Time to do the UTC afternoon backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400).
[14:00:15] <jouncebot>	 abijeet and Dreamy_Jazz: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:16] <stashbot>	 T124607: Create a special page for mass global (un)block - https://phabricator.wikimedia.org/T124607
[14:00:22] <Dreamy_Jazz>	 \o
[14:00:35] <Dreamy_Jazz>	 My backporting is now done
[14:00:38] <abijeet>	 o/
[14:00:53] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1043.eqiad.wmnet with OS bookworm
[14:00:54] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1044.eqiad.wmnet with OS bookworm
[14:00:56] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1045.eqiad.wmnet with OS bookworm
[14:00:59] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10379763 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm
[14:01:01] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10379764 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1044.eqiad.wmnet with OS bookworm
[14:01:05] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10379765 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm
[14:01:09] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2445.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:01:18] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2447.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:01:24] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.provision for host mw2446.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:01:37] <Dreamy_Jazz>	 I want to go for lunch. Would another deployer be able to deploy the remaining change?
[14:01:37] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100456 (owner: 10Muehlenhoff)
[14:01:50] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] profile::mariadb::core: alert on all replicas [puppet] - 10https://gerrit.wikimedia.org/r/1100457 (https://phabricator.wikimedia.org/T381276) (owner: 10Giuseppe Lavagetto)
[14:01:54] <wikibugs>	 (03PS2) 10Effie Mouzeli: kafka-main: Replace kafka-main1004 with kafka-main1009 [puppet] - 10https://gerrit.wikimedia.org/r/1100447 (https://phabricator.wikimedia.org/T363214)
[14:02:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:02:54] <Lucas_WMDE>	 o/
[14:02:57] <Lucas_WMDE>	 I think I can deploy in a few minutes
[14:04:07] <abijeet>	 Cool, thanks!
[14:04:44] <abijeet>	 Just throwing it out there, I didn't get the sticker for breaking legalteam wiki
[14:04:55] <abijeet>	 I helped fix it too
[14:05:09] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1036.eqiad.wmnet wikikube-worker1037.eqiad.wmnet on all recursors
[14:05:13] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1036.eqiad.wmnet wikikube-worker1037.eqiad.wmnet on all recursors
[14:05:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] "Thanks both!" [puppet] - 10https://gerrit.wikimedia.org/r/1100112 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[14:05:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] gateway-check: Make indentation consistent [puppet] - 10https://gerrit.wikimedia.org/r/1100111 (owner: 10Alexandros Kosiaris)
[14:06:36] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:06:36] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:06:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[14:07:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1036.eqiad.wmnet with OS bookworm
[14:07:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:07:46] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:07:48] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 226, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:07:52] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 310, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:08:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:08:40] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:09:23] <wikibugs>	 (03CR) 10Abijeet Patro: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:09:51] <Lucas_WMDE>	 o_O
[14:10:00] <Lucas_WMDE>	 oh. just another DNS failure
[14:10:09] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "probably due to T374830" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:10:11] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] kafka-main: Replace kafka-main1004 with kafka-main1009 [puppet] - 10https://gerrit.wikimedia.org/r/1100447 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[14:10:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:10:18] <Lucas_WMDE>	 took me a second to see it
[14:10:53] <wikibugs>	 (03Merged) 10jenkins-bot: Translate: Enable message group subscription for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100352 (https://phabricator.wikimedia.org/T372386) (owner: 10Abijeet Patro)
[14:11:20] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1100352|Translate: Enable message group subscription for 6 wikis (T372386)]]
[14:11:24] <stashbot>	 T372386: Enable message group subscription feature on Wikimedia wikis - https://phabricator.wikimedia.org/T372386
[14:13:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: gateway-check: Support (and use) per wiki rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100112 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[14:17:22] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 abi, lucaswerkmeister-wmde: Backport for [[gerrit:1100352|Translate: Enable message group subscription for 6 wikis (T372386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:17:25] <stashbot>	 T372386: Enable message group subscription feature on Wikimedia wikis - https://phabricator.wikimedia.org/T372386
[14:17:29] <Lucas_WMDE>	 abijeet: please test :)
[14:18:08] <abijeet>	 Lucas_WMDE, on it
[14:18:20] <icinga-wm>	 RECOVERY - BGP status on lsw1-c5-codfw.mgmt is OK: BGP OK - up: 8, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:18:27] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2015.codfw.wmnet with OS bookworm
[14:19:05] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
[14:20:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 4 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10379859 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1025.eqiad.wmnet with OS bullseye
[14:20:40] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.186 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:20:40] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53070 bytes in 0.194 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:20:42] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 08 Feb 2025 11:19:52 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:22:08] <abijeet>	 Lucas_WMDE, Looks OK on my end. Please proceed.
[14:22:56] <Lucas_WMDE>	 ok thanks!
[14:22:58] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 abi, lucaswerkmeister-wmde: Continuing with sync
[14:23:36] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1036.eqiad.wmnet with reason: host reimage
[14:23:46] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100417 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[14:27:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1036.eqiad.wmnet with reason: host reimage
[14:28:00] <Krinkle>	 Dreamy_Jazz: is there a task for the scap problem where its internal maintennace scripts fail due to network being blocked / due to MW now using the network from a script that was previously presumed to be offline (= enabling statslib in maint = the patch).
[14:28:06] <wikibugs>	 (03PS1) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322)
[14:28:15] <Krinkle>	 This will break scap again next week if not fixed before then
[14:28:33] <Dreamy_Jazz>	 I tagged the associated task as a train blocker for the next train
[14:28:41] <Dreamy_Jazz>	 I haven't filed a separate task for it.
[14:28:54] <Krinkle>	 Found it, thanks!
[14:29:32] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100352|Translate: Enable message group subscription for 6 wikis (T372386)]] (duration: 18m 12s)
[14:29:35] <stashbot>	 T372386: Enable message group subscription feature on Wikimedia wikis - https://phabricator.wikimedia.org/T372386
[14:30:08] <Lucas_WMDE>	 anything else to deploy?
[14:30:27] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to <wmf group> for <cpetrillo> - https://phabricator.wikimedia.org/T381464#10379896 (10andrea.denisse) a:03andrea.denisse
[14:31:12] <abijeet>	 Lucas_WMDE, thank you.
[14:31:18] <wikibugs>	 (03PS3) 10Andrew Bogott: Remove ceph references to cloudcephosd100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1098095 (https://phabricator.wikimedia.org/T380893)
[14:31:18] <wikibugs>	 (03PS2) 10Andrew Bogott: Remove refs to cloudcephmon100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1098096 (https://phabricator.wikimedia.org/T380893)
[14:31:18] <Lucas_WMDE>	 yw :)
[14:31:23] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:31:24] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1098095 (https://phabricator.wikimedia.org/T380893) (owner: 10Andrew Bogott)
[14:31:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:36] <wikibugs>	 10ops-magru, 06DC-Ops, 06Infrastructure-Foundations, 06Traffic: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10379903 (10ssingh) 05Open→03Resolved a:03ssingh
[14:32:52] <wikibugs>	 (03PS5) 10Dreamy Jazz: Update MediaModeration module to run scans automatically [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169)
[14:34:24] <Krinkle>	 Dreamy_Jazz: do you have access to a stack trace from that mergeMessageFileList.php warning?
[14:34:36] <Dreamy_Jazz>	 There was no stack trace printed to the console.
[14:34:49] <Dreamy_Jazz>	 Is there somewhere else it could have been printed to?
[14:35:06] <Krinkle>	 I see, yeah, there wouldn't be if it's plain php-cli stderr. MediaWiki obtains a trace when reporting them to Logstash.
[14:35:28] <Krinkle>	 But given it's run in an enforced-offline context while building the docker image, that would have been lost and/or disabled by configuration.
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:38:27] <wikibugs>	 (03CR) 10Gmodena: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[14:40:01] <wikibugs>	 (03PS6) 10Fabfur: cache:haproxy: longer capture buffers for relevant headers [puppet] - 10https://gerrit.wikimedia.org/r/1100113 (https://phabricator.wikimedia.org/T370668)
[14:40:16] <wikibugs>	 (03CR) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[14:40:41] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Remove ceph references to cloudcephosd100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1098095 (https://phabricator.wikimedia.org/T380893) (owner: 10Andrew Bogott)
[14:40:55] <wikibugs>	 (03CR) 10Muehlenhoff: "Good catch! Some questions and comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[14:40:58] <wikibugs>	 (03PS1) 10Hnowlan: php8.1: rebuild to pick up mercurius images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1100462 (https://phabricator.wikimedia.org/T371701)
[14:42:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] php8.1: rebuild to pick up mercurius images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1100462 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[14:43:16] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] php8.1: rebuild to pick up mercurius images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1100462 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[14:44:03] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] Update various kafka-main connection strings for kafka-main1009 Replacing kafka-main1004 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100452 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[14:44:31] <TheresNoTime>	 jouncebot: nowandnext
[14:44:31] <jouncebot>	 For the next 0 hour(s) and 15 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[14:44:31] <jouncebot>	 In 0 hour(s) and 15 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1500)
[14:44:36] <wikibugs>	 (03CR) 10Hnowlan: [V:03+2 C:03+2] php8.1: rebuild to pick up mercurius images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1100462 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[14:45:25] <wikibugs>	 (03CR) 10Dreamy Jazz: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[14:46:06] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1025.eqiad.wmnet with reason: host reimage
[14:46:26] <TheresNoTime>	 hihi (cc Lucas_WMDE) — I intend to deploy 1090502, any issues?
[14:46:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1036.eqiad.wmnet with OS bookworm
[14:46:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T371742)', diff saved to https://phabricator.wikimedia.org/P71540 and previous config saved to /var/cache/conftool/dbconfig/20241204-144651-ladsgroup.json
[14:46:54] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[14:46:55] <wikibugs>	 (03CR) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[14:47:00] <wikibugs>	 (03CR) 10Brouberol: dse-k8s-services: rename mw-dumps helmfiles. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322) (owner: 10Gmodena)
[14:47:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1037.eqiad.wmnet with OS bookworm
[14:47:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1090502 (https://phabricator.wikimedia.org/T373634) (owner: 10Srishakatux)
[14:48:12] <wikibugs>	 (03Merged) 10jenkins-bot: Add new namespaces to hsb wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1090502 (https://phabricator.wikimedia.org/T373634) (owner: 10Srishakatux)
[14:48:45] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1090502|Add new namespaces to hsb wiktionary (T373634)]]
[14:48:48] <stashbot>	 T373634: Add new namespaces to hsb.wiktionary.org - https://phabricator.wikimedia.org/T373634
[14:49:54] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1025.eqiad.wmnet with reason: host reimage
[14:50:11] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2445.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:50:17] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2447.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:50:20] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2446.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
[14:51:25] <logmsgbot>	 !log samtar@deploy2002 samtar, srishakatux: Backport for [[gerrit:1090502|Add new namespaces to hsb wiktionary (T373634)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:51:34] <wikibugs>	 (03CR) 10Muehlenhoff: php: Allow provisioning MediaWiki with PHP 8.1 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752) (owner: 10BryanDavis)
[14:51:46] <Lucas_WMDE>	 TheresNoTime: I saw it late but feel free to go ahead :)
[14:52:20] <logmsgbot>	 !log samtar@deploy2002 samtar, srishakatux: Continuing with sync
[14:52:33] <Lucas_WMDE>	 (also now I’m imagining us sending around “intend to deploy” announcement emails like browsers do for “intend to ship” lol)
[14:53:35] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] "LGTM, but this will need a manual rebase." [puppet] - 10https://gerrit.wikimedia.org/r/1084247 (owner: 10Scott French)
[14:54:57] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2015.codfw.wmnet
[14:54:59] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2015.codfw.wmnet
[14:56:28] <hnowlan>	 jouncebot: nowandnext
[14:56:28] <jouncebot>	 For the next 0 hour(s) and 3 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1400)
[14:56:28] <jouncebot>	 In 0 hour(s) and 3 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1500)
[14:56:53] <wikibugs>	 (03PS1) 10Bking: cumin: add aliases for net-new wdqs services [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150)
[14:57:02] * TheresNoTime is almost done deploying
[14:59:01] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1090502|Add new namespaces to hsb wiktionary (T373634)]] (duration: 10m 16s)
[14:59:04] <stashbot>	 T373634: Add new namespaces to hsb.wiktionary.org - https://phabricator.wikimedia.org/T373634
[15:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1500)
[15:00:25] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[15:00:59] <wikibugs>	 (03PS7) 10Fabfur: cache:haproxy: longer capture buffers for relevant headers [puppet] - 10https://gerrit.wikimedia.org/r/1100113 (https://phabricator.wikimedia.org/T370668)
[15:01:20] <TheresNoTime>	 !log '[samtar@deploy2002 ~]$ mwscript-k8s --comment="T373634" -f -- namespaceDupes.php --wiki hsbwiktionary --fix' for T373634
[15:01:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:01:47] * TheresNoTime done
[15:01:55] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s4 on db1245 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[15:01:58] <Lucas_WMDE>	 custom shell prompt spotted 👀
[15:01:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P71541 and previous config saved to /var/cache/conftool/dbconfig/20241204-150157-ladsgroup.json
[15:02:12] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] cumin: add aliases for net-new wdqs services [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[15:02:50] <wikibugs>	 (03CR) 10Bking: [C:03+2] cumin: add aliases for net-new wdqs services [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[15:03:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1037.eqiad.wmnet with reason: host reimage
[15:03:47] <wikibugs>	 (03PS1) 10JMeybohm: Rename mw244[5-7] to wikikube-worker217[3-5] [puppet] - 10https://gerrit.wikimedia.org/r/1100467 (https://phabricator.wikimedia.org/T377877)
[15:06:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1037.eqiad.wmnet with reason: host reimage
[15:06:45] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache:haproxy: longer capture buffers for relevant headers [puppet] - 10https://gerrit.wikimedia.org/r/1100113 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[15:06:51] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
[15:07:30] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1100467 (https://phabricator.wikimedia.org/T377877) (owner: 10JMeybohm)
[15:08:23] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Rename mw244[5-7] to wikikube-worker217[3-5] [puppet] - 10https://gerrit.wikimedia.org/r/1100467 (https://phabricator.wikimedia.org/T377877) (owner: 10JMeybohm)
[15:08:59] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache:haproxy: longer capture buffers for relevant headers [puppet] - 10https://gerrit.wikimedia.org/r/1100113 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[15:09:10] <wikibugs>	 (03CR) 10BBlack: [C:03+1] Update geo-maps file's US section [dns] - 10https://gerrit.wikimedia.org/r/1097521 (owner: 10CDobbins)
[15:10:09] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2445 to wikikube-worker2173
[15:10:20] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:10:23] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2446 to wikikube-worker2174
[15:10:28] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.rename from mw2447 to wikikube-worker2175
[15:13:04] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:13:05] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:14:39] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] kafka-main: Replace kafka-main1004 with kafka-main1009 [puppet] - 10https://gerrit.wikimedia.org/r/1100447 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[15:16:06] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:17:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P71542 and previous config saved to /var/cache/conftool/dbconfig/20241204-151705-ladsgroup.json
[15:17:21] <hnowlan>	 I'm gonna do a quick scap sync-world to rebuild the 8.1 production images if there are no objections 
[15:18:00] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2445 to wikikube-worker2173 - jayme@cumin2002"
[15:18:28] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2445 to wikikube-worker2173 - jayme@cumin2002"
[15:18:28] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:18:29] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2173
[15:19:13] <wikibugs>	 (03PS1) 10CDanis: tunnelencabulator: add upload-lb support [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100470
[15:20:22] <wikibugs>	 (03PS2) 10CDanis: tunnelencabulator: add upload-lb support [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100470
[15:20:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] tunnelencabulator: add upload-lb support [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100470 (owner: 10CDanis)
[15:20:27] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2173
[15:20:48] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
[15:20:52] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1167: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1100471
[15:21:04] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] tunnelencabulator: add upload-lb support [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100470 (owner: 10CDanis)
[15:21:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P71543 and previous config saved to /var/cache/conftool/dbconfig/20241204-152105-root.json
[15:21:08] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2445 to wikikube-worker2173
[15:21:25] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1045.eqiad.wmnet with OS bookworm
[15:21:25] <wikibugs>	 (03CR) 10CDanis: [V:03+2 C:03+2] tunnelencabulator: add upload-lb support [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100470 (owner: 10CDanis)
[15:21:31] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10380106 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm ex...
[15:21:34] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2446 to wikikube-worker2174 - jayme@cumin2002"
[15:21:39] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2446 to wikikube-worker2174 - jayme@cumin2002"
[15:21:40] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:21:41] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2174
[15:21:48] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1167: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1100471 (owner: 10Marostegui)
[15:22:02] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:22:06] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2174
[15:22:47] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2446 to wikikube-worker2174
[15:23:24] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10380111 (10JMeybohm)
[15:24:08] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381478#10380114 (10JMeybohm)
[15:24:26] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:24:27] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2175
[15:24:42] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2175
[15:24:45] <logmsgbot>	 !log hnowlan@deploy2002 Started scap sync-world: Rebuild and deploy to pick up new php8.1 base
[15:25:22] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2447 to wikikube-worker2175
[15:26:19] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2173.codfw.wmnet with OS bookworm
[15:26:30] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2173
[15:26:43] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2173.codfw.wmnet wikikube-worker2174.codfw.wmnet wikikube-worker2175.codfw.wmnet on all recursors
[15:26:46] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2173.codfw.wmnet wikikube-worker2174.codfw.wmnet wikikube-worker2175.codfw.wmnet on all recursors
[15:26:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:26:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1037.eqiad.wmnet with OS bookworm
[15:27:37] <jelto>	 !log homer 'cr*eqiad*' commit 'T377876'
[15:27:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:40] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[15:27:40] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2174.codfw.wmnet with OS bookworm
[15:27:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] profile::mariadb::core: alert on all replicas [puppet] - 10https://gerrit.wikimedia.org/r/1100457 (https://phabricator.wikimedia.org/T381276) (owner: 10Giuseppe Lavagetto)
[15:28:26] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2175.codfw.wmnet with OS bookworm
[15:30:25] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2173 - jayme@cumin2002"
[15:30:31] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2173 - jayme@cumin2002"
[15:30:31] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:30:31] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2173.codfw.wmnet 78.48.192.10.in-addr.arpa 8.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:30:35] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2173.codfw.wmnet 78.48.192.10.in-addr.arpa 8.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:30:37] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2173
[15:30:49] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
[15:31:26] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2173
[15:31:27] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2173
[15:31:39] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2174
[15:31:47] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:32:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T371742)', diff saved to https://phabricator.wikimedia.org/P71544 and previous config saved to /var/cache/conftool/dbconfig/20241204-153212-ladsgroup.json
[15:32:14] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[15:32:15] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[15:32:27] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[15:32:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1177 (T371742)', diff saved to https://phabricator.wikimedia.org/P71545 and previous config saved to /var/cache/conftool/dbconfig/20241204-153234-ladsgroup.json
[15:36:01] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2174 - jayme@cumin2002"
[15:36:07] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2174 - jayme@cumin2002"
[15:36:07] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:36:08] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2174.codfw.wmnet 79.48.192.10.in-addr.arpa 9.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:36:11] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2174.codfw.wmnet 79.48.192.10.in-addr.arpa 9.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:36:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P71546 and previous config saved to /var/cache/conftool/dbconfig/20241204-153611-root.json
[15:36:12] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2174
[15:36:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 24.59% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:36:39] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2174
[15:36:39] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2174
[15:37:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:37:26] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2175
[15:39:18] <wikibugs>	 (03PS5) 10Gmodena: dse-k8s-services: rename mw-dumps helmfiles. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100420 (https://phabricator.wikimedia.org/T381322)
[15:39:29] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.netbox
[15:41:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 24.59% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:41:33] <wikibugs>	 (03CR) 10Volans: "post-merge -1, doesn't work" [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[15:41:33] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1036-1037].eqiad.wmnet
[15:41:35] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1036-1037].eqiad.wmnet
[15:42:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:43:18] <wikibugs>	 (03Abandoned) 10Ahmon Dancy: bootstrap-scap-target.sh: Temp hard code scap version [puppet] - 10https://gerrit.wikimedia.org/r/1100204 (https://phabricator.wikimedia.org/T380772) (owner: 10Ahmon Dancy)
[15:45:53] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2175 - jayme@cumin2002"
[15:45:59] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2175 - jayme@cumin2002"
[15:45:59] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:45:59] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2175.codfw.wmnet 80.48.192.10.in-addr.arpa 0.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:46:03] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2175.codfw.wmnet 80.48.192.10.in-addr.arpa 0.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:46:04] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2175
[15:47:46] <wikibugs>	 10ops-eqiad, 06collaboration-services, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381504 (10Jelto) 03NEW
[15:49:20] <wikibugs>	 (03CR) 10JHathaway: puppet 7: fix facter.conf location (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[15:50:12] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on mw2440 - https://phabricator.wikimedia.org/T381469#10380226 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm T 381478 - renamed server to wikikube-worker2015 T 358489 - probably false alert from this ticket.
[15:50:33] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2173.codfw.wmnet with reason: host reimage
[15:50:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on mw2444 - https://phabricator.wikimedia.org/T381472#10380230 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm T 381478 - renamed server to wikikube-worker2172 T 358489 - probably false alert from this ticket.
[15:51:04] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2175
[15:51:04] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2175
[15:51:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P71548 and previous config saved to /var/cache/conftool/dbconfig/20241204-155116-root.json
[15:51:26] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] "Looks correct.  One optional suggestion." [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772) (owner: 10Jaime Nuche)
[15:52:03] <wikibugs>	 (03PS2) 10Pcoombe: CSP for banner preview: allow remind me later SMS host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[15:54:20] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2173.codfw.wmnet with reason: host reimage
[15:54:48] <wikibugs>	 (03PS1) 10CDanis: tunnelencabulator: bump version [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100475
[15:54:57] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Not sure why in your PCC run cp7001 wasn't checked since you clearly specify it but" [puppet] - 10https://gerrit.wikimedia.org/r/1090814 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[15:54:57] <wikibugs>	 (03CR) 10CDanis: [V:03+2 C:03+2] tunnelencabulator: bump version [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/1100475 (owner: 10CDanis)
[15:55:33] <wikibugs>	 (03PS1) 10Muehlenhoff: yarn: Restrict access to Envoy port [puppet] - 10https://gerrit.wikimedia.org/r/1100476
[15:55:56] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2174.codfw.wmnet with reason: host reimage
[15:58:16] <wikibugs>	 (03CR) 10Bking: [C:03+2] opensearch: Add resource to define cross-cluster settings [puppet] - 10https://gerrit.wikimedia.org/r/1091326 (https://phabricator.wikimedia.org/T380752) (owner: 10Ebernhardson)
[15:58:24] <wikibugs>	 (03PS4) 10Jaime Nuche: bootstrap-scap-target.sh: handle multiple wheel versions [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772)
[15:58:25] <wikibugs>	 (03CR) 10Bking: opensearch: Add resource to define cross-cluster settings [puppet] - 10https://gerrit.wikimedia.org/r/1091326 (https://phabricator.wikimedia.org/T380752) (owner: 10Ebernhardson)
[15:58:32] <wikibugs>	 06SRE, 10Dumps 2.0, 10Dumps-Generation: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10380240 (10Marostegui) @xcollazo what the status of this? We keep seeing issues with dumps. We just got the enwiki dumps replica lagged again while dumps were r...
[15:58:33] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100476 (owner: 10Muehlenhoff)
[15:58:56] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1090814 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[15:59:05] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772) (owner: 10Jaime Nuche)
[15:59:32] <wikibugs>	 (03CR) 10Gmodena: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[16:01:33] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2174.codfw.wmnet with reason: host reimage
[16:04:34] <wikibugs>	 (03CR) 10Jaime Nuche: bootstrap-scap-target.sh: handle multiple wheel versions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772) (owner: 10Jaime Nuche)
[16:06:16] <wikibugs>	 (03CR) 10CDanis: [C:03+1] New ferm rule to permit HDFS data flows and mark as low-prio for qos [puppet] - 10https://gerrit.wikimedia.org/r/1100166 (https://phabricator.wikimedia.org/T381389) (owner: 10Cathal Mooney)
[16:06:16] <logmsgbot>	 !log hnowlan@deploy2002 Finished scap sync-world: Rebuild and deploy to pick up new php8.1 base (duration: 42m 17s)
[16:06:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P71549 and previous config saved to /var/cache/conftool/dbconfig/20241204-160622-root.json
[16:06:31] <wikibugs>	 (03CR) 10Jaime Nuche: "@cgoubert@wikimedia.org We need this change to fix bootstrapping of new Scap hosts. Could you help with merging? :)" [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772) (owner: 10Jaime Nuche)
[16:06:39] <wikibugs>	 (03CR) 10CDanis: [C:03+1] P:idp enable JMX exporter [puppet] - 10https://gerrit.wikimedia.org/r/1098023 (https://phabricator.wikimedia.org/T380402) (owner: 10Slyngshede)
[16:07:22] <wikibugs>	 (03CR) 10Bking: "Elasticsearch's docs warn that "there is no validation to block unsupported settings from the keystore and they can cause Elasticsearch to" [puppet] - 10https://gerrit.wikimedia.org/r/1091325 (https://phabricator.wikimedia.org/T380752) (owner: 10Ebernhardson)
[16:07:58] <wikibugs>	 (03CR) 10Bking: [C:03+1] opensearch: Add resource to define cross-cluster settings [puppet] - 10https://gerrit.wikimedia.org/r/1091326 (https://phabricator.wikimedia.org/T380752) (owner: 10Ebernhardson)
[16:08:06] <wikibugs>	 (03CR) 10SBassett: [C:03+1] "This incorporates the new connect-src directive limitation within the policy, which was suggested by acooper." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[16:08:12] <wikibugs>	 (03CR) 10CDanis: [C:03+2] chart-renderer: scrape metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100202 (https://phabricator.wikimedia.org/T379687) (owner: 10CDanis)
[16:08:44] <claime>	 jnuche: is the change tested and ok to merge?
[16:09:14] <jnuche>	 claime: yeah, I tested it on our scap3-dev env
[16:09:21] <claime>	 ok cool
[16:09:25] <wikibugs>	 (03Merged) 10jenkins-bot: chart-renderer: scrape metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100202 (https://phabricator.wikimedia.org/T379687) (owner: 10CDanis)
[16:09:25] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] bootstrap-scap-target.sh: handle multiple wheel versions [puppet] - 10https://gerrit.wikimedia.org/r/1100407 (https://phabricator.wikimedia.org/T380772) (owner: 10Jaime Nuche)
[16:09:26] <wikibugs>	 (03PS1) 10CDanis: bump chart-renderer chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100479
[16:09:38] <wikibugs>	 (03CR) 10CDanis: [C:03+2] bump chart-renderer chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100479 (owner: 10CDanis)
[16:09:57] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2175.codfw.wmnet with reason: host reimage
[16:09:58] <jnuche>	 claime: thx!
[16:10:41] <claime>	 jnuche: merged, do you need me to run puppet on a specific server?
[16:10:42] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:10:48] <wikibugs>	 (03Merged) 10jenkins-bot: bump chart-renderer chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100479 (owner: 10CDanis)
[16:11:00] <joelyrookewmde>	 Hi all, I'm planning to run a maintenance script to add wikidata support for idwikivoyage as per T381083. Is that disruptive to your current activities/ should I do it another time?
[16:11:00] <stashbot>	 T381083: Add Wikidata support for idwikivoyage - https://phabricator.wikimedia.org/T381083
[16:11:53] <jnuche>	 claime: would be good to verify on `wdqs1027.eqiad.wmnet`
[16:12:08] <wikibugs>	 (03CR) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[16:12:26] <inflatador>	 claime jnuche what's up with wdqs1027? LMK if I can help
[16:12:36] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
[16:12:38] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1025.eqiad.wmnet with OS bullseye
[16:12:39] <claime>	 jnuche: puppet running
[16:12:52] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 4 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10380301 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1025.eqiad.wmnet with OS bullseye completed: - wdqs1025...
[16:13:15] <jnuche>	 inflatador: ryankemper ran into an issue with the Scap bootstrapping there last night
[16:13:20] <claime>	 jnuche: puppet run done, what's to be done afterwards?
[16:13:27] <wikibugs>	 (03PS2) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322)
[16:13:29] <jnuche>	 he worked around it, but now we've merged an actual fix
[16:13:33] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2175.codfw.wmnet with reason: host reimage
[16:13:36] <wikibugs>	 (03CR) 10Brouberol: mw-dump-rev-content-reconcile-enrich: rename namespaces (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100460 (https://phabricator.wikimedia.org/T381322) (owner: 10Brouberol)
[16:13:52] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2173.codfw.wmnet with OS bookworm
[16:14:04] <jnuche>	 claime: if it didn't fail, that should be it :) you can run `scap version` to double-check Scap is healthy on the box
[16:14:17] <inflatador>	 jnuche ACK, I just reimaged wdqs1025 so we'll check it there later today. Probably be a few hrs though
[16:14:17] <claime>	 4.132.0
[16:14:18] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[16:14:21] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381478#10380295 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[16:14:27] <jnuche>	 claime: awesome, ty again!
[16:14:30] <claime>	 It didn't trigger anything, just changed the script, btw
[16:14:47] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[16:14:49] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[16:14:58] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[16:15:00] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[16:15:36] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[16:15:38] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[16:15:52] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[16:15:53] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[16:16:33] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[16:16:34] <jnuche>	 claime: hmm, that actually makes sense, is there an easy way to persuade Puppet to rerun the resource associated to the script?
[16:16:34] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
[16:16:51] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
[16:16:52] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
[16:17:04] <claime>	 jnuche: let me check the code
[16:17:08] <logmsgbot>	 !log jiji@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
[16:17:10] <logmsgbot>	 !log jiji@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[16:17:47] <logmsgbot>	 !log jiji@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[16:17:49] <logmsgbot>	 !log jiji@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[16:18:06] <wikibugs>	 (03CR) 10Bking: "Does it make sense to export any of these stats as Prometheus metrics?" [puppet] - 10https://gerrit.wikimedia.org/r/1091327 (https://phabricator.wikimedia.org/T380752) (owner: 10Ebernhardson)
[16:18:14] <wikibugs>	 (03CR) 10Fabfur: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1090814 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[16:18:27] <logmsgbot>	 !log jiji@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[16:19:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] cumin: add aliases for net-new wdqs services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100465 (https://phabricator.wikimedia.org/T376150) (owner: 10Bking)
[16:19:41] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.remove-downtime for kafka-main1009.eqiad.wmnet
[16:19:42] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main1009.eqiad.wmnet
[16:20:47] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] Update various kafka-main connection strings for kafka-main1009 Replacing kafka-main1004 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100452 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[16:21:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P71550 and previous config saved to /var/cache/conftool/dbconfig/20241204-162127-root.json
[16:21:35] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2174.codfw.wmnet with OS bookworm
[16:21:39] <wikibugs>	 (03PS1) 10Hnowlan: jobqueue: temporarily toggle video job to test mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100481 (https://phabricator.wikimedia.org/T371701)
[16:21:48] <wikibugs>	 (03PS1) 10Clément Goubert: scap: Trigger bootstrap-scap-target.sh on script change [puppet] - 10https://gerrit.wikimedia.org/r/1100482 (https://phabricator.wikimedia.org/T380772)
[16:21:58] <claime>	 jnuche: ^ this should do the trick
[16:22:07] <wikibugs>	 (03CR) 10Andy Cooper: [C:03+1] CSP for banner preview: allow remind me later SMS host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[16:22:22] <wikibugs>	 (03Merged) 10jenkins-bot: Update various kafka-main connection strings for kafka-main1009 Replacing kafka-main1004 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100452 (https://phabricator.wikimedia.org/T363214) (owner: 10Effie Mouzeli)
[16:22:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Schema change
[16:22:48] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Schema change
[16:23:05] <claime>	 this will trigger the exec every time the subscribed resource (the script) changes
[16:24:21] <claime>	 jnuche: the subscribe on the file resource for the symlink is maybe unnecessary
[16:25:04] <jnuche>	 claime: hmm, not sure if we want to add that behavior, merging a faulty bootstrap script in the future could wipe out Scap from all targets
[16:25:16] <jnuche>	 instead of just reimaged machines
[16:25:22] <wikibugs>	 (03PS4) 10JHathaway: puppet 7: fix facter.conf location [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490)
[16:25:52] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:26:20] <claime>	 jnuche: then there's not really a way to do it except running the script manually
[16:26:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:26:53] <jnuche>	 claime: doing that on a different box as we speaak
[16:26:58] <claime>	 ack
[16:27:12] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.decommission for hosts cloudcephmon1001.eqiad.wmnet
[16:27:27] <wikibugs>	 (03Abandoned) 10Clément Goubert: scap: Trigger bootstrap-scap-target.sh on script change [puppet] - 10https://gerrit.wikimedia.org/r/1100482 (https://phabricator.wikimedia.org/T380772) (owner: 10Clément Goubert)
[16:27:38] <wikibugs>	 (03PS1) 10Kamila Součková: Rename mw149[1-6] to wikikube-worker10[38-42] [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876)
[16:28:34] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[16:29:01] <jnuche>	 claime: running `sudo -su scap /usr/local/bin/bootstrap-scap-target.sh deployment /var/lib/scap` after the script updated on the machine worked :)
[16:29:13] <claime>	 cool :)
[16:29:13] <jnuche>	 jnuche@releases2003:~$ scap version
[16:29:13] <jnuche>	 4.132.0
[16:29:24] <jnuche>	 claime: I think we're good, thanks a lot again
[16:29:27] <claime>	 np
[16:32:44] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2175.codfw.wmnet with OS bookworm
[16:32:54] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
[16:33:17] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
[16:33:21] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
[16:33:21] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet 7: fix facter.conf location [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:33:38] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
[16:33:39] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
[16:33:48] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/eventgate-main: apply
[16:34:06] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
[16:34:07] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
[16:34:43] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
[16:34:48] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.dns.netbox
[16:34:55] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
[16:35:19] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet 7: fix facter.conf location (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1100161 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:35:21] <logmsgbot>	 !log jayme@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2173-2175].codfw.wmnet
[16:35:29] <logmsgbot>	 !log jayme@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2173-2175].codfw.wmnet
[16:35:50] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 304, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:36:14] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/eventstreams: apply
[16:36:33] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/eventstreams: apply
[16:36:35] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/eventstreams: apply
[16:36:38] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[16:36:49] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[16:36:53] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/changeprop: apply
[16:37:10] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
[16:37:11] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop: apply
[16:37:12] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop: apply
[16:37:44] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
[16:38:34] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[16:38:53] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[16:38:53] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:38:54] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon1001.eqiad.wmnet
[16:39:04] <wikibugs>	 (03PS1) 10Hnowlan: mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701)
[16:40:22] <wikibugs>	 (03CR) 10Scott French: [C:03+1] mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:41:58] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:43:02] <wikibugs>	 (03CR) 10Scott French: [C:03+1] jobqueue: temporarily toggle video job to test mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100481 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:43:16] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] jobqueue: temporarily toggle video job to test mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100481 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:43:25] <wikibugs>	 (03CR) 10Hnowlan: mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:44:19] <wikibugs>	 (03Merged) 10jenkins-bot: jobqueue: temporarily toggle video job to test mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100481 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:45:29] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:45:34] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
[16:46:24] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
[16:46:38] <wikibugs>	 (03PS12) 10BryanDavis: php: Allow provisioning MediaWiki with PHP 8.1 [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752)
[16:47:25] <wikibugs>	 (03PS1) 10Cathal Mooney: Increase the number of gnmic worker and writer threads [puppet] - 10https://gerrit.wikimedia.org/r/1100488 (https://phabricator.wikimedia.org/T369384)
[16:47:25] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: correct mercurius command-args config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100486 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[16:48:31] <wikibugs>	 (03CR) 10BryanDavis: php: Allow provisioning MediaWiki with PHP 8.1 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752) (owner: 10BryanDavis)
[16:49:58] <wikibugs>	 (03PS2) 10Cathal Mooney: Increase the number of gnmic worker and writer threads [puppet] - 10https://gerrit.wikimedia.org/r/1100488 (https://phabricator.wikimedia.org/T369384)
[16:51:06] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.decommission for hosts cloudcephmon1002.eqiad.wmnet
[16:55:10] <wikibugs>	 (03PS1) 10JHathaway: facter: fix facter conf location [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490)
[16:55:32] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:56:00] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.dns.netbox
[16:57:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Doh, ofc!" [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:59:17] <wikibugs>	 (03PS2) 10JHathaway: facter: fix facter conf location [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490)
[16:59:23] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[16:59:36] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[16:59:42] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100491
[16:59:52] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[16:59:52] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:59:53] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon1002.eqiad.wmnet
[17:00:10] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.hosts.decommission for hosts cloudcephmon1003.eqiad.wmnet
[17:03:02] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] facter: fix facter conf location [puppet] - 10https://gerrit.wikimedia.org/r/1100490 (https://phabricator.wikimedia.org/T330490) (owner: 10JHathaway)
[17:04:38] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.dns.netbox
[17:08:50] <logmsgbot>	 !log andrew@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[17:10:01] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephmon1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
[17:10:01] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:10:02] <logmsgbot>	 !log andrew@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon1003.eqiad.wmnet
[17:10:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Remove refs to cloudcephmon100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/1098096 (https://phabricator.wikimedia.org/T380893) (owner: 10Andrew Bogott)
[17:13:21] <wikibugs>	 10ops-eqiad, 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10380484 (10Andrew)
[17:14:39] <cdanis>	 !incidents
[17:14:39] <sirenbot>	 5508 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (asw2-b-eqiad.mgmt.eqiad.wmnet)
[17:14:40] <sirenbot>	 5507 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[17:14:40] <sirenbot>	 5506 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (asw2-b-eqiad.mgmt.eqiad.wmnet)
[17:14:40] <sirenbot>	 5505 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (asw2-b-eqiad.mgmt.eqiad.wmnet)
[17:15:15] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 220, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:15:17] <wikibugs>	 (03PS1) 10Btullis: [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729)
[17:15:31] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T371742)', diff saved to https://phabricator.wikimedia.org/P71551 and previous config saved to /var/cache/conftool/dbconfig/20241204-171530-ladsgroup.json
[17:15:34] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[17:15:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729) (owner: 10Btullis)
[17:17:22] <wikibugs>	 (03PS2) 10Btullis: [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729)
[17:18:29] <wikibugs>	 (03PS3) 10Btullis: [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729)
[17:19:15] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4637/co" [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729) (owner: 10Btullis)
[17:20:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729) (owner: 10Btullis)
[17:22:24] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: gateway-check: avoid mutation of gateway_config [puppet] - 10https://gerrit.wikimedia.org/r/1100474 (owner: 10Scott French)
[17:22:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] "Good catch. This didn't have a very large time window of biting, but it could end up being confusing to some cases. Thanks for the patch!" [puppet] - 10https://gerrit.wikimedia.org/r/1100474 (owner: 10Scott French)
[17:22:27] <wikibugs>	 (03PS4) 10Btullis: [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729)
[17:22:45] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358#10380540 (10Jhancock.wm) heads up UPS didn't deliver yesterday. still waiting.
[17:22:51] <wikibugs>	 (03CR) 10Btullis: [dumps] Increase the lbzip2 thread count for large wikis [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729) (owner: 10Btullis)
[17:23:28] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4638/co" [puppet] - 10https://gerrit.wikimedia.org/r/1100498 (https://phabricator.wikimedia.org/T380729) (owner: 10Btullis)
[17:23:50] <wikibugs>	 (03CR) 10Andrew Bogott: "Here is the full output from one node:" [puppet] - 10https://gerrit.wikimedia.org/r/1099748 (https://phabricator.wikimedia.org/T381293) (owner: 10Andrew Bogott)
[17:25:28] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review!" [puppet] - 10https://gerrit.wikimedia.org/r/1100474 (owner: 10Scott French)
[17:25:31] <wikibugs>	 (03CR) 10Scott French: [C:03+2] gateway-check: avoid mutation of gateway_config [puppet] - 10https://gerrit.wikimedia.org/r/1100474 (owner: 10Scott French)
[17:25:59] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:26:13] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:29:52] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.204 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:30:06] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53069 bytes in 0.099 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:30:38] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P71553 and previous config saved to /var/cache/conftool/dbconfig/20241204-173037-ladsgroup.json
[17:31:16] <wikibugs>	 (03PS1) 10Clare Ming: Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100502 (https://phabricator.wikimedia.org/T379247)
[17:34:24] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review, Alexandros!" [puppet] - 10https://gerrit.wikimedia.org/r/1084247 (owner: 10Scott French)
[17:34:25] <wikibugs>	 (03PS1) 10Clare Ming: Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100504 (https://phabricator.wikimedia.org/T379247)
[17:37:06] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100504 (https://phabricator.wikimedia.org/T379247) (owner: 10Clare Ming)
[17:37:10] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100502 (https://phabricator.wikimedia.org/T379247) (owner: 10Clare Ming)
[17:38:08] <wikibugs>	 (03Merged) 10jenkins-bot: Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100504 (https://phabricator.wikimedia.org/T379247) (owner: 10Clare Ming)
[17:38:16] <wikibugs>	 (03Merged) 10jenkins-bot: Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100502 (https://phabricator.wikimedia.org/T379247) (owner: 10Clare Ming)
[17:42:02] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1100488 (https://phabricator.wikimedia.org/T369384) (owner: 10Cathal Mooney)
[17:45:46] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P71554 and previous config saved to /var/cache/conftool/dbconfig/20241204-174544-ladsgroup.json
[17:47:25] <wikibugs>	 (03PS5) 10Scott French: mediawiki: support for service.deployment: none [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081449 (https://phabricator.wikimedia.org/T377040)
[17:47:25] <wikibugs>	 (03PS5) 10Scott French: mw-api-int: add migration release [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081450 (https://phabricator.wikimedia.org/T377040)
[17:47:25] <wikibugs>	 (03PS5) 10Scott French: mw-api-int: remove "migration" release values overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081452 (https://phabricator.wikimedia.org/T377040)
[17:47:25] <wikibugs>	 (03PS5) 10Scott French: mediawiki: add remaining migration releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1082863 (https://phabricator.wikimedia.org/T377040)
[17:47:26] <wikibugs>	 (03PS5) 10Scott French: mediawiki: remove migration release overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/1082864 (https://phabricator.wikimedia.org/T377040)
[17:49:09] <wikibugs>	 (03CR) 10Scott French: "Thanks for the re-review! Just rebased and re-bumped the chart version. I'll merge this during the upcoming infra window." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081449 (https://phabricator.wikimedia.org/T377040) (owner: 10Scott French)
[17:50:04] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@6e3ee14]: Regular analytics weekly train [analytics/refinery@6e3ee14b]
[17:50:30] <bd808>	 !log Moved SAL fediverse posts to https://wikimedia.social/@sal. Many thanks to botsin.space for providing hosting for so long.
[17:50:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:10] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@6e3ee14]: Regular analytics weekly train [analytics/refinery@6e3ee14b] (duration: 02m 05s)
[17:54:07] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@6e3ee14] (thin): Regular analytics weekly train THIN [analytics/refinery@6e3ee14b]
[17:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:54:44] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@6e3ee14] (thin): Regular analytics weekly train THIN [analytics/refinery@6e3ee14b] (duration: 00m 37s)
[17:54:49] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@6e3ee14] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6e3ee14b]
[17:55:20] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@6e3ee14] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6e3ee14b] (duration: 00m 31s)
[17:56:15] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Increase the number of gnmic worker and writer threads [puppet] - 10https://gerrit.wikimedia.org/r/1100488 (https://phabricator.wikimedia.org/T369384) (owner: 10Cathal Mooney)
[18:00:04] <jouncebot>	 swfrench-wmf: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki infrastructure (UTC late). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1800).
[18:00:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T371742)', diff saved to https://phabricator.wikimedia.org/P71555 and previous config saved to /var/cache/conftool/dbconfig/20241204-180052-ladsgroup.json
[18:00:54] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[18:00:56] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[18:01:08] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[18:01:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1178 (T371742)', diff saved to https://phabricator.wikimedia.org/P71556 and previous config saved to /var/cache/conftool/dbconfig/20241204-180114-ladsgroup.json
[18:01:22] <swfrench-wmf>	 here a bit earlier than expected, and will start work shortly
[18:01:57] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mediawiki: support for service.deployment: none [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081449 (https://phabricator.wikimedia.org/T377040) (owner: 10Scott French)
[18:04:01] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: support for service.deployment: none [deployment-charts] - 10https://gerrit.wikimedia.org/r/1081449 (https://phabricator.wikimedia.org/T377040) (owner: 10Scott French)
[18:04:16] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
[18:04:37] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
[18:06:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[18:09:30] <wikibugs>	 (03CR) 10Máté Szabó: [C:03+1] Update MediaModeration module to run scans automatically [puppet] - 10https://gerrit.wikimedia.org/r/1100427 (https://phabricator.wikimedia.org/T355169) (owner: 10Dreamy Jazz)
[18:11:24] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: Deployment to clear noop chart diff from 1081449 - T377040
[18:11:30] <stashbot>	 T377040: Turn up PHP 8.1-flavored k8s deployments for all MediaWiki services - https://phabricator.wikimedia.org/T377040
[18:13:31] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: Deployment to clear noop chart diff from 1081449 - T377040 (duration: 02m 07s)
[18:13:55] <wikibugs>	 (03PS1) 10Dbrant: push-notifications: Add proxy env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100513 (https://phabricator.wikimedia.org/T379647)
[18:15:13] <swfrench-wmf>	 all done on my end
[18:15:41] <wikibugs>	 (03PS1) 10CDanis: app/generic copypatch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100514
[18:15:41] <wikibugs>	 (03PS1) 10CDanis: app/generic: add support for a metricsPort [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100515
[18:15:41] <wikibugs>	 (03PS1) 10CDanis: chart-renderer: use the metrics port [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100516 (https://phabricator.wikimedia.org/T379687)
[18:22:03] <wikibugs>	 (03PS2) 10Dbrant: push-notifications: Add proxy env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100513 (https://phabricator.wikimedia.org/T379647)
[18:24:54] <wikibugs>	 (03PS3) 10CDanis: push-notifications: New release & proxy env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100513 (https://phabricator.wikimedia.org/T379647) (owner: 10Dbrant)
[18:24:57] <wikibugs>	 (03CR) 10CDanis: [C:03+2] push-notifications: New release & proxy env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100513 (https://phabricator.wikimedia.org/T379647) (owner: 10Dbrant)
[18:25:59] <wikibugs>	 (03Merged) 10jenkins-bot: push-notifications: New release & proxy env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100513 (https://phabricator.wikimedia.org/T379647) (owner: 10Dbrant)
[18:30:22] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] START helmfile.d/services/push-notifications: apply
[18:30:56] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] DONE helmfile.d/services/push-notifications: apply
[18:31:19] <wikibugs>	 (03CR) 10Scott French: "Thanks, Alexandros!" [puppet] - 10https://gerrit.wikimedia.org/r/1100112 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[18:33:55] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] START helmfile.d/services/push-notifications: apply
[18:34:41] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
[18:35:04] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] START helmfile.d/services/push-notifications: apply
[18:35:43] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
[18:39:08] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: pybal pools for graph split [puppet] - 10https://gerrit.wikimedia.org/r/1097541 (https://phabricator.wikimedia.org/T379330) (owner: 10Ryan Kemper)
[18:46:19] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-internal-scholarly,service=wdqs-scholarly
[18:46:31] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-internal-main,service=wdqs-main
[18:47:52] <ryankemper>	 !log T379330 `wdqs-internal-main` and `wdqs-internal-scholarly` pools created
[18:47:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:55] <stashbot>	 T379330: Create pybal pools for wdqs-internal-main and wdqs-internal-scholarly - https://phabricator.wikimedia.org/T379330
[18:47:58] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
[18:48:11] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
[18:50:00] <hashar>	 jouncebot: nowandnext
[18:50:00] <jouncebot>	 For the next 0 hour(s) and 9 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1800)
[18:50:01] <jouncebot>	 In 0 hour(s) and 9 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1900)
[18:50:34] <wikibugs>	 (03PS5) 10Ryan Kemper: wdqs-internal: add A & PTR records for graph split [dns] - 10https://gerrit.wikimedia.org/r/1100010 (https://phabricator.wikimedia.org/T379334)
[18:50:34] <wikibugs>	 (03PS5) 10Ryan Kemper: wdqs-internal: add graph split disc DNS records [dns] - 10https://gerrit.wikimedia.org/r/1100165 (https://phabricator.wikimedia.org/T379334) (owner: 10Bking)
[18:52:12] <ryankemper>	 !log T379334 Creating A and PTR records for `wdqs-internal-main` and `wdqs-internal-scholarly` VIPs [merging https://gerrit.wikimedia.org/r/c/operations/dns/+/1100010/ & running authdns update after]
[18:52:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:15] <stashbot>	 T379334: Create DNS records for wdqs-internal-main and wdqs-internal-scholarly - https://phabricator.wikimedia.org/T379334
[18:52:19] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: add A & PTR records for graph split [dns] - 10https://gerrit.wikimedia.org/r/1100010 (https://phabricator.wikimedia.org/T379334) (owner: 10Ryan Kemper)
[18:54:24] <wikibugs>	 (03PS7) 10Kamila Součková: [WIP, DNM] create sre.k8s.roll-reimage-nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/1094494 (https://phabricator.wikimedia.org/T377857)
[18:55:03] <ryankemper>	 !log T379334 Successfully ran `sudo authdns-update` on `dns1004`
[18:55:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:52] <_Gerges>	  Hi, Does the T381445 task need community consensus?
[18:56:52] <stashbot>	 T381445: Add "Noto Sans Arabic" Font - https://phabricator.wikimedia.org/T381445
[18:57:52] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.dns.wipe-cache wdqs-internal-main.svc.eqiad.wmnet on all recursors
[18:57:56] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-internal-main.svc.eqiad.wmnet on all recursors
[18:58:02] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:58:05] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:58:24] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:58:53] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[19:00:04] <jouncebot>	 jeena and hashar: Deploy window MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T1900)
[19:02:05] <ryankemper>	 !log T380555 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094061 to establish initial service definitions for `wdqs-internal-main` and `wdqs-internal-scholarly`
[19:02:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:08] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[19:02:11] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: Add graph split svcs to catalog [puppet] - 10https://gerrit.wikimedia.org/r/1094061 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:02:19] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "LGTM, with maybe two stale comments." [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[19:02:32] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[19:03:02] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[19:04:50] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "https://puppet-compiler.wmflabs.org/output/1094069/4639/wdqs2018.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:05:13] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: add envoy config for graph split [puppet] - 10https://gerrit.wikimedia.org/r/1097542 (https://phabricator.wikimedia.org/T379333) (owner: 10Ryan Kemper)
[19:09:18] <ryankemper>	 !log T379333 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1097542 to establish envoy on `A:wdqs-internal-main` and `A:wdqs-internal-scholarly`; running puppet on `wdqs2018` to test change
[19:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:09:21] <stashbot>	 T379333: Create envoy config for wdqs-internal-main and wdqs-internal-scholarly - https://phabricator.wikimedia.org/T379333
[19:13:12] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-Services, 06DC-Ops, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10381052 (10VRiley-WMF) Understood, I will close this this and ask for a replacement!
[19:13:24] <wikibugs>	 10ops-eqiad, 06SRE, 10Cloud-Services, 06DC-Ops, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10381053 (10VRiley-WMF) 05Open→03Resolved
[19:13:41] <jinxer-wm>	 FIRING: [4x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_wdqs-internal-main.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[19:14:34] <sukhe>	 oh ok
[19:14:34] <wikibugs>	 (03PS2) 10Kamila Součková: Rename mw149[1-6] to wikikube-worker10[38-42] [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876)
[19:14:35] <sukhe>	 let's see
[19:15:53] <wikibugs>	 (03CR) 10Kamila Součková: Rename mw149[1-6] to wikikube-worker10[38-42] (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[19:16:31] <ryankemper>	 !log T380555 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094069 to enable `lvs::realserver`
[19:16:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:34] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[19:16:41] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: configure lvs IPs for backends [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:17:20] <wikibugs>	 (03CR) 10Scott French: [C:03+1] Rename mw149[1-6] to wikikube-worker10[38-42] [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[19:19:15] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[1491-1496].eqiad.wmnet
[19:19:54] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:20:26] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@1f94312]: Regular analytics weekly train - HOTFIX [analytics/refinery@1f94312a]
[19:21:12] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] Rename mw149[1-6] to wikikube-worker10[38-42] [puppet] - 10https://gerrit.wikimedia.org/r/1100483 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[19:22:47] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[1491-1496].eqiad.wmnet
[19:23:43] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@1f94312]: Regular analytics weekly train - HOTFIX [analytics/refinery@1f94312a] (duration: 03m 17s)
[19:23:57] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@1f94312] (thin): Regular analytics weekly train THIN - HOTFIX [analytics/refinery@1f94312a]
[19:24:28] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@1f94312] (thin): Regular analytics weekly train THIN - HOTFIX [analytics/refinery@1f94312a] (duration: 00m 30s)
[19:24:39] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@1f94312] (hadoop-test): Regular analytics weekly train TEST - HOTFIX [analytics/refinery@1f94312a]
[19:25:03] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1491 to wikikube-worker1038
[19:25:06] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@1f94312] (hadoop-test): Regular analytics weekly train TEST - HOTFIX [analytics/refinery@1f94312a] (duration: 00m 26s)
[19:25:23] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:26:21] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1492 to wikikube-worker1039
[19:26:54] <icinga-wm>	 PROBLEM - BGP status on lsw1-f3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:27:16] <icinga-wm>	 PROBLEM - BGP status on lsw1-f2-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:27:18] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] app/generic copypatch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100514 (owner: 10CDanis)
[19:27:22] <icinga-wm>	 PROBLEM - BGP status on lsw1-e3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:27:23] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] app/generic: add support for a metricsPort [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100515 (owner: 10CDanis)
[19:27:47] <wikibugs>	 (03CR) 10CDanis: [C:03+2] app/generic copypatch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100514 (owner: 10CDanis)
[19:27:54] <wikibugs>	 (03CR) 10CDanis: [C:03+2] app/generic: add support for a metricsPort [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100515 (owner: 10CDanis)
[19:28:47] <wikibugs>	 (03Merged) 10jenkins-bot: app/generic copypatch [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100514 (owner: 10CDanis)
[19:29:06] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1491 to wikikube-worker1038 - kamila@cumin1002"
[19:29:14] <wikibugs>	 (03Merged) 10jenkins-bot: app/generic: add support for a metricsPort [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100515 (owner: 10CDanis)
[19:29:36] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1491 to wikikube-worker1038 - kamila@cumin1002"
[19:29:36] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:29:37] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1038
[19:29:42] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:29:43] <wikibugs>	 (03CR) 10CDanis: [C:03+2] chart-renderer: use the metrics port [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100516 (https://phabricator.wikimedia.org/T379687) (owner: 10CDanis)
[19:29:48] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1038
[19:30:26] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1491 to wikikube-worker1038
[19:30:33] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1493 to wikikube-worker1040
[19:31:12] <wikibugs>	 (03Merged) 10jenkins-bot: chart-renderer: use the metrics port [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100516 (https://phabricator.wikimedia.org/T379687) (owner: 10CDanis)
[19:33:16] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1492 to wikikube-worker1039 - kamila@cumin1002"
[19:34:03] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1492 to wikikube-worker1039 - kamila@cumin1002"
[19:34:03] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:34:04] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1039
[19:34:20] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1039
[19:34:39] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:34:40] <jinxer-wm>	 FIRING: [3x] KubernetesRsyslogDown: rsyslog on mw1494:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:34:59] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1492 to wikikube-worker1039
[19:35:10] <logmsgbot>	 !log joal@deploy2002 Started deploy [airflow-dags/analytics@df2cac9]: Regular analytics weekly train [airflow-dags/analytics@df2cac98]
[19:35:48] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply
[19:36:23] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
[19:37:13] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1494 to wikikube-worker1041
[19:38:07] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1493 to wikikube-worker1040 - kamila@cumin1002"
[19:38:40] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1493 to wikikube-worker1040 - kamila@cumin1002"
[19:38:40] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:38:41] <jinxer-wm>	 FIRING: [8x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_wdqs-internal-main.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[19:38:46] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1040
[19:38:49] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:38:58] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1040
[19:39:06] <logmsgbot>	 !log joal@deploy2002 Finished deploy [airflow-dags/analytics@df2cac9]: Regular analytics weekly train [airflow-dags/analytics@df2cac98] (duration: 03m 55s)
[19:39:37] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1493 to wikikube-worker1040
[19:39:44] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:39:46] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:40:11] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
[19:40:14] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] wdqs-internal: configure lvs IPs for backends [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:40:27] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Don't merge this yet." [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:40:29] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1495 to wikikube-worker1042
[19:40:55] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
[19:42:52] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs-internal: fix graph split conftool svc [puppet] - 10https://gerrit.wikimedia.org/r/1100524 (https://phabricator.wikimedia.org/T380555)
[19:42:53] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1494 to wikikube-worker1041 - kamila@cumin1002"
[19:43:19] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1494 to wikikube-worker1041 - kamila@cumin1002"
[19:43:20] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:43:20] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1041
[19:43:20] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Nice find! Should work." [puppet] - 10https://gerrit.wikimedia.org/r/1100524 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:43:30] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1041
[19:43:33] <wikibugs>	 (03CR) 10Bking: [C:03+1] "Copied votes on follow-up patch sets have been updated:" [puppet] - 10https://gerrit.wikimedia.org/r/1100524 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:43:33] <wikibugs>	 (03PS2) 10Ryan Kemper: wdqs-internal: fix graph split conftool svc [puppet] - 10https://gerrit.wikimedia.org/r/1100524 (https://phabricator.wikimedia.org/T380555)
[19:43:50] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:44:09] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1494 to wikikube-worker1041
[19:44:18] <kamila_>	 whose toes am I stepping on with netbox changes, and do you mind?
[19:45:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T371742)', diff saved to https://phabricator.wikimedia.org/P71558 and previous config saved to /var/cache/conftool/dbconfig/20241204-194459-ladsgroup.json
[19:45:03] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[19:45:16] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1496 to wikikube-worker1043
[19:45:48] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:46:27] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: fix graph split conftool svc [puppet] - 10https://gerrit.wikimedia.org/r/1100524 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:47:44] <wikibugs>	 (03PS7) 10Ryan Kemper: wdqs-internal: configure lvs IPs for backends [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555)
[19:47:44] <wikibugs>	 (03PS6) 10Ryan Kemper: wdqs-internal: configure graphsplit load balancers [puppet] - 10https://gerrit.wikimedia.org/r/1094070 (https://phabricator.wikimedia.org/T380555)
[19:47:44] <wikibugs>	 (03PS6) 10Ryan Kemper: wdqs-internal: bring graph split into production [puppet] - 10https://gerrit.wikimedia.org/r/1094074 (https://phabricator.wikimedia.org/T380555)
[19:49:02] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1495 to wikikube-worker1042 - kamila@cumin1002"
[19:49:22] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1495 to wikikube-worker1042 - kamila@cumin1002"
[19:49:22] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:49:22] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1042
[19:49:23] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[19:49:33] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1042
[19:50:12] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1495 to wikikube-worker1042
[19:51:48] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: configure lvs IPs for backends [puppet] - 10https://gerrit.wikimedia.org/r/1094069 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[19:52:43] <sukhe>	 !log sudo cumin "O:config_master" "run-puppet-agent"
[19:52:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:52:58] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1496 to wikikube-worker1043 - kamila@cumin1002"
[19:53:03] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1496 to wikikube-worker1043 - kamila@cumin1002"
[19:53:03] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:53:03] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1043
[19:53:07] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply
[19:53:13] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1043
[19:53:36] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
[19:53:41] <jinxer-wm>	 RESOLVED: [8x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_wdqs-internal-main.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[19:53:52] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1496 to wikikube-worker1043
[19:55:01] <ryankemper>	 !log T380555 Proceeding to step 5 of new lvs service process. Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094069 to enable lvs::realserver functionality
[19:55:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:04] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[19:55:41] <ryankemper>	 !log T380555 Running puppet on `wdqs2018`
[19:55:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:52] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1038.eqiad.wmnet on all recursors
[19:55:55] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1038.eqiad.wmnet on all recursors
[19:57:01] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1038.eqiad.wmnet wikikube-worker1039.eqiad.wmnet wikikube-worker1040.eqiad.wmnet wikikube-worker1041.eqiad.wmnet wikikube-worker1042.eqiad.wmnet wikikube-worker1043.eqiad.wmnet on all recursors
[19:57:04] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1038.eqiad.wmnet wikikube-worker1039.eqiad.wmnet wikikube-worker1040.eqiad.wmnet wikikube-worker1041.eqiad.wmnet wikikube-worker1042.eqiad.wmnet wikikube-worker1043.eqiad.wmnet on all recursors
[19:58:39] <wikibugs>	 (03PS1) 10Clare Ming: Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100527
[19:58:50] <wikibugs>	 (03CR) 10BryanDavis: [C:03+1] "Cherry-pick updated on deployment-puppetserver-1.deployment-prep.eqiad1.wikimedia.cloud and puppet run forced on deployment-mediawiki81.de" [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752) (owner: 10BryanDavis)
[20:00:04] <wikibugs>	 (03PS1) 10Clare Ming: Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100528
[20:00:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71559 and previous config saved to /var/cache/conftool/dbconfig/20241204-200006-ladsgroup.json
[20:01:03] <wikibugs>	 (03PS1) 10CDanis: app/generic: metricsPort: add to NetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100529
[20:01:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] app/generic: metricsPort: add to NetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100529 (owner: 10CDanis)
[20:03:33] <wikibugs>	 (03PS2) 10CDanis: app/generic: metricsPort: add to NetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100529
[20:04:51] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "T377876 - kamila@cumin1002"
[20:04:54] <stashbot>	 T377876: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876
[20:04:56] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "T377876 - kamila@cumin1002"
[20:05:04] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: MediaWiki: Ensure nice 404 instead of php-fpm 404 on auth domain [puppet] - 10https://gerrit.wikimedia.org/r/1100530 (https://phabricator.wikimedia.org/T380551)
[20:05:06] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: MediaWiki: Define wikimedia.org portal on beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1100531 (https://phabricator.wikimedia.org/T173887)
[20:05:08] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: MediaWiki: Redirect auth domain root to wikimedia.org portal [puppet] - 10https://gerrit.wikimedia.org/r/1100532 (https://phabricator.wikimedia.org/T380551)
[20:05:10] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: MediaWiki: Remove duplicate ErrorDocument 404 from beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1100533
[20:05:10] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: MediaWiki: Only proxy existing .php files, otherwise return nice 404 [puppet] - 10https://gerrit.wikimedia.org/r/1100534 (https://phabricator.wikimedia.org/T380551)
[20:05:32] <wikibugs>	 (03CR) 10CDanis: [C:03+2] app/generic: metricsPort: add to NetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100529 (owner: 10CDanis)
[20:07:01] <wikibugs>	 (03Merged) 10jenkins-bot: app/generic: metricsPort: add to NetworkPolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100529 (owner: 10CDanis)
[20:07:30] <ryankemper>	 !log T380555 Disabling puppet on lvs hosts in preparation for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094070 which will move `wdqs-internal-[main,scholarly]` from `service_setup` to `lvs_setup`
[20:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:07:34] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[20:08:25] <ryankemper>	 !log T380555 ran `ryankemper@cumin2002:~$ sudo -E cumin 'lvs*' 'disable-puppet T380555'`
[20:08:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:59] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1038.eqiad.wmnet with OS bookworm
[20:09:33] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply
[20:09:51] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
[20:10:06] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1039.eqiad.wmnet with OS bookworm
[20:10:48] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: configure graphsplit load balancers [puppet] - 10https://gerrit.wikimedia.org/r/1094070 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[20:12:38] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
[20:12:44] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
[20:12:51] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply
[20:12:55] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
[20:15:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71560 and previous config saved to /var/cache/conftool/dbconfig/20241204-201513-ladsgroup.json
[20:16:10] <wikibugs>	 (03PS1) 10Dbrant: push-notifications: Add no_proxy: localhost, for making API calls. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100535 (https://phabricator.wikimedia.org/T379647)
[20:16:51] <wikibugs>	 (03PS1) 10Andrew Bogott: codfw1dev cinder backups: change lifespan to 2 days [puppet] - 10https://gerrit.wikimedia.org/r/1100536
[20:17:05] <ryankemper>	 !log T380555 Beginning lvs rolling restarts. first up `A:lvs-secondary-codfw`
[20:17:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:08] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[20:17:33] <ryankemper>	 !log T380555 `sudo -E cumin 'A:lvs-secondary-codfw' 'run-puppet-agent --force'`
[20:17:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:41] <wikibugs>	 (03CR) 10CDanis: [C:03+2] push-notifications: Add no_proxy: localhost, for making API calls. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100535 (https://phabricator.wikimedia.org/T379647) (owner: 10Dbrant)
[20:17:57] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] codfw1dev cinder backups: change lifespan to 2 days [puppet] - 10https://gerrit.wikimedia.org/r/1100536 (owner: 10Andrew Bogott)
[20:18:42] <ryankemper>	 !log T380555 `sudo cookbook sre.loadbalancer.restart-pybal 'A:lvs-secondary-codfw' --reason 'rolling out new wdqs-internal-[main,scholarly] services'`
[20:18:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:18:50] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] phabricator weekly changes email: Sort newcomers by claim date [puppet] - 10https://gerrit.wikimedia.org/r/1092205 (owner: 10Aklapper)
[20:19:04] <wikibugs>	 (03Merged) 10jenkins-bot: push-notifications: Add no_proxy: localhost, for making API calls. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100535 (https://phabricator.wikimedia.org/T379647) (owner: 10Dbrant)
[20:20:46] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] START helmfile.d/services/push-notifications: apply
[20:20:50] <logmsgbot>	 !log dbrant@deploy2002 helmfile [staging] DONE helmfile.d/services/push-notifications: apply
[20:20:58] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "tested query but did not send a test mail, want one?" [puppet] - 10https://gerrit.wikimedia.org/r/1092205 (owner: 10Aklapper)
[20:20:59] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
[20:21:28] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] START helmfile.d/services/push-notifications: apply
[20:21:29] <ryankemper>	 !log T380555 `sudo cookbook sre.loadbalancer.restart-pybal --query 'A:lvs-secondary-codfw' --reason 'rolling out new wdqs-internal-[main,scholarly] services' restart_daemons`
[20:21:29] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
[20:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:20] <logmsgbot>	 !log dbrant@deploy2002 helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
[20:22:25] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1020 is CRITICAL: CRITICAL: 117 connections established with conf1007.eqiad.wmnet:4001 (min=119) https://wikitech.wikimedia.org/wiki/PyBal
[20:22:30] <sukhe>	 that's OK
[20:22:39] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] START helmfile.d/services/push-notifications: apply
[20:23:07] <logmsgbot>	 !log dbrant@deploy2002 helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
[20:23:15] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
[20:23:37] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64600/IPv4: OpenSent - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:23:43] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64600/IPv4: OpenSent - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:23:56] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: MediaWiki: Ensure nice 404 instead of php-fpm 404 on auth domain [puppet] - 10https://gerrit.wikimedia.org/r/1100530 (https://phabricator.wikimedia.org/T380551)
[20:24:42] <ryankemper>	 !log T380555 hosts happily pooled and `sudo ipvsadm -L -n` shows `10.2.1.93` and `10.2.1.94` as expected), proceeding to `A:lvs-low-traffic-codfw`
[20:24:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:45] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[20:25:12] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1038.eqiad.wmnet with reason: host reimage
[20:25:43] <logmsgbot>	 !log sukhe@cumin1002 END (ERROR) - Cookbook sre.loadbalancer.restart-pybal (exit_code=97) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
[20:25:48] <sukhe>	 pybal looks unhappy on lvs1020
[20:26:30] <sukhe>	 ok
[20:26:32] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1039.eqiad.wmnet with reason: host reimage
[20:26:33] <sukhe>	 restarted 
[20:28:11] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
[20:28:11] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
[20:28:16] <ryankemper>	 !log T380555 ran `sudo -E cumin 'A:lvs-low-traffic-codfw' 'run-puppet-agent --force'`
[20:28:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:26] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
[20:28:28] <ryankemper>	 !log T380555 `sudo cookbook sre.loadbalancer.restart-pybal --query 'A:lvs-low-traffic-codfw' --reason 'rolling out new wdqs-internal-[main,scholarly] services' restart_daemons`
[20:28:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:37] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1020 is OK: OK: 119 connections established with conf1007.eqiad.wmnet:4001 (min=119) https://wikitech.wikimedia.org/wiki/PyBal
[20:28:41] <sukhe>	 ~cool
[20:28:45] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1038.eqiad.wmnet with reason: host reimage
[20:28:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install fransc1001 - https://phabricator.wikimedia.org/T367814#10381305 (10cmooney) I'd hope we could avoid a lot of manual work and get this server set up using the new automation we are trying to build for Fundraising servers (see T37955...
[20:28:56] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
[20:30:14] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Cherry-picked on the beta cluster following these instructions: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code" [puppet] - 10https://gerrit.wikimedia.org/r/1100530 (https://phabricator.wikimedia.org/T380551) (owner: 10Bartosz Dziewoński)
[20:30:21] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T371742)', diff saved to https://phabricator.wikimedia.org/P71561 and previous config saved to /var/cache/conftool/dbconfig/20241204-203021-ladsgroup.json
[20:30:23] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[20:30:33] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[20:30:36] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[20:30:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1192 (T371742)', diff saved to https://phabricator.wikimedia.org/P71562 and previous config saved to /var/cache/conftool/dbconfig/20241204-203043-ladsgroup.json
[20:31:49] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: MediaWiki: Define wikimedia.org portal on beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1100531 (https://phabricator.wikimedia.org/T173887)
[20:32:04] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
[20:32:21] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1039.eqiad.wmnet with reason: host reimage
[20:33:45] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
[20:36:00] <wikibugs>	 (03CR) 10Scott French: "Thanks, Hugh!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1099752 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan)
[20:37:52] <ryankemper>	 !log T380555 hosts happily pooled (except that `lvs2013` aka `A:lvs-low-traffic-codfw` cannot talk to `wdqs2026`) and `sudo ipvsadm -L -n` shows `10.2.1.93` and `10.2.1.94` as expected, codfw all done
[20:37:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:55] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[20:39:04] <wikibugs>	 (03PS3) 10Bartosz Dziewoński: MediaWiki: Define wikimedia.org portal on beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1100531 (https://phabricator.wikimedia.org/T173887)
[20:39:06] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1100531 (https://phabricator.wikimedia.org/T173887) (owner: 10Bartosz Dziewoński)
[20:41:36] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Cherry-picked on the beta cluster following these instructions: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code" [puppet] - 10https://gerrit.wikimedia.org/r/1100531 (https://phabricator.wikimedia.org/T173887) (owner: 10Bartosz Dziewoński)
[20:41:56] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: MediaWiki: Redirect auth domain root to wikimedia.org portal [puppet] - 10https://gerrit.wikimedia.org/r/1100532 (https://phabricator.wikimedia.org/T380551)
[20:44:15] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Cherry-picked on the beta cluster following these instructions: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code" [puppet] - 10https://gerrit.wikimedia.org/r/1100532 (https://phabricator.wikimedia.org/T380551) (owner: 10Bartosz Dziewoński)
[20:44:24] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: MediaWiki: Remove duplicate ErrorDocument 404 from beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1100533
[20:45:52] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Cherry-picked on the beta cluster following these instructions: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code" [puppet] - 10https://gerrit.wikimedia.org/r/1100533 (owner: 10Bartosz Dziewoński)
[20:46:01] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: MediaWiki: Only proxy existing .php files, otherwise return nice 404 [puppet] - 10https://gerrit.wikimedia.org/r/1100534 (https://phabricator.wikimedia.org/T380551)
[20:46:27] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100528 (owner: 10Clare Ming)
[20:46:30] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100527 (owner: 10Clare Ming)
[20:47:30] <wikibugs>	 (03Merged) 10jenkins-bot: Metrics Platform Instrument/Experiment Configurator: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100528 (owner: 10Clare Ming)
[20:47:37] <wikibugs>	 (03Merged) 10jenkins-bot: Metrics Platform Instrument/Experiment Configurator: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1100527 (owner: 10Clare Ming)
[20:47:46] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1038.eqiad.wmnet with OS bookworm
[20:49:19] <wikibugs>	 (03PS1) 10Cathal Mooney: lvs2013: correct parent port for private1-b2-codfw vlan2029 int [puppet] - 10https://gerrit.wikimedia.org/r/1100540 (https://phabricator.wikimedia.org/T352784)
[20:49:57] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
[20:50:15] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
[20:51:10] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] lvs2013: correct parent port for private1-b2-codfw vlan2029 int [puppet] - 10https://gerrit.wikimedia.org/r/1100540 (https://phabricator.wikimedia.org/T352784) (owner: 10Cathal Mooney)
[20:51:25] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1039.eqiad.wmnet with OS bookworm
[20:52:02] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] lvs2013: correct parent port for private1-b2-codfw vlan2029 int [puppet] - 10https://gerrit.wikimedia.org/r/1100540 (https://phabricator.wikimedia.org/T352784) (owner: 10Cathal Mooney)
[20:54:27] <wikibugs>	 (03PS3) 10Ottomata: mediawiki.org/beacon/event/index.php - use EventLoggingLegacyConverter::submitEvent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1063222 (https://phabricator.wikimedia.org/T353817)
[20:54:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] lvs2013: correct parent port for private1-b2-codfw vlan2029 int [puppet] - 10https://gerrit.wikimedia.org/r/1100540 (https://phabricator.wikimedia.org/T352784) (owner: 10Cathal Mooney)
[20:56:54] <wikibugs>	 (03PS1) 10Bvibber: Enable Chart extension on several pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100544 (https://phabricator.wikimedia.org/T381436)
[20:57:11] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@7ba91e1]: Regular analytics weekly train - HOTFIX 2 [analytics/refinery@7ba91e13]
[20:57:35] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100544 (https://phabricator.wikimedia.org/T381436) (owner: 10Bvibber)
[20:59:00] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@7ba91e1]: Regular analytics weekly train - HOTFIX 2 [analytics/refinery@7ba91e13] (duration: 01m 48s)
[20:59:06] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
[20:59:19] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
[20:59:45] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@7ba91e1] (thin): Regular analytics weekly train THIN - HOTFIX 2 [analytics/refinery@7ba91e13]
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: Time to snap out of that daydream and deploy UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T2100).
[21:00:04] <jouncebot>	 greg-g and bvibber: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:11] <bvibber>	 o/ here :D
[21:00:17] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@7ba91e1] (thin): Regular analytics weekly train THIN - HOTFIX 2 [analytics/refinery@7ba91e13] (duration: 00m 31s)
[21:00:39] <logmsgbot>	 !log joal@deploy2002 Started deploy [analytics/refinery@7ba91e1] (hadoop-test): Regular analytics weekly train TEST - HOTFIX 2 [analytics/refinery@7ba91e13]
[21:01:08] <logmsgbot>	 !log joal@deploy2002 Finished deploy [analytics/refinery@7ba91e1] (hadoop-test): Regular analytics weekly train TEST - HOTFIX 2 [analytics/refinery@7ba91e13] (duration: 00m 29s)
[21:01:25] <cjming>	 is a deployer needed?
[21:02:21] <bvibber>	 i can do mine myself in a pinch except i'm in a meeting ribght now :D
[21:02:29] <bvibber>	 so that'd be welcome <3
[21:02:40] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 (Connection refused) https://wikitech.wikimedia.org/wiki/PyBal
[21:02:40] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs2013 is CRITICAL: CRITICAL: 0 connections established with conf2004.codfw.wmnet:4001 (min=85) https://wikitech.wikimedia.org/wiki/PyBal
[21:02:48] <sukhe>	 yes
[21:03:05] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] prometheus: restart statsd-exporter on config change [puppet] - 10https://gerrit.wikimedia.org/r/1099822 (https://phabricator.wikimedia.org/T355837) (owner: 10Cwhite)
[21:03:14] <cjming>	 no worries
[21:03:18] <icinga-wm>	 PROBLEM - pybal on lvs2013 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[21:03:18] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting shortly
[21:03:22] <icinga-wm>	 PROBLEM - BGP status on lsw1-c2-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:03:31] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting shortly
[21:03:32] <cjming>	 greg-g: you around?  otherwise i'll start with Brooke's patch
[21:04:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100544 (https://phabricator.wikimedia.org/T381436) (owner: 10Bvibber)
[21:05:02] <greg-g>	 cjming: sorry! yes
[21:05:04] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Chart extension on several pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100544 (https://phabricator.wikimedia.org/T381436) (owner: 10Bvibber)
[21:05:26] <greg-g>	 sorry for being late, happy to wait my turn :)
[21:05:33] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1100544|Enable Chart extension on several pilot wikis (T381436 T381312)]]
[21:05:37] <stashbot>	 T381436: Enable Chart extension on mediawiki.org - https://phabricator.wikimedia.org/T381436
[21:05:38] <stashbot>	 T381312: Enable Charts extension on Swedish, Italian, Hebrew Wikipedia - https://phabricator.wikimedia.org/T381312
[21:05:46] <ryankemper>	 !log T380555 Moving `wdqs-internal-[main,scholarly]` services into prod by merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094074
[21:05:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:49] <stashbot>	 T380555: Enable LVS for wdqs-internal-[main,scholarly] - https://phabricator.wikimedia.org/T380555
[21:05:53] <cjming>	 no worries! window should go quick with just config patches in the queue
[21:05:55] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: bring graph split into production [puppet] - 10https://gerrit.wikimedia.org/r/1094074 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[21:06:16] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] webperf: set statsv.py --statsd to statsd.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/1099720 (https://phabricator.wikimedia.org/T355837) (owner: 10Krinkle)
[21:08:12] <jeena>	 cjming: please ping me when backports are done. I missed the train deployment window 🤦‍♀️
[21:08:27] <cjming>	 jeena: ack - will do
[21:09:02] <ryankemper>	 !log T380555 Rolling out prod change => `ryankemper@cumin2002:~$ sudo cumin -b 8 'A:dnsbox' 'run-puppet-agent'`
[21:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Remove defunct lvs cross-dc links in Netbox (lvs2011 & lvs2013) - https://phabricator.wikimedia.org/T381533 (10cmooney) 03NEW p:05Triage→03Low
[21:12:17] <cjming>	 bvibber: on mwdebug - testable?
[21:12:21] <bvibber>	 lemme test
[21:12:48] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Cherry-picked on the beta cluster following these instructions. I think it works? I was a bit confused for a while, since it didn't seem t" [puppet] - 10https://gerrit.wikimedia.org/r/1100534 (https://phabricator.wikimedia.org/T380551) (owner: 10Bartosz Dziewoński)
[21:13:00] <logmsgbot>	 !log cjming@deploy2002 cjming, bvibber: Backport for [[gerrit:1100544|Enable Chart extension on several pilot wikis (T381436 T381312)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:13:06] <stashbot>	 T381436: Enable Chart extension on mediawiki.org - https://phabricator.wikimedia.org/T381436
[21:13:07] <stashbot>	 T381312: Enable Charts extension on Swedish, Italian, Hebrew Wikipedia - https://phabricator.wikimedia.org/T381312
[21:13:18] <bvibber>	 cjming: looks good
[21:13:56] <wikibugs>	 (03CR) 10Tchanders: Ensure IP reveal buttons are not shown on Special:MassGlobalBlock (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100150 (https://phabricator.wikimedia.org/T124607) (owner: 10Tchanders)
[21:14:12] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "Anyway, this one is more complex than the rest of the stack, and it will affect production and not just the beta cluster, so careful revie" [puppet] - 10https://gerrit.wikimedia.org/r/1100534 (https://phabricator.wikimedia.org/T380551) (owner: 10Bartosz Dziewoński)
[21:14:18] <logmsgbot>	 !log cjming@deploy2002 cjming, bvibber: Continuing with sync
[21:14:37] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] "See here for fix patch we needed to ship to bring service.yaml state into alignment with what we'd had in conftool-data in the previous pa" [puppet] - 10https://gerrit.wikimedia.org/r/1094061 (https://phabricator.wikimedia.org/T380555) (owner: 10Ryan Kemper)
[21:14:46] <wikibugs>	 (03PS3) 10Pcoombe: CSP for banner preview: allow remind me later SMS host [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[21:15:56] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] webperf: disable statsd-exporter relaying flag [puppet] - 10https://gerrit.wikimedia.org/r/1099796 (https://phabricator.wikimedia.org/T355837) (owner: 10Cwhite)
[21:17:30] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1045.eqiad.wmnet with OS bookworm
[21:17:42] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381503 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm
[21:18:10] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-main
[21:18:17] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly
[21:19:09] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] wdqs-internal: add graph split disc DNS records [dns] - 10https://gerrit.wikimedia.org/r/1100165 (https://phabricator.wikimedia.org/T379334) (owner: 10Bking)
[21:19:40] <greg-g>	 cjming: ready when you are
[21:19:48] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] "Forgot to write in commit message but this was step 9 (the final step) of the lvs add a new service process" [dns] - 10https://gerrit.wikimedia.org/r/1100165 (https://phabricator.wikimedia.org/T379334) (owner: 10Bking)
[21:20:17] <cjming>	 greg-g: just waiting for bvibber's patch to finish syncing - any minute now
[21:20:29] <greg-g>	 ah, wasn't sure, coolio
[21:20:48] <greg-g>	 (just saw the rebase so thought you were ready ready ;) )
[21:21:06] <ryankemper>	 !log T379334 Final step (step 9) of spinning up these new services; merged https://gerrit.wikimedia.org/r/c/operations/dns/+/1100165/, next up is the authdns update
[21:21:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:10] <stashbot>	 T379334: Create DNS records for wdqs-internal-main and wdqs-internal-scholarly - https://phabricator.wikimedia.org/T379334
[21:22:03] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
[21:23:02] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1100544|Enable Chart extension on several pilot wikis (T381436 T381312)]] (duration: 17m 29s)
[21:23:07] <stashbot>	 T381436: Enable Chart extension on mediawiki.org - https://phabricator.wikimedia.org/T381436
[21:23:08] <stashbot>	 T381312: Enable Charts extension on Swedish, Italian, Hebrew Wikipedia - https://phabricator.wikimedia.org/T381312
[21:23:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1093401 (https://phabricator.wikimedia.org/T380232) (owner: 10Greg Grossmeier)
[21:23:26] <cjming>	 bvibber: should be live!
[21:23:32] <bvibber>	 cjming: thx!
[21:23:38] <cjming>	 yw
[21:23:53] <ryankemper>	 !log T379334 `ryankemper@dns1004:~$ sudo -i authdns-update` completed
[21:24:35] <greg-g>	 highfive to bvibber for being swat window buddies
[21:24:39] <greg-g>	 :)
[21:24:59] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
[21:25:07] <bvibber>	 greg-g: \o
[21:25:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:25:20] <icinga-wm>	 PROBLEM - Host lvs2013 is DOWN: PING CRITICAL - Packet loss = 100%
[21:25:40] <icinga-wm>	 RECOVERY - Host lvs2013 is UP: PING OK - Packet loss = 0%, RTA = 33.29 ms
[21:25:40] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 (Connection refused) https://wikitech.wikimedia.org/wiki/PyBal
[21:25:47] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1093401|CSP for banner preview: allow remind me later SMS host (T380232)]]
[21:25:50] <stashbot>	 T380232: Add app.goacoustic.com to wikipedia.org Content Security Policy (CSP) - https://phabricator.wikimedia.org/T380232
[21:26:06] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.dns.wipe-cache wdqs-internal-main.discovery.wmnet on all recursors
[21:26:10] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-internal-main.discovery.wmnet on all recursors
[21:26:14] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.dns.wipe-cache wdqs-internal-scholarly.discovery.wmnet on all recursors
[21:26:17] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-internal-scholarly.discovery.wmnet on all recursors
[21:26:20] <icinga-wm>	 PROBLEM - pybal on lvs2013 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[21:26:48] <greg-g>	 these pybal errors ok?
[21:27:09] <sukhe>	 yes please
[21:27:23] <sukhe>	 that host is drained, we are bringing it back up and should go away
[21:27:40] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs2013 is CRITICAL: CRITICAL: 0 connections established with conf2004.codfw.wmnet:4001 (min=85) https://wikitech.wikimedia.org/wiki/PyBal
[21:28:05] <greg-g>	 sukhe: cool, so OK to proceed with deploys?
[21:28:42] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[21:28:55] <sukhe>	 greg-g: please do, it's up now (and should not affect it regardless of that)
[21:29:00] <greg-g>	 cool
[21:29:03] <sukhe>	 thanks for checking
[21:29:20] <icinga-wm>	 RECOVERY - pybal on lvs2013 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[21:29:24] <icinga-wm>	 RECOVERY - BGP status on lsw1-c2-codfw.mgmt is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:31:29] <cjming>	 greg-g: up on test servers if testable
[21:31:58] <greg-g>	 cjming: testing
[21:32:01] <logmsgbot>	 !log cjming@deploy2002 cjming, gjg: Backport for [[gerrit:1093401|CSP for banner preview: allow remind me later SMS host (T380232)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:32:04] <stashbot>	 T380232: Add app.goacoustic.com to wikipedia.org Content Security Policy (CSP) - https://phabricator.wikimedia.org/T380232
[21:32:20] <greg-g>	 k8s-mwdebug?
[21:32:40] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs2013 is OK: OK: 85 connections established with conf2004.codfw.wmnet:4001 (min=85) https://wikitech.wikimedia.org/wiki/PyBal
[21:32:43] <cjming>	 mwdebug - yes
[21:34:13] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:34:23] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1086.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[21:34:58] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be1086.eqiad.wmnet with OS bullseye
[21:35:06] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host ms-be1086.eqiad.wmnet with OS bullseye
[21:35:29] <greg-g>	 hmm, not sure, I'm still getting the CSP policy violation error, but not sure if that's because of how things are setup on mwdebug and csp
[21:36:00] <cjming>	 do you want to abort or continue?
[21:36:54] <greg-g>	 can you continue and I'll get the security team to review the state? the worst case is that we just didn't open it far enough
[21:37:04] <cjming>	 sure thing
[21:37:08] <logmsgbot>	 !log cjming@deploy2002 cjming, gjg: Continuing with sync
[21:37:09] <greg-g>	 ie: if anything it just means we're still locked down too much
[21:40:18] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[21:43:27] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1093401|CSP for banner preview: allow remind me later SMS host (T380232)]] (duration: 17m 39s)
[21:43:30] <stashbot>	 T380232: Add app.goacoustic.com to wikipedia.org Content Security Policy (CSP) - https://phabricator.wikimedia.org/T380232
[21:43:53] <cjming>	 greg-g: should be live :)
[21:43:54] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be - jclark@cumin1002"
[21:43:58] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be - jclark@cumin1002"
[21:43:58] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:44:44] <greg-g>	 cjming: thanks! I'll follow-up with security and fundraising on this. All good for now!
[21:45:00] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host es1045
[21:45:28] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
[21:46:06] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1045
[21:46:29] <cjming>	 great - closing window then
[21:46:32] <cjming>	 !log end of UTC late backport window
[21:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:39] <cjming>	 jeena: all yours
[21:46:45] <jeena>	 thank you
[21:47:42] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.44.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100547 (https://phabricator.wikimedia.org/T375665)
[21:47:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.44.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100547 (https://phabricator.wikimedia.org/T375665) (owner: 10TrainBranchBot)
[21:48:22] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.44.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1100547 (https://phabricator.wikimedia.org/T375665) (owner: 10TrainBranchBot)
[21:49:06] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
[21:54:28] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:54:32] <greg-g>	 cjming: just to say it, my test case was old, got a new banner and it worked, all good!
[21:57:35] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1044.eqiad.wmnet with OS bookworm
[21:57:41] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1043.eqiad.wmnet with OS bookworm
[21:57:50] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381630 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1044.eqiad.wmnet with OS bookworm ex...
[21:57:53] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381631 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm ex...
[21:59:34] <logmsgbot>	 !log jhuneidi@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.6  refs T375665
[21:59:37] <stashbot>	 T375665: 1.44.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T375665
[22:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241204T2200)
[22:03:01] <wikibugs>	 (03PS2) 10Thcipriani: Reinstate the banner for the developer survey [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1100163 (owner: 10Hashar)
[22:03:01] <wikibugs>	 (03CR) 10Thcipriani: "Got you the privacy link, I'll get the survey link Soon™ Thank you for this ❤️" [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1100163 (owner: 10Hashar)
[22:05:23] <wikibugs>	 (03Abandoned) 10Thcipriani: Add a banner for the 2024 developer survey [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1100162 (https://phabricator.wikimedia.org/T351109) (owner: 10Thcipriani)
[22:06:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-logging-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[22:10:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T371742)', diff saved to https://phabricator.wikimedia.org/P71563 and previous config saved to /var/cache/conftool/dbconfig/20241204-221001-ladsgroup.json
[22:10:05] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[22:11:44] <wikibugs>	 (03PS1) 10Eevans: cassandra: configurations merged from upstream 4.1.7 [puppet] - 10https://gerrit.wikimedia.org/r/1100549 (https://phabricator.wikimedia.org/T380420)
[22:12:01] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[22:12:18] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[22:12:19] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1086.eqiad.wmnet with OS bullseye
[22:12:32] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381664 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host ms-be1086.eqiad.wmnet with OS bullseye complete...
[22:13:02] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381665 (10Jclark-ctr)
[22:13:26] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1040.eqiad.wmnet with OS bookworm
[22:13:46] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1041.eqiad.wmnet with OS bookworm
[22:16:48] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1044.eqiad.wmnet with OS bookworm
[22:17:00] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381670 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1044.eqiad.wmnet with OS bookworm
[22:18:09] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be1085.eqiad.wmnet with OS bullseye
[22:18:18] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381671 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host ms-be1085.eqiad.wmnet with OS bullseye
[22:25:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71564 and previous config saved to /var/cache/conftool/dbconfig/20241204-222509-ladsgroup.json
[22:26:22] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1043.eqiad.wmnet with OS bookworm
[22:26:34] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381680 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm
[22:29:25] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1040.eqiad.wmnet with reason: host reimage
[22:30:00] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1041.eqiad.wmnet with reason: host reimage
[22:32:14] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1040.eqiad.wmnet with reason: host reimage
[22:33:13] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1044.eqiad.wmnet with reason: host reimage
[22:34:19] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1042.eqiad.wmnet with OS bookworm
[22:34:44] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1043.eqiad.wmnet with OS bookworm
[22:35:57] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1041.eqiad.wmnet with reason: host reimage
[22:37:44] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1045.eqiad.wmnet with OS bookworm
[22:37:50] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381705 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1045.eqiad.wmnet with OS bookworm ex...
[22:38:21] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1044.eqiad.wmnet with reason: host reimage
[22:40:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71565 and previous config saved to /var/cache/conftool/dbconfig/20241204-224016-ladsgroup.json
[22:45:55] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10381717 (10Dzahn) Cool! I tested the downgrade and upgrade with APT as well on lists2001. Worked both ways.
[22:50:26] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] webperf: set statsd exporter timer type to histogram (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1099821 (https://phabricator.wikimedia.org/T355837) (owner: 10Cwhite)
[22:50:37] <wikibugs>	 (03PS3) 10Cwhite: webperf: set statsd exporter timer type to histogram [puppet] - 10https://gerrit.wikimedia.org/r/1099821 (https://phabricator.wikimedia.org/T355837)
[22:50:37] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1042.eqiad.wmnet with reason: host reimage
[22:50:38] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1043.eqiad.wmnet with reason: host reimage
[22:51:32] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1040.eqiad.wmnet with OS bookworm
[22:54:29] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1042.eqiad.wmnet with reason: host reimage
[22:54:55] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1041.eqiad.wmnet with OS bookworm
[22:55:23] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T371742)', diff saved to https://phabricator.wikimedia.org/P71566 and previous config saved to /var/cache/conftool/dbconfig/20241204-225523-ladsgroup.json
[22:55:25] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[22:55:26] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[22:55:39] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[22:55:46] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1203 (T371742)', diff saved to https://phabricator.wikimedia.org/P71567 and previous config saved to /var/cache/conftool/dbconfig/20241204-225545-ladsgroup.json
[22:56:31] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[22:57:42] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1043.eqiad.wmnet with reason: host reimage
[22:58:54] <icinga-wm>	 RECOVERY - BGP status on lsw1-e3-eqiad.mgmt is OK: BGP OK - up: 26, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[23:04:27] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: load-dcatap-weekly.service on wdqs2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:04:34] <jinxer-wm>	 FIRING: [14x] ProbeDown: Service wdqs1026:443 has failed probes (http_wdqs_internal_sparql_endpoint_search_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[23:06:12] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): Backport facter to bullseye - https://phabricator.wikimedia.org/T381538 (10jhathaway) 03NEW
[23:06:25] <ryankemper>	 ^dcatap alerts are from stale systemd units that need to be cleaned up. the probedown on wdqs1026 i’ll investigate when back near computer
[23:06:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): Backport facter to bullseye - https://phabricator.wikimedia.org/T381538#10381764 (10jhathaway) p:05Triage→03Low
[23:08:51] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): Backport facter to bullseye - https://phabricator.wikimedia.org/T381538#10381775 (10jhathaway)
[23:10:00] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1043.eqiad.wmnet with OS bookworm
[23:10:06] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381781 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm ex...
[23:13:32] <icinga-wm>	 RECOVERY - BGP status on lsw1-f2-eqiad.mgmt is OK: BGP OK - up: 12, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[23:13:46] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1042.eqiad.wmnet with OS bookworm
[23:16:26] <icinga-wm>	 RECOVERY - BGP status on lsw1-f3-eqiad.mgmt is OK: BGP OK - up: 22, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[23:16:36] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1043.eqiad.wmnet with OS bookworm
[23:20:54] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[23:21:35] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[23:26:36] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1085.eqiad.wmnet with OS bullseye
[23:26:44] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381819 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host ms-be1085.eqiad.wmnet with OS bullseye executed...
[23:32:15] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[23:32:44] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[23:32:50] <wikibugs>	 (03CR) 10Cwhite: [V:03+2 C:03+2] webperf: set statsd exporter timer type to histogram [puppet] - 10https://gerrit.wikimedia.org/r/1099821 (https://phabricator.wikimedia.org/T355837) (owner: 10Cwhite)
[23:35:07] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[23:35:36] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[23:39:42] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[23:40:36] <logmsgbot>	 !log amastilovic@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[23:42:29] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:42:31] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1044.eqiad.wmnet with OS bookworm
[23:42:39] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381845 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host es1044.eqiad.wmnet with OS bookworm co...
[23:43:25] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381848 (10Jclark-ctr)
[23:43:40] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be1085.eqiad.wmnet with OS bullseye
[23:43:54] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be10{83-91} - https://phabricator.wikimedia.org/T371389#10381849 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host ms-be1085.eqiad.wmnet with OS bullseye
[23:47:04] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host es1043.eqiad.wmnet with OS bookworm
[23:47:13] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-Automations, and 2 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10381853 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host es1043.eqiad.wmnet with OS bookworm
[23:54:36] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1085.eqiad.wmnet with reason: host reimage
[23:57:37] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1085.eqiad.wmnet with reason: host reimage
[23:59:55] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 23:00:00 on 8 hosts with reason: T376150 non-prod hosts
[23:59:58] <stashbot>	 T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly - https://phabricator.wikimedia.org/T376150