[00:08:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: man-db.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:18:17] <icinga-wm>	 PROBLEM - SSH on bast7001 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:19:17] <icinga-wm>	 RECOVERY - SSH on bast7001 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:38:26] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1091924
[00:38:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1091924 (owner: 10TrainBranchBot)
[01:08:27] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1091925
[01:08:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1091925 (owner: 10TrainBranchBot)
[01:13:45] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1091924 (owner: 10TrainBranchBot)
[01:41:13] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1091925 (owner: 10TrainBranchBot)
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:51] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.009e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[04:08:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: man-db.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:29:33] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:30:13] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:52:04] <wikibugs>	 (03PS1) 10KartikMistry: Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300)
[05:52:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300) (owner: 10KartikMistry)
[05:55:47] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 18 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300) (owner: 10KartikMistry)
[06:12:02] <wikibugs>	 (03PS2) 10KartikMistry: Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300)
[06:12:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300) (owner: 10KartikMistry)
[06:14:21] <wikibugs>	 (03PS3) 10KartikMistry: Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300)
[06:18:20] <kart_>	 Doing quick installation of MinT on eqiad..
[06:19:03] <kart_>	 err. deployment :)
[06:19:19] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
[06:28:50] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
[06:31:17] <kart_>	 !log Updated MinT to 2024-10-16-065051-production on eqiad
[06:31:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:57:49] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[07:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:22:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] wikikube-staging: put kubestage2003 and 2004 into production [puppet] - 10https://gerrit.wikimedia.org/r/1091783 (https://phabricator.wikimedia.org/T377011) (owner: 10Jasmine)
[07:35:01] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144 (10MoritzMuehlenhoff) 03NEW
[07:46:05] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
[07:46:07] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
[07:46:10] <stashbot>	 T373037: Make ParserCache more like a ring - https://phabricator.wikimedia.org/T373037
[07:46:13] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
[07:46:16] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
[07:46:17] <stashbot>	 T378068: pc1017 crashed - https://phabricator.wikimedia.org/T378068
[07:47:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
[07:48:14] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330008 (10ops-monitoring-bot) Draining ganeti1020.eqiad.wmnet of running VMs
[07:50:17] <wikibugs>	 (03PS2) 10Stevemunene: airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440)
[07:50:17] <wikibugs>	 (03PS2) 10Stevemunene: airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440)
[07:51:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
[07:52:14] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
[07:52:25] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330017 (10ops-monitoring-bot) Draining ganeti1020.eqiad.wmnet of running VMs
[07:54:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
[07:56:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
[07:56:22] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330021 (10ops-monitoring-bot) Draining ganeti1021.eqiad.wmnet of running VMs
[07:57:39] <wikibugs>	 (03PS1) 10Stevemunene: airflow-analytics-product: create user kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1092180 (https://phabricator.wikimedia.org/T378440)
[07:57:41] <wikibugs>	 (03PS1) 10Stevemunene: airflow-analytics-product: create OIDC config [puppet] - 10https://gerrit.wikimedia.org/r/1092181 (https://phabricator.wikimedia.org/T378440)
[07:57:42] <wikibugs>	 (03PS1) 10Stevemunene: airflow-analytics-product: create ATS mapping and caching config [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440)
[07:59:25] <wikibugs>	 (03CR) 10Joal: [C:03+1] "Thank you for the investigation and findings @btullis" [puppet] - 10https://gerrit.wikimedia.org/r/1090842 (https://phabricator.wikimedia.org/T376118) (owner: 10Btullis)
[07:59:27] <wikibugs>	 (03CR) 10Stevemunene: airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[08:00:04] * Hamishcz says hi
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: #bothumor I � Unicode. All rise for UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T0800).
[08:00:05] <jouncebot>	 Hamishcz and kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:01:02] <kart_>	 here
[08:01:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
[08:01:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
[08:02:03] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330027 (10ops-monitoring-bot) Draining ganeti1021.eqiad.wmnet of running VMs
[08:03:33] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
[08:05:14] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
[08:05:24] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330028 (10ops-monitoring-bot) Draining ganeti1021.eqiad.wmnet of running VMs
[08:06:23] <kart_>	 Hamishcz: Do you need help in deployment?
[08:07:04] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
[08:07:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
[08:07:42] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10330029 (10ops-monitoring-bot) Draining ganeti1021.eqiad.wmnet of running VMs
[08:08:09] <Hamishcz>	 kart_: what kind of help, for example?
[08:08:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: man-db.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:12:41] <Hamishcz>	 kart_: maybe u mean, you can help me deploy my patch?
[08:15:23] <wikibugs>	 (03CR) 10Arnaudb: sre.mysql.sanitize-wiki: sanitize wiki cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb)
[08:17:31] <kart_>	 Hamishcz: yes. Do you want me to deploy?
[08:17:46] <Hamishcz>	 ah yes, appreciate
[08:17:50] <kart_>	 :)
[08:18:32] <Hamishcz>	 :) I'm sorry I misunderstood at first
[08:18:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091912 (https://phabricator.wikimedia.org/T375054) (owner: 10Hamish)
[08:19:31] <wikibugs>	 (03Merged) 10jenkins-bot: bjnwikiquote: Add local logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091912 (https://phabricator.wikimedia.org/T375054) (owner: 10Hamish)
[08:20:12] <logmsgbot>	 !log kartik@deploy2002 Started scap sync-world: Backport for [[gerrit:1091912|bjnwikiquote: Add local logo (T375054)]]
[08:20:16] <stashbot>	 T375054: Requesting logo change for bjn.wikiquote.org - https://phabricator.wikimedia.org/T375054
[08:20:29] <wikibugs>	 (03PS1) 10Slyngshede: Version 0.2.0. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/1092184
[08:29:35] <icinga-wm>	 PROBLEM - BGP status on cr2-drmrs is CRITICAL: BGP CRITICAL - AS5511/IPv6: Connect - Orange https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:30:34] <Hamishcz>	 confirmed good on debug server
[08:30:48] <logmsgbot>	 !log kartik@deploy2002 kartik, hamishz: Backport for [[gerrit:1091912|bjnwikiquote: Add local logo (T375054)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:30:52] <stashbot>	 T375054: Requesting logo change for bjn.wikiquote.org - https://phabricator.wikimedia.org/T375054
[08:31:09] <kart_>	 Hamishcz: nice! 
[08:31:15] <kart_>	 Hamishcz: going ahead..
[08:31:19] <logmsgbot>	 !log kartik@deploy2002 kartik, hamishz: Continuing with sync
[08:33:42] <wikibugs>	 (03PS2) 10Slyngshede: Version 0.1.0. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/1092184
[08:37:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM. At this point we can stop building bitu-ldap for buster, it's still installed on mwmaint*, but no longer used since the functionalit" [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/1092184 (owner: 10Slyngshede)
[08:37:56] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Version 0.1.0. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/1092184 (owner: 10Slyngshede)
[08:38:31] <wikibugs>	 (03PS3) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618)
[08:39:34] <wikibugs>	 (03Merged) 10jenkins-bot: Version 0.1.0. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/1092184 (owner: 10Slyngshede)
[08:40:02] <wikibugs>	 (03PS4) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618)
[08:40:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add two new Airflow LDAP groups to be considered for offboarding [puppet] - 10https://gerrit.wikimedia.org/r/1091735 (https://phabricator.wikimedia.org/T375729) (owner: 10Muehlenhoff)
[08:40:51] <wikibugs>	 (03CR) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618) (owner: 10Elukey)
[08:43:07] <logmsgbot>	 !log kartik@deploy2002 Finished scap sync-world: Backport for [[gerrit:1091912|bjnwikiquote: Add local logo (T375054)]] (duration: 22m 55s)
[08:43:11] <stashbot>	 T375054: Requesting logo change for bjn.wikiquote.org - https://phabricator.wikimedia.org/T375054
[08:44:13] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
[08:44:19] <kart_>	 Hamishcz: Done!
[08:44:25] <kart_>	 I'm going with my patch..
[08:44:27] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
[08:44:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300) (owner: 10KartikMistry)
[08:45:04] <Hamishcz>	 but I cannot load the logo from my end, why?
[08:45:06] <Hamishcz>	 https://bjn.wikiquote.org/wiki/Laman_Tatambaian
[08:45:36] <wikibugs>	 (03Merged) 10jenkins-bot: Enable the Contribute menu in 2nd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091932 (https://phabricator.wikimedia.org/T375300) (owner: 10KartikMistry)
[08:45:52] <logmsgbot>	 !log kartik@deploy2002 Started scap sync-world: Backport for [[gerrit:1091932|Enable the Contribute menu in 2nd group of Wikis (T375300)]]
[08:45:56] <stashbot>	 T375300: Enable the Contribute menu in 2nd group of wikis where translation experience is available on mobile - https://phabricator.wikimedia.org/T375300
[08:49:10] <wikibugs>	 (03PS5) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618)
[08:49:35] <logmsgbot>	 !log kartik@deploy2002 kartik: Backport for [[gerrit:1091932|Enable the Contribute menu in 2nd group of Wikis (T375300)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:50:10] <Hamishcz>	 ah good now
[08:50:15] <Hamishcz>	 maybe cache problem,
[08:50:21] <Hamishcz>	 kart_: thanks!
[08:52:54] <wikibugs>	 (03PS1) 10Muehlenhoff: Add one more Airflow LDAP group to be considered for offboarding [puppet] - 10https://gerrit.wikimedia.org/r/1092186 (https://phabricator.wikimedia.org/T375729)
[08:53:01] <logmsgbot>	 !log kartik@deploy2002 kartik: Continuing with sync
[08:55:22] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 40850
[08:55:47] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40850
[08:57:10] <wikibugs>	 (03PS6) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618)
[08:57:37] <logmsgbot>	 !log kartik@deploy2002 Finished scap sync-world: Backport for [[gerrit:1091932|Enable the Contribute menu in 2nd group of Wikis (T375300)]] (duration: 11m 45s)
[08:57:41] <stashbot>	 T375300: Enable the Contribute menu in 2nd group of wikis where translation experience is available on mobile - https://phabricator.wikimedia.org/T375300
[08:59:01] <wikibugs>	 (03CR) 10Elukey: docker_registry_ha: limit /v2/_catalog to internal IPs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1091597 (https://phabricator.wikimedia.org/T378618) (owner: 10Elukey)
[09:05:09] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1092186 (https://phabricator.wikimedia.org/T375729) (owner: 10Muehlenhoff)
[09:12:13] <icinga-wm>	 PROBLEM - Router interfaces on cr2-drmrs is CRITICAL: CRITICAL: host 185.15.58.129, interfaces up: 60, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:15:45] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Drop Python support for 3.7, 3.8, add 3.11 (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/1029209 (owner: 10Volans)
[09:17:04] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] trafficserver: remove inbound TLS and related settings (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1091748 (owner: 10Ssingh)
[09:17:59] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[09:18:13] <icinga-wm>	 RECOVERY - Router interfaces on cr2-drmrs is OK: OK: host 185.15.58.129, interfaces up: 61, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:18:18] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[09:18:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good, two typos inline" [software/bitu] - 10https://gerrit.wikimedia.org/r/1090852 (owner: 10Slyngshede)
[09:24:35] <moritzm>	 !log installing openssl security updates
[09:24:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:12] <wikibugs>	 (03CR) 10Volans: [V:03+2 C:03+2] "Force merging the next CR in the series fixes mypy" [software/cumin] - 10https://gerrit.wikimedia.org/r/1029209 (owner: 10Volans)
[09:26:24] <wikibugs>	 (03CR) 10Volans: [C:03+2] Use importlib.metadata instead of pkg_resources [software/cumin] - 10https://gerrit.wikimedia.org/r/1029210 (owner: 10Volans)
[09:34:35] <icinga-wm>	 RECOVERY - BGP status on cr2-drmrs is OK: BGP OK - up: 114, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:34:53] <wikibugs>	 (03PS3) 10DCausse: rdf-streaming-updater: bump to 0.3.150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091306 (https://phabricator.wikimedia.org/T376598)
[09:34:53] <wikibugs>	 (03PS1) 10DCausse: rdf-streaming-updater: produce rdf_change v2 events [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092191 (https://phabricator.wikimedia.org/T374919)
[09:35:06] <dcausse>	 jouncebot: nowandnext
[09:35:06] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 24 minute(s)
[09:35:06] <jouncebot>	 In 1 hour(s) and 24 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1100)
[09:35:12] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] Add new namespaces to hsb wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1090502 (https://phabricator.wikimedia.org/T373634) (owner: 10Srishakatux)
[09:35:59] <wikibugs>	 (03CR) 10DCausse: [C:04-1] "needs Ife016662f5fde835c21457ef457b567d9be61d2a to be fully deployed everywhere" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092191 (https://phabricator.wikimedia.org/T374919) (owner: 10DCausse)
[09:42:09] <wikibugs>	 (03Merged) 10jenkins-bot: Use importlib.metadata instead of pkg_resources [software/cumin] - 10https://gerrit.wikimedia.org/r/1029210 (owner: 10Volans)
[09:42:28] <moritzm>	 !log restarting nginx on acmechief hosts to pick up openssl updates
[09:42:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:54] <wikibugs>	 (03CR) 10Volans: [C:03+2] Add support for Python 3.12 [software/cumin] - 10https://gerrit.wikimedia.org/r/1090504 (owner: 10Volans)
[09:43:22] <wikibugs>	 (03PS2) 10Slyngshede: Prevalidation of permissions [software/bitu] - 10https://gerrit.wikimedia.org/r/1090852
[09:44:59] <wikibugs>	 (03CR) 10DCausse: [C:03+2] rdf-streaming-updater: bump to 0.3.150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091306 (https://phabricator.wikimedia.org/T376598) (owner: 10DCausse)
[09:45:00] <wikibugs>	 (03PS1) 10Jelto: wikidata-query-gui: add querybuilder releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092192 (https://phabricator.wikimedia.org/T350793)
[09:46:16] <wikibugs>	 (03Merged) 10jenkins-bot: rdf-streaming-updater: bump to 0.3.150 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091306 (https://phabricator.wikimedia.org/T376598) (owner: 10DCausse)
[09:47:33] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[09:47:59] <logmsgbot>	 !log dcausse@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[09:48:11] <wikibugs>	 (03PS2) 10Arnaudb: sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[09:48:25] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:49:00] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:49:10] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: create user kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1092180 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:49:25] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: create OIDC config [puppet] - 10https://gerrit.wikimedia.org/r/1092181 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:50:02] <wikibugs>	 (03CR) 10Brouberol: [C:04-1] "You're missing the caching config in `hieradata/role/common/cache/text.yaml`" [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:50:09] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:50:47] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:51:07] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-analytics-product: create user kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1092180 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:51:10] <wikibugs>	 (03PS1) 10Elukey: redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193
[09:51:26] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-analytics-product: create OIDC config [puppet] - 10https://gerrit.wikimedia.org/r/1092181 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[09:51:36] <wikibugs>	 (03PS2) 10Elukey: redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193
[09:53:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[09:54:38] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Prevalidation of permissions (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/1090852 (owner: 10Slyngshede)
[09:55:18] <wikibugs>	 (03PS1) 10Btullis: Add spark version 3.5.3 to production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035)
[09:57:08] <wikibugs>	 (03Merged) 10jenkins-bot: Prevalidation of permissions [software/bitu] - 10https://gerrit.wikimedia.org/r/1090852 (owner: 10Slyngshede)
[09:57:53] <wikibugs>	 (03PS2) 10Stevemunene: airflow-analytics-product: create ATS mapping and caching config [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440)
[09:58:04] <wikibugs>	 (03Merged) 10jenkins-bot: Add support for Python 3.12 [software/cumin] - 10https://gerrit.wikimedia.org/r/1090504 (owner: 10Volans)
[09:58:07] <wikibugs>	 (03CR) 10Volans: [C:04-1] "Makes sense, needs a tweak because of old requests on bullseye." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[09:58:17] <wikibugs>	 (03CR) 10Volans: [C:03+2] Integration tests: use linuxserver/openssh-server [software/cumin] - 10https://gerrit.wikimedia.org/r/1090505 (owner: 10Volans)
[09:59:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1092195 (https://phabricator.wikimedia.org/T135991)
[10:02:31] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] wikidata-query-gui: add querybuilder releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092192 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[10:03:12] <wikibugs>	 (03PS3) 10Elukey: redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193
[10:03:35] <wikibugs>	 (03CR) 10Elukey: redfish: add response logging for request() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[10:07:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Cumin alias for liberica [puppet] - 10https://gerrit.wikimedia.org/r/1092196
[10:10:29] <wikibugs>	 (03PS3) 10Stevemunene: airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440)
[10:10:29] <wikibugs>	 (03PS3) 10Stevemunene: airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440)
[10:10:29] <wikibugs>	 (03PS1) 10Stevemunene: airflow-analytics-product: define namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092197 (https://phabricator.wikimedia.org/T378443)
[10:11:05] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] "please do not merge this till the applayer endpoint is ready:" [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[10:13:16] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "thx!" [puppet] - 10https://gerrit.wikimedia.org/r/1092196 (owner: 10Muehlenhoff)
[10:13:20] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:13:31] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:14:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[10:14:14] <wikibugs>	 (03Merged) 10jenkins-bot: Integration tests: use linuxserver/openssh-server [software/cumin] - 10https://gerrit.wikimedia.org/r/1090505 (owner: 10Volans)
[10:14:46] <logmsgbot>	 !log fabfur@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo
[10:14:54] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:14:56] <fabfur>	 !log upgrade haproxy on cp-ulsfo (T379891)
[10:14:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:00] <stashbot>	 T379891: Upgrade haproxy to 2.8.12 on cp hosts - https://phabricator.wikimedia.org/T379891
[10:15:05] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:16:57] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: define namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092197 (https://phabricator.wikimedia.org/T378443) (owner: 10Stevemunene)
[10:17:48] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10330345 (10eoghan) a:03eoghan
[10:21:27] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "I don’t fully understand it, but IMHO it’s fine to try this out and revert if needed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092192 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[10:22:50] <wikibugs>	 (03PS1) 10Volans: doc: don't fail on warning on readthedocs [software/cumin] - 10https://gerrit.wikimedia.org/r/1092199
[10:25:44] <wikibugs>	 (03CR) 10Elukey: [C:03+1] doc: don't fail on warning on readthedocs [software/cumin] - 10https://gerrit.wikimedia.org/r/1092199 (owner: 10Volans)
[10:26:17] <wikibugs>	 (03CR) 10Brouberol: Add spark version 3.5.3 to production images (032 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[10:27:20] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:27:31] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:27:59] <wikibugs>	 (03CR) 10Jelto: [C:03+2] wikidata-query-gui: add querybuilder releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092192 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[10:29:21] <wikibugs>	 (03Merged) 10jenkins-bot: wikidata-query-gui: add querybuilder releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092192 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[10:33:02] <wikibugs>	 (03CR) 10Elukey: redfish: add response logging for request() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[10:33:16] <wikibugs>	 (03CR) 10Brouberol: Add spark version 3.5.3 to production images (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[10:35:15] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:35:35] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:36:53] <wikibugs>	 (03CR) 10Btullis: Add spark version 3.5.3 to production images (032 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[10:37:18] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:38:58] <wikibugs>	 (03CR) 10Btullis: Add spark version 3.5.3 to production images (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[10:39:42] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:41:16] <logmsgbot>	 !log dcausse@deploy2002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[10:41:26] <logmsgbot>	 !log dcausse@deploy2002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[10:41:38] <wikibugs>	 (03CR) 10Volans: "LGTM, just run `tox -e py3-format` to fix CI" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[10:41:48] <wikibugs>	 (03CR) 10Volans: [C:03+2] doc: don't fail on warning on readthedocs [software/cumin] - 10https://gerrit.wikimedia.org/r/1092199 (owner: 10Volans)
[10:43:12] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:43:23] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:43:31] <wikibugs>	 (03CR) 10FNegri: [C:03+2] "TIL! I didn't know about `keep_firing_for`, it looks like it's mostly designed for flapping alerts, I wonder if setting it to "24h" could " [alerts] - 10https://gerrit.wikimedia.org/r/1088585 (https://phabricator.wikimedia.org/T379378) (owner: 10FNegri)
[10:45:10] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:45:20] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:46:31] <logmsgbot>	 !log dcausse@deploy2002 helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
[10:46:45] <logmsgbot>	 !log dcausse@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
[10:47:03] <wikibugs>	 (03CR) 10Brouberol: Add spark version 3.5.3 to production images (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[10:49:50] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:50:00] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:50:18] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:50:28] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[10:55:50] <wikibugs>	 (03CR) 10Vgutierrez: haproxykafka: working on TLS client authentication to kafka (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1090915 (https://phabricator.wikimedia.org/T379776) (owner: 10Fabfur)
[10:56:11] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10330511 (10elukey) My bad, I misremembered that we got the firmware for config J from Supermicro already (somehow I thought it was for the ganeti nodes,...
[10:57:47] <wikibugs>	 (03Merged) 10jenkins-bot: doc: don't fail on warning on readthedocs [software/cumin] - 10https://gerrit.wikimedia.org/r/1092199 (owner: 10Volans)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1100)
[11:04:38] <wikibugs>	 (03PS1) 10Aklapper: phabricator weekly changes email: Sort newcomers by claim date [puppet] - 10https://gerrit.wikimedia.org/r/1092205
[11:04:48] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Revert "Allow other input and changes to trigger searchsuggestions to update" [core] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1091605 (https://phabricator.wikimedia.org/T379983) (owner: 10Samtar)
[11:09:47] <wikibugs>	 (03PS6) 10Fabfur: haproxykafka: working on TLS client authentication to kafka [puppet] - 10https://gerrit.wikimedia.org/r/1090915 (https://phabricator.wikimedia.org/T379776)
[11:12:54] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1090915 (https://phabricator.wikimedia.org/T379776) (owner: 10Fabfur)
[11:14:21] <wikibugs>	 (03PS2) 10Btullis: Add spark version 3.5.3 to production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035)
[11:14:54] <wikibugs>	 (03CR) 10Btullis: Add spark version 3.5.3 to production images (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[11:16:05] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:18:05] <wikibugs>	 (03CR) 10Fabfur: haproxykafka: working on TLS client authentication to kafka (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1090915 (https://phabricator.wikimedia.org/T379776) (owner: 10Fabfur)
[11:18:42] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: create user kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1092180 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:20:03] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: define namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092197 (https://phabricator.wikimedia.org/T378443) (owner: 10Stevemunene)
[11:21:16] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:21:21] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Enable deletion of unused segments on the druid-analytics cluster [puppet] - 10https://gerrit.wikimedia.org/r/1090842 (https://phabricator.wikimedia.org/T376118) (owner: 10Btullis)
[11:23:15] <wikibugs>	 (03PS2) 10Aqu: EventStreamConfig: Enable Hive Ingestion for most streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[11:23:51] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-analytics-product: define namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092197 (https://phabricator.wikimedia.org/T378443) (owner: 10Stevemunene)
[11:24:32] <wikibugs>	 (03CR) 10Aqu: [C:03+1] "I've activated canary events for some streams." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[11:25:19] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:25:30] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:25:51] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:30:00] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-analytics-product: register namespace in ceph-csi and cloudnative-pg operator configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091199 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:33:19] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
[11:36:23] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1092181 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:38:16] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:38:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add Cumin alias for liberica [puppet] - 10https://gerrit.wikimedia.org/r/1092196 (owner: 10Muehlenhoff)
[11:39:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add one more Airflow LDAP group to be considered for offboarding [puppet] - 10https://gerrit.wikimedia.org/r/1092186 (https://phabricator.wikimedia.org/T375729) (owner: 10Muehlenhoff)
[11:39:27] <wikibugs>	 (03Merged) 10jenkins-bot: airflow-analytics-product: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091200 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[11:40:59] <urbanecm>	 !log mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
[11:41:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:03] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:41:04] <stashbot>	 T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983
[11:41:27] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
[11:41:30] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
[11:41:31] <stashbot>	 T380131: Corrupt index on db2216 - https://phabricator.wikimedia.org/T380131
[11:41:32] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:41:52] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s1 #page on db2216 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[11:43:11] <wikibugs>	 (03PS1) 10Btullis: Add the thirdparty/bigtop15 component to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1092210 (https://phabricator.wikimedia.org/T378954)
[11:43:40] <kart_>	 OK to deploy ml-service ie recommendation-api?
[11:44:00] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4543/console" [puppet] - 10https://gerrit.wikimedia.org/r/1092210 (https://phabricator.wikimedia.org/T378954) (owner: 10Btullis)
[11:45:47] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:45:59] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:47:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
[11:54:57] <kart_>	 I'll wait till current window is over..
[11:56:05] <elukey>	 jouncebot: next
[11:56:05] <jouncebot>	 In 2 hour(s) and 3 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1400)
[11:58:22] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:58:55] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[11:59:14] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[11:59:40] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:00:38] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[12:02:14] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[12:03:06] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: create OIDC config [puppet] - 10https://gerrit.wikimedia.org/r/1092181 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[12:06:59] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10330858 (10elukey) Ok I found the issue, I asked Jenn to turn off IPv6 last week for the BMC network to test if that was the issue, but it was before upg...
[12:07:09] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be1005 - https://phabricator.wikimedia.org/T370453#10330860 (10elukey) @Jclark-ctr I updated the firmware to the correct one, but I'd need the BMC label password in pvt when you are in the DC (it is needed...
[12:08:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: man-db.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:08:36] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
[12:08:45] <kart_>	 elukey: I'll be deploying recommendation-api-ng now..
[12:09:10] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[12:10:02] <logmsgbot>	 !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:10:38] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update recommendation api to 2024-11-13-183159-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1089964 (https://phabricator.wikimedia.org/T379592) (owner: 10KartikMistry)
[12:11:44] <wikibugs>	 (03Merged) 10jenkins-bot: Update recommendation api to 2024-11-13-183159-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1089964 (https://phabricator.wikimedia.org/T379592) (owner: 10KartikMistry)
[12:11:48] <elukey>	 kart_: yes yes go ahead!
[12:11:59] <kart_>	 Thanks!
[12:12:07] <elukey>	 I think there is no policy for it, just ping the ml-team on their chan for notification
[12:12:22] <kart_>	 sure. noted!
[12:12:26] <klausman>	 ty :)
[12:13:06] <logmsgbot>	 !log kartik@deploy2002 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[12:13:23] <logmsgbot>	 !log fabfur@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo
[12:14:33] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[12:15:46] <logmsgbot>	 !log stevemunene@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[12:17:12] <wikibugs>	 06SRE, 06collaboration-services: gitlab runners don't have the apt.wikimedia.org key - https://phabricator.wikimedia.org/T380164#10330906 (10MatthewVernon)
[12:19:31] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
[12:19:50] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092223
[12:21:33] <logmsgbot>	 !log btullis@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1018.eqiad.wmnet with OS bullseye
[12:22:07] <logmsgbot>	 !log kartik@deploy2002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[12:22:11] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
[12:24:24] <logmsgbot>	 !log kartik@deploy2002 helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[12:29:55] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): an-presto1018.eqiad.wmnet: DRAC is down - https://phabricator.wikimedia.org/T378854#10330937 (10BTullis) 05Open→03Resolved I think that this is fixed now. I'm able to reimage an-presto1018 and connect to a SOL session, so...
[12:32:47] <wikibugs>	 (03CR) 10Stevemunene: "endpoint is ready" [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[12:36:17] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[12:36:19] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06serviceops: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10330969 (10Clement_Goubert) p:05Triage→03Medium a:03Jclark-ctr
[12:36:19] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.pool db2150 slowly with 10 steps - slow repool db2150 T380117
[12:36:23] <stashbot>	 T380117: Corrupt index on db2150 - https://phabricator.wikimedia.org/T380117
[12:37:19] <kart_>	 !log Updated recommendation api to 2024-11-13-183159-production (T379592, T379037)
[12:37:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:23] <stashbot>	 T379592: Unable to deploy new version of recommendation-api to production due to connectivity issues - https://phabricator.wikimedia.org/T379592
[12:37:23] <stashbot>	 T379037: Implement batching for collections data - https://phabricator.wikimedia.org/T379037
[12:38:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): an-presto1018.eqiad.wmnet: DRAC is down - https://phabricator.wikimedia.org/T378854#10330973 (10BTullis) Maybe I spoke too soon. I've had this error twice now, suggesting a failure to pull the boot image with TFTP, or similar....
[12:38:28] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:38:45] <logmsgbot>	 !log btullis@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1018.eqiad.wmnet with OS bullseye
[12:39:16] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
[12:40:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): an-presto1018.eqiad.wmnet: DRAC is down - https://phabricator.wikimedia.org/T378854#10330996 (10BTullis) Trying the reimage again with the note from https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_docum...
[12:42:21] <wikibugs>	 (03PS1) 10Jelto: wikidata-query-gui: update readiness_probe for querybuilder [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092232 (https://phabricator.wikimedia.org/T350793)
[12:48:02] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: fullstack: use git clone instead of direct fetch [puppet] - 10https://gerrit.wikimedia.org/r/1092233 (https://phabricator.wikimedia.org/T379356)
[12:48:23] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092233 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[12:49:41] <mvolz>	 jouncebot: nowandnext
[12:49:41] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 10 minute(s)
[12:49:41] <jouncebot>	 In 1 hour(s) and 10 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1400)
[12:50:16] <mvolz>	 Anyone mind if I use the open window to do a deploy on k8s? 
[12:53:33] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+2] openstack: nova: fullstack: use git clone instead of direct fetch [puppet] - 10https://gerrit.wikimedia.org/r/1092233 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[12:54:03] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
[12:55:00] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add spark version 3.5.3 to production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[12:55:56] <wikibugs>	 (03CR) 10Jelto: [C:03+2] wikidata-query-gui: update readiness_probe for querybuilder [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092232 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[12:56:52] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Very good news!" [puppet] - 10https://gerrit.wikimedia.org/r/1092210 (https://phabricator.wikimedia.org/T378954) (owner: 10Btullis)
[12:56:54] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
[12:57:04] <wikibugs>	 (03Merged) 10jenkins-bot: wikidata-query-gui: update readiness_probe for querybuilder [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092232 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[12:57:13] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow-analytics-product: create ATS mapping and caching config [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[12:58:33] <wikibugs>	 (03CR) 10Brouberol: [V:03+1 C:03+2] airflow: define the webserver.base_url configuration [puppet] - 10https://gerrit.wikimedia.org/r/1091654 (https://phabricator.wikimedia.org/T379267) (owner: 10Brouberol)
[13:00:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1092210 (https://phabricator.wikimedia.org/T378954) (owner: 10Btullis)
[13:01:58] <moritzm>	 !log removing ganeti1021 from active Ganeti nodes T378921
[13:02:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:02] <stashbot>	 T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921
[13:03:07] <wikibugs>	 (03CR) 10Btullis: [V:03+2 C:03+2] Add spark version 3.5.3 to production images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[13:03:49] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:04:00] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Add the thirdparty/bigtop15 component to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1092210 (https://phabricator.wikimedia.org/T378954) (owner: 10Btullis)
[13:04:10] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:04:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1092236
[13:05:13] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1021 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[13:05:13] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1021 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[13:06:27] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] airflow-analytics-product: create ATS mapping and caching config [puppet] - 10https://gerrit.wikimedia.org/r/1092182 (https://phabricator.wikimedia.org/T378440) (owner: 10Stevemunene)
[13:07:04] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1021:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:07:29] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092236 (owner: 10Muehlenhoff)
[13:13:33] <wikibugs>	 06SRE-OnFire, 06SRE Observability: Harden corto systemd service - https://phabricator.wikimedia.org/T372437#10331077 (10lmata)
[13:16:01] <urbanecm>	 !log mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
[13:16:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:06] <stashbot>	 T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983
[13:20:06] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bullseye
[13:24:33] <wikibugs>	 (03PS1) 10Effie Mouzeli: memcached: add mc-gp100[4-6] gutter servers [puppet] - 10https://gerrit.wikimedia.org/r/1092243 (https://phabricator.wikimedia.org/T377033)
[13:25:40] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:25:56] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:25:58] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] "Copied votes on follow-up patch sets have been updated:" [puppet] - 10https://gerrit.wikimedia.org/r/1091654 (https://phabricator.wikimedia.org/T379267) (owner: 10Brouberol)
[13:26:17] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): an-presto1018.eqiad.wmnet: DRAC is down - https://phabricator.wikimedia.org/T378854#10331151 (10BTullis) That worked, so we're all good.
[13:26:26] <topranks>	 !log stopping netbox service on netbox-next test server to restore new database backup from production 
[13:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:47] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] chromium-render: Add cli flag to avoid flooding with crashpad processes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1088271 (https://phabricator.wikimedia.org/T376438) (owner: 10Jgiannelos)
[13:26:53] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: fullstack: file link depends on git clone [puppet] - 10https://gerrit.wikimedia.org/r/1092244 (https://phabricator.wikimedia.org/T379356)
[13:27:27] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
[13:27:46] <wikibugs>	 (03PS2) 10Effie Mouzeli: memcached: add mc-gp100[4-6] gutter servers [puppet] - 10https://gerrit.wikimedia.org/r/1092243 (https://phabricator.wikimedia.org/T377033)
[13:27:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092244 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[13:27:57] <wikibugs>	 (03CR) 10Effie Mouzeli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092243 (https://phabricator.wikimedia.org/T377033) (owner: 10Effie Mouzeli)
[13:28:25] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1089701 (owner: 10PipelineBot)
[13:28:48] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:28:57] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:29:26] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1089701 (owner: 10PipelineBot)
[13:30:39] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[13:31:04] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[13:31:17] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:31:27] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:31:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1092236 (owner: 10Muehlenhoff)
[13:33:34] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[13:34:07] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[13:34:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+2] openstack: nova: fullstack: file link depends on git clone [puppet] - 10https://gerrit.wikimedia.org/r/1092244 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[13:34:44] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[13:35:09] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[13:35:26] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:35:34] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:37:23] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:37:31] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:39:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 18 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082726 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[13:39:46] <wikibugs>	 (03PS1) 10Jelto: wikidata-query-gui: fix namespace typo in gateway and service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092247 (https://phabricator.wikimedia.org/T350793)
[13:40:39] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: fullstack: subscribe service to git clone [puppet] - 10https://gerrit.wikimedia.org/r/1092248 (https://phabricator.wikimedia.org/T379356)
[13:40:58] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092248 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[13:42:13] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] wikidata-query-gui: fix namespace typo in gateway and service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092247 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[13:42:20] <wikibugs>	 (03CR) 10Jelto: [C:03+2] wikidata-query-gui: fix namespace typo in gateway and service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092247 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[13:42:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1092250 (https://phabricator.wikimedia.org/T378921)
[13:43:43] <wikibugs>	 (03Merged) 10jenkins-bot: wikidata-query-gui: fix namespace typo in gateway and service name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092247 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto)
[13:44:55] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] debug.json: add support for mwdebug-next [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1076848 (https://phabricator.wikimedia.org/T372605) (owner: 10Scott French)
[13:45:26] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+2] openstack: nova: fullstack: subscribe service to git clone [puppet] - 10https://gerrit.wikimedia.org/r/1092248 (https://phabricator.wikimedia.org/T379356) (owner: 10Arturo Borrero Gonzalez)
[13:45:45] <wikibugs>	 (03PS1) 10Ssingh: Revert^2 "cp7001: temporarily set check_min_fe_mem to true" [puppet] - 10https://gerrit.wikimedia.org/r/1092252
[13:46:36] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] memcached: add mc-gp100[4-6] gutter servers [puppet] - 10https://gerrit.wikimedia.org/r/1092243 (https://phabricator.wikimedia.org/T377033) (owner: 10Effie Mouzeli)
[13:46:48] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] memcached: add mc-gp100[4-6] gutter servers [puppet] - 10https://gerrit.wikimedia.org/r/1092243 (https://phabricator.wikimedia.org/T377033) (owner: 10Effie Mouzeli)
[13:46:54] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[13:46:57] <logmsgbot>	 !log jelto@deploy2002 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[13:47:51] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert^2 "cp7001: temporarily set check_min_fe_mem to true" [puppet] - 10https://gerrit.wikimedia.org/r/1092252 (owner: 10Ssingh)
[13:47:57] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Upgrade IDPs to CAS 6.6/Bullseye and enable webauthn - https://phabricator.wikimedia.org/T305518#10331270 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This old task can be closed, the update to CAS 6.6 was resolved with T311235 and th...
[13:48:18] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082726 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[13:48:28] <logmsgbot>	 !log jelto@deploy2002 helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
[13:49:09] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1021:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:49:24] <logmsgbot>	 !log jelto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
[13:49:35] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082726 (https://phabricator.wikimedia.org/T364460) (owner: 10Wangombe)
[13:49:52] <logmsgbot>	 !log jelto@deploy2002 helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
[13:50:32] <logmsgbot>	 !log jelto@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
[13:54:05] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Adapt WMF theming for webauthn - https://phabricator.wikimedia.org/T380172 (10MoritzMuehlenhoff) 03NEW
[13:54:18] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Adapt WMF theming for webauthn - https://phabricator.wikimedia.org/T380172#10331304 (10MoritzMuehlenhoff) p:05Triage→03Medium
[13:56:33] <wikibugs>	 (03PS2) 10Ssingh: trafficserver: remove inbound TLS and related settings [puppet] - 10https://gerrit.wikimedia.org/r/1091748
[13:57:11] <wikibugs>	 (03CR) 10Ssingh: trafficserver: remove inbound TLS and related settings (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1091748 (owner: 10Ssingh)
[13:58:14] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4544/co" [puppet] - 10https://gerrit.wikimedia.org/r/1091748 (owner: 10Ssingh)
[13:58:15] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select data store for webauthn devices - https://phabricator.wikimedia.org/T380173 (10MoritzMuehlenhoff) 03NEW
[13:58:20] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select data store for webauthn devices - https://phabricator.wikimedia.org/T380173#10331325 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:02:05] <wikibugs>	 (03PS5) 10Volans: mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855
[14:02:05] <wikibugs>	 (03PS5) 10Volans: mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856
[14:02:06] <wikibugs>	 (03PS2) 10Volans: mysql: make fetch_one_row return always a dict [software/spicerack] - 10https://gerrit.wikimedia.org/r/1091278
[14:02:06] <wikibugs>	 (03PS1) 10Volans: mysql_legacy: improve DRY-RUN support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092253
[14:04:36] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
[14:09:01] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
[14:09:17] <wikibugs>	 (03PS3) 10Volans: sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860
[14:09:18] <wikibugs>	 (03PS2) 10Volans: Adapt to new Spicerack API renaming mysql_legacy [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861
[14:10:44] <wikibugs>	 (03CR) 10Xcollazo: "I didn't see Iceberg being put in the `/jars` folder of this Spark distribution?" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092194 (https://phabricator.wikimedia.org/T380035) (owner: 10Btullis)
[14:11:03] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
[14:11:07] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:11:31] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:13:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: confd_prometheus_metrics.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:15:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adapt to new Spicerack API renaming mysql_legacy [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861 (owner: 10Volans)
[14:15:21] <claime>	 bgp issues are probably me putting a k8s node into failed
[14:15:33] <jinxer-wm>	 FIRING: KubernetesCalicoDown: wikikube-worker1306.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=wikikube-worker1306.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[14:15:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[14:15:55] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "found a typo, otherwise lgtm!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092253 (owner: 10Volans)
[14:16:34] <claime>	 !log running homer 'cr*-eqiad' 'T379454'
[14:16:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:39] <stashbot>	 T379454: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454
[14:18:25] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: confd_prometheus_metrics.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:18:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1306:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1306 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:19:17] <wikibugs>	 (03PS4) 10Effie Mouzeli: mediawiki: Add mwcron feature [deployment-charts] - 10https://gerrit.wikimedia.org/r/1076746 (https://phabricator.wikimedia.org/T341555) (owner: 10Clément Goubert)
[14:20:31] <wikibugs>	 (03CR) 10Effie Mouzeli: "missing chart bump" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1076746 (https://phabricator.wikimedia.org/T341555) (owner: 10Clément Goubert)
[14:27:12] <wikibugs>	 (03PS1) 10KartikMistry: Enable the Contribute menu in 3rd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092257 (https://phabricator.wikimedia.org/T375301)
[14:27:40] <wikibugs>	 (03PS1) 10Peter Fischer: CirrusSearch: enable offloading weighted tags via EventBus for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983)
[14:27:56] <Lucas_WMDE>	 jouncebot: now
[14:27:56] <jouncebot>	 For the next 0 hour(s) and 32 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1400)
[14:28:04] <Lucas_WMDE>	 did it not announce the beginning of the backport window? o_O
[14:28:25] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: confd_prometheus_metrics.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:28:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable the Contribute menu in 3rd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092257 (https://phabricator.wikimedia.org/T375301) (owner: 10KartikMistry)
[14:28:32] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1305-1312].eqiad.wmnet
[14:28:45] <Lucas_WMDE>	 anyway… if it’s okay with everyone else, I’d quite like to deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1091605 (cc TheresNoTime, Jdlrobson)
[14:29:02] <Lucas_WMDE>	 I could reproduce the issue, so I’d be comfortable testing it myself
[14:29:04] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983) (owner: 10Peter Fischer)
[14:29:06] <wikibugs>	 (03PS2) 10Volans: mysql_legacy: improve DRY-RUN support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092253
[14:29:06] <wikibugs>	 (03PS6) 10Volans: mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855
[14:29:06] <wikibugs>	 (03PS6) 10Volans: mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856
[14:29:07] <wikibugs>	 (03PS3) 10Volans: mysql: make fetch_one_row return always a dict [software/spicerack] - 10https://gerrit.wikimedia.org/r/1091278
[14:29:07] <Lucas_WMDE>	 and it sounds like users are getting antsy about it
[14:30:00] <wikibugs>	 (03PS1) 10Sbisson: Unified dashboard: Add UI for page collection recommendations [extensions/ContentTranslation] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092259 (https://phabricator.wikimedia.org/T368718)
[14:30:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1091605 (https://phabricator.wikimedia.org/T379983) (owner: 10Samtar)
[14:30:29] <Lucas_WMDE>	 I’‘ll go ahead and start the scap, there’s plenty of time during gate-and-submit if anyone wants to stop me :)
[14:31:11] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select optin method for webauthn - https://phabricator.wikimedia.org/T380178 (10MoritzMuehlenhoff) 03NEW
[14:31:21] <Lucas_WMDE>	 jouncebot: next
[14:31:21] <jouncebot>	 In 1 hour(s) and 58 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1630)
[14:31:26] <Lucas_WMDE>	 ok, good, there’s a break after this window
[14:31:33] <Lucas_WMDE>	 because the gate-and-submit might not finish in time otherwise :|
[14:32:15] <wikibugs>	 (03PS2) 10KartikMistry: Enable the Contribute menu in 3rd group of Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092257 (https://phabricator.wikimedia.org/T375301)
[14:32:21] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1305-1312].eqiad.wmnet
[14:32:51] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Select opt-in method for webauthn - https://phabricator.wikimedia.org/T380178#10331471 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:32:55] * Lucas_WMDE peeks at jouncebot’s logs
[14:33:50] <Lucas_WMDE>	 well, it says “Deploy timer kicked. Attempting to notify.”
[14:33:53] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Evaluate supported for trusted devices - https://phabricator.wikimedia.org/T380179 (10MoritzMuehlenhoff) 03NEW
[14:33:54] <Lucas_WMDE>	 at 14:00 UTC
[14:35:54] <kart_>	 I also wants to deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/1092259 :D
[14:36:08] <Lucas_WMDE>	 oh dear
[14:36:17] <Lucas_WMDE>	 i18n changes make for a very slow backport :/
[14:36:32] <kart_>	 Yeah, but it is some unbreak change :/
[14:36:41] <kart_>	 'UBN' :D
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:48] <Lucas_WMDE>	 > Medium
[14:36:49] <Lucas_WMDE>	 o_O
[14:37:09] * Lucas_WMDE looks at CI time of other CX changes
[14:37:37] <Lucas_WMDE>	 18 minutes
[14:37:45] <Lucas_WMDE>	 I don’t think we can fit that in before the core backport, then
[14:38:03] <Lucas_WMDE>	 I guess we can still do it out-of-window before the portals update…
[14:38:19] <kart_>	 I can wait, no issue. Dinner on the desk!
[14:38:48] <kart_>	 We need to check if portal updates are happening.
[14:40:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10331504 (10Jhancock.wm)
[14:40:57] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180 (10MoritzMuehlenhoff) 03NEW
[14:41:27] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896#10331515 (10Jhancock.wm)
[14:42:37] <Lucas_WMDE>	 filed T380181 for jouncebot’s issue FTR
[14:42:38] <stashbot>	 T380181: jouncebot did not announce 2024-11-18 UTC afternoon backport window for no apparent reason - https://phabricator.wikimedia.org/T380181
[14:43:54] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 #page on db2216 is OK: OK slave_sql_lag Replication lag: 5.17 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:44:09] <XioNoX>	 welcome back db2216
[14:47:56] <wikibugs>	 (03CR) 10Volans: [C:03+2] mysql_legacy: improve DRY-RUN support (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092253 (owner: 10Volans)
[14:49:37] <wikibugs>	 10ops-eqiad, 06DC-Ops: Inbound interface errors - https://phabricator.wikimedia.org/T380182 (10phaultfinder) 03NEW
[14:51:03] <kart_>	 Lucas_WMDE: we need to wait till core change is merged, right? Can I do +2 to my change after that or wait till deployment is over?
[14:51:22] <Lucas_WMDE>	 kart_: you can +2 it once the deployment for the core change has properly started, I’d say
[14:51:37] <wikibugs>	 (03CR) 10Urbanecm: [C:04-1] CirrusSearch: enable offloading weighted tags via EventBus for testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983) (owner: 10Peter Fischer)
[14:51:38] <Lucas_WMDE>	 if you +2 it now there’s a risk it’ll merge before the core change, and then get included in that deployment, which we don’t want
[14:51:42] <wikibugs>	 (03PS4) 10Elukey: redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193
[14:52:07] <wikibugs>	 (03CR) 10Elukey: redfish: add response logging for request() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[14:52:17] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2150 slowly with 10 steps - slow repool db2150 T380117
[14:52:20] <stashbot>	 T380117: Corrupt index on db2150 - https://phabricator.wikimedia.org/T380117
[14:52:53] <kart_>	 Right Lucas_WMDE
[14:53:10] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091197 (https://phabricator.wikimedia.org/T354939) (owner: 10Urbanecm)
[14:53:29] <kart_>	 Lucas_WMDE: Please ping me when it starts.. I would `git fetch dinner` meanwhile..
[14:53:45] <Lucas_WMDE>	 okay :)
[14:54:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T379668#10331550 (10phaultfinder)
[14:56:14] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM, possible idea inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[14:56:43] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.pool db2216 slowly with 10 steps - slow motion repool T380131
[14:56:46] <stashbot>	 T380131: Corrupt index on db2216 - https://phabricator.wikimedia.org/T380131
[14:56:47] <logmsgbot>	 !log arnaudb@cumin1002 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2216 slowly with 10 steps - slow motion repool T380131
[14:57:52] <wikibugs>	 (03Merged) 10jenkins-bot: mysql_legacy: improve DRY-RUN support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092253 (owner: 10Volans)
[14:59:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'manual repool commit', diff saved to https://phabricator.wikimedia.org/P71076 and previous config saved to /var/cache/conftool/dbconfig/20241118-145946-arnaudb.json
[15:00:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71077 and previous config saved to /var/cache/conftool/dbconfig/20241118-150020-arnaudb.json
[15:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:03:13] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Allow other input and changes to trigger searchsuggestions to update" [core] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1091605 (https://phabricator.wikimedia.org/T379983) (owner: 10Samtar)
[15:03:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: confd_prometheus_metrics.service on wikikube-worker1306:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:03:30] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1091605|Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]]
[15:03:35] <stashbot>	 T379983: RangeError: Maximum call stack size exceeded in mediawiki.searchSuggest - https://phabricator.wikimedia.org/T379983
[15:05:32] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Evaluate supported for trusted devices - https://phabricator.wikimedia.org/T380179#10331597 (10MoritzMuehlenhoff) p:05Triage→03Medium
[15:05:44] <wikibugs>	 06SRE, 10CAS-SSO, 06Infrastructure-Foundations: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10331598 (10MoritzMuehlenhoff) p:05Triage→03Medium
[15:06:20] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "+2ing ahead of deployment" [extensions/ContentTranslation] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092259 (https://phabricator.wikimedia.org/T368718) (owner: 10Sbisson)
[15:06:24] <Lucas_WMDE>	 kart_: ^ fyi
[15:06:32] <Lucas_WMDE>	 (core deployment still ongoing)
[15:06:38] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 samtar, lucaswerkmeister-wmde: Backport for [[gerrit:1091605|Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:06:42] <Lucas_WMDE>	 testing…
[15:06:44] <kart_>	 Thanks!
[15:07:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 samtar, lucaswerkmeister-wmde: Continuing with sync
[15:07:10] <Lucas_WMDE>	 yup, fixes the weird search arrow key issue at least
[15:09:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T379668#10331608 (10phaultfinder)
[15:11:45] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1091605|Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] (duration: 08m 14s)
[15:11:57] <stashbot>	 T379983: RangeError: Maximum call stack size exceeded in mediawiki.searchSuggest - https://phabricator.wikimedia.org/T379983
[15:13:27] <wikibugs>	 (03CR) 10TChin: [C:03+2] EventStreamConfig: Enable Hive Ingestion for most streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[15:14:55] <wikibugs>	 (03Merged) 10jenkins-bot: EventStreamConfig: Enable Hive Ingestion for most streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[15:16:41] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/ContentTranslation] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092259 (https://phabricator.wikimedia.org/T368718) (owner: 10Sbisson)
[15:17:30] <Lucas_WMDE>	 kart_: ^ fyi
[15:17:43] <Lucas_WMDE>	 scap backport is running now (and waiting for the merge)
[15:18:31] <Lucas_WMDE>	 though I don’t know what happens to TChin’s config change above…
[15:19:04] <Lucas_WMDE>	 (I don’t see any other scap locks being held, at least)
[15:21:37] <kart_>	 Nice!
[15:22:25] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] "Oops, didn't realise I hadn't +1ed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1088271 (https://phabricator.wikimedia.org/T376438) (owner: 10Jgiannelos)
[15:23:00] <Lucas_WMDE>	 tchin: is it okay to deploy your EventStreamConfig change?
[15:23:14] <Lucas_WMDE>	 because IIUC, it will be included in my ongoing backport (unless it gets reverted in the meantime)
[15:24:11] <wikibugs>	 (03PS1) 10Hnowlan: team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559)
[15:26:45] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm
[15:26:55] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Note: if I’m not mistaken, I’m about to deploy this as part of the backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[15:27:50] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm
[15:28:39] <Lucas_WMDE>	 CI almost finished, apparently
[15:28:47] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:29:04] <wikibugs>	 (03Merged) 10jenkins-bot: Unified dashboard: Add UI for page collection recommendations [extensions/ContentTranslation] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092259 (https://phabricator.wikimedia.org/T368718) (owner: 10Sbisson)
[15:29:27] <Lucas_WMDE>	 “The following are unexpected commits pulled from origin for /srv/mediawiki-staging”
[15:29:28] <Lucas_WMDE>	 there it is
[15:29:39] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm
[15:29:55] <Lucas_WMDE>	 I guess I’ll go ahead with that in a moment if I don’t hear anything else
[15:30:15] <kart_>	 ah.
[15:30:39] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm
[15:30:59] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1092259|Unified dashboard: Add UI for page collection recommendations (T368718)]]
[15:31:13] <stashbot>	 T368718: Community-defined Translation Collections: Single selection mode UI - https://phabricator.wikimedia.org/T368718
[15:31:17] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
[15:31:53] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm
[15:33:18] <wikibugs>	 (03CR) 10Scott French: team-sre: add thumbor alert for pods with high error rates (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:33:19] <icinga-wm>	 PROBLEM - BGP status on lsw1-e5-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:34:17] <icinga-wm>	 PROBLEM - BGP status on lsw1-e6-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:34:17] <icinga-wm>	 PROBLEM - BGP status on lsw1-e7-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv4: Connect - kubernetes-eqiad, AS64601/IPv6: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:35:15] <icinga-wm>	 PROBLEM - BGP status on lsw1-f5-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:36:02] <claime>	 that's my reimages, no worries
[15:36:13] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm
[15:36:51] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm
[15:39:19] <icinga-wm>	 PROBLEM - BGP status on lsw1-f6-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:39:45] <stephanebisson>	 Hi, I can test gerrit:1092259 when it's on a test server, let me know
[15:39:51] <icinga-wm>	 PROBLEM - BGP status on lsw1-f7-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:40:16] <kart_>	 Lucas_WMDE: still syncing to mwdebugs? :/
[15:40:36] <Lucas_WMDE>	 yup
[15:40:50] <Lucas_WMDE>	 i18n changes mean a big image diff, IIUC
[15:40:52] <wikibugs>	 (03PS1) 10Ssingh: Revert^3 "cp7001: temporarily set check_min_fe_mem to true" [puppet] - 10https://gerrit.wikimedia.org/r/1092267
[15:41:20] <wikibugs>	 (03PS2) 10Hnowlan: team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559)
[15:41:33] <wikibugs>	 (03CR) 10Hnowlan: team-sre: add thumbor alert for pods with high error rates (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:41:45] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert^3 "cp7001: temporarily set check_min_fe_mem to true" [puppet] - 10https://gerrit.wikimedia.org/r/1092267 (owner: 10Ssingh)
[15:41:54] <wikibugs>	 06SRE, 10Observability-Alerting, 06Traffic: PuppetFailure alert is not being fired for host(s) where agent has failed - https://phabricator.wikimedia.org/T379807#10331748 (10ssingh) 05Open→03Resolved a:03ssingh ` 10:38:48 < jinxer-wm> FIRING: PuppetZeroResources: Puppet has failed generate resource...
[15:42:50] <wikibugs>	 (03CR) 10Effie Mouzeli: team-sre: add thumbor alert for pods with high error rates (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:42:59] <wikibugs>	 06SRE, 10Bitu, 06Infrastructure-Foundations: Allow to provide links for Bitu permissions - https://phabricator.wikimedia.org/T379926#10331754 (10SLyngshede-WMF) p:05Triage→03Low
[15:45:17] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
[15:45:37] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 sbisson, lucaswerkmeister-wmde: Backport for [[gerrit:1092259|Unified dashboard: Add UI for page collection recommendations (T368718)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:45:43] <stashbot>	 T368718: Community-defined Translation Collections: Single selection mode UI - https://phabricator.wikimedia.org/T368718
[15:45:51] <Lucas_WMDE>	 kart_ / stephanebisson: please test :)
[15:46:19] <stephanebisson>	 Which server do I pick in the browser extension?
[15:46:25] <wikibugs>	 (03CR) 10Elukey: [C:03+2] redfish: add response logging for request() (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[15:46:39] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
[15:47:09] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] debug.json: add support for mwdebug-next [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1076848 (https://phabricator.wikimedia.org/T372605) (owner: 10Scott French)
[15:47:14] <kart_>	 stephanebisson: You can pick mwdebug1001/1002/2001/2002 either of these
[15:47:50] <stephanebisson>	 kart_ Lucas_WMDE Working fine AFAICT
[15:47:57] <claime>	 you should be testing on k8s actually
[15:48:10] <Lucas_WMDE>	 yeah, k8s-mwdebug is the one to pick most of the time
[15:48:15] <claime>	 mwdebugs are going away in the near-ish future
[15:48:31] <Lucas_WMDE>	 (though changes still get deployed to them at the moment, so you *can* also test there IIUC)
[15:48:32] <claime>	 and 100% of client-facing prod is on k8s
[15:48:39] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
[15:48:39] <wikibugs>	 (03CR) 10Scott French: [C:03+1] team-sre: add thumbor alert for pods with high error rates (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:48:51] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10331780 (10cmooney) p:05Triage→03Medium
[15:48:51] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
[15:48:53] <claime>	 yes, they are still scap targets and get the new code, so it will work
[15:49:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 sbisson, lucaswerkmeister-wmde: Continuing with sync
[15:49:08] <Lucas_WMDE>	 anyway, I’ll continue
[15:49:11] <kart_>	 stephanebisson: cool!
[15:49:11] <Lucas_WMDE>	 it’ll take long enough
[15:49:11] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:49:13] <Lucas_WMDE>	 jouncebot: next
[15:49:13] <jouncebot>	 In 0 hour(s) and 40 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1630)
[15:49:18] <wikibugs>	 (03CR) 10TChin: [C:03+2] "That's perfectly fine, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1089967 (https://phabricator.wikimedia.org/T369845) (owner: 10TChin)
[15:49:20] <Lucas_WMDE>	 ah ok, it’s at half past not at the full hour
[15:49:22] <Lucas_WMDE>	 should finish in time then
[15:49:53] <kart_>	 claime: That's new info :) Please let wikitech-l know also!
[15:49:53] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
[15:50:36] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
[15:51:00] <wikibugs>	 (03CR) 10Hnowlan: team-sre: add thumbor alert for pods with high error rates (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:51:10] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:51:27] <claime>	 kart_: I could have sworn we'd sent out an email about mwdebug targets but apparently not since we went to 1% of global traffic...
[15:51:41] <Lucas_WMDE>	 https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/2DXHPFD22DUO2EWNL6AVMYF74VPDBYQM/ was the most recent relevant email I found
[15:51:46] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
[15:51:51] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
[15:51:52] <Lucas_WMDE>	 https://wikitech.wikimedia.org/wiki/WikimediaDebug#Available_backends also sounds more outdated than I realized :/
[15:52:18] <claime>	 About the end of mwdebugs we are not really at the announcement stage yet but it'll come, we'll make an announcement in due time
[15:52:21] <wikibugs>	 (03Merged) 10jenkins-bot: team-sre: add thumbor alert for pods with high error rates [alerts] - 10https://gerrit.wikimedia.org/r/1092265 (https://phabricator.wikimedia.org/T379559) (owner: 10Hnowlan)
[15:52:24] <MatmaRex>	 hi folks, i have some puppet config and MW config changes - all beta-only - that would ideally be deployed around the same time: https://gerrit.wikimedia.org/r/q/bug:T379811 is that possible, or should i just schedule them for their separate windows?
[15:52:25] <stashbot>	 T379811: Update URL structure for SUL3 shared domain - https://phabricator.wikimedia.org/T379811
[15:52:35] <kart_>	 claime: Thanks!
[15:52:57] <dancy>	 MatmaxRex: `scap backport` should handle beta-only changes efficiently
[15:53:00] <claime>	 there will be an announcement for mwdebug-next soon, I think we can group all mwdebug target info in there
[15:53:48] <claime>	 Lucas_WMDE: I'll update the available backend section after my meeting, thanks for pointing it out
[15:54:01] <Lucas_WMDE>	 sounds good, thanks!
[15:54:31] <Lucas_WMDE>	 (I’d try it myself but I have no idea if the other WikimediaDebug features work on k8s by now or not, so happy to leave that to you)
[15:54:33] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
[15:54:42] <Lucas_WMDE>	 kart_: deployment is ongoing fyi (53% rn)
[15:55:11] <MatmaRex>	 dancy: puppet too?
[15:55:13] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
[15:55:42] <dancy>	 MatmaRex: Ah, didn't realize you were referenced puppet changes.  Disregard. :-)
[15:56:02] <kart_>	 Lucas_WMDE: noted!
[15:56:02] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
[15:56:25] <claime>	 Lucas_WMDE: xhgui, excimer etc. worj
[15:56:29] <claime>	 work*
[15:56:37] <Lucas_WMDE>	 nice
[15:56:38] <claime>	 I have to check the verbose logging one
[15:57:57] <wikibugs>	 (03Merged) 10jenkins-bot: redfish: add response logging for request() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092193 (owner: 10Elukey)
[15:58:06] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
[15:58:16] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1092259|Unified dashboard: Add UI for page collection recommendations (T368718)]] (duration: 27m 17s)
[15:58:19] <stashbot>	 T368718: Community-defined Translation Collections: Single selection mode UI - https://phabricator.wikimedia.org/T368718
[15:58:23] * Lucas_WMDE done deploying
[15:58:35] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:58:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:04] <kart_>	 Thanks a lot Lucas_WMDE!
[16:01:17] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
[16:01:41] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading: Unable to obtain exclusive write permission. Someone else is doing something with this file. - https://phabricator.wikimedia.org/T379234#10331888 (10Aklapper)
[16:03:49] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10331908 (10RobH)
[16:04:35] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
[16:06:53] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
[16:07:08] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
[16:07:25] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v8.16.2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092278
[16:08:35] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm
[16:10:04] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1092250 (https://phabricator.wikimedia.org/T378921) (owner: 10Muehlenhoff)
[16:10:36] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm
[16:11:13] <wikibugs>	 (03CR) 10Volans: [C:03+2] CHANGELOG: add changelogs for release v8.16.2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1092278 (owner: 10Volans)
[16:11:31] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
[16:12:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Update site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1092250 (https://phabricator.wikimedia.org/T378921) (owner: 10Muehlenhoff)
[16:12:42] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:13:20] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
[16:13:25] <icinga-wm>	 RECOVERY - BGP status on lsw1-e7-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:14:02] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
[16:16:01] <icinga-wm>	 RECOVERY - BGP status on lsw1-f7-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:16:25] <icinga-wm>	 RECOVERY - BGP status on lsw1-e6-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:16:53] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm
[16:17:42] <jinxer-wm>	 RESOLVED: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:18:07] <icinga-wm>	 RECOVERY - BGP status on lsw1-e5-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:18:23] <icinga-wm>	 RECOVERY - BGP status on lsw1-f5-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:18:41] <icinga-wm>	 RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 670, down: 7, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:18:45] <icinga-wm>	 RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 713, down: 13, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:18:46] <wikibugs>	 (03CR) 10Bking: "Re: link to conversation, it's in #wikimedia-k8s-sig IRC channel. Exact quote: " IIRC gets you two things basically, ProbeDown alerts for " [puppet] - 10https://gerrit.wikimedia.org/r/1090977 (https://phabricator.wikimedia.org/T365659) (owner: 10Bking)
[16:18:48] <wikibugs>	 (03PS1) 10Effie Mouzeli: memcached: add mc-gp100[4-6] gutter servers to pool [puppet] - 10https://gerrit.wikimedia.org/r/1092280 (https://phabricator.wikimedia.org/T377033)
[16:19:01] <icinga-wm>	 PROBLEM - BGP status on lsw1-f7-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:19:19] <icinga-wm>	 RECOVERY - Disk space on wikikube-worker1306 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=wikikube-worker1306&var-datasource=eqiad+prometheus/ops
[16:19:36] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm
[16:19:51] <wikibugs>	 (03PS1) 10Volans: Upstream release v8.16.2 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1092281
[16:20:01] <icinga-wm>	 RECOVERY - BGP status on lsw1-f7-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:20:03] <wikibugs>	 (03CR) 10Volans: [C:03+2] Upstream release v8.16.2 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1092281 (owner: 10Volans)
[16:22:23] <icinga-wm>	 PROBLEM - BGP status on lsw1-f5-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:22:25] <icinga-wm>	 RECOVERY - BGP status on lsw1-f6-eqiad.mgmt is OK: BGP OK - up: 16, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:22:44] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm
[16:23:18] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for dbrant - https://phabricator.wikimedia.org/T379678#10332035 (10Seddon) Approved
[16:23:25] <icinga-wm>	 RECOVERY - BGP status on lsw1-f5-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:25:18] <wikibugs>	 (03PS1) 10Effie Mouzeli: memcached: add mc-gp200[4-6] gutter servers [puppet] - 10https://gerrit.wikimedia.org/r/1092282 (https://phabricator.wikimedia.org/T377033)
[16:25:57] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm
[16:26:27] <icinga-wm>	 PROBLEM - BGP status on lsw1-f6-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64601/IPv6: Connect - kubernetes-eqiad, AS64601/IPv4: Connect - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:27:27] <icinga-wm>	 RECOVERY - BGP status on lsw1-f6-eqiad.mgmt is OK: BGP OK - up: 16, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:27:28] <wikibugs>	 (03PS1) 10Muehlenhoff: Add ferm macro/nftables set for aux pods like for other k8s installations [puppet] - 10https://gerrit.wikimedia.org/r/1092283
[16:28:51] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading: Unable to obtain exclusive write permission. Someone else is doing something with this file. - https://phabricator.wikimedia.org/T379234#10332105 (10MatthewVernon) I'm afraid we don't keep swift logs far enough back to 7th November, so I can't provide an...
[16:30:05] <jouncebot>	 jan_drewniak: Time to snap out of that daydream and deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1630).
[16:30:15] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10332108 (10Papaul) maps-test2001 - ganeti2009 maps-test2002 - ganeti2010 maps-test2003 - ganeti2013 maps-test2004 - ganeti2014 maps-test2005 - gsneti2015 maps-test2001 - g...
[16:30:20] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm
[16:34:23] <volans>	 !log uploaded spicerack_8.16.2 to apt.wikimedia.org bullseye-wikimedia
[16:34:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:37] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet
[16:34:40] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet
[16:35:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] Add ferm macro/nftables set for aux pods like for other k8s installations [puppet] - 10https://gerrit.wikimedia.org/r/1092283 (owner: 10Muehlenhoff)
[16:37:49] <wikibugs>	 (03CR) 10CDanis: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1092283 (owner: 10Muehlenhoff)
[16:38:48] <volans>	 !log installing spicerack v8.16.2 on cumin2002
[16:38:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:42] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[16:50:54] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10332241 (10RobH) a:05RobH→03None
[16:50:58] <volans>	 !log installing spicerack v8.16.2 on cumin1002
[16:51:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:20] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10332236 (10RobH) 05Open→03Resolved a:03RobH @wiki_willy: I just wanted to notify you of this task's resolution and you'll see the N...
[16:54:27] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on db1246 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:54:41] <icinga-wm>	 PROBLEM - MariaDB read only s2 on db1246 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:55:09] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
[16:55:43] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
[16:55:43] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:55:50] <wikibugs>	 (03PS1) 10Effie Mouzeli: memcached: add mc-gp200[4-6] gutter servers to pool [puppet] - 10https://gerrit.wikimedia.org/r/1092290 (https://phabricator.wikimedia.org/T377033)
[16:57:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10332303 (10Jhancock.wm)
[16:58:33] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10332308 (10Jhancock.wm) 2163 is being a pain. gonna take a closer look today. failed during imaging but didn't catch the error.
[17:00:30] <wikibugs>	 (03PS1) 10Dreamy Jazz: [Beta] Re-enable IP masking on beta metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092292 (https://phabricator.wikimedia.org/T379108)
[17:01:30] <Dreamy_Jazz>	 jouncebot: nowandnext
[17:01:31] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 58 minute(s)
[17:01:31] <jouncebot>	 In 0 hour(s) and 58 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1800)
[17:01:31] <jouncebot>	 In 0 hour(s) and 58 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1800)
[17:01:41] <Dreamy_Jazz>	 Going to do a beta only deploy now if that's okay
[17:02:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092292 (https://phabricator.wikimedia.org/T379108) (owner: 10Dreamy Jazz)
[17:02:57] <wikibugs>	 (03Merged) 10jenkins-bot: [Beta] Re-enable IP masking on beta metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092292 (https://phabricator.wikimedia.org/T379108) (owner: 10Dreamy Jazz)
[17:09:01] <icinga-wm>	 PROBLEM - SSH on bast7001 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring
[17:10:01] <icinga-wm>	 RECOVERY - SSH on bast7001 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[17:21:45] <wikibugs>	 (03CR) 10Ebernhardson: [C:03+2] cirrus: Drop labtestwiki exclude [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091589 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah)
[17:22:51] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Drop labtestwiki exclude [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091589 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah)
[17:23:29] <wikibugs>	 06SRE, 06Editing-team, 10MediaWiki-Debug-Logger, 10observability, and 4 others: Flow internal error on frwiki not in logstash - https://phabricator.wikimedia.org/T371586#10332629 (10Urbanecm_WMF)
[17:24:39] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[17:25:49] <logmsgbot>	 !log xcollazo@deploy2002 Started deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755.
[17:25:52] <stashbot>	 T368755: Python job that reads from wmf_dumps.wikitext_inconsistent_row and produced reconciliation events. - https://phabricator.wikimedia.org/T368755
[17:27:59] <logmsgbot>	 !log xcollazo@deploy2002 Finished deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755. (duration: 02m 10s)
[17:30:30] <wikibugs>	 (03PS1) 10Urbanecm: [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380161)
[17:30:34] <wikibugs>	 (03PS2) 10Urbanecm: [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204)
[17:31:31] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:31:39] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:31:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[17:31:56] <wikibugs>	 (03PS1) 10Jdlrobson: Promote Vector 2022 as default on 3 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092296 (https://phabricator.wikimedia.org/T379765)
[17:33:15] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:34:11] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 08 Feb 2025 11:19:52 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:34:31] <wikibugs>	 (03PS3) 10Urbanecm: [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204)
[17:34:48] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[17:37:15] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:37:31] <wikibugs>	 (03PS1) 10Urbanecm: Create no-link-recommendation variant [extensions/GrowthExperiments] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092300 (https://phabricator.wikimedia.org/T377787)
[17:37:42] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: lsw-d[18]-codfw missing console port info in netbox - https://phabricator.wikimedia.org/T376917#10332761 (10Jhancock.wm) 05Open→03Resolved
[17:41:10] <wikibugs>	 (03CR) 10Urbanecm: [C:04-1] CirrusSearch: enable offloading weighted tags via EventBus for testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983) (owner: 10Peter Fischer)
[17:41:19] <wikibugs>	 (03PS2) 10Urbanecm: CirrusSearch: enable offloading weighted tags via EventBus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983) (owner: 10Peter Fischer)
[17:41:27] <wikibugs>	 (03CR) 10Urbanecm: CirrusSearch: enable offloading weighted tags via EventBus (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092258 (https://phabricator.wikimedia.org/T378983) (owner: 10Peter Fischer)
[17:43:05] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 08 Feb 2025 11:19:52 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:43:23] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52922 bytes in 0.113 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:43:33] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:47:37] <icinga-wm>	 PROBLEM - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is CRITICAL: CRITICAL: /usr/bin/env PYTHONPATH=/srv/deployment/airflow-dags/analytics AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet did not succeed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow
[17:49:37] <icinga-wm>	 RECOVERY - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is OK: OK: /usr/bin/env PYTHONPATH=/srv/deployment/airflow-dags/analytics AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet succeeded https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow
[17:50:28] <wikibugs>	 06SRE, 10envoy, 06serviceops, 06Traffic: Upgrade Envoy to >= 1.24 - https://phabricator.wikimedia.org/T380211 (10JMeybohm) 03NEW
[17:50:41] <wikibugs>	 06SRE, 10envoy, 06serviceops, 06Traffic: Upgrade Envoy to >= 1.24 - https://phabricator.wikimedia.org/T380211#10332849 (10JMeybohm)
[17:50:49] <wikibugs>	 06SRE, 10envoy, 06serviceops, 06Traffic, 13Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324#10332850 (10JMeybohm)
[17:51:07] <wikibugs>	 (03PS1) 10Bvibber: Use WAN cache for JsonConfig remote fetch cache [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746)
[17:51:35] <wikibugs>	 06SRE, 10envoy, 06serviceops, 07Kubernetes, 07Service-Architecture: Upgrade envoy configuration to use the v3 API - https://phabricator.wikimedia.org/T265880#10332798 (10JMeybohm) 05Open→03Resolved a:03JMeybohm I believe this is done https://gerrit.wikimedia.org/r/c/operations/puppet/+/754460
[17:51:40] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 18 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[17:52:21] <wikibugs>	 (03PS2) 10Jdlrobson: Promote Vector 2022 as default on 3 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092296 (https://phabricator.wikimedia.org/T379765)
[17:53:28] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
[17:53:34] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10332867 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye
[17:54:37] <wikibugs>	 (03CR) 10Stoyofuku-wmf: [C:03+1] "This looks correct to me!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092296 (https://phabricator.wikimedia.org/T379765) (owner: 10Jdlrobson)
[17:57:36] <wikibugs>	 (03PS1) 10Papaul: Add test maps nodes to site.pp and preseed.yaml file [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144)
[17:58:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add test maps nodes to site.pp and preseed.yaml file [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144) (owner: 10Papaul)
[17:59:07] <wikibugs>	 (03PS3) 10Bking: dse-k8s-services: introduce Blunderbuss config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994)
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1800)
[18:00:05] <jouncebot>	 ryankemper: That opportune time for a Wikidata Query Service weekly deploy deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1800).
[18:00:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dse-k8s-services: introduce Blunderbuss config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994) (owner: 10Bking)
[18:01:58] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:02:04] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092300 (https://phabricator.wikimedia.org/T377787) (owner: 10Urbanecm)
[18:02:14] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[18:03:09] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:03:38] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:04:11] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:08:45] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:09:21] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:09:27] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10332939 (10eoghan) We had a quick chat with ITS today where they disabled the change that caused the routing to change, an...
[18:11:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Use WAN cache for JsonConfig remote fetch cache [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[18:12:32] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
[18:12:39] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10332960 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye ex...
[18:13:47] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:14:08] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:15:24] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:15:28] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:16:22] <wikibugs>	 (03CR) 10Bvibber: "recheck" [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[18:17:11] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[18:24:14] <wikibugs>	 (03PS1) 10Scott French: mw-debug: remove replicas override on -next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092309 (https://phabricator.wikimedia.org/T372604)
[18:26:34] <wikibugs>	 (03PS1) 10Bking: dse-k8s: raise quota for blunderbuss [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092311 (https://phabricator.wikimedia.org/T371994)
[18:27:47] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[18:35:06] <wikibugs>	 (03PS2) 10Scott French: scap: add mw-debug "next" testservers check [puppet] - 10https://gerrit.wikimedia.org/r/1087984 (https://phabricator.wikimedia.org/T372604)
[18:36:49] <wikibugs>	 (03PS2) 10Papaul: Add test maps nodes to site.pp and preseed.yaml file [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144)
[18:37:36] <swfrench-wmf>	 jouncebot: nowandnext
[18:37:36] <jouncebot>	 For the next 0 hour(s) and 22 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T1800)
[18:37:36] <jouncebot>	 In 2 hour(s) and 22 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T2100)
[18:39:29] <swfrench-wmf>	 FYI, I'm going to make a minor helmfile-only change to the mw-debug "next" deployments
[18:39:39] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mw-debug: remove replicas override on -next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092309 (https://phabricator.wikimedia.org/T372604) (owner: 10Scott French)
[18:40:02] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[18:40:24] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] scap: add mw-debug "next" testservers check [puppet] - 10https://gerrit.wikimedia.org/r/1087984 (https://phabricator.wikimedia.org/T372604) (owner: 10Scott French)
[18:40:49] <wikibugs>	 (03Merged) 10jenkins-bot: mw-debug: remove replicas override on -next [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092309 (https://phabricator.wikimedia.org/T372604) (owner: 10Scott French)
[18:41:58] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[18:43:32] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[18:45:07] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[18:46:05] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[18:50:43] <wikibugs>	 (03CR) 10Bking: [C:03+2] "self-merging, as this does not affect production services" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092311 (https://phabricator.wikimedia.org/T371994) (owner: 10Bking)
[18:54:12] <wikibugs>	 (03Merged) 10jenkins-bot: dse-k8s: raise quota for blunderbuss [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092311 (https://phabricator.wikimedia.org/T371994) (owner: 10Bking)
[18:55:52] <wikibugs>	 (03CR) 10Aleksandar Mastilovic: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092311 (https://phabricator.wikimedia.org/T371994) (owner: 10Bking)
[18:56:59] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[18:57:44] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[18:58:06] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[19:00:05] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06serviceops: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10333210 (10Jclark-ctr) Opened ticket with Dell  Advised of i/o errors on sda  and uploaded tsr report   ` [Sat Nov  9 08:53:19 2024] blk_update_request: I/O error, dev sda, sector 0 op 0x1:(...
[19:01:03] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06serviceops: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10333216 (10Jclark-ctr) Confirmed: Service Request 201149035
[19:06:22] <swfrench-wmf>	 FYI, unless there are any objections, I'll be making a second mw-debug related change that will require a noop scap deployment. this will happen in 5-10 minutes.
[19:07:06] <wikibugs>	 (03CR) 10Scott French: [C:03+2] scap: add mw-debug "next" testservers check [puppet] - 10https://gerrit.wikimedia.org/r/1087984 (https://phabricator.wikimedia.org/T372604) (owner: 10Scott French)
[19:08:17] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[19:15:20] <swfrench-wmf>	 moving ahead with the noop scap deployment to test
[19:15:38] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: Test deployment after adding mwdebug-next check command - T372604
[19:15:42] <stashbot>	 T372604: Turn up PHP 8.1-flavored mw-debug k8s deployment - https://phabricator.wikimedia.org/T372604
[19:15:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
[19:17:10] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: Test deployment after adding mwdebug-next check command - T372604 (duration: 01m 31s)
[19:17:35] <swfrench-wmf>	 all done on my end
[19:17:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10333264 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2163.codfw.wmnet with OS bookworm
[19:18:28] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:18:30] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:18:32] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:21:43] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[19:22:35] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.194 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[19:22:59] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:28:45] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:29:59] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:32:29] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2037']
[19:33:05] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2113']
[19:33:51] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2037']
[19:33:52] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2113']
[19:34:08] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
[19:35:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host restbase2037.codfw.wmnet with OS bullseye
[19:35:20] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host elastic2113.codfw.wmnet with OS bullseye
[19:35:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333378 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host elastic2113.co...
[19:35:56] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896#10333377 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host restbase2037.codfw.wmnet with OS bullseye
[19:36:35] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye
[19:36:45] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333390 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host elastic2112.co...
[19:37:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
[19:42:04] <wikibugs>	 (03PS1) 10Ssingh: P:hardware::check: add profile to check HW configuration [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724)
[19:42:51] <wikibugs>	 (03PS1) 10D3r1ck01: [SUL3] varnish: Split frontend cache on `sul3OptIn` cookie [puppet] - 10https://gerrit.wikimedia.org/r/1092323 (https://phabricator.wikimedia.org/T375788)
[19:44:37] <wikibugs>	 (03PS2) 10Ssingh: P:hardware::check: add profile to check HW configuration [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724)
[19:45:44] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4546/console" [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724) (owner: 10Ssingh)
[19:46:27] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "Updating hiera with incorrect data to fail PCC." [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724) (owner: 10Ssingh)
[19:46:55] <wikibugs>	 (03PS3) 10Ssingh: P:hardware::check: add profile to check HW configuration [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724)
[19:48:06] <wikibugs>	 (03CR) 10Ssingh: "Error: Could not call 'find' on 'catalog': Evaluation Error: Error while evaluating a Function Call, HW config check error: cpu_core_count" [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724) (owner: 10Ssingh)
[19:48:58] <wikibugs>	 (03PS4) 10Ssingh: P:hardware::check: add profile to check HW configuration [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724)
[19:50:00] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4548/console" [puppet] - 10https://gerrit.wikimedia.org/r/1092324 (https://phabricator.wikimedia.org/T378724) (owner: 10Ssingh)
[19:51:06] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
[19:51:17] <wikibugs>	 (03CR) 10Muehlenhoff: Add test maps nodes to site.pp and preseed.yaml file (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144) (owner: 10Papaul)
[19:52:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
[19:54:24] <logmsgbot>	 !log ebernhardson@deploy2002 Started deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5
[19:54:30] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
[19:54:48] <stashbot>	 T377153: Migrate Glent to Gitlab for publication of artifacts - https://phabricator.wikimedia.org/T377153
[19:54:52] <logmsgbot>	 !log ebernhardson@deploy2002 Finished deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5 (duration: 00m 27s)
[19:55:50] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[19:56:49] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[19:56:50] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm
[19:56:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10333502 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2163.codfw.wmnet with OS bookworm completed: - wi...
[19:57:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
[19:57:52] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
[19:58:33] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10333508 (10Jhancock.wm)
[19:58:42] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10333509 (10Jhancock.wm) @Clement_Goubert last batch done!
[20:00:56] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
[20:03:02] <wikibugs>	 (03PS3) 10Papaul: Add test maps nodes to site.pp and preseed.yaml file [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144)
[20:04:03] <wikibugs>	 10ops-codfw, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T380228 (10phaultfinder) 03NEW
[20:07:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144) (owner: 10Papaul)
[20:11:04] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:11:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T379668#10333568 (10phaultfinder)
[20:12:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:12:18] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2113.codfw.wmnet with OS bullseye
[20:12:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333569 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host elastic2113.codfw....
[20:14:49] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:18:56] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10333580 (10jhathaway) >>! In T380009#10332939, @eoghan wrote: > We had a quick chat with ITS today where they disabled the...
[20:19:24] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:19:25] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2112.codfw.wmnet with OS bullseye
[20:19:34] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333581 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host elastic2112.codfw....
[20:19:43] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:19:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333582 (10Jhancock.wm)
[20:20:24] <wikibugs>	 06SRE-OnFire, 10Incident Tooling, 13Patch-For-Review: corto: failure to create google doc should not be fatal - https://phabricator.wikimedia.org/T379858#10333583 (10Eevans) Done ([[ https://gitlab.wikimedia.org/repos/sre/corto/-/commit/4c0104f0581b6db91b9d379163abcc50b504d20d | 4c0104f ]]).
[20:20:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10333584 (10Jhancock.wm) need to double check the mgmt port on 2110
[20:23:20] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[20:23:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2037.codfw.wmnet with OS bullseye
[20:23:23] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10333590 (10revi) >>! In T380009#10332939, @eoghan wrote: > We had a quick chat with ITS today where they disabled the chan...
[20:23:27] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896#10333591 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host restbase2037.codfw.wmnet with OS bullseye completed: - restbase203...
[20:23:48] <wikibugs>	 06SRE-OnFire, 10Incident Tooling, 13Patch-For-Review: corto: failure to create google doc should not be fatal - https://phabricator.wikimedia.org/T379858#10333585 (10Eevans) 05Open→03Resolved a:03Eevans
[20:23:52] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896#10333592 (10Jhancock.wm)
[20:25:35] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install restbase203[6-8] - https://phabricator.wikimedia.org/T377896#10333593 (10Jhancock.wm) 05Open→03Resolved @Eevans this is complete!
[20:25:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10333608 (10Jhancock.wm) 05Open→03Resolved
[20:26:34] <wikibugs>	 (03CR) 10Papaul: [C:03+2] Add test maps nodes to site.pp and preseed.yaml file (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1092305 (https://phabricator.wikimedia.org/T380144) (owner: 10Papaul)
[20:29:07] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
[20:29:15] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333634 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye
[20:30:07] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 30015952 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:31:07] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[20:33:07] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[20:37:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
[20:37:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
[20:37:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:37:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:39:09] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
[20:39:14] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye ex...
[20:39:32] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:39:34] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:39:36] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:39:38] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:39:40] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:39:41] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:42:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:49:47] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
[20:49:54] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10333742 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host maps-test2001.codfw.wmnet with OS bookworm
[20:49:58] <jhathaway>	 !log disabling auto-reboot on re-imaging for debugging
[20:50:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:06] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
[20:51:13] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333743 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye
[20:51:58] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
[20:52:07] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333744 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bullseye ex...
[20:52:41] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
[20:52:56] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333753 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T2100).
[21:00:05] <jouncebot>	 MatmaRex and bvibber: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:16] <bvibber>	 o/
[21:00:20] <MatmaRex>	 hi
[21:01:15] <MatmaRex>	 dear deployer: my patches should all go out together, affect the beta cluster only, and can't be tested (because they depend on a puppet patch to function correctly, which is scheduled for the next window tomorrow). thanks
[21:01:39] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:01:52] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
[21:02:29] <bvibber>	 my patch is cleanup for multi-dc caching so can't be tested on debug servers :D
[21:03:13] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333768 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm ex...
[21:03:38] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
[21:04:27] * TheresNoTime can't deploy this evening ^^ hopefully another deployer appears shortly
[21:04:28] <bvibber>	 hm, actually i might, there's separate servers for each
[21:04:34] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333770 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm
[21:04:34] <bvibber>	 <3
[21:10:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:10:51] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:10:58] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:11:08] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:14:54] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:15:11] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2041']
[21:15:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2041']
[21:15:33] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2042']
[21:15:44] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2042']
[21:16:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
[21:16:34] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10333798 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host maps-test2002.codfw.wmnet with OS bookworm
[21:16:49] <MatmaRex>	 bvibber: if you're deploying, could you do my patches afterwards too? i don't have access
[21:17:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
[21:17:01] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm
[21:17:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm
[21:17:05] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm
[21:17:07] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm
[21:17:12] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333799 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm
[21:17:14] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333800 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm
[21:17:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm
[21:17:17] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333801 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm
[21:17:18] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333802 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm
[21:17:20] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] trafficserver: remove inbound TLS and related settings [puppet] - 10https://gerrit.wikimedia.org/r/1091748 (owner: 10Ssingh)
[21:17:26] <MatmaRex>	 (or is anyone else deployng today's window?)
[21:17:26] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm
[21:17:32] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: VRTS e-mail address unreachable / e-mail routing issue - https://phabricator.wikimedia.org/T380009#10333786 (10eoghan) @jhathaway It was a rule set up to change the envelope-to of a mail from a given source. When we disabl...
[21:17:42] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10333804 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm
[21:18:32] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
[21:19:08] <bvibber>	 (not sure we have a deployer)
[21:20:56] <bvibber>	 No I don't have all the rights to deploy consistently
[21:20:59] <icinga-wm>	 RECOVERY - Disk space on an-launcher1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops
[21:21:41] <bvibber>	 I also need a deployer :)
[21:21:56] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
[21:22:23] <TheresNoTime>	 might be worth pinging RoanKattouw urbanecm cjming and kindrobot again :)
[21:22:36] <urbanecm>	 let's deploy then
[21:22:39] <urbanecm>	 hi bvibber 
[21:22:53] <bvibber>	 :D
[21:22:54] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Use WAN cache for JsonConfig remote fetch cache [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[21:22:56] <bvibber>	 \o/
[21:23:02] <bvibber>	 thx urbanecm 
[21:23:26] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Rename everything referring to "SSO domain" to use "shared domain" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091839 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:23:28] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Rename shared domain sso.wikimedia.org to auth.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091841 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:23:29] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Use DB name rather than server name in shared domain path prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091842 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:23:46] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Create no-link-recommendation variant [extensions/GrowthExperiments] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092300 (https://phabricator.wikimedia.org/T377787) (owner: 10Urbanecm)
[21:23:48] <urbanecm>	 and since i'm deploying anyway...
[21:24:19] <wikibugs>	 (03Merged) 10jenkins-bot: Rename everything referring to "SSO domain" to use "shared domain" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091839 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:24:21] <wikibugs>	 (03Merged) 10jenkins-bot: Rename shared domain sso.wikimedia.org to auth.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091841 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:24:24] <wikibugs>	 (03Merged) 10jenkins-bot: Use DB name rather than server name in shared domain path prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091842 (https://phabricator.wikimedia.org/T379811) (owner: 10Bartosz Dziewoński)
[21:26:00] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1091839|Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841|Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842|Use DB name rather than server name in shared domain path prefix (T379811)]]
[21:26:10] <stashbot>	 T379811: Update URL structure for SUL3 shared domain - https://phabricator.wikimedia.org/T379811
[21:26:38] <urbanecm>	 bvibber: btw, you do seem to have all the rights to deploy?
[21:26:58] <MatmaRex>	 (thanks urbanecm)
[21:26:58] <bvibber>	 urbanecm: last i checked i couldn't +2 into some stuff i needed. that might've been config patches
[21:27:03] <bvibber>	 and i might be wrong hah
[21:27:06] <bvibber>	 might've gotten fixed
[21:27:24] <urbanecm>	 bvibber: you shouldn't _need_ to +2 manually. if you run `scap backport XXXXX`, the bot will +2 for you
[21:27:25] <bvibber>	 in which case i just need to read up to make sure i know how to do mediawiki deploys right as well as service deploys :D
[21:27:29] <bvibber>	 ok
[21:27:33] <bvibber>	 nice
[21:27:37] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect - HE, AS6939/IPv6: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:27:48] <bvibber>	 ok so that's good news that'll save me time >:-)
[21:27:51] <urbanecm>	 (i manually +2 to speed things up, as CI can run while i deploy something else, but that's just to do things in parallel)
[21:28:03] <bvibber>	 but next time i try it i'll want someone hovering over my shoulder in case i fuck it up hehe
[21:29:18] <urbanecm>	 !log Add bvibber to wmf-deployment Gerrit group (existing deployer)
[21:29:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:23] <bvibber>	 i guess trying to escape old permissions never works. the perm bits keep coming back ;)
[21:29:33] <bvibber>	 thx
[21:29:45] <urbanecm>	 bvibber: i just made them in sync, if you want them revoked, can be done ig :D
[21:29:58] <bvibber>	 hehe
[21:30:33] <logmsgbot>	 !log urbanecm@deploy2002 matmarex, urbanecm: Backport for [[gerrit:1091839|Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841|Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842|Use DB name rather than server name in shared domain path prefix (T379811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:30:44] <urbanecm>	 MatmaRex: i assume nothing to test? :)
[21:31:03] <MatmaRex>	 urbanecm: yep. will test tomorrow when i can get the puppet patch deployed
[21:31:09] <logmsgbot>	 !log urbanecm@deploy2002 matmarex, urbanecm: Continuing with sync
[21:31:11] <urbanecm>	 ack, proceeding
[21:31:43] <icinga-wm>	 PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[21:33:09] <wikibugs>	 (03PS4) 10Urbanecm: [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204)
[21:33:21] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[21:34:04] <wikibugs>	 (03Merged) 10jenkins-bot: [GrowthExperiments] testwiki: Enable no-link-recommendation experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092295 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[21:36:45] <icinga-wm>	 RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 229.23 ms
[21:36:54] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1091839|Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841|Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842|Use DB name rather than server name in shared domain path prefix (T379811)]] (duration: 10m 54s)
[21:37:10] <stashbot>	 T379811: Update URL structure for SUL3 shared domain - https://phabricator.wikimedia.org/T379811
[21:38:01] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[21:38:01] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092300 (https://phabricator.wikimedia.org/T377787) (owner: 10Urbanecm)
[21:38:07] <bvibber>	 \o/
[21:39:21] <urbanecm>	 just the ci, just the ci...
[21:40:07] <wikibugs>	 (03PS2) 10Gergő Tisza: Add 'auth' wiki tag when using the shared login domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091922 (https://phabricator.wikimedia.org/T373737)
[21:40:49] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add 'auth' wiki tag when using the shared login domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091922 (https://phabricator.wikimedia.org/T373737) (owner: 10Gergő Tisza)
[21:42:03] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[21:43:46] <wikibugs>	 (03Merged) 10jenkins-bot: Use WAN cache for JsonConfig remote fetch cache [extensions/JsonConfig] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092304 (https://phabricator.wikimedia.org/T374746) (owner: 10Bvibber)
[21:43:54] <urbanecm>	 here we go
[21:43:58] <bvibber>	 yay
[21:44:35] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:46:20] <wikibugs>	 (03Merged) 10jenkins-bot: Create no-link-recommendation variant [extensions/GrowthExperiments] (wmf/1.44.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1092300 (https://phabricator.wikimedia.org/T377787) (owner: 10Urbanecm)
[21:46:41] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1092304|Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300|Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295|[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]]
[21:46:44] <urbanecm>	 okay, now it goes through
[21:46:47] <stashbot>	 T374746: Cache invalidation based on usage tracking of Data: pages - https://phabricator.wikimedia.org/T374746
[21:46:48] <stashbot>	 T377787: Add Link (structured): Introduce the no-link-recommendation variant - https://phabricator.wikimedia.org/T377787
[21:46:48] <stashbot>	 T380204: Deploy Add Link to a proportion of test.wikipedia.org users - https://phabricator.wikimedia.org/T380204
[21:48:27] <effie>	 !log upload prometheus-mcrouter-exporter_0.4.0+git20241118-1~wmf1 - T380212
[21:48:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:30] <stashbot>	 T380212: Package prometheus-mcrouter-exporter v0.4.0 - https://phabricator.wikimedia.org/T380212
[21:52:21] <wikibugs>	 (03PS1) 10Gergő Tisza: Use 'auth' rather than 'sso' as cookie prefix on the auth domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092333 (https://phabricator.wikimedia.org/T379811)
[21:52:47] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm, bvibber: Backport for [[gerrit:1092304|Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300|Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295|[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:52:51] <urbanecm>	 finally
[21:52:54] <stashbot>	 T374746: Cache invalidation based on usage tracking of Data: pages - https://phabricator.wikimedia.org/T374746
[21:52:54] <stashbot>	 T377787: Add Link (structured): Introduce the no-link-recommendation variant - https://phabricator.wikimedia.org/T377787
[21:52:54] <stashbot>	 T380204: Deploy Add Link to a proportion of test.wikipedia.org users - https://phabricator.wikimedia.org/T380204
[21:52:54] <urbanecm>	 bvibber: can you test?
[21:52:56] <bvibber>	 woot! testing
[21:54:04] <bvibber>	 urbanecm: working :D
[21:54:05] <bvibber>	 thx
[21:54:10] <urbanecm>	 yay!
[21:54:11] <urbanecm>	 good news
[21:54:13] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm, bvibber: Continuing with sync
[21:54:15] <urbanecm>	 proceeding
[21:57:08] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for dbrant - https://phabricator.wikimedia.org/T379678#10333937 (10thcipriani) >>! In T379678#10318095, @herron wrote: > * @thcipriani could you please leave a comment of approval for deployment?  Reason for access makes sense to me, approved!
[21:58:51] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1092304|Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300|Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295|[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] (duration: 12m 10s)
[21:59:03] <stashbot>	 T374746: Cache invalidation based on usage tracking of Data: pages - https://phabricator.wikimedia.org/T374746
[21:59:03] <stashbot>	 T377787: Add Link (structured): Introduce the no-link-recommendation variant - https://phabricator.wikimedia.org/T377787
[21:59:03] <stashbot>	 T380204: Deploy Add Link to a proportion of test.wikipedia.org users - https://phabricator.wikimedia.org/T380204
[21:59:32] <urbanecm>	 bvibber: okay, should be live
[21:59:35] <urbanecm>	 anything else?
[22:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: I, the Bot under the Fountain, call upon thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241118T2200).
[22:00:20] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] Use 'auth' rather than 'sso' as cookie prefix on the auth domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092333 (https://phabricator.wikimedia.org/T379811) (owner: 10Gergő Tisza)
[22:02:26] <wikibugs>	 (03CR) 10Bking: [C:03+1] ryankemper: add timestamps to bash history [puppet] - 10https://gerrit.wikimedia.org/r/1083925 (owner: 10Ryan Kemper)
[22:02:48] <wikibugs>	 (03PS1) 10Gergő Tisza: Disable various extensions when using the shared login domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092334 (https://phabricator.wikimedia.org/T373737)
[22:03:07] <bvibber>	 urbanecm: that's all from me
[22:03:14] <urbanecm>	 sounds good!
[22:03:15] <bvibber>	 thanks!
[22:03:35] <wikibugs>	 (03PS3) 10Gergő Tisza: Add 'auth' wiki tag when using the shared login domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091922 (https://phabricator.wikimedia.org/T373737)
[22:04:16] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add 'auth' wiki tag when using the shared login domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1091922 (https://phabricator.wikimedia.org/T373737) (owner: 10Gergő Tisza)
[22:07:15] <wikibugs>	 (03PS1) 10Urbanecm: [GrowthExperiments] testwiki: Only enable Add Link for new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092336 (https://phabricator.wikimedia.org/T380204)
[22:08:02] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] [GrowthExperiments] testwiki: Only enable Add Link for new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092336 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[22:08:46] <wikibugs>	 (03Merged) 10jenkins-bot: [GrowthExperiments] testwiki: Only enable Add Link for new accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092336 (https://phabricator.wikimedia.org/T380204) (owner: 10Urbanecm)
[22:09:19] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1092336|[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]]
[22:09:22] <stashbot>	 T380204: Deploy Add Link to a proportion of test.wikipedia.org users - https://phabricator.wikimedia.org/T380204
[22:13:22] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1092336|[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:13:55] <logmsgbot>	 !log urbanecm@deploy2002 urbanecm: Continuing with sync
[22:18:34] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1092336|[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] (duration: 09m 14s)
[22:18:37] <stashbot>	 T380204: Deploy Add Link to a proportion of test.wikipedia.org users - https://phabricator.wikimedia.org/T380204
[22:22:35] <logmsgbot>	 !log jhathaway@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm
[22:22:45] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10333992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm ex...
[22:29:28] <wikibugs>	 (03PS1) 10Effie Mouzeli: prometheus-mcrouter-exporter: update to v0.4.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1092338 (https://phabricator.wikimedia.org/T380212)
[22:37:26] <logmsgbot>	 !log bking@deploy2002 Started deploy [wdqs/wdqs@9927a5a]: 0.3.150
[22:37:48] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2042.codfw.wmnet with OS bookworm
[22:37:56] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334038 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm executed...
[22:41:07] <wikibugs>	 (03PS1) 10Aleksandar Mastilovic: All the necessary changes and missing files to make helm linter happy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092339
[22:41:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] All the necessary changes and missing files to make helm linter happy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092339 (owner: 10Aleksandar Mastilovic)
[22:47:21] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
[22:47:29] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334071 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm
[22:49:01] <wikibugs>	 (03PS1) 10Aleksandar Mastilovic: Fixing an improper merge of values.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1092340
[22:49:26] <logmsgbot>	 !log bking@deploy2002 Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 11m 59s)
[22:50:45] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[22:50:46] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
[22:50:55] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10334085 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host maps-test2001.codfw.wmnet with OS bookworm compl...
[22:52:45] <tzatziki>	 !log removing 10 files for legal compliance
[22:52:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:16] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] ryankemper: add timestamps to bash history [puppet] - 10https://gerrit.wikimedia.org/r/1083925 (owner: 10Ryan Kemper)
[22:53:55] <wikibugs>	 (03PS1) 10C. Scott Ananian: Enable experimental Parsoid fragment support on labs and test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092341 (https://phabricator.wikimedia.org/T374661)
[22:54:34] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
[22:54:47] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334089 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed...
[22:54:50] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2043.codfw.wmnet with OS bookworm
[22:54:59] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334090 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm executed...
[22:54:59] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2046.codfw.wmnet with OS bookworm
[22:55:10] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm executed...
[22:55:15] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2044.codfw.wmnet with OS bookworm
[22:55:21] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
[22:55:26] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334092 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm executed...
[22:55:28] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334093 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm ex...
[22:55:37] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2045.codfw.wmnet with OS bookworm
[22:55:49] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10334094 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm executed...
[22:56:01] <wikibugs>	 (03CR) 10Subramanya Sastry: [C:03+1] Enable experimental Parsoid fragment support on labs and test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092341 (https://phabricator.wikimedia.org/T374661) (owner: 10C. Scott Ananian)
[22:57:13] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
[22:57:22] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334100 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm
[22:58:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 19 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1092296 (https://phabricator.wikimedia.org/T379765) (owner: 10Jdlrobson)
[22:59:08] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
[22:59:12] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Platform, 06DC-Ops: Q2:rack/setup/install kafka-jumbo10[16-18] - https://phabricator.wikimedia.org/T377874#10334105 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
[23:00:26] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.dns.netbox
[23:00:37] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
[23:00:40] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
[23:00:42] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Platform, 06DC-Ops: Q2:rack/setup/install kafka-jumbo10[16-18] - https://phabricator.wikimedia.org/T377874#10334109 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
[23:00:46] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Platform, 06DC-Ops: Q2:rack/setup/install kafka-jumbo10[16-18] - https://phabricator.wikimedia.org/T377874#10334110 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
[23:01:34] <logmsgbot>	 !log jhathaway@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
[23:01:41] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334113 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm ex...
[23:02:26] <wikibugs>	 (03PS11) 10Ryan Kemper: wdqs: create wdqs-internal-[main,scholarly] roles [puppet] - 10https://gerrit.wikimedia.org/r/1088210 (https://phabricator.wikimedia.org/T379329)
[23:02:52] <wikibugs>	 (03PS5) 10Ryan Kemper: wdqs: new pybal pools for internal graph split [puppet] - 10https://gerrit.wikimedia.org/r/1088383 (https://phabricator.wikimedia.org/T379330)
[23:02:52] <wikibugs>	 (03PS4) 10Ryan Kemper: wdqs-internal: add envoy config for graph split [puppet] - 10https://gerrit.wikimedia.org/r/1091340 (https://phabricator.wikimedia.org/T379333)
[23:03:38] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1088210 (https://phabricator.wikimedia.org/T379329) (owner: 10Ryan Kemper)
[23:03:54] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
[23:03:57] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
[23:04:02] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
[23:04:02] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334117 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host thanos-be2005.codfw.wmnet with OS bookworm
[23:04:05] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
[23:04:05] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[23:04:10] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10334118 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host maps-test2003.codfw.wmnet with OS bookworm
[23:05:33] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
[23:06:08] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.dns.netbox
[23:08:13] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
[23:09:10] <wikibugs>	 (03PS1) 10Cwhite: logstash: upgrade phatality version to 2.7.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/1092343 (https://phabricator.wikimedia.org/T342476)
[23:09:33] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
[23:09:38] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
[23:09:38] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[23:12:49] <tzatziki>	 !log removing 2 files for legal compliance
[23:12:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:12:51] <icinga-wm>	 PROBLEM - rt.wikimedia.org requires authentication on moscovium is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[23:14:45] <icinga-wm>	 RECOVERY - rt.wikimedia.org requires authentication on moscovium is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 537 bytes in 3.180 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[23:15:31] <logmsgbot>	 !log jhathaway@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
[23:19:10] <logmsgbot>	 !log jhathaway@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
[23:20:22] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install thanos-be2005 - https://phabricator.wikimedia.org/T370452#10334153 (10jhathaway) @elukey, unfortunately I observed the same double d-i installer issue with thanos-be2005. Grub's installer does not throw any errro...
[23:25:08] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
[23:25:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10334174 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host maps-test2004.codfw.wmnet with OS bookworm
[23:26:00] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
[23:26:37] <tzatziki>	 !log removing 1 file for legal compliance
[23:26:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:27:58] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[23:28:48] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[23:28:49] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
[23:28:59] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10334190 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host maps-test2002.codfw.wmnet with OS bookworm completed: - maps-test2...
[23:29:57] <wikibugs>	 (03PS4) 10Bking: dse-k8s-services: introduce Blunderbuss config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1091827 (https://phabricator.wikimedia.org/T371994)
[23:31:49] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
[23:32:05] <tzatziki>	 !log removing 1 file for legal compliance
[23:32:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:11] <wikibugs>	 (03PS1) 10Eevans: restbase: commission restbase203[6-8] [puppet] - 10https://gerrit.wikimedia.org/r/1092345 (https://phabricator.wikimedia.org/T380236)
[23:46:07] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
[23:46:13] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Set up six decommissioned nodes as temporary maps-test cluster - https://phabricator.wikimedia.org/T380144#10334202 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host maps-test2005.codfw.wmnet with OS bookworm
[23:48:01] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
[23:50:49] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
[23:51:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"