[00:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0000)
[00:01:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 957.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:03:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:06:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 947.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:10:12] <wikibugs>	 (03PS1) 10Zabe: Prepare sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122267 (https://phabricator.wikimedia.org/T386441)
[00:10:14] <wikibugs>	 (03PS1) 10Zabe: Activate sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122268 (https://phabricator.wikimedia.org/T386441)
[00:12:05] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Prepare sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122267 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:13:29] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122267 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:13:51] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1122267|Prepare sylwiki (T386441)]]
[00:13:54] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:16:28] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:1122267|Prepare sylwiki (T386441)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[00:16:39] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[00:23:17] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122267|Prepare sylwiki (T386441)]] (duration: 09m 25s)
[00:23:21] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:25:06] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Activate sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122268 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:25:52] <wikibugs>	 (03Merged) 10jenkins-bot: Activate sylwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122268 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:26:45] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1122268|Activate sylwiki (T386441)]]
[00:29:26] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:1122268|Activate sylwiki (T386441)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[00:29:30] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:29:51] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[00:33:14] <wikibugs>	 (03PS5) 10Scott French: php8.1: use pcre2 backport [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1120588 (https://phabricator.wikimedia.org/T386006)
[00:33:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:36:34] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122268|Activate sylwiki (T386441)]] (duration: 09m 48s)
[00:36:38] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:38:14] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1122270
[00:38:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1122270 (owner: 10TrainBranchBot)
[00:39:19] <wikibugs>	 (03PS1) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122271 (https://phabricator.wikimedia.org/T386441)
[00:39:21] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122271 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:40:43] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122271 (https://phabricator.wikimedia.org/T386441) (owner: 10Zabe)
[00:41:10] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1122271|Update interwiki cache (T386441)]]
[00:42:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.079s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:43:51] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:1122271|Update interwiki cache (T386441)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[00:43:55] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:44:26] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[00:45:25] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/1122259 (https://phabricator.wikimedia.org/T378429) (owner: 10RLazarus)
[00:47:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.091s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:48:53] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1122270 (owner: 10TrainBranchBot)
[00:51:02] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122271|Update interwiki cache (T386441)]] (duration: 09m 51s)
[00:51:05] <stashbot>	 T386441: Create Wikipedia Sylheti - https://phabricator.wikimedia.org/T386441
[00:54:08] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Add mw-script-restricted config to mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1122259 (https://phabricator.wikimedia.org/T378429) (owner: 10RLazarus)
[01:06:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.005s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:08:34] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1122273
[01:08:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1122273 (owner: 10TrainBranchBot)
[01:16:16] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.046s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:24:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10577714 (10phaultfinder)
[01:29:37] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1122273 (owner: 10TrainBranchBot)
[01:51:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[01:53:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[01:54:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2013-4 to codfw - jhancock@cumin2002"
[01:54:51] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2013-4 to codfw - jhancock@cumin2002"
[01:54:52] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[01:55:28] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host backup2013
[01:55:34] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host backup2014
[01:55:40] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2013
[01:55:43] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2014
[01:56:44] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[01:56:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:07:31] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:08:37] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.44.0-wmf.18 [core] (wmf/1.44.0-wmf.18) - 10https://gerrit.wikimedia.org/r/1122276 (https://phabricator.wikimedia.org/T382369)
[02:08:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.44.0-wmf.18 [core] (wmf/1.44.0-wmf.18) - 10https://gerrit.wikimedia.org/r/1122276 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[02:13:37] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:14:09] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:15:11] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10577756 (10Jhancock.wm)
[02:15:39] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:17:48] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:19:20] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.44.0-wmf.18 [core] (wmf/1.44.0-wmf.18) - 10https://gerrit.wikimedia.org/r/1122276 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[02:23:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[02:25:49] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:28:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.181s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:33:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.128s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:34:16] <wikibugs>	 (03PS2) 10Huji: New alias for Project namespace on Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122278 (https://phabricator.wikimedia.org/T387185)
[02:35:34] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:35:36] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:38:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.085s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:42:45] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:42:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[02:43:16] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.005s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:43:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[02:44:10] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2014']
[02:44:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2013']
[02:44:22] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2013']
[02:44:24] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2014']
[02:45:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
[02:45:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host backup2014.codfw.wmnet with OS bookworm
[02:45:25] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10577765 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2013.codfw.wmnet with OS bookworm
[02:45:25] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10577766 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2014.codfw.wmnet with OS bookworm
[02:56:56] <wikibugs>	 (03PS1) 10Pppery: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464)
[02:57:22] <wikibugs>	 (03PS2) 10Pppery: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464)
[02:58:33] <wikibugs>	 (03PS3) 10Pppery: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464)
[02:58:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate grafana-labs.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0300)
[03:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:13:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:28:50] <wikibugs>	 (03PS4) 10Pppery: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464)
[03:52:22] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0400)
[04:02:47] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122280 (https://phabricator.wikimedia.org/T382369)
[04:02:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122280 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[04:03:37] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122280 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[04:04:05] <logmsgbot>	 !log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.18  refs T382369
[04:04:09] <stashbot>	 T382369: 1.44.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T382369
[04:05:30] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2013.codfw.wmnet with OS bookworm
[04:05:35] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2014.codfw.wmnet with OS bookworm
[04:05:36] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10577831 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2013.codfw.wmnet with OS bookworm executed with errors: - backu...
[04:05:41] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10577832 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2014.codfw.wmnet with OS bookworm executed with errors: - backu...
[04:18:19] <icinga-wm>	 PROBLEM - Disk space on deploy2002 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/dabe007b656e0142cc64c917886173235ebfd151464d06b58cb88e4e1ea40743/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=deploy2002&var-datasource=codfw+prometheus/ops
[04:38:19] <icinga-wm>	 RECOVERY - Disk space on deploy2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=deploy2002&var-datasource=codfw+prometheus/ops
[04:51:29] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap sync-world: testwikis to 1.44.0-wmf.18  refs T382369 (duration: 47m 24s)
[04:51:33] <stashbot>	 T382369: 1.44.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T382369
[05:00:05] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0500)
[05:03:02] <logmsgbot>	 !log mwpresync@deploy2002 Pruned MediaWiki: 1.44.0-wmf.15 (duration: 02m 59s)
[05:19:16] <wikibugs>	 (03PS5) 10Pppery: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464)
[05:53:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:08:37] <marostegui>	 !log Sanitize sylwiki T386463
[06:08:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:41] <stashbot>	 T386463: Prepare and check storage layer for sylwiki - https://phabricator.wikimedia.org/T386463
[06:11:53] <wikibugs>	 (03PS1) 10Marostegui: s2-pager: Remove from repo [software] - 10https://gerrit.wikimedia.org/r/1122288
[06:13:26] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] s2-pager: Remove from repo [software] - 10https://gerrit.wikimedia.org/r/1122288 (owner: 10Marostegui)
[06:13:52] <wikibugs>	 (03Merged) 10jenkins-bot: s2-pager: Remove from repo [software] - 10https://gerrit.wikimedia.org/r/1122288 (owner: 10Marostegui)
[06:22:51] <wikibugs>	 (03CR) 10Marostegui: clone.py: Add helper functions for later use (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1120213 (https://phabricator.wikimedia.org/T387023) (owner: 10Federico Ceratto)
[06:23:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:32:28] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db2243 - https://phabricator.wikimedia.org/T382425#10577942 (10Marostegui) Thank you!
[06:36:37] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[06:51:29] <wikibugs>	 (03PS2) 10Anzx: sylwiki: add logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122473 (https://phabricator.wikimedia.org/T386464)
[06:52:55] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 25 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122473 (https://phabricator.wikimedia.org/T386464) (owner: 10Anzx)
[06:58:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate grafana-labs.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0700)
[07:00:05] <jouncebot>	 marostegui, Amir1, and federico3: May I have your attention please! Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0700)
[07:16:59] <wikibugs>	 (03PS2) 10Anzx: lift of IP cap for UCLA Library event - 3/5/2025 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122478 (https://phabricator.wikimedia.org/T387181)
[07:17:16] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 25 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122478 (https://phabricator.wikimedia.org/T387181) (owner: 10Anzx)
[07:28:11] <wikibugs>	 (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1120152 (https://phabricator.wikimedia.org/T386622) (owner: 10LD)
[07:34:51] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3500 MB (3% inode=98%): /tmp 3500 MB (3% inode=98%): /var/tmp 3500 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[07:52:22] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:57:43] <wikibugs>	 (03PS15) 10Vgutierrez: sre.loadbalancer: Add migrate-service-ipip cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020)
[07:57:55] <wikibugs>	 (03CR) 10Vgutierrez: sre.loadbalancer: Add migrate-service-ipip cookbook (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T0800).
[08:00:05] <jouncebot>	 LD and anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:11] <anzx>	 o/
[08:00:41] <awight>	 I can deploy today
[08:02:01] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:02:36] <awight>	 LD: patch looks good but are you around?  Or does someone else involved in the frwiki campaigns enablement want to stand in?
[08:03:17] <awight>	 anzx: I can start with your logo patch
[08:03:22] <anzx>	 ok
[08:04:30] <wikibugs>	 (03PS4) 10Fabfur: hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332)
[08:04:31] <awight>	 the text seems to disappear on a dark background, jfyi.  maybe test in dark mode once it's on the test server
[08:04:39] <wikibugs>	 (03CR) 10Fabfur: hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:05:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by awight@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122473 (https://phabricator.wikimedia.org/T386464) (owner: 10Anzx)
[08:05:52] <wikibugs>	 (03Merged) 10jenkins-bot: sylwiki: add logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122473 (https://phabricator.wikimedia.org/T386464) (owner: 10Anzx)
[08:06:50] <logmsgbot>	 !log awight@deploy2002 Started scap sync-world: Backport for [[gerrit:1122473|sylwiki: add logo (T386464)]]
[08:06:54] <stashbot>	 T386464: Post-creation work for sylwiki - https://phabricator.wikimedia.org/T386464
[08:08:09] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:11:12] <awight>	 (I don't know if scap is updating logos.php automatically?)
[08:13:12] <awight>	 Ah this was already included in the patch, sorry for the noise...
[08:13:14] <wikibugs>	 (03PS5) 10Fabfur: hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332)
[08:13:27] <logmsgbot>	 !log awight@deploy2002 anzx, awight: Backport for [[gerrit:1122473|sylwiki: add logo (T386464)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:13:30] <anzx>	 awight: checking 
[08:13:31] <stashbot>	 T386464: Post-creation work for sylwiki - https://phabricator.wikimedia.org/T386464
[08:13:53] <awight>	 anzx: Looks good to me, also works in dark mode
[08:14:08] <anzx>	 awight: looks good
[08:14:12] <awight>	 ack
[08:14:12] <wikibugs>	 (03CR) 10Jelto: [C:03+2] aptrepo: update gitlab-runner Suite to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1119718 (https://phabricator.wikimedia.org/T386297) (owner: 10Jelto)
[08:14:16] <logmsgbot>	 !log awight@deploy2002 anzx, awight: Continuing with sync
[08:14:51] <icinga-wm>	 PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 3232 MB (3% inode=98%): /tmp 3232 MB (3% inode=98%): /var/tmp 3232 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops
[08:20:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 36547328 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[08:21:01] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 2320728 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[08:22:46] <logmsgbot>	 !log awight@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122473|sylwiki: add logo (T386464)]] (duration: 15m 56s)
[08:22:50] <stashbot>	 T386464: Post-creation work for sylwiki - https://phabricator.wikimedia.org/T386464
[08:25:47] <awight>	 anzx: Deploying the throttle exception now
[08:27:17] <anzx>	 awight: no need for testing on throttle change
[08:27:24] <awight>	 ack, thanks
[08:27:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by awight@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122478 (https://phabricator.wikimedia.org/T387181) (owner: 10Anzx)
[08:28:25] <wikibugs>	 (03Merged) 10jenkins-bot: lift of IP cap for UCLA Library event - 3/5/2025 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122478 (https://phabricator.wikimedia.org/T387181) (owner: 10Anzx)
[08:28:52] <logmsgbot>	 !log awight@deploy2002 Started scap sync-world: Backport for [[gerrit:1122478|lift of IP cap for UCLA Library event - 3/5/2025 (T387181)]]
[08:28:56] <stashbot>	 T387181: Requesting temporary lift of IP cap for UCLA Library event - 3/5/2025 - https://phabricator.wikimedia.org/T387181
[08:30:50] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:33:19] <logmsgbot>	 !log awight@deploy2002 awight, anzx: Backport for [[gerrit:1122478|lift of IP cap for UCLA Library event - 3/5/2025 (T387181)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:33:33] <wikibugs>	 (03PS1) 10Brouberol: airflow-test-k8s: temporarily mimic airflow-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122514 (https://phabricator.wikimedia.org/T386282)
[08:33:38] <logmsgbot>	 !log awight@deploy2002 awight, anzx: Continuing with sync
[08:34:35] <wikibugs>	 (03PS2) 10Brouberol: airflow-test-k8s: temporarily mimic airflow-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122514 (https://phabricator.wikimedia.org/T386282)
[08:40:36] <logmsgbot>	 !log awight@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122478|lift of IP cap for UCLA Library event - 3/5/2025 (T387181)]] (duration: 11m 43s)
[08:40:40] <stashbot>	 T387181: Requesting temporary lift of IP cap for UCLA Library event - 3/5/2025 - https://phabricator.wikimedia.org/T387181
[08:40:44] <anzx>	 awight: thanks for deploying
[08:40:55] <awight>	 gladly!
[08:41:21] <awight>	 !log UTC morning backport finished
[08:41:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:58] <wikibugs>	 (03PS1) 10Volans: service_catalog: allow to refresh from disk [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122516
[08:42:24] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:42:49] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: send haproxy silent-drop logs to benthos (cp-upload_ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:42:55] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] "tnx!" [puppet] - 10https://gerrit.wikimedia.org/r/1122157 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[08:51:09] <wikibugs>	 (03CR) 10Volans: "Nice addition! Some comments/replies inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[08:54:17] <wikibugs>	 (03PS1) 10Jelto: sre.gitlab.upgrade: add a prompt before backups on replica [cookbooks] - 10https://gerrit.wikimedia.org/r/1122520
[09:03:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[09:06:34] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "`" [cookbooks] - 10https://gerrit.wikimedia.org/r/1122520 (owner: 10Jelto)
[09:09:51] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=5; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[09:10:23] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: maintenance
[09:10:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P73521 and previous config saved to /var/cache/conftool/dbconfig/20250225-091025-marostegui.json
[09:11:02] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for db1158.eqiad.wmnet
[09:14:55] <suzannewoodWMDE2>	 Hi! We want to run a maintenance script to add wikidata support for a new language wikipedia. Let us know if this is a bad time, otherwise we will proceed (#wikidata-for-wikimedia-projects at WMDE) https://phabricator.wikimedia.org/T386468
[09:15:57] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Idle - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:16:01] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:17:42] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1158.eqiad.wmnet
[09:19:01] <logmsgbot>	 !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Index rebuild
[09:24:41] <wikibugs>	 (03CR) 10Elukey: "Checked hostnames and IP ranges, LGTM. I left a couple of comments related to the service.yaml changes, lemme know :)" [puppet] - 10https://gerrit.wikimedia.org/r/1122170 (https://phabricator.wikimedia.org/T381417) (owner: 10Herron)
[09:24:48] <wikibugs>	 10ops-magru, 06SRE, 06Infrastructure-Foundations, 10netops: cr2-magru errors on xe-0/1/0 (EdgeUno Transit) - https://phabricator.wikimedia.org/T387006#10578164 (10ayounsi) 05Open→03Resolved a:03ayounsi No more errors.
[09:26:00] <wikibugs>	 (03CR) 10Elukey: [C:03+1] service_catalog: allow to refresh from disk [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122516 (owner: 10Volans)
[09:26:14] <wikibugs>	 (03CR) 10Elukey: [C:03+2] knative-serving: fix drop capabilities [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122129 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey)
[09:27:24] <wikibugs>	 (03CR) 10Volans: [C:03+2] service_catalog: allow to refresh from disk [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122516 (owner: 10Volans)
[09:27:56] <suzannewoodWMDE2>	 !log suzannewood@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
[09:27:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:13] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=5; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
[09:30:42] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:33:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[09:36:20] <wikibugs>	 (03PS1) 10Cathal Mooney: WMF-Plugin: Potential clean-up of b-end circuit finding logic [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1122524 (https://phabricator.wikimedia.org/T310577)
[09:36:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMF-Plugin: Potential clean-up of b-end circuit finding logic [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1122524 (https://phabricator.wikimedia.org/T310577) (owner: 10Cathal Mooney)
[09:37:33] <wikibugs>	 (03Merged) 10jenkins-bot: service_catalog: allow to refresh from disk [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122516 (owner: 10Volans)
[09:44:44] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v9.1.2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122525
[09:45:22] <wikibugs>	 (03CR) 10Volans: [C:03+2] CHANGELOG: add changelogs for release v9.1.2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122525 (owner: 10Volans)
[09:46:55] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[09:47:55] <icinga-wm>	 RECOVERY - BFD status on cr2-eqdfw is OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[09:49:15] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] "nice catch!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1120588 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[09:50:09] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10578273 (10JMeybohm) >>! In T384731#10566953, @fgiunchedi wrote: >>>! In T384731#10563685, @ayounsi wrote: >>  >>...
[09:50:18] <suzannewoodWMDE2>	 !log Finished populateSitesTable for [sylwiki] https://phabricator.wikimedia.org/T386468
[09:50:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:59] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/994164 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[09:51:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] P:firewall absent check_conntrack script. [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[09:53:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] P:systemd::timesyncd absent monitoring, handled by AlertManager [puppet] - 10https://gerrit.wikimedia.org/r/994172 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[09:55:48] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v9.1.2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122525 (owner: 10Volans)
[09:57:28] <wikibugs>	 (03PS1) 10Volans: Upstream release v9.1.2 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122529
[09:57:40] <wikibugs>	 (03CR) 10Volans: [C:03+2] Upstream release v9.1.2 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122529 (owner: 10Volans)
[10:00:08] <suzannewoodWMDE2>	 We are finished with the maintenance scripts
[10:01:07] <wikibugs>	 (03PS1) 10Marostegui: Revert "x1: Change format to STATEMENT" [puppet] - 10https://gerrit.wikimedia.org/r/1122530
[10:02:23] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "x1: Change format to STATEMENT" [puppet] - 10https://gerrit.wikimedia.org/r/1122530 (owner: 10Marostegui)
[10:03:34] <marostegui>	 !log Move x1 back to RBR T385645
[10:03:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:37] <stashbot>	 T385645: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645
[10:05:41] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] airflow-test-k8s: temporarily mimic airflow-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122514 (https://phabricator.wikimedia.org/T386282) (owner: 10Brouberol)
[10:08:15] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v9.1.2 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122529 (owner: 10Volans)
[10:10:05] <wikibugs>	 (03PS1) 10Marostegui: db-production.php: Disable writes on es6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122533 (https://phabricator.wikimedia.org/T376905)
[10:19:16] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: maintenance
[10:19:28] <wikibugs>	 (03CR) 10Cathal Mooney: "Overall looks good to me.  As we talked about on irc I think there are further improvements we can make with this as the starting point.  " [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1094284 (https://phabricator.wikimedia.org/T310577) (owner: 10Ayounsi)
[10:19:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P73525 and previous config saved to /var/cache/conftool/dbconfig/20250225-101956-marostegui.json
[10:20:20] <marostegui>	 !log Upgrade db1169 to 10.6.21 T385678
[10:20:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:25] <stashbot>	 T385678: Compile and package MariaDB 10.11.11 and MariaDB 10.6.21 - https://phabricator.wikimedia.org/T385678
[10:22:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P73526 and previous config saved to /var/cache/conftool/dbconfig/20250225-102159-marostegui.json
[10:22:20] <marostegui>	 !log Upgrade db1177 to 10.6.21 T385678
[10:22:23] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: maintenance
[10:22:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73527 and previous config saved to /var/cache/conftool/dbconfig/20250225-102422-root.json
[10:25:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73528 and previous config saved to /var/cache/conftool/dbconfig/20250225-102522-root.json
[10:28:38] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] php8.1: use pcre2 backport [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1120588 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[10:35:24] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-test-k8s: temporarily mimic airflow-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122514 (https://phabricator.wikimedia.org/T386282) (owner: 10Brouberol)
[10:37:07] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "That's great !!" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1122524 (https://phabricator.wikimedia.org/T310577) (owner: 10Cathal Mooney)
[10:38:26] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[10:38:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es1031 to es3 master', diff saved to https://phabricator.wikimedia.org/P73529 and previous config saved to /var/cache/conftool/dbconfig/20250225-103849-root.json
[10:39:16] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[10:39:23] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1034.eqiad.wmnet with reason: maintenance
[10:39:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73530 and previous config saved to /var/cache/conftool/dbconfig/20250225-103928-root.json
[10:39:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1034', diff saved to https://phabricator.wikimedia.org/P73531 and previous config saved to /var/cache/conftool/dbconfig/20250225-103945-marostegui.json
[10:39:51] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for es1034.eqiad.wmnet
[10:40:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73532 and previous config saved to /var/cache/conftool/dbconfig/20250225-104028-root.json
[10:42:06] <wikibugs>	 (03CR) 10Stevemunene: [C:03+1] analytics/html: update readme for MW history dump [puppet] - 10https://gerrit.wikimedia.org/r/1102848 (https://phabricator.wikimedia.org/T381390) (owner: 10Milimetric)
[10:47:54] <hnowlan>	 jouncebot: nowandnext
[10:47:54] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 12 minute(s)
[10:47:54] <jouncebot>	 In 0 hour(s) and 12 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1100)
[10:48:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool es1034', diff saved to https://phabricator.wikimedia.org/P73533 and previous config saved to /var/cache/conftool/dbconfig/20250225-104840-marostegui.json
[10:48:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1034.eqiad.wmnet
[10:49:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es1034 to es3 master', diff saved to https://phabricator.wikimedia.org/P73534 and previous config saved to /var/cache/conftool/dbconfig/20250225-104908-root.json
[10:54:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73535 and previous config saved to /var/cache/conftool/dbconfig/20250225-105433-root.json
[10:54:58] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] trafficserver: use mobileapps directly for hewiki APIs [puppet] - 10https://gerrit.wikimedia.org/r/1117508 (https://phabricator.wikimedia.org/T372746) (owner: 10Hnowlan)
[10:55:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73536 and previous config saved to /var/cache/conftool/dbconfig/20250225-105534-root.json
[10:57:22] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: user@0.service on testreduce1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1100)
[11:02:37] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Gaps in gNMI network statistics in eqiad - https://phabricator.wikimedia.org/T386807#10578486 (10cmooney) 05Open→03Resolved Gonna close this one at this point.  All has been ok in eqiad and codfw since the increase in thread count last week - gaps are no...
[11:05:42] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.add-wiki for database satwiktionary (T386634)
[11:05:45] <stashbot>	 T386634: [wikireplicas] Create views for new wiki satwiktionary - https://phabricator.wikimedia.org/T386634
[11:08:04] <wikibugs>	 (03CR) 10Effie Mouzeli: [V:03+2 C:03+2] php8.1: use pcre2 backport [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1120588 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[11:08:20] <wikibugs>	 (03PS16) 10Vgutierrez: sre.loadbalancer: Add migrate-service-ipip cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020)
[11:08:32] <hnowlan>	 !log Switched hewiki mobileapps APIs to rest-gateway, removing restbase from path 
[11:08:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:09] <wikibugs>	 (03PS17) 10Vgutierrez: sre.loadbalancer: Add migrate-service-ipip cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020)
[11:09:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73537 and previous config saved to /var/cache/conftool/dbconfig/20250225-110938-root.json
[11:09:51] <wikibugs>	 (03CR) 10Vgutierrez: sre.loadbalancer: Add migrate-service-ipip cookbook (0311 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[11:10:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73538 and previous config saved to /var/cache/conftool/dbconfig/20250225-111039-root.json
[11:13:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:14:30] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] shellbox-media: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116838 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:14:46] <wikibugs>	 (03PS1) 10Hnowlan: trafficserver: roll restbaseless citoid out to group0 wikis [puppet] - 10https://gerrit.wikimedia.org/r/1122542 (https://phabricator.wikimedia.org/T361576)
[11:15:38] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox-media: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116838 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:15:42] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:16:16] <wikibugs>	 (03CR) 10Volans: [C:03+1] "Great, LGTM." [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[11:16:41] <wikibugs>	 (03PS1) 10Cathal Mooney: Rename text interface state values returned by GNMI to ints [puppet] - 10https://gerrit.wikimedia.org/r/1122543 (https://phabricator.wikimedia.org/T372457)
[11:17:39] <wikibugs>	 (03CR) 10Elukey: "I may miss something related to the containerd migration, but in theory this recipe is not needed. We have currently this layout:" [puppet] - 10https://gerrit.wikimedia.org/r/1121335 (https://phabricator.wikimedia.org/T386900) (owner: 10Stevemunene)
[11:18:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1035', diff saved to https://phabricator.wikimedia.org/P73539 and previous config saved to /var/cache/conftool/dbconfig/20250225-111805-marostegui.json
[11:18:24] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for es1035.eqiad.wmnet
[11:18:28] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] sre.loadbalancer: Add migrate-service-ipip cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[11:20:13] <wikibugs>	 (03PS1) 10Marostegui: Revert^2 "x1: Change format to STATEMENT" [puppet] - 10https://gerrit.wikimedia.org/r/1122544
[11:20:42] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:21:42] <wikibugs>	 (03PS1) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[11:22:02] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[11:22:16] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[11:22:21] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[11:22:54] <wikibugs>	 (03PS1) 10Vgutierrez: prometheus: Collect MSS metrics every minute [puppet] - 10https://gerrit.wikimedia.org/r/1122546
[11:22:57] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[11:23:14] <wikibugs>	 (03PS2) 10Vgutierrez: prometheus: Collect MSS metrics every minute [puppet] - 10https://gerrit.wikimedia.org/r/1122546
[11:23:46] <wikibugs>	 (03PS2) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[11:24:25] <wikibugs>	 (03CR) 10Vgutierrez: alerts: add alert for ferm_mss_cfg Prometheus metric (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[11:24:43] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122546 (owner: 10Vgutierrez)
[11:24:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73540 and previous config saved to /var/cache/conftool/dbconfig/20250225-112447-root.json
[11:25:34] <wikibugs>	 (03Merged) 10jenkins-bot: sre.loadbalancer: Add migrate-service-ipip cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1122152 (https://phabricator.wikimedia.org/T373020) (owner: 10Vgutierrez)
[11:25:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73541 and previous config saved to /var/cache/conftool/dbconfig/20250225-112545-root.json
[11:25:46] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1035.eqiad.wmnet
[11:26:37] <wikibugs>	 (03PS1) 10Effie Mouzeli: shellbox-timeline: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122547 (https://phabricator.wikimedia.org/T377038)
[11:29:33] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1035.eqiad.wmnet with reason: maintenance
[11:29:58] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "Ok to me, using "minutely" is much more readable than old cron syntax btw" [puppet] - 10https://gerrit.wikimedia.org/r/1122546 (owner: 10Vgutierrez)
[11:30:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73542 and previous config saved to /var/cache/conftool/dbconfig/20250225-113029-root.json
[11:30:56] <logmsgbot>	 !log jiji@deploy2002 Started scap sync-world: T386006 - use pcre2 backport in php8.1 images
[11:31:00] <stashbot>	 T386006: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006
[11:31:34] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] prometheus: Collect MSS metrics every minute [puppet] - 10https://gerrit.wikimedia.org/r/1122546 (owner: 10Vgutierrez)
[11:31:35] <logmsgbot>	 !log fnegri@cumin1002 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database satwiktionary (T386634)
[11:31:39] <stashbot>	 T386634: [wikireplicas] Create views for new wiki satwiktionary - https://phabricator.wikimedia.org/T386634
[11:32:11] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.add-wiki for database sylwiki (T386467)
[11:32:15] <stashbot>	 T386467: [wikireplicas] Create views for new wiki sylwiki - https://phabricator.wikimedia.org/T386467
[11:32:18] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert^2 "x1: Change format to STATEMENT" [puppet] - 10https://gerrit.wikimedia.org/r/1122544 (owner: 10Marostegui)
[11:32:23] <logmsgbot>	 !log fnegri@cumin1002 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database sylwiki (T386467)
[11:35:29] <wikibugs>	 (03CR) 10Effie Mouzeli: "self +2ing this, as it is similar to If2296418565caa0ad58f4dd612d009c44ad4dd07" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122547 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:35:40] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] shellbox-timeline: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122547 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:36:08] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[11:36:11] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[11:37:20] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox-timeline: 1 replica on 8.1 for each DC [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122547 (https://phabricator.wikimedia.org/T377038) (owner: 10Effie Mouzeli)
[11:37:34] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[11:37:38] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[11:39:04] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[11:39:28] <marostegui>	 !log Deploy schema change on x1 db1179 eqiad dbmaint T385645
[11:39:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:31] <stashbot>	 T385645: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645
[11:39:41] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[11:41:35] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[11:41:52] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[11:43:21] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:43:54] <effie>	 !jouncebot next
[11:43:54] <wm-bot>	 a Python reminder bot for deployments. see https://wikitech.wikimedia.org/wiki/Tool:Jouncebot
[11:44:02] <effie>	 jouncebot: next
[11:44:02] <jouncebot>	 In 1 hour(s) and 15 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1300)
[11:45:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1035 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73543 and previous config saved to /var/cache/conftool/dbconfig/20250225-114534-root.json
[11:45:44] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: maintenance
[11:51:52] <wikibugs>	 (03PS2) 10Hnowlan: trafficserver: roll restbaseless citoid out to group0 wikis [puppet] - 10https://gerrit.wikimedia.org/r/1122542 (https://phabricator.wikimedia.org/T361576)
[11:55:27] <logmsgbot>	 !log jiji@deploy2002 Finished scap sync-world: T386006 - use pcre2 backport in php8.1 images (duration: 25m 34s)
[11:55:31] <stashbot>	 T386006: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006
[12:00:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1035 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73544 and previous config saved to /var/cache/conftool/dbconfig/20250225-120040-root.json
[12:03:45] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10578627 (10Yann) I got again this error while trying to upload a big PNG file....
[12:04:31] <wikibugs>	 (03PS1) 10Fabfur: workaround for T256098 [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098)
[12:06:28] <LD>	 Lucas_WMDE do you think we can deploy 1120152 on the next window (14utc?)
[12:09:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P73545 and previous config saved to /var/cache/conftool/dbconfig/20250225-120953-marostegui.json
[12:14:00] <LD>	 awight_ sorry about this morning, I had a call conf
[12:15:12] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] sre.gitlab.upgrade: add a prompt before backups on replica [cookbooks] - 10https://gerrit.wikimedia.org/r/1122520 (owner: 10Jelto)
[12:15:31] <wikibugs>	 (03PS1) 10Hnowlan: mw-api-ext, mw-web: right-size clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122561 (https://phabricator.wikimedia.org/T380858)
[12:15:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73546 and previous config saved to /var/cache/conftool/dbconfig/20250225-121545-root.json
[12:16:27] <wikibugs>	 (03PS1) 10Slyngshede: Ensure that the LDAP user is parsed as an Entry object. [software/bitu] - 10https://gerrit.wikimedia.org/r/1122562 (https://phabricator.wikimedia.org/T385947)
[12:17:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 25 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1120152 (https://phabricator.wikimedia.org/T386622) (owner: 10LD)
[12:17:30] <wikibugs>	 (03PS2) 10Slyngshede: Ensure that the LDAP user is parsed as an Entry object. [software/bitu] - 10https://gerrit.wikimedia.org/r/1122562 (https://phabricator.wikimedia.org/T385947)
[12:18:32] <wikibugs>	 (03PS1) 10Clément Goubert: mediawiki: CronJob name as Job label [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122563 (https://phabricator.wikimedia.org/T385709)
[12:20:12] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10578641 (10cmooney) >>! In T385217#10572967, @cmooney wrote: > DC-Ops folks Nokia reccomend trying to interrupt the grub bootlo...
[12:20:18] <wikibugs>	 (03PS1) 10Hnowlan: mw-jobrunner: scale down [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122565
[12:20:43] <wikibugs>	 (03PS2) 10Clément Goubert: mediawiki: CronJob name as Job label [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122563 (https://phabricator.wikimedia.org/T385709)
[12:30:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73547 and previous config saved to /var/cache/conftool/dbconfig/20250225-123050-root.json
[12:37:09] <wikibugs>	 (03Abandoned) 10Andrew Bogott: cloud-vps: increase # of attempts with dns resolving [puppet] - 10https://gerrit.wikimedia.org/r/1105945 (https://phabricator.wikimedia.org/T374830) (owner: 10Andrew Bogott)
[12:37:54] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] db-production.php: Disable writes on es6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122533 (https://phabricator.wikimedia.org/T376905) (owner: 10Marostegui)
[12:41:30] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10578688 (10MatthewVernon) I'm not seeing any elevated errors from swift today.
[12:48:20] <marostegui>	 jouncebot: next
[12:48:21] <jouncebot>	 In 0 hour(s) and 11 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1300)
[12:48:38] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db-production.php: Disable writes on es6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122533 (https://phabricator.wikimedia.org/T376905) (owner: 10Marostegui)
[12:49:26] <wikibugs>	 (03Merged) 10jenkins-bot: db-production.php: Disable writes on es6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122533 (https://phabricator.wikimedia.org/T376905) (owner: 10Marostegui)
[12:50:13] <logmsgbot>	 !log marostegui@deploy2002 Started scap sync-world: Backport for [[gerrit:1122533|db-production.php: Disable writes on es6 (T376905)]]
[12:50:58] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote es2037 to es6 master [puppet] - 10https://gerrit.wikimedia.org/r/1122568 (https://phabricator.wikimedia.org/T387211)
[12:51:02] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update es6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1122569 (https://phabricator.wikimedia.org/T387211)
[12:52:37] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T387211
[12:52:41] <stashbot>	 T387211: Switchover es6 master (es2035 -> es2037) - https://phabricator.wikimedia.org/T387211
[12:53:06] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:1122533|db-production.php: Disable writes on es6 (T376905)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[12:53:18] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[12:56:39] <wikibugs>	 (03PS3) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[12:58:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73548 and previous config saved to /var/cache/conftool/dbconfig/20250225-125813-root.json
[12:58:33] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=5; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[12:58:46] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=5; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
[12:59:19] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=10; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[12:59:20] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-production.php: Disable writes on es6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122570
[12:59:26] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/weight=10; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
[12:59:49] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=inactive; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[13:00:04] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[13:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1300)
[13:00:11] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
[13:00:14] <logmsgbot>	 !log marostegui@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122533|db-production.php: Disable writes on es6 (T376905)]] (duration: 10m 01s)
[13:00:49] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
[13:01:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set es2037 with weight 0 T387211', diff saved to https://phabricator.wikimedia.org/P73549 and previous config saved to /var/cache/conftool/dbconfig/20250225-130138-root.json
[13:01:42] <stashbot>	 T387211: Switchover es6 master (es2035 -> es2037) - https://phabricator.wikimedia.org/T387211
[13:02:22] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote es2037 to es6 master [puppet] - 10https://gerrit.wikimedia.org/r/1122568 (https://phabricator.wikimedia.org/T387211) (owner: 10Gerrit maintenance bot)
[13:03:21] <marostegui>	 !log Starting es6 codfw failover from es2035 to es2037 - T387211
[13:03:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:47] <wikibugs>	 (03PS2) 10Jelto: Build helm3.17 with new upstream version [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984)
[13:03:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2037 to es6 primary T387211', diff saved to https://phabricator.wikimedia.org/P73550 and previous config saved to /var/cache/conftool/dbconfig/20250225-130348-root.json
[13:04:25] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update es6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1122569 (https://phabricator.wikimedia.org/T387211) (owner: 10Gerrit maintenance bot)
[13:04:38] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[13:06:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2035 T387211', diff saved to https://phabricator.wikimedia.org/P73551 and previous config saved to /var/cache/conftool/dbconfig/20250225-130619-root.json
[13:06:35] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[13:06:58] <wikibugs>	 06SRE, 10Maps, 06Traffic: Allow Wikimedia Maps usage on schoolwiki.in - https://phabricator.wikimedia.org/T383210#10578780 (10Gnoeee) >>! In T383210#10575269, @ssingh wrote: > @Gnoeee: This has been rolled out and should now be live. Please feel free to re-open this task if there are any issues. Thank yo...
[13:07:00] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for es2035.codfw.wmnet
[13:07:52] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db-production.php: Disable writes on es6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122570 (owner: 10Marostegui)
[13:08:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73552 and previous config saved to /var/cache/conftool/dbconfig/20250225-130821-root.json
[13:08:25] <wikibugs>	 (03PS3) 10Slyngshede: Ensure that the LDAP user is parsed as an Entry object. [software/bitu] - 10https://gerrit.wikimedia.org/r/1122562 (https://phabricator.wikimedia.org/T385947)
[13:08:46] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-production.php: Disable writes on es6" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122570 (owner: 10Marostegui)
[13:09:22] <logmsgbot>	 !log marostegui@deploy2002 Started scap sync-world: Backport for [[gerrit:1122570|Revert "db-production.php: Disable writes on es6"]]
[13:09:53] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudgw100[12] - https://phabricator.wikimedia.org/T386810#10578798 (10Andrew) >>! In T386810#10577268, @VRiley-WMF wrote: > Unracked and removed the following servers. However, the script has failed and returning...
[13:11:04] <wikibugs>	 (03CR) 10Mvolz: [C:03+1] trafficserver: roll restbaseless citoid out to group0 wikis [puppet] - 10https://gerrit.wikimedia.org/r/1122542 (https://phabricator.wikimedia.org/T361576) (owner: 10Hnowlan)
[13:13:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73553 and previous config saved to /var/cache/conftool/dbconfig/20250225-131318-root.json
[13:13:31] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2035.codfw.wmnet
[13:15:44] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:1122570|Revert "db-production.php: Disable writes on es6"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:15:54] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[13:17:21] <wikibugs>	 (03CR) 10David Caro: "LGTM, though I'm quite unfamiliar with gnmic so might want input from someone else too. The syntax looks ok (matching the other `event-str" [puppet] - 10https://gerrit.wikimedia.org/r/1122543 (https://phabricator.wikimedia.org/T372457) (owner: 10Cathal Mooney)
[13:22:45] <logmsgbot>	 !log marostegui@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122570|Revert "db-production.php: Disable writes on es6"]] (duration: 13m 23s)
[13:23:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73554 and previous config saved to /var/cache/conftool/dbconfig/20250225-132326-root.json
[13:28:01] <wikibugs>	 (03PS1) 10Vgutierrez: lvs_realserver: Wait at least for two consecutive MSS errors [alerts] - 10https://gerrit.wikimedia.org/r/1122575
[13:28:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73555 and previous config saved to /var/cache/conftool/dbconfig/20250225-132823-root.json
[13:30:04] <wikibugs>	 (03CR) 10Vgutierrez: "I'd rather be consistent and include FB ranges and friends" [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[13:32:51] <wikibugs>	 (03PS3) 10Jelto: Build helm3.17 with new upstream version [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984)
[13:33:44] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: When executing cli scripts, wait for the service mesh [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122578 (https://phabricator.wikimedia.org/T387208)
[13:33:54] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1236 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/1122579 (https://phabricator.wikimedia.org/T387216)
[13:35:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db1236 with weight 0 T387216', diff saved to https://phabricator.wikimedia.org/P73556 and previous config saved to /var/cache/conftool/dbconfig/20250225-133500-marostegui.json
[13:35:04] <stashbot>	 T387216: Switchover s7 master (db1181 -> db1236) - https://phabricator.wikimedia.org/T387216
[13:35:14] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T387216
[13:35:20] <wikibugs>	 (03CR) 10JMeybohm: mediawiki: introduce feature flags (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1116639 (owner: 10Giuseppe Lavagetto)
[13:36:40] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1236 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/1122579 (https://phabricator.wikimedia.org/T387216) (owner: 10Gerrit maintenance bot)
[13:37:22] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:38:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73557 and previous config saved to /var/cache/conftool/dbconfig/20250225-133831-root.json
[13:39:31] <wikibugs>	 (03PS1) 10Effie Mouzeli: trafficserver: re-enable cookie-enrolled traffic to 8.1 [puppet] - 10https://gerrit.wikimedia.org/r/1122584 (https://phabricator.wikimedia.org/T383845)
[13:39:34] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:40:59] <wikibugs>	 (03PS1) 10Effie Mouzeli: Re-enable cookie-based enrollment in 8.1 at 50% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122585 (https://phabricator.wikimedia.org/T385395)
[13:41:24] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122242 (https://phabricator.wikimedia.org/T386969) (owner: 10Eevans)
[13:42:32] <marostegui>	 !log Starting s7 eqiad failover from db1181 to db1236 - T387216
[13:42:32] <wikibugs>	 (03PS2) 10Effie Mouzeli: trafficserver: re-enable cookie-enrolled traffic to 8.1 [puppet] - 10https://gerrit.wikimedia.org/r/1122584 (https://phabricator.wikimedia.org/T383845)
[13:42:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:35] <stashbot>	 T387216: Switchover s7 master (db1181 -> db1236) - https://phabricator.wikimedia.org/T387216
[13:42:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db1236 to s7 primary T387216', diff saved to https://phabricator.wikimedia.org/P73558 and previous config saved to /var/cache/conftool/dbconfig/20250225-134256-marostegui.json
[13:43:21] <wikibugs>	 (03CR) 10MVernon: [C:03+1] restbase: upgrade cluster to 'dev' (Cassandra 4.1.8) [puppet] - 10https://gerrit.wikimedia.org/r/1122242 (https://phabricator.wikimedia.org/T386969) (owner: 10Eevans)
[13:43:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73559 and previous config saved to /var/cache/conftool/dbconfig/20250225-134328-root.json
[13:43:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1181 T387216', diff saved to https://phabricator.wikimedia.org/P73560 and previous config saved to /var/cache/conftool/dbconfig/20250225-134349-marostegui.json
[13:44:20] <wikibugs>	 (03PS4) 10Jelto: Build helm3.17 with new upstream version [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984)
[13:45:32] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for db1181.eqiad.wmnet
[13:45:58] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: gNMIc connection not working for cloudsw2-d5-eqiad - https://phabricator.wikimedia.org/T387018#10578920 (10cmooney) >>! In T387018#10574426, @ayounsi wrote: > Enabling traceoptions shows a `no shared cipher` error on the switch : > ` > Feb 24 09:33:58 ssl_transp...
[13:46:30] <wikibugs>	 (03CR) 10Eevans: [C:03+2] restbase: upgrade cluster to 'dev' (Cassandra 4.1.8) [puppet] - 10https://gerrit.wikimedia.org/r/1122242 (https://phabricator.wikimedia.org/T386969) (owner: 10Eevans)
[13:48:43] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] mediawiki-common: introduce chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117547 (owner: 10Giuseppe Lavagetto)
[13:48:52] <wikibugs>	 (03CR) 10JMeybohm: Add a mediawiki-common release to mw-script (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1117548 (owner: 10Giuseppe Lavagetto)
[13:51:15] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[13:52:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1181.eqiad.wmnet
[13:52:55] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-(api-int|jobrunner|parsoid): resume php8.1 rollout [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122587 (https://phabricator.wikimedia.org/T383845)
[13:53:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73561 and previous config saved to /var/cache/conftool/dbconfig/20250225-135336-root.json
[13:57:53] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Rename text interface state values returned by GNMI to ints [puppet] - 10https://gerrit.wikimedia.org/r/1122543 (https://phabricator.wikimedia.org/T372457) (owner: 10Cathal Mooney)
[13:58:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73562 and previous config saved to /var/cache/conftool/dbconfig/20250225-135834-root.json
[13:59:15] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Rebuild index
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor My software never has bugs. It just develops random features. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1400).
[14:00:05] <jouncebot>	 LD: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:37] <LD>	 (y)
[14:01:55] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] mediawiki: CronJob name as Job label [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122563 (https://phabricator.wikimedia.org/T385709) (owner: 10Clément Goubert)
[14:03:50] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Rename text interface state values returned by GNMI to ints [puppet] - 10https://gerrit.wikimedia.org/r/1122543 (https://phabricator.wikimedia.org/T372457) (owner: 10Cathal Mooney)
[14:04:19] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Rename YAML var "evpn_bgp" to "switch_ibgp" [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[14:06:45] <wikibugs>	 (03CR) 10Ayounsi: [C:04-1] Rename YAML var "evpn_bgp" to "switch_ibgp" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[14:07:45] <Amir1>	 !log drop module_deps table in all of s5 (T385997)
[14:07:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:50] <stashbot>	 T385997: Drop module_deps table in WMF prod - https://phabricator.wikimedia.org/T385997
[14:08:04] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add 90pct to envoy recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1122589 (https://phabricator.wikimedia.org/T385693)
[14:08:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73563 and previous config saved to /var/cache/conftool/dbconfig/20250225-140841-root.json
[14:08:42] <Daimona>	 Hi LD! I'm not a deployer and therefore can't deploy your patch, but I'm here if you need any help.
[14:08:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73564 and previous config saved to /var/cache/conftool/dbconfig/20250225-140856-root.json
[14:09:25] <wikibugs>	 (03PS1) 10Brouberol: airflow-analytics-product: migrate the scheduler and the DB to Kubernetes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122591 (https://phabricator.wikimedia.org/T380623)
[14:10:30] <LD>	 Ayo Daimona, you prolly might review the commit once more, the last patchset has undone the review, I just solved the merging conflict <https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1120152> lmao
[14:10:36] <wikibugs>	 (03PS1) 10Brouberol: airflow-analytics-product: disable and remove the airflow systemd services [puppet] - 10https://gerrit.wikimedia.org/r/1122592 (https://phabricator.wikimedia.org/T380623)
[14:11:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: add 90pct to envoy recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1122589 (https://phabricator.wikimedia.org/T385693) (owner: 10Filippo Giunchedi)
[14:11:25] <wikibugs>	 (03CR) 10Brouberol: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4978/co" [puppet] - 10https://gerrit.wikimedia.org/r/1122592 (https://phabricator.wikimedia.org/T380623) (owner: 10Brouberol)
[14:11:57] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C:03+1] frwiki: Enable the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1120152 (https://phabricator.wikimedia.org/T386622) (owner: 10LD)
[14:12:09] <Daimona>	 Bien sûr, +1ed
[14:12:17] <wikibugs>	 (03Abandoned) 10Ssingh: geo-maps: put eqiad at lowest priority for T380858 [dns] - 10https://gerrit.wikimedia.org/r/1113205 (https://phabricator.wikimedia.org/T380858) (owner: 10Ssingh)
[14:12:35] <LD>	 merci
[14:12:53] <wikibugs>	 (03PS2) 10Brouberol: airflow-analytics-product: migrate the scheduler and the DB to Kubernetes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122591 (https://phabricator.wikimedia.org/T380623)
[14:14:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, see inline though" [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098) (owner: 10Fabfur)
[14:14:49] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] lvs_realserver: Wait at least for two consecutive MSS errors [alerts] - 10https://gerrit.wikimedia.org/r/1122575 (owner: 10Vgutierrez)
[14:14:57] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] mw-jobrunner: scale down [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122565 (owner: 10Hnowlan)
[14:14:59] <Daimona>	 The question is though: are there any deployers around?
[14:15:24] <LD>	 I dont think so, unfortunately
[14:15:27] <wikibugs>	 (03CR) 10Ssingh: "+1 ^" [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[14:15:30] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] lvs_realserver: Wait at least for two consecutive MSS errors [alerts] - 10https://gerrit.wikimedia.org/r/1122575 (owner: 10Vgutierrez)
[14:15:54] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] lvs_realserver: Wait at least for two consecutive MSS errors [alerts] - 10https://gerrit.wikimedia.org/r/1122575 (owner: 10Vgutierrez)
[14:16:03] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122578 (https://phabricator.wikimedia.org/T387208) (owner: 10Giuseppe Lavagetto)
[14:16:50] <kamila_>	 LD, Daimona: I can deploy if nobody else pops up
[14:17:47] <Daimona>	 Thank you Kamila, that would be very much appreciated!
[14:21:19] <kamila_>	 ok, on it :-)
[14:21:29] <LD>	 thanks :)
[14:23:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kamila@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1120152 (https://phabricator.wikimedia.org/T386622) (owner: 10LD)
[14:24:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73565 and previous config saved to /var/cache/conftool/dbconfig/20250225-142402-root.json
[14:24:25] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10579051 (10fgiunchedi) >>! In T384731#10578273, @JMeybohm wrote: >>>! In T384731#10566953, @fgiunchedi wrote: >>>>...
[14:24:31] <jinxer-wm>	 FIRING: [2x] Emergency syslog message: Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Emergency syslog message   - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message
[14:24:37] <wikibugs>	 (03Merged) 10jenkins-bot: frwiki: Enable the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1120152 (https://phabricator.wikimedia.org/T386622) (owner: 10LD)
[14:25:04] <logmsgbot>	 !log kamila@deploy2002 Started scap sync-world: Backport for [[gerrit:1120152|frwiki: Enable the CampaignEvents extension (T386622)]]
[14:25:36] <wikibugs>	 (03PS4) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[14:26:45] <wikibugs>	 (03CR) 10Elukey: "should be fixed :)" [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[14:29:31] <jinxer-wm>	 RESOLVED: [2x] Emergency syslog message: Device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet recovered from Emergency syslog message   - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message
[14:31:23] <LD>	 kamila_ thanks, looks like merged, I'm a still a bit confused tho, when do it comes on live?
[14:31:25] <logmsgbot>	 !log kamila@deploy2002 kamila, wpld: Backport for [[gerrit:1120152|frwiki: Enable the CampaignEvents extension (T386622)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:31:33] <wikibugs>	 (03PS1) 10Vgutierrez: site,hiera: Reimage lvs7007 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122597 (https://phabricator.wikimedia.org/T384477)
[14:31:37] <LD>	 nvm its still pending
[14:31:39] <wikibugs>	 (03PS3) 10Cathal Mooney: Rename YAML var "evpn_bgp" to "switch_ibgp" [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088)
[14:31:53] <wikibugs>	 (03PS2) 10Vgutierrez: site,hiera: Reimage lvs7003 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122597 (https://phabricator.wikimedia.org/T384477)
[14:31:57] <kamila_>	 LD: just got deployed to mwdebug, but I think you can't test this there, is that correct?
[14:31:58] <claime>	 LD: it's been synced to the test servers so you can test with XWD
[14:32:03] <claime>	 (if you can)
[14:32:36] <wikibugs>	 (03CR) 10Cathal Mooney: "Thanks, I set it back the way it was in latest patchset.  That template still evpn-specific so it should keep that name for now, may refac" [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[14:32:40] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122597 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[14:32:59] <wikibugs>	 (03CR) 10Cathal Mooney: Rename YAML var "evpn_bgp" to "switch_ibgp" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[14:33:22] <LD>	 well as its frwiki config I thought no test was needed, cannot test with XWD anyway
[14:33:29] <logmsgbot>	 !log kamila@deploy2002 kamila, wpld: Continuing with sync
[14:33:44] <kamila_>	 LD: ok, continuing with full deployment then
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:42] <wikibugs>	 (03CR) 10Ssingh: WIP geo-maps: deprioritize eqiad to depool traffic from it (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[14:37:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1036', diff saved to https://phabricator.wikimedia.org/P73566 and previous config saved to /var/cache/conftool/dbconfig/20250225-143749-root.json
[14:37:59] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for es1036.eqiad.wmnet
[14:39:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73567 and previous config saved to /var/cache/conftool/dbconfig/20250225-143908-root.json
[14:40:42] <logmsgbot>	 !log kamila@deploy2002 Finished scap sync-world: Backport for [[gerrit:1120152|frwiki: Enable the CampaignEvents extension (T386622)]] (duration: 15m 38s)
[14:40:46] <stashbot>	 T386622: Release CampaignEvents extension to French Wikipedia - https://phabricator.wikimedia.org/T386622
[14:40:58] <wikibugs>	 (03PS7) 10Herron: aux-k8s-ctrl codfw: apply role [puppet] - 10https://gerrit.wikimedia.org/r/1122170 (https://phabricator.wikimedia.org/T381417)
[14:40:59] <kamila_>	 LD: and done
[14:41:14] <LD>	 thanks, LGTM
[14:41:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2219', diff saved to https://phabricator.wikimedia.org/P73568 and previous config saved to /var/cache/conftool/dbconfig/20250225-144137-root.json
[14:41:46] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for db2219.codfw.wmnet
[14:42:19] <kamila_>	 \o/
[14:42:20] <wikibugs>	 (03PS5) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[14:42:39] <wikibugs>	 (03CR) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[14:42:46] <wikibugs>	 (03PS6) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858)
[14:43:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P73569 and previous config saved to /var/cache/conftool/dbconfig/20250225-144341-root.json
[14:43:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.mysql.upgrade for db1186.eqiad.wmnet
[14:44:16] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1036.eqiad.wmnet
[14:45:59] <wikibugs>	 (03PS8) 10Herron: aux-k8s-ctrl codfw: apply role [puppet] - 10https://gerrit.wikimedia.org/r/1122170 (https://phabricator.wikimedia.org/T381417)
[14:46:59] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2219.codfw.wmnet
[14:47:02] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looks good! Verified based on magru and existing ulsfo config." [puppet] - 10https://gerrit.wikimedia.org/r/1122597 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[14:47:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73570 and previous config saved to /var/cache/conftool/dbconfig/20250225-144722-root.json
[14:47:53] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] site,hiera: Reimage lvs7003 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122597 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[14:48:16] <wikibugs>	 (03CR) 10Ssingh: WIP geo-maps: deprioritize eqiad to depool traffic from it (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[14:48:43] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10579181 (10ayounsi) >> And what happens if peer_descr is missing or empty ? > good question, in that case the inst...
[14:49:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2219 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73571 and previous config saved to /var/cache/conftool/dbconfig/20250225-144945-root.json
[14:49:54] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+2] pool.py: Add basic typing to allow mypy checks [cookbooks] - 10https://gerrit.wikimedia.org/r/1122099 (https://phabricator.wikimedia.org/T383760) (owner: 10Federico Ceratto)
[14:50:28] <wikibugs>	 (03CR) 10Federico Ceratto: [V:03+2 C:03+2] pool.py: Add basic typing to allow mypy checks [cookbooks] - 10https://gerrit.wikimedia.org/r/1122099 (https://phabricator.wikimedia.org/T383760) (owner: 10Federico Ceratto)
[14:50:35] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bookworm
[14:51:13] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1186.eqiad.wmnet
[14:51:48] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Index rebuild
[14:53:21] <hnowlan>	 jouncebot: nowandnext
[14:53:21] <jouncebot>	 For the next 0 hour(s) and 6 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1400)
[14:53:21] <jouncebot>	 In 1 hour(s) and 6 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1600)
[14:54:02] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mw-jobrunner: scale down [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122565 (owner: 10Hnowlan)
[14:54:07] <wikibugs>	 (03PS1) 10Fabfur: benthos: fix schema name [puppet] - 10https://gerrit.wikimedia.org/r/1122603 (https://phabricator.wikimedia.org/T329332)
[14:54:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73572 and previous config saved to /var/cache/conftool/dbconfig/20250225-145413-root.json
[14:54:33] <icinga-wm>	 PROBLEM - BGP status on asw1-b3-magru.mgmt is CRITICAL: BGP CRITICAL - AS64600/IPv4: Connect - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:56:07] <wikibugs>	 (03Merged) 10jenkins-bot: mw-jobrunner: scale down [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122565 (owner: 10Hnowlan)
[14:57:20] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] benthos: fix schema name [puppet] - 10https://gerrit.wikimedia.org/r/1122603 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[14:58:19] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] benthos: fix schema name [puppet] - 10https://gerrit.wikimedia.org/r/1122603 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[14:58:48] <wikibugs>	 (03CR) 10Federico Ceratto: clone.py: Add helper functions for later use (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1120213 (https://phabricator.wikimedia.org/T387023) (owner: 10Federico Ceratto)
[14:58:54] <wikibugs>	 (03PS1) 10Scott French: php8.1: rebuild to pick up newer php8.1 packages [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006)
[15:02:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73573 and previous config saved to /var/cache/conftool/dbconfig/20250225-150228-root.json
[15:03:16] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[15:03:57] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[15:04:10] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "lgtm!" [homer/public] - 10https://gerrit.wikimedia.org/r/1122208 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[15:04:27] <wikibugs>	 (03PS1) 10Elukey: services: Increase capacity and specs for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926)
[15:04:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2219 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73574 and previous config saved to /var/cache/conftool/dbconfig/20250225-150450-root.json
[15:06:11] <wikibugs>	 (03CR) 10Clément Goubert: "F" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926) (owner: 10Elukey)
[15:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:07:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] services: Increase capacity and specs for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926) (owner: 10Elukey)
[15:08:18] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] "I think this looks good, it is hard to test this without a real wiki creation. But for the next one (we had one last night, what a pity) w" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb)
[15:08:25] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[15:08:31] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
[15:08:48] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[15:09:01] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
[15:09:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73575 and previous config saved to /var/cache/conftool/dbconfig/20250225-150919-root.json
[15:11:48] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mwscript: do not run mesh checks when running in a loop [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122606 (https://phabricator.wikimedia.org/T387208)
[15:12:48] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
[15:14:35] <wikibugs>	 (03PS2) 10Elukey: services: Increase capacity and specs for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926)
[15:15:08] <wikibugs>	 (03CR) 10Elukey: services: Increase capacity and specs for Kartotherian (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926) (owner: 10Elukey)
[15:15:22] <wikibugs>	 (03PS62) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204)
[15:15:42] <wikibugs>	 (03CR) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[15:16:37] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
[15:16:56] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10579276 (10MatthewVernon) I looked for the first of those two files (`sudo cum...
[15:16:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[15:17:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73576 and previous config saved to /var/cache/conftool/dbconfig/20250225-151733-root.json
[15:18:02] <wikibugs>	 (03PS9) 10Herron: aux-k8s-ctrl codfw: apply role [puppet] - 10https://gerrit.wikimedia.org/r/1122170 (https://phabricator.wikimedia.org/T381417)
[15:18:02] <wikibugs>	 (03CR) 10Herron: [V:03+1] "Thx for the review! Please see a few replies inline" [puppet] - 10https://gerrit.wikimedia.org/r/1122170 (https://phabricator.wikimedia.org/T381417) (owner: 10Herron)
[15:18:49] <wikibugs>	 (03PS63) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204)
[15:19:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2219 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73577 and previous config saved to /var/cache/conftool/dbconfig/20250225-151956-root.json
[15:21:25] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote es2039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1122607 (https://phabricator.wikimedia.org/T387224)
[15:21:29] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1122608 (https://phabricator.wikimedia.org/T387224)
[15:21:41] <wikibugs>	 (03PS1) 10Marostegui: db-production.php: Disable writes on es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122609 (https://phabricator.wikimedia.org/T387224)
[15:21:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job liberica in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:23:47] <jinxer-wm>	 FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-analytics-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[15:25:32] <ottomata>	 brouberol: ^^ is this somethign DPE - SRE could help with/ or who usually responds to cert expires?
[15:26:36] <icinga-wm>	 RECOVERY - BGP status on asw1-b3-magru.mgmt is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:27:15] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: When executing cli scripts, wait for the service mesh (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122578 (https://phabricator.wikimedia.org/T387208) (owner: 10Giuseppe Lavagetto)
[15:27:28] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: When executing cli scripts, wait for the service mesh [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122578 (https://phabricator.wikimedia.org/T387208)
[15:27:38] <wikibugs>	 (03PS1) 10Volans: setup.py: revert conftool dependency [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122610
[15:28:51] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks for catching this!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122610 (owner: 10Volans)
[15:29:38] <wikibugs>	 (03CR) 10Volans: [C:03+2] setup.py: revert conftool dependency [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122610 (owner: 10Volans)
[15:31:04] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2013']
[15:31:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2013']
[15:31:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job liberica in ops@magru - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:32:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73578 and previous config saved to /var/cache/conftool/dbconfig/20250225-153239-root.json
[15:33:36] <icinga-wm>	 PROBLEM - BGP status on asw1-b3-magru.mgmt is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:34:06] <vgutierrez>	 that's expected (lvs7003 being reimaged)
[15:34:10] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10579425 (10Yann) It worked after I disabled https://commons.wikimedia.org/w/in...
[15:34:36] <icinga-wm>	 RECOVERY - BGP status on asw1-b3-magru.mgmt is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:35:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2219 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73579 and previous config saved to /var/cache/conftool/dbconfig/20250225-153501-root.json
[15:35:06] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10579426 (10Yann) >>! In T341007#10579276, @MatthewVernon wrote: > I looked for...
[15:37:12] <wikibugs>	 (03CR) 10Federico Ceratto: [V:03+2] sre.mysql.sanitize-wiki: sanitize wiki cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb)
[15:37:35] <wikibugs>	 (03CR) 10Federico Ceratto: [V:03+2 C:03+2] sre.mysql.sanitize-wiki: sanitize wiki cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb)
[15:40:12] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: revert conftool dependency [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122610 (owner: 10Volans)
[15:47:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73580 and previous config saved to /var/cache/conftool/dbconfig/20250225-154744-root.json
[15:47:56] <swfrench-wmf>	 !log reprepro include php8.1_8.1.31-1+wmf11u4 into component/php81 - T386006
[15:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:59] <stashbot>	 T386006: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006
[15:48:40] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v9.1.3 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122614
[15:48:56] <wikibugs>	 (03CR) 10Volans: [C:03+2] CHANGELOG: add changelogs for release v9.1.3 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122614 (owner: 10Volans)
[15:49:51] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Fix NIC name for liberica@magru [puppet] - 10https://gerrit.wikimedia.org/r/1122615 (https://phabricator.wikimedia.org/T384477)
[15:50:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2219 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73581 and previous config saved to /var/cache/conftool/dbconfig/20250225-155006-root.json
[15:50:08] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: Fix NIC name for liberica@magru [puppet] - 10https://gerrit.wikimedia.org/r/1122615 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[15:50:24] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Fix NIC name for liberica@magru [puppet] - 10https://gerrit.wikimedia.org/r/1122615 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[15:50:33] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] db-production.php: Disable writes on es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122609 (https://phabricator.wikimedia.org/T387224) (owner: 10Marostegui)
[15:50:50] <marostegui>	 jouncebot: next
[15:50:50] <jouncebot>	 In 0 hour(s) and 9 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1600)
[15:51:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db-production.php: Disable writes on es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122609 (https://phabricator.wikimedia.org/T387224) (owner: 10Marostegui)
[15:51:31] <swfrench-wmf>	 !log reprepro include php-apcu_5.1.23-1+wmf11u4 into component/php81 - T386006
[15:51:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:39] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T387224
[15:51:42] <stashbot>	 T387224: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T387224
[15:51:55] <wikibugs>	 (03Merged) 10jenkins-bot: db-production.php: Disable writes on es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122609 (https://phabricator.wikimedia.org/T387224) (owner: 10Marostegui)
[15:52:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set es2039 with weight 0 T387224', diff saved to https://phabricator.wikimedia.org/P73582 and previous config saved to /var/cache/conftool/dbconfig/20250225-155229-marostegui.json
[15:53:04] <logmsgbot>	 !log marostegui@deploy2002 Started scap sync-world: Backport for [[gerrit:1122609|db-production.php: Disable writes on es7 (T387224)]]
[15:56:56] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7003.magru.wmnet with OS bookworm
[15:57:21] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "let's get this one merged cause it's becoming more important now that we are migrating low traffic services to IPIP encapsulation. Nice jo" [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[15:57:43] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:1122609|db-production.php: Disable writes on es7 (T387224)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:57:47] <stashbot>	 T387224: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T387224
[15:57:47] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[15:58:03] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup201[34] - https://phabricator.wikimedia.org/T384973#10579615 (10Jhancock.wm) neither server will pxe. pxe is set, config on switches is correct. neither nic will come up. could be firmware issue again? roped papaul in this via irc
[15:58:30] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[15:58:39] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v9.1.3 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1122614 (owner: 10Volans)
[16:00:05] <jouncebot>	 jelto, arnoldokoth, and mutante: #bothumor My software never has bugs. It just develops random features. Rise for SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1600).
[16:00:13] <wikibugs>	 (03PS1) 10Volans: Upstream release v9.1.3 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122616
[16:00:27] <wikibugs>	 (03CR) 10Volans: [C:03+2] Upstream release v9.1.3 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122616 (owner: 10Volans)
[16:00:34] <Lucas_WMDE>	 LD: sorry, I was busy and didn’t look at IRC at all today
[16:00:38] <Lucas_WMDE>	 thanks kamila_ for deploying \o/
[16:00:54] <kamila_>	 sure :-)
[16:02:01] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, and 2 others: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10579629 (10Andrew) @MoritzMuehlenhoff ping, is ganeti1044 ready to be moved?
[16:04:13] <logmsgbot>	 !log marostegui@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122609|db-production.php: Disable writes on es7 (T387224)]] (duration: 11m 09s)
[16:04:17] <stashbot>	 T387224: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T387224
[16:05:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote es2039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1122607 (https://phabricator.wikimedia.org/T387224) (owner: 10Gerrit maintenance bot)
[16:06:30] <marostegui>	 !log Starting es7 codfw failover from es2038 to es2039 - T387224
[16:06:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2039 to es7 primary T387224', diff saved to https://phabricator.wikimedia.org/P73583 and previous config saved to /var/cache/conftool/dbconfig/20250225-160659-marostegui.json
[16:09:01] <wikibugs>	 (03PS1) 10ZhaoFJx: cowikimedia: Change the logo v2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122622 (https://phabricator.wikimedia.org/T386872)
[16:09:40] <XioNoX>	 !log set bgp to true on lvs6002 - T380469
[16:09:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:44] <stashbot>	 T380469: eqiad/esams/drmrs LVS: use Netbox BGP flag - https://phabricator.wikimedia.org/T380469
[16:10:23] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v9.1.3 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1122616 (owner: 10Volans)
[16:10:52] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Reimage lvs7002 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122623 (https://phabricator.wikimedia.org/T384477)
[16:10:56] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Reimage lvs7001 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122624 (https://phabricator.wikimedia.org/T384477)
[16:11:23] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] trafficserver: roll restbaseless citoid out to group0 wikis [puppet] - 10https://gerrit.wikimedia.org/r/1122542 (https://phabricator.wikimedia.org/T361576) (owner: 10Hnowlan)
[16:11:39] <vgutierrez>	 XioNoX: hmmm that's interesting.. we don't have that feature on liberica (turning off BGP entirely)
[16:11:52] <vgutierrez>	 XioNoX: is that something that we could need?
[16:12:24] <XioNoX>	 vgutierrez: you tell me :)
[16:12:28] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T387224
[16:12:33] <stashbot>	 T387224: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T387224
[16:13:09] <XioNoX>	 vgutierrez: I don't think it's needed, or we've needed that on pybal
[16:13:16] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122623 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:13:20] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122624 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:13:34] <wikibugs>	 (03PS1) 10Marostegui: Revert "mariadb: Promote es2039 to es7 master" [puppet] - 10https://gerrit.wikimedia.org/r/1122625
[16:14:13] <vgutierrez>	 XioNoX: ohhh I misread your last !log entry... I was thinking of bgp: true|false pybal config setting
[16:14:35] <wikibugs>	 10SRE-swift-storage, 06Commons, 10MediaWiki-Uploading, 07Unstewarded-production-error, 07Wikimedia-production-error: An unknown error occurred in storage backend "local-swift-eqiad" - https://phabricator.wikimedia.org/T341007#10579684 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon >>! In...
[16:14:44] <logmsgbot>	 !log hashar@deploy2002 Started deploy [integration/docroot@50f623d]: build: make Phan stricter
[16:14:50] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "mariadb: Promote es2039 to es7 master" [puppet] - 10https://gerrit.wikimedia.org/r/1122625 (owner: 10Marostegui)
[16:14:54] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [integration/docroot@50f623d]: build: make Phan stricter (duration: 00m 10s)
[16:16:41] <XioNoX>	 !log set bgp to true on lvs6001 - T380469
[16:16:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:45] <stashbot>	 T380469: eqiad/esams/drmrs LVS: use Netbox BGP flag - https://phabricator.wikimedia.org/T380469
[16:16:53] <volans>	 !log uploaded spicerack_9.1.3 to apt.wikimedia.org bullseye-wikimedia
[16:16:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:02] <hnowlan>	 !log route citoid via rest-gateway (and not restbase) for most group0 wikis 
[16:17:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:08] <XioNoX>	 !log set bgp to true on lvs6003 - T380469
[16:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set es2038 with weight 0 T387224', diff saved to https://phabricator.wikimedia.org/P73584 and previous config saved to /var/cache/conftool/dbconfig/20250225-161823-marostegui.json
[16:18:27] <stashbot>	 T387224: Switchover es7 master (es2038 -> es2039) - https://phabricator.wikimedia.org/T387224
[16:20:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2038 to es7 primary T387224', diff saved to https://phabricator.wikimedia.org/P73586 and previous config saved to /var/cache/conftool/dbconfig/20250225-162001-marostegui.json
[16:21:12] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-production.php: Disable writes on es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122626
[16:21:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 25 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464) (owner: 10Pppery)
[16:21:29] <XioNoX>	 !log set bgp to true on esams LVS - T380469
[16:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:24] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db-production.php: Disable writes on es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122626 (owner: 10Marostegui)
[16:23:18] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-production.php: Disable writes on es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122626 (owner: 10Marostegui)
[16:23:52] <logmsgbot>	 !log marostegui@deploy2002 Started scap sync-world: Backport for [[gerrit:1122626|Revert "db-production.php: Disable writes on es7"]]
[16:26:39] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thank you, effie! I can merge and deploy this during my day today." [puppet] - 10https://gerrit.wikimedia.org/r/1122584 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[16:29:00] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:1122626|Revert "db-production.php: Disable writes on es7"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[16:29:39] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[16:30:30] <XioNoX>	 !log set bgp to true on eqiad LVS - T380469
[16:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:36] <stashbot>	 T380469: eqiad/esams/drmrs LVS: use Netbox BGP flag - https://phabricator.wikimedia.org/T380469
[16:30:43] <icinga-wm>	 PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[16:31:05] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looks good, checked asw1-b4-magru gateway." [puppet] - 10https://gerrit.wikimedia.org/r/1122623 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:31:24] <wikibugs>	 (03CR) 10Scott French: "Thanks, effie! Agreed that we should be able to get back to where we were quite quickly, since the risk of surprises from the PCRE2 upgrad" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122585 (https://phabricator.wikimedia.org/T385395) (owner: 10Effie Mouzeli)
[16:33:46] <wikibugs>	 (03PS1) 10Elukey: profile::dns::auth::discovery-map: prefer codfw over eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1122627 (https://phabricator.wikimedia.org/T380858)
[16:33:55] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: Reimage lvs7001 as liberica LB [puppet] - 10https://gerrit.wikimedia.org/r/1122624 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:34:32] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-2] "do not merge till 2025-02-26" [puppet] - 10https://gerrit.wikimedia.org/r/1122623 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:34:37] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-2] "do not merge till 2025-02-26" [puppet] - 10https://gerrit.wikimedia.org/r/1122624 (https://phabricator.wikimedia.org/T384477) (owner: 10Vgutierrez)
[16:34:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10579751 (10phaultfinder)
[16:34:57] <icinga-wm>	 PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[16:35:37] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10579756 (10cmooney) Myself and Jenn went on a call with Brooke, Saju and some of the other Nokia technical folks.  They couldn'...
[16:35:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2039', diff saved to https://phabricator.wikimedia.org/P73587 and previous config saved to /var/cache/conftool/dbconfig/20250225-163543-marostegui.json
[16:35:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73588 and previous config saved to /var/cache/conftool/dbconfig/20250225-163556-root.json
[16:36:13] <logmsgbot>	 !log marostegui@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122626|Revert "db-production.php: Disable writes on es7"]] (duration: 12m 20s)
[16:36:20] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thank you, effie!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122587 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[16:39:57] <icinga-wm>	 RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[16:40:20] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] services: Increase capacity and specs for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122605 (https://phabricator.wikimedia.org/T386926) (owner: 10Elukey)
[16:40:43] <icinga-wm>	 RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[16:43:02] <wikibugs>	 (03PS2) 10Fabfur: workaround for T256098 [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098)
[16:43:05] <wikibugs>	 (03CR) 10Fabfur: workaround for T256098 (031 comment) [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098) (owner: 10Fabfur)
[16:45:26] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looks good. To be extra sure, we can of course quickly verify the intended output once eqiad is depooled and this patch is merged in." [puppet] - 10https://gerrit.wikimedia.org/r/1122627 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[16:45:59] <logmsgbot>	 !log volans@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: test
[16:47:15] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] "nits: I would suggest that this deserves a version of 8.1.34-1-s2 and mention the pcre2 version we have built against, given we are using " [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[16:48:41] <wikibugs>	 (03PS6) 10Jelto: Build helm3.17 with new upstream version [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984)
[16:48:47] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.099e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[16:53:38] <wikibugs>	 (03CR) 10Vgutierrez: workaround for T256098 (031 comment) [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098) (owner: 10Fabfur)
[16:54:07] <vgutierrez>	 ^^ is that expected'
[16:54:45] <wikibugs>	 (03CR) 10Jelto: "one question in-line to @mmuhlenhoff@wikimedia.org regarding packages from backports." [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984) (owner: 10Jelto)
[16:56:11] <wikibugs>	 (03PS3) 10Fabfur: workaround for T256098 [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098)
[16:58:00] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[17:00:04] <jouncebot>	 jhathaway and rzl: #bothumor I � Unicode. All rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:54] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] workaround for T256098 [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098) (owner: 10Fabfur)
[17:01:48] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[17:02:31] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for backup1013 - jclark@cumin1002"
[17:02:38] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for backup1013 - jclark@cumin1002"
[17:02:38] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:03:10] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[17:03:36] <wikibugs>	 (03Abandoned) 10Elukey: WIP geo-maps: deprioritize eqiad to depool traffic from it [dns] - 10https://gerrit.wikimedia.org/r/1122545 (https://phabricator.wikimedia.org/T380858) (owner: 10Elukey)
[17:03:48] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host backup1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:04:03] <logmsgbot>	 !log dzahn@cumin1002 DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on vrts2002.codfw.wmnet with reason: znuny upgrade
[17:04:09] <logmsgbot>	 !log dzahn@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on vrts2002.codfw.wmnet with reason: znuny upgrade
[17:05:00] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Hugh! These numbers look good, projecting from the last ~ week of usage." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122561 (https://phabricator.wikimedia.org/T380858) (owner: 10Hnowlan)
[17:09:21] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for backup1014 - jclark@cumin1002"
[17:09:26] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for backup1014 - jclark@cumin1002"
[17:09:26] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:10:01] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host backup1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:11:35] <logmsgbot>	 !log dzahn@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on vrts1003.eqiad.wmnet with reason: znuny upgrade
[17:14:19] <wikibugs>	 (03PS1) 10Ssingh: wikimedia-dns.org: add test TYPE65 record [dns] - 10https://gerrit.wikimedia.org/r/1122630
[17:14:21] <fabfur>	 !log temp disabling puppet on cp4050 to test benthos configuration (T329332)
[17:14:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner/canary at codfw: 9.375% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All&var-release=canary - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:15:58] <wikibugs>	 (03PS2) 10Scott French: php8.1: rebuild to pick up newer php8.1 packages [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006)
[17:16:13] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] wikimedia-dns.org: add test TYPE65 record [dns] - 10https://gerrit.wikimedia.org/r/1122630 (owner: 10Ssingh)
[17:16:22] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[17:16:59] <wikibugs>	 (03PS3) 10Scott French: php8.1: rebuild to pick up newer php8.1 packages [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006)
[17:17:19] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[17:18:15] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] mw-api-ext, mw-web: right-size clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122561 (https://phabricator.wikimedia.org/T380858) (owner: 10Hnowlan)
[17:18:17] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[17:18:42] <wikibugs>	 (03CR) 10Scott French: [V:03+2] "Verified the expected packages are installed when building locally." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[17:18:57] <swfrench-wmf>	 jouncebot: nowandnext
[17:18:57] <jouncebot>	 For the next 0 hour(s) and 41 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1700)
[17:18:58] <jouncebot>	 In 0 hour(s) and 41 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1800)
[17:19:04] <wikibugs>	 (03PS1) 10Ssingh: Revert "wikimedia-dns.org: add test TYPE65 record" [dns] - 10https://gerrit.wikimedia.org/r/1122631
[17:19:59] <swfrench-wmf>	 since it does not appear that there are any puppet patches for today, I'd like to deploy in order to pick up a newer php 8.1 base image
[17:20:02] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:20:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner/canary at codfw: 6.25% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All&var-release=canary - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:22:01] <wikibugs>	 (03CR) 10Scott French: [V:03+2 C:03+2] php8.1: rebuild to pick up newer php8.1 packages [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1122604 (https://phabricator.wikimedia.org/T386006) (owner: 10Scott French)
[17:22:39] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert "wikimedia-dns.org: add test TYPE65 record" [dns] - 10https://gerrit.wikimedia.org/r/1122631 (owner: 10Ssingh)
[17:22:46] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[17:23:47] <jinxer-wm>	 FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-analytics-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[17:24:01] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:24:30] <brouberol>	 ottomata: TBH I'm not 100% sure. I'm out until tomorrow atm
[17:25:11] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[17:26:38] <wikibugs>	 (03PS1) 10Bernard Wang: Deploy Search AB test to french wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122633
[17:27:51] <swfrench-wmf>	 !log built 8.1.34-1-s2 php8.1 production images - T386006
[17:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:55] <stashbot>	 T386006: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006
[17:29:23] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1013.eqiad.wmnet with OS bookworm
[17:29:24] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[17:29:36] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10579902 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1013.eqiad.wmnet with OS bookworm
[17:29:37] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10579904 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[17:30:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10579912 (10phaultfinder)
[17:32:44] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: Use php packages built against pcre2 backport - T386006
[17:37:29] <logmsgbot>	 !log jhathaway@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
[17:37:34] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[17:40:23] <wikibugs>	 (03PS1) 10Ssingh: wikimedia-dns.org: add test TYPE65 record (take two) [dns] - 10https://gerrit.wikimedia.org/r/1122635
[17:41:50] <wikibugs>	 (03PS1) 10Elukey: kserve-inference: remove the need for the kserve container's securityContext [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122636 (https://phabricator.wikimedia.org/T369493)
[17:42:38] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] wikimedia-dns.org: add test TYPE65 record (take two) [dns] - 10https://gerrit.wikimedia.org/r/1122635 (owner: 10Ssingh)
[17:42:42] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[17:43:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kserve-inference: remove the need for the kserve container's securityContext [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122636 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey)
[17:44:39] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[17:45:27] <wikibugs>	 (03CR) 10JMeybohm: Build helm3.17 with new upstream version (031 comment) [debs/helm3] - 10https://gerrit.wikimedia.org/r/1115388 (https://phabricator.wikimedia.org/T341984) (owner: 10Jelto)
[17:47:03] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup1013.eqiad.wmnet with reason: host reimage
[17:47:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73590 and previous config saved to /var/cache/conftool/dbconfig/20250225-174737-root.json
[17:48:03] <Amir1>	 jouncebot: nowandnex
[17:48:04] <Amir1>	 jouncebot: nowandnext
[17:48:05] <jouncebot>	 For the next 0 hour(s) and 11 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1700)
[17:48:05] <jouncebot>	 In 0 hour(s) and 11 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1800)
[17:48:48] <wikibugs>	 (03PS1) 10DLynch: DiscussionTools: enable thanking comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122638 (https://phabricator.wikimedia.org/T366095)
[17:49:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner/canary at codfw: 6.25% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All&var-release=canary - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:49:22] <wikibugs>	 (03PS3) 10Ladsgroup: Remove special-casing of CentralAuth for labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1118561 (https://phabricator.wikimedia.org/T161859)
[17:49:32] <wikibugs>	 (03PS4) 10Ladsgroup: Remove special-casing of CentralAuth for labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1118561 (https://phabricator.wikimedia.org/T161859)
[17:50:04] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Remove special-casing of CentralAuth for labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1118561 (https://phabricator.wikimedia.org/T161859) (owner: 10Ladsgroup)
[17:50:26] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1013.eqiad.wmnet with reason: host reimage
[17:50:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1118561 (https://phabricator.wikimedia.org/T161859) (owner: 10Ladsgroup)
[17:50:42] <swfrench-wmf>	 Amir1: I have a deployment in progress, but you should be good to go once it completes (prod update in flight)
[17:50:45] <wikibugs>	 (03Merged) 10jenkins-bot: Remove special-casing of CentralAuth for labswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1118561 (https://phabricator.wikimedia.org/T161859) (owner: 10Ladsgroup)
[17:50:57] <Amir1>	 no worries
[17:51:08] <swfrench-wmf>	 i.e., once you hit the locked part, your backport will stop :)
[17:51:19] <swfrench-wmf>	 (until mine completes, which should be soon)
[17:51:26] <Amir1>	 no worries
[17:53:20] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[17:54:18] <wikibugs>	 (03PS1) 10Ssingh: Revert "wikimedia-dns.org: add test TYPE65 record (take two)" [dns] - 10https://gerrit.wikimedia.org/r/1122639
[17:56:16] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert "wikimedia-dns.org: add test TYPE65 record (take two)" [dns] - 10https://gerrit.wikimedia.org/r/1122639 (owner: 10Ssingh)
[17:56:27] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[17:56:46] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: Use php packages built against pcre2 backport - T386006 (duration: 26m 35s)
[17:56:51] <stashbot>	 T386006: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006
[17:57:10] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1118561|Remove special-casing of CentralAuth for labswiki (T161859)]]
[17:57:14] <stashbot>	 T161859: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859
[17:58:17] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "lgtm - per https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html" [puppet] - 10https://gerrit.wikimedia.org/r/1112011 (https://phabricator.wikimedia.org/T323754) (owner: 10Hashar)
[17:58:24] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[17:59:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner/canary at codfw: 9.375% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All&var-release=canary - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[18:00:05] <jouncebot>	 swfrench-wmf: I, the Bot under the Fountain, call upon thee, The Deployer, to do MediaWiki infrastructure (UTC late) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1800).
[18:00:06] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1118561|Remove special-casing of CentralAuth for labswiki (T161859)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[18:00:20] <swfrench-wmf>	 o/
[18:00:27] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[18:00:50] <swfrench-wmf>	 I'm done with my deployment, and thus will not need to use the infra window
[18:02:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73591 and previous config saved to /var/cache/conftool/dbconfig/20250225-180242-root.json
[18:05:41] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Allow users to sign up on Wikitech (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077048 (https://phabricator.wikimedia.org/T377074) (owner: 10Majavah)
[18:06:26] <wikibugs>	 (03Merged) 10jenkins-bot: Allow users to sign up on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077048 (https://phabricator.wikimedia.org/T377074) (owner: 10Majavah)
[18:07:10] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1118561|Remove special-casing of CentralAuth for labswiki (T161859)]] (duration: 09m 59s)
[18:07:14] <stashbot>	 T161859: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859
[18:08:05] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1077048|Allow users to sign up on Wikitech (T377074)]]
[18:08:08] <stashbot>	 T377074: Re-enable account creation on Wikitech - https://phabricator.wikimedia.org/T377074
[18:11:49] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[18:12:07] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[18:12:08] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1013.eqiad.wmnet with OS bookworm
[18:12:18] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580058 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1013.eqiad.wmnet with OS bookworm completed: - backup1013 (**PASS...
[18:13:48] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580063 (10Jclark-ctr)
[18:14:32] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, taavi: Backport for [[gerrit:1077048|Allow users to sign up on Wikitech (T377074)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[18:14:36] <stashbot>	 T377074: Re-enable account creation on Wikitech - https://phabricator.wikimedia.org/T377074
[18:15:33] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, taavi: Continuing with sync
[18:17:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73592 and previous config saved to /var/cache/conftool/dbconfig/20250225-181747-root.json
[18:19:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10580080 (10phaultfinder)
[18:19:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10580081 (10cmooney) Ok all devices are back online and reachable via SSH, all running SR Linux v24.7.2.  Tomorrow I'll try to f...
[18:21:24] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1014.eqiad.wmnet with OS bookworm
[18:21:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580096 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[18:22:10] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1077048|Allow users to sign up on Wikitech (T377074)]] (duration: 14m 05s)
[18:22:14] <stashbot>	 T377074: Re-enable account creation on Wikitech - https://phabricator.wikimedia.org/T377074
[18:22:28] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580102 (10Jclark-ctr) @papaul i am having issues with backup1014 failing grub install on sdb do you have any recommendations
[18:22:41] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[18:22:47] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580104 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[18:23:20] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[18:29:30] <wikibugs>	 (03CR) 10CDobbins: [C:03+2] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[18:30:43] <wikibugs>	 (03Merged) 10jenkins-bot: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (https://phabricator.wikimedia.org/T367204) (owner: 10CDobbins)
[18:31:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73593 and previous config saved to /var/cache/conftool/dbconfig/20250225-183134-root.json
[18:32:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73594 and previous config saved to /var/cache/conftool/dbconfig/20250225-183252-root.json
[18:36:50] <fabfur>	 !log re-enabled puppet on cp4050 (T329332)
[18:36:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:31] <wikibugs>	 (03PS1) 10Fabfur: benthos: fix header capitalization and stricter timeouts [puppet] - 10https://gerrit.wikimedia.org/r/1122644 (https://phabricator.wikimedia.org/T329332)
[18:46:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73595 and previous config saved to /var/cache/conftool/dbconfig/20250225-184640-root.json
[18:47:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73596 and previous config saved to /var/cache/conftool/dbconfig/20250225-184758-root.json
[18:48:45] <wikibugs>	 (03PS2) 10Fabfur: benthos: fix header capitalization and stricter timeouts [puppet] - 10https://gerrit.wikimedia.org/r/1122644 (https://phabricator.wikimedia.org/T329332)
[18:52:02] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] benthos: fix header capitalization and stricter timeouts [puppet] - 10https://gerrit.wikimedia.org/r/1122644 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[18:55:16] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] benthos: fix header capitalization and stricter timeouts [puppet] - 10https://gerrit.wikimedia.org/r/1122644 (https://phabricator.wikimedia.org/T329332) (owner: 10Fabfur)
[19:00:05] <jouncebot>	 dduvall and andre: Your horoscope predicts another MediaWiki train - Utc-7+Utc-0 Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T1900).
[19:01:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73597 and previous config saved to /var/cache/conftool/dbconfig/20250225-190145-root.json
[19:06:41] <wikibugs>	 (03PS1) 10Ssingh: Revert^2 "wikimedia-dns.org: add test TYPE65 record (take two)" [dns] - 10https://gerrit.wikimedia.org/r/1122645
[19:07:20] <icinga-wm>	 PROBLEM - Host cp4047 is DOWN: PING CRITICAL - Packet loss = 100%
[19:07:33] <sukhe>	 huh
[19:08:05] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=--reason,service=(cdn|ats-be)
[19:08:06] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=host down,service=(cdn|ats-be)
[19:08:09] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=--reason,service=(cdn|ats-be)
[19:08:10] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=host down,service=(cdn|ats-be)
[19:08:11] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=cp4047.ulsfo.wmnet,service=(cdn|ats-be)
[19:08:55] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert^2 "wikimedia-dns.org: add test TYPE65 record (take two)" [dns] - 10https://gerrit.wikimedia.org/r/1122645 (owner: 10Ssingh)
[19:09:02] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:09:20] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[19:10:16] <icinga-wm>	 RECOVERY - Host cp4047 is UP: PING OK - Packet loss = 0%, RTA = 71.25 ms
[19:11:00] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[19:16:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73598 and previous config saved to /var/cache/conftool/dbconfig/20250225-191650-root.json
[19:18:15] <wikibugs>	 10ops-ulsfo, 06DC-Ops: cp4047 flapped (host went down) - https://phabricator.wikimedia.org/T387238 (10ssingh) 03NEW
[19:20:17] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1014.eqiad.wmnet with OS bookworm
[19:20:25] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580336 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[19:20:37] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[19:20:47] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580337 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[19:31:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73599 and previous config saved to /var/cache/conftool/dbconfig/20250225-193155-root.json
[19:35:27] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1122608 (https://phabricator.wikimedia.org/T387224) (owner: 10Gerrit maintenance bot)
[19:36:51] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122647 (https://phabricator.wikimedia.org/T382369)
[19:36:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122647 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[19:37:35] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.44.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122647 (https://phabricator.wikimedia.org/T382369) (owner: 10TrainBranchBot)
[19:38:10] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1014.eqiad.wmnet with OS bookworm
[19:38:22] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[19:38:35] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[19:38:43] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580369 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[19:41:03] <wikibugs>	 (03PS1) 10Ssingh: wikimedia-dns.org: add test TYPE65 record (take three, in proper format) [dns] - 10https://gerrit.wikimedia.org/r/1122649
[19:42:45] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] wikimedia-dns.org: add test TYPE65 record (take three, in proper format) [dns] - 10https://gerrit.wikimedia.org/r/1122649 (owner: 10Ssingh)
[19:42:53] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:44:53] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[19:50:32] <logmsgbot>	 !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.18  refs T382369
[19:50:36] <stashbot>	 T382369: 1.44.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T382369
[19:52:25] <wikibugs>	 (03PS1) 10Ssingh: Revert "wikimedia-dns.org: add test TYPE65 record (take three, in proper format)" [dns] - 10https://gerrit.wikimedia.org/r/1122651
[19:54:29] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Revert "wikimedia-dns.org: add test TYPE65 record (take three, in proper format)" [dns] - 10https://gerrit.wikimedia.org/r/1122651 (owner: 10Ssingh)
[19:54:44] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:55:00] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:55:40] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:55:49] <wikibugs>	 (03CR) 10Eevans: [C:03+2] ml-cache: upgrade cluster to 'dev' (Cassandra 4.1.8) [puppet] - 10https://gerrit.wikimedia.org/r/1122243 (https://phabricator.wikimedia.org/T386969) (owner: 10Eevans)
[19:56:32] <wikibugs>	 (03PS1) 10Ssingh: wikimedia-dns.org: remove TYPE65 record [dns] - 10https://gerrit.wikimedia.org/r/1122653
[19:57:37] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[19:58:27] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] wikimedia-dns.org: remove TYPE65 record [dns] - 10https://gerrit.wikimedia.org/r/1122653 (owner: 10Ssingh)
[19:59:02] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[19:59:33] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[20:01:01] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[20:04:50] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[20:15:41] <wikibugs>	 (03PS1) 10Scott French: Re-enroll 5% of client sessions in PHP 8.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122655 (https://phabricator.wikimedia.org/T383845)
[20:15:41] <wikibugs>	 (03CR) 10Scott French: "Thanks for prepping the other patches!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122655 (https://phabricator.wikimedia.org/T383845) (owner: 10Scott French)
[20:17:18] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[20:19:16] <wikibugs>	 (03CR) 10Scott French: "Idb67a57b5541af9c4584d5ea6e1b9fec661ac432 proposes to start with 5%, which would then be followed soon after by this one. As noted there, " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122585 (https://phabricator.wikimedia.org/T385395) (owner: 10Effie Mouzeli)
[20:23:04] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Actually, I'm going to hold until shortly before I move the enrollment fraction forward again, given the likely presence of broken clients" [puppet] - 10https://gerrit.wikimedia.org/r/1122584 (https://phabricator.wikimedia.org/T383845) (owner: 10Effie Mouzeli)
[20:23:07] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[20:25:20] <logmsgbot>	 !log jhathaway@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
[20:25:26] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[20:30:44] <wikibugs>	 (03PS1) 10Bking: elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860)
[20:31:02] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[20:31:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[20:34:44] <wikibugs>	 (03PS1) 10Ladsgroup: Remove more wikitech specific stuff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122662
[20:41:20] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
[20:42:27] <wikibugs>	 (03PS2) 10Bking: elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860)
[20:42:38] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[20:44:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10580556 (10phaultfinder)
[20:48:03] <wikibugs>	 (03PS3) 10Bking: elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860)
[20:48:37] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[20:50:57] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1014.eqiad.wmnet with OS bookworm
[20:51:08] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[20:52:09] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[20:53:20] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:53:28] <wikibugs>	 (03PS4) 10Bking: elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860)
[20:57:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, February 25 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122622 (https://phabricator.wikimedia.org/T386872) (owner: 10ZhaoFJx)
[20:59:45] <wikibugs>	 (03PS1) 10RLazarus: deployment_server: Pass kubeConfig in helmfile state values [puppet] - 10https://gerrit.wikimedia.org/r/1122666 (https://phabricator.wikimedia.org/T378429)
[20:59:48] <wikibugs>	 (03CR) 10Bking: [C:03+2] elastic: enable perf governor, remove unused host hieradata [puppet] - 10https://gerrit.wikimedia.org/r/1122660 (https://phabricator.wikimedia.org/T386860) (owner: 10Bking)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T2100).
[21:00:05] <jouncebot>	 Pppery and ZhaoFJx: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:09] <Pppery>	 here
[21:00:19] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[21:00:27] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580621 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[21:01:11] <ZhaoFJx>	 here
[21:02:46] <wikibugs>	 (03PS2) 10Ladsgroup: Remove more wikitech specific stuff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122662
[21:04:51] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464) (owner: 10Pppery)
[21:05:40] <wikibugs>	 (03Merged) 10jenkins-bot: Add various settings for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122279 (https://phabricator.wikimedia.org/T386464) (owner: 10Pppery)
[21:06:23] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1122279|Add various settings for new wikis (T386464 T386631)]]
[21:06:28] <stashbot>	 T386464: Post-creation work for sylwiki - https://phabricator.wikimedia.org/T386464
[21:06:29] <stashbot>	 T386631: Post-creation work for satwiktionary - https://phabricator.wikimedia.org/T386631
[21:08:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.274s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:10:30] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122668
[21:11:13] <logmsgbot>	 !log ladsgroup@deploy2002 pppery, ladsgroup: Backport for [[gerrit:1122279|Add various settings for new wikis (T386464 T386631)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:11:19] <Pppery>	 looking
[21:11:27] <Amir1>	 Thanks
[21:13:16] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.085s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:14:41] <Pppery>	 Checked a few things, seems to work
[21:14:43] <Pppery>	 so proceed
[21:14:46] <Amir1>	 thanks
[21:14:48] <logmsgbot>	 !log ladsgroup@deploy2002 pppery, ladsgroup: Continuing with sync
[21:16:16] <wikibugs>	 (03CR) 10Simon04: "I'd like to learn more about this secret. 😊" [puppet] - 10https://gerrit.wikimedia.org/r/1080357 (https://phabricator.wikimedia.org/T318285) (owner: 10Simon04)
[21:16:38] <wikibugs>	 (03CR) 10Fabfur: workaround for T256098 (031 comment) [debs/benthos] - 10https://gerrit.wikimedia.org/r/1122557 (https://phabricator.wikimedia.org/T256098) (owner: 10Fabfur)
[21:19:59] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Good find and thanks for the cleanup! I totally overlooked the `helmfile` invocation when reviewing Icd8437d6a68d928c04abe1b8ed23bbc95a59d" [puppet] - 10https://gerrit.wikimedia.org/r/1122666 (https://phabricator.wikimedia.org/T378429) (owner: 10RLazarus)
[21:21:15] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.225s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:21:21] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122279|Add various settings for new wikis (T386464 T386631)]] (duration: 14m 58s)
[21:21:26] <stashbot>	 T386464: Post-creation work for sylwiki - https://phabricator.wikimedia.org/T386464
[21:21:27] <stashbot>	 T386631: Post-creation work for satwiktionary - https://phabricator.wikimedia.org/T386631
[21:22:31] <ZhaoFJx>	 Could a deployer take a look on patch 1122622? Thanks in advance :)
[21:23:20] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:23:47] <jinxer-wm>	 FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventgate-analytics-external.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[21:23:48] <Amir1>	 ZhaoFJx: about to do that
[21:24:04] <ZhaoFJx>	 thanks a lot
[21:24:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122622 (https://phabricator.wikimedia.org/T386872) (owner: 10ZhaoFJx)
[21:25:07] <wikibugs>	 (03Merged) 10jenkins-bot: cowikimedia: Change the logo v2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122622 (https://phabricator.wikimedia.org/T386872) (owner: 10ZhaoFJx)
[21:25:38] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1122622|cowikimedia: Change the logo v2 (T386872)]]
[21:25:42] <stashbot>	 T386872: Requesting logo change for co.wikimedia.org - https://phabricator.wikimedia.org/T386872
[21:25:56] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudgw100[12] - https://phabricator.wikimedia.org/T386810#10580766 (10VRiley-WMF) After trying to rerun it again, I keep getti{F58493864}ng this error (screenshot attached)   @cmooney would you have an idea what m...
[21:26:15] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.092s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:27:43] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] deployment_server: Pass kubeConfig in helmfile state values [puppet] - 10https://gerrit.wikimedia.org/r/1122666 (https://phabricator.wikimedia.org/T378429) (owner: 10RLazarus)
[21:28:33] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, zhaofjx: Backport for [[gerrit:1122622|cowikimedia: Change the logo v2 (T386872)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:28:47] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1014.eqiad.wmnet with OS bookworm
[21:28:53] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580781 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[21:28:56] <ZhaoFJx>	 checking
[21:29:15] <ZhaoFJx>	 Amir1all good
[21:29:20] <ZhaoFJx>	 Amir1 all good
[21:29:20] <volans>	 !log upgraded spicerack on the cumin hosts to v9.1.3
[21:29:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:21] <Amir1>	 thanks
[21:30:23] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup, zhaofjx: Continuing with sync
[21:31:15] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.133s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:36:50] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122622|cowikimedia: Change the logo v2 (T386872)]] (duration: 11m 12s)
[21:36:54] <stashbot>	 T386872: Requesting logo change for co.wikimedia.org - https://phabricator.wikimedia.org/T386872
[21:40:01] <wikibugs>	 (03PS1) 10Kimberly Sarabia: Add config for donate banner to be enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122671 (https://phabricator.wikimedia.org/T386767)
[21:41:15] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.125s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:41:31] <ZhaoFJx>	 Amir1 thank you for deployment
[21:42:21] <wikibugs>	 (03PS2) 10Kimberly Sarabia: Add config for donate banner to be enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122671 (https://phabricator.wikimedia.org/T386767)
[21:43:20] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1065 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:46:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.093s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:46:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1115:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1115 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:49:10] <Amir1>	 :)
[21:50:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122662 (owner: 10Ladsgroup)
[21:51:21] <wikibugs>	 (03Merged) 10jenkins-bot: Remove more wikitech specific stuff [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122662 (owner: 10Ladsgroup)
[21:51:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1115:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1115 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:51:51] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1122662|Remove more wikitech specific stuff]]
[21:54:05] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Enable CPU performance governor on Relforge, Cloudelastic, and Elasticsearch hosts - https://phabricator.wikimedia.org/T386860#10580861 (10bking) Unfortunately, I just now remembered that the Performance governor on...
[21:56:20] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1122662|Remove more wikitech specific stuff]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:56:22] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[21:59:57] <ZhaoFJx>	 Amir1 sorry for bother again, but could you help purge the caches for the logo on cowikimedia?
[21:59:58] <ZhaoFJx>	 Since looks like https://co.wikimedia.org/static/images/project-logos/cowikimedia.png is still displaying the old version, even though it looks fine on the testserver or add ?purge after the url
[22:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250225T2200)
[22:00:22] <toyofuku>	 We will be using the deploy window today!
[22:00:25] <Amir1>	 ZhaoFJx: is that the only url
[22:00:36] <Amir1>	 toyofuku: we are almost done, give us a sec
[22:01:07] <toyofuku>	 Sounds good
[22:01:11] <ZhaoFJx>	 Amir1 not sure, I only know its a file under /static
[22:01:26] <ZhaoFJx>	 there is a guide on https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Purging
[22:02:53] <Amir1>	 ZhaoFJx: done, I also did it with mobile domain, just in case
[22:02:57] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122662|Remove more wikitech specific stuff]] (duration: 11m 06s)
[22:03:09] <Amir1>	 also my wikitech stuff is now deployed
[22:04:23] <ZhaoFJx>	 Amir1 still the old image on my side somehow
[22:04:27] <A_smart_kitten>	 ZhaoFJx: i think your local machine might be caching the old version - i had to force-refresh the logo file you linked before it updated to the new version for me
[22:04:44] <logmsgbot>	 !log jhathaway@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
[22:04:47] <Amir1>	 Yeah ^
[22:04:48] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[22:04:54] <Amir1>	 ctrl + shift + r
[22:05:58] <wikibugs>	 (03CR) 10Jdlrobson: [C:04-1] Add config for donate banner to be enabled (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122671 (https://phabricator.wikimedia.org/T386767) (owner: 10Kimberly Sarabia)
[22:07:09] <ZhaoFJx>	 It doesn't work sadly... But if the image works fine on you two's end, then that's problem of my pc I guess
[22:07:38] <wikibugs>	 (03CR) 10Jdrewniak: [C:03+1] Deploy Search AB test to french wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122633 (owner: 10Bernard Wang)
[22:09:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.159s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:09:34] <toyofuku>	 Amir1: am I still waiting?
[22:12:24] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[22:12:38] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580899 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[22:13:20] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1065 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[22:14:07] <toyofuku>	 Ordinarily I would wait for an explicit handoff, but since we're a bit pressed for time and I don't see an in progress deploy, we're gonna get started
[22:14:10] <toyofuku>	 yolo
[22:14:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.167s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:15:23] <Amir1>	 toyofuku: oh I'm so sorry, after 12 hours of work, I forgot to hand over
[22:15:33] <toyofuku>	 No worries at all!!!
[22:15:40] <Amir1>	 and went for dinner
[22:15:45] <toyofuku>	 Please go rest if you can 12 hours of work sounds like approx 4 too many
[22:16:01] <toyofuku>	 12 too many if it were up to me 😪
[22:16:41] <Amir1>	 yeah, ttyl!
[22:16:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by toyofuku@deploy2002 using scap backport" [extensions/MobileFrontend] (wmf/1.44.0-wmf.17) - 10https://gerrit.wikimedia.org/r/1122254 (https://phabricator.wikimedia.org/T386735) (owner: 10Jdlrobson)
[22:16:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by toyofuku@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122633 (owner: 10Bernard Wang)
[22:17:40] <wikibugs>	 (03Merged) 10jenkins-bot: Deploy Search AB test to french wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122633 (owner: 10Bernard Wang)
[22:19:18] <toyofuku>	 While we're waiting for `gate-and-submit-wmf` - I'm listening to EoO by Bad Bunny off his latest album
[22:23:24] <toyofuku>	 Now Chimbita by Feid off Inter Shibuya
[22:25:45] <toyofuku>	 Crush ft Jorja Smith by AJ Tracey
[22:26:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 992.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:27:08] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs: Create DNS entry for one full graph host [dns] - 10https://gerrit.wikimedia.org/r/1122676 (https://phabricator.wikimedia.org/T384422)
[22:28:22] <wikibugs>	 (03Merged) 10jenkins-bot: Update ext.MobileFrontend.searchOverlay.empty hook to fire after ext.MobileFrontend.searchOverlay.open [extensions/MobileFrontend] (wmf/1.44.0-wmf.17) - 10https://gerrit.wikimedia.org/r/1122254 (https://phabricator.wikimedia.org/T386735) (owner: 10Jdlrobson)
[22:28:52] <logmsgbot>	 !log toyofuku@deploy2002 Started scap sync-world: Backport for [[gerrit:1122254|Update ext.MobileFrontend.searchOverlay.empty hook to fire after ext.MobileFrontend.searchOverlay.open (T386735)]], [[gerrit:1122633|Deploy Search AB test to french wiki]]
[22:28:56] <stashbot>	 T386735: Show empty search recommendation event is missing funnel data - https://phabricator.wikimedia.org/T386735
[22:28:58] <toyofuku>	 Perfect timing
[22:29:56] <wikibugs>	 (03PS2) 10Ryan Kemper: wdqs: add routing for legacy full graph host [puppet] - 10https://gerrit.wikimedia.org/r/1121726 (https://phabricator.wikimedia.org/T384422)
[22:31:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.213s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:31:50] <logmsgbot>	 !log toyofuku@deploy2002 bwang, toyofuku, jdlrobson: Backport for [[gerrit:1122254|Update ext.MobileFrontend.searchOverlay.empty hook to fire after ext.MobileFrontend.searchOverlay.open (T386735)]], [[gerrit:1122633|Deploy Search AB test to french wiki]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:32:00] <toyofuku>	 coordinating testing via slack - brb
[22:35:43] <toyofuku>	 continuing to hold
[22:36:16] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.101s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:38:36] <wikibugs>	 (03PS1) 10Bernard Wang: Deploy Search AB test to french wiki including eventstreams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122677
[22:41:09] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs: create new ui for wdqs legacy full [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122678 (https://phabricator.wikimedia.org/T384422)
[22:41:20] <toyofuku>	 We caught something on test servers (shoutout test servers!!)
[22:41:35] <toyofuku>	 Will likely be proceeding with the deploy, followed by another deploy to fix what we caught
[22:41:39] <inflatador>	 {◕ ◡ ◕}
[22:41:49] <toyofuku>	 Will keep the void updated as possible
[22:42:33] <toyofuku>	 While we wait, I'm listening to this really weird song: https://open.spotify.com/track/0MxPT9xJ89g4j0IleXXWwY
[22:42:40] <toyofuku>	 Wouldn't say I necessarily recommend it but it's cute
[22:42:57] <wikibugs>	 (03PS2) 10Ryan Kemper: wdqs: create new ui for wdqs legacy full [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122678 (https://phabricator.wikimedia.org/T384422)
[22:43:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.491s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:43:31] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on backup1013 - https://phabricator.wikimedia.org/T387252 (10ops-monitoring-bot) 03NEW
[22:43:32] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1013.eqiad.wmnet with OS bookworm
[22:43:38] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10580965 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1013.eqiad.wmnet with OS bookworm
[22:44:06] <wikibugs>	 (03CR) 10Ryan Kemper: "I *think* this is all that's required to set up a new UI, although this change feels a little too easy so there very well could be somethi" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122678 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[22:44:16] <wikibugs>	 (03CR) 10Bking: [C:03+1] "LGTM, probably want someone from serviceops-collab to confirm though." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1122678 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[22:45:35] <toyofuku>	 We're proceeding
[22:45:38] <logmsgbot>	 !log toyofuku@deploy2002 bwang, toyofuku, jdlrobson: Continuing with sync
[22:46:14] <toyofuku>	 This song also by ay3demi is kind of a bop: https://open.spotify.com/track/515UNMgW9krZGvvVnQ8XuD
[22:47:39] <wikibugs>	 (03CR) 10Jdrewniak: [C:03+1] Deploy Search AB test to french wiki including eventstreams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122677 (owner: 10Bernard Wang)
[22:50:53] <wikibugs>	 (03PS1) 10Bernard Wang: Deploy Search AB test to everywhere but English wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122680
[22:51:33] <toyofuku>	 As I mentioned, we'll likely be doing another deploy after this one finishes
[22:52:02] <toyofuku>	 Hopefully that's okay since nothing appears to be scheduled after this, but yell at me if it's not pls
[22:52:19] <logmsgbot>	 !log toyofuku@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122254|Update ext.MobileFrontend.searchOverlay.empty hook to fire after ext.MobileFrontend.searchOverlay.open (T386735)]], [[gerrit:1122633|Deploy Search AB test to french wiki]] (duration: 23m 26s)
[22:52:23] <stashbot>	 T386735: Show empty search recommendation event is missing funnel data - https://phabricator.wikimedia.org/T386735
[22:52:52] <wikibugs>	 (03PS2) 10Bernard Wang: Deploy Search AB test to everywhere but English wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122680 (https://phabricator.wikimedia.org/T386849)
[22:54:00] <toyofuku>	 First deploy done, second deploy starting soon
[22:54:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by toyofuku@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122677 (owner: 10Bernard Wang)
[22:55:05] <toyofuku>	 Second deploy starting NOW
[22:55:38] <wikibugs>	 (03Merged) 10jenkins-bot: Deploy Search AB test to french wiki including eventstreams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122677 (owner: 10Bernard Wang)
[22:56:05] <logmsgbot>	 !log toyofuku@deploy2002 Started scap sync-world: Backport for [[gerrit:1122677|Deploy Search AB test to french wiki including eventstreams]]
[22:59:00] <logmsgbot>	 !log toyofuku@deploy2002 toyofuku, bwang: Backport for [[gerrit:1122677|Deploy Search AB test to french wiki including eventstreams]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:59:39] <wikibugs>	 (03CR) 10Bking: [C:03+1] wdqs: Create DNS entry for one full graph host [dns] - 10https://gerrit.wikimedia.org/r/1122676 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[22:59:47] <toyofuku>	 Once again coordinating testing via slack
[22:59:58] <wikibugs>	 (03CR) 10Bking: [C:03+1] wdqs: add routing for legacy full graph host [puppet] - 10https://gerrit.wikimedia.org/r/1121726 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[23:00:02] <toyofuku>	 will keep all zero of you updated
[23:00:05] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on backup1013.eqiad.wmnet with reason: host reimage
[23:01:17] <wikibugs>	 (03CR) 10Bking: wdqs: add routing for legacy full graph host [puppet] - 10https://gerrit.wikimedia.org/r/1121726 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[23:03:21] <wikibugs>	 (03CR) 10Bking: wdqs: add routing for legacy full graph host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1121726 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[23:03:50] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1013.eqiad.wmnet with reason: host reimage
[23:04:42] <toyofuku>	 We're still in the middle of a deploy and still testing on test servers, coordinated via slack
[23:06:00] <wikibugs>	 (03CR) 10Ryan Kemper: [C:04-1] "Putting a -1 until I/we figure out the cert provisioning" [puppet] - 10https://gerrit.wikimedia.org/r/1121726 (https://phabricator.wikimedia.org/T384422) (owner: 10Ryan Kemper)
[23:08:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.256s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:08:33] <thcipriani>	 toyofuku: thanks for keeping channel up-to-date
[23:08:56] <toyofuku>	 🫡🫡
[23:10:35] <toyofuku>	 We're proceeding!
[23:10:38] <logmsgbot>	 !log toyofuku@deploy2002 toyofuku, bwang: Continuing with sync
[23:13:16] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.364s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:14:25] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Enable CPU performance governor on Relforge, Cloudelastic, and Elasticsearch hosts - https://phabricator.wikimedia.org/T386860#10581045 (10Jclark-ctr) those are racked in d4 ,f5 should not have any problems with pow...
[23:14:44] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Remove unused config variable $wgJsonConfigInterwikiPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122683
[23:15:28] <wikibugs>	 (03PS1) 10Bernard Wang: Deploy Search AB test to everywhere but English wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122684
[23:16:23] <wikibugs>	 (03Abandoned) 10Bernard Wang: Deploy Search AB test to everywhere but English wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122680 (https://phabricator.wikimedia.org/T386849) (owner: 10Bernard Wang)
[23:16:34] <wikibugs>	 (03PS2) 10Bernard Wang: Deploy Search AB test to everywhere but English wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1122684 (https://phabricator.wikimedia.org/T386849)
[23:16:50] <logmsgbot>	 !log toyofuku@deploy2002 Finished scap sync-world: Backport for [[gerrit:1122677|Deploy Search AB test to french wiki including eventstreams]] (duration: 20m 44s)
[23:17:17] <toyofuku>	 Apologies for running a bit over, but we should be done now!
[23:17:19] <toyofuku>	 Thanks all
[23:18:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 933.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:21:02] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1013.eqiad.wmnet with OS bookworm
[23:21:16] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10581067 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1013.eqiad.wmnet with OS bookworm completed: - backup1013 (**WARN...
[23:22:10] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10581071 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm executed with errors: - backup1...
[23:22:27] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host backup1014.eqiad.wmnet with OS bookworm
[23:22:40] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install backup101[34] - https://phabricator.wikimedia.org/T384977#10581072 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host backup1014.eqiad.wmnet with OS bookworm
[23:26:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.11s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:27:18] <logmsgbot>	 !log jhathaway@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
[23:27:22] <stashbot>	 T381919: Supermicro: unable to set boot order after using Redfish to boot once - https://phabricator.wikimedia.org/T381919
[23:31:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.11s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:32:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.087s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:37:41] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on ms-be2088 is CRITICAL: CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T387257 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[23:37:45] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on ms-be2088 - https://phabricator.wikimedia.org/T387257 (10ops-monitoring-bot) 03NEW
[23:41:31] <jinxer-wm>	 FIRING: [3x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.311s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:46:31] <jinxer-wm>	 FIRING: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.063s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:47:00] <wikibugs>	 (03PS1) 10Cwhite: site: clean up logstash102[6789] configs [puppet] - 10https://gerrit.wikimedia.org/r/1122691 (https://phabricator.wikimedia.org/T383287)
[23:47:16] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.063s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:49:33] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash1029.eqiad.wmnet
[23:52:12] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash1028.eqiad.wmnet
[23:53:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1151:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1151 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[23:53:54] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.hosts.decommission for hosts logstash1027.eqiad.wmnet
[23:54:31] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.183s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:56:17] <logmsgbot>	 !log cwhite@cumin2002 START - Cookbook sre.dns.netbox
[23:58:45] <jinxer-wm>	 RESOLVED: [2x] MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/canary (k8s) 1.229s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:59:46] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid/main (k8s) 1.171s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded