[00:01:38] (03PS1) 10Urbanecm: Babel: Remove config that is now in community configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115518 (https://phabricator.wikimedia.org/T385239) [00:04:28] (03PS1) 10Urbanecm: Babel: Do not use a wmg variable for BabelDefaultLevel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115520 (https://phabricator.wikimedia.org/T119117) [00:04:40] !log pt1979@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [00:05:16] !log pt1979@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [00:05:16] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1252.eqiad.wmnet with OS bookworm [00:05:21] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510761 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1002 for host db1252.eqiad.wmnet with OS bookworm completed: - db1252 (**PASS**) - Rem... [00:05:47] (03CR) 10Urbanecm: [C:04-2] "let's wait a bit, in case we need to rollback the deployment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115518 (https://phabricator.wikimedia.org/T385239) (owner: 10Urbanecm) [00:06:15] !log pt1979@cumin1002 START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS bookworm [00:06:22] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510772 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1002 for host db1253.eqiad.wmnet with OS bookworm [00:07:05] !log pt1979@cumin1002 START - Cookbook sre.hosts.provision for host db1254.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [00:09:21] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510776 (10Papaul) [00:16:40] FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1146:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1146 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [00:21:53] !log pt1979@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage [00:22:58] (03PS2) 10Dwisehaupt: Add aws validation key for lp.email ssl cert generation [dns] - 10https://gerrit.wikimedia.org/r/1115494 (https://phabricator.wikimedia.org/T384931) [00:23:43] (03CR) 10Dwisehaupt: Add aws validation key for lp.email ssl cert generation (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1115494 (https://phabricator.wikimedia.org/T384931) (owner: 10Dwisehaupt) [00:25:22] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage [00:30:51] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1254.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [00:32:53] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510851 (10Papaul) [00:33:54] !log pt1979@cumin1002 START - Cookbook sre.hosts.reimage for host db1254.eqiad.wmnet with OS bookworm [00:34:03] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1002 for host db1254.eqiad.wmnet with OS bookworm [00:38:08] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1115540 [00:38:08] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1115540 (owner: 10TrainBranchBot) [00:42:41] !log pt1979@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [00:42:59] !log pt1979@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [00:43:00] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS bookworm [00:43:13] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510857 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1002 for host db1253.eqiad.wmnet with OS bookworm completed: - db1253 (**PASS**) - Rem... [00:44:14] (03PS1) 10Zabe: hooks: Stop using deprecated DatabaseBlock method [extensions/TorBlock] (wmf/1.44.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1115542 [00:45:01] (03CR) 10Zabe: [C:03+2] hooks: Stop using deprecated DatabaseBlock method [extensions/TorBlock] (wmf/1.44.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1115542 (owner: 10Zabe) [00:45:08] (03CR) 10Zabe: [C:03+2] Restrict editing on ruwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115146 (https://phabricator.wikimedia.org/T382805) (owner: 10Zabe) [00:45:58] (03Merged) 10jenkins-bot: Restrict editing on ruwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115146 (https://phabricator.wikimedia.org/T382805) (owner: 10Zabe) [00:46:40] RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1146:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1146 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [00:49:31] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1115540 (owner: 10TrainBranchBot) [00:49:31] FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic1071-production-search-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [00:50:33] !log pt1979@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1254.eqiad.wmnet with reason: host reimage [00:50:53] (03Merged) 10jenkins-bot: hooks: Stop using deprecated DatabaseBlock method [extensions/TorBlock] (wmf/1.44.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1115542 (owner: 10Zabe) [00:51:35] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1115542|hooks: Stop using deprecated DatabaseBlock method]], [[gerrit:1115146|Restrict editing on ruwikimedia (T382805)]] [00:51:40] T382805: WMRU wiki (ru.wikimedia.org) restricting access - https://phabricator.wikimedia.org/T382805 [00:53:28] FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable [00:53:29] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: host reimage [00:54:21] !log zabe@deploy2002 zabe: Backport for [[gerrit:1115542|hooks: Stop using deprecated DatabaseBlock method]], [[gerrit:1115146|Restrict editing on ruwikimedia (T382805)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [00:56:19] !log zabe@deploy2002 zabe: Continuing with sync [00:56:56] FIRING: CalicoTyphaDown: Too few (0) calico-typha replicas running - https://wikitech.wikimedia.org/wiki/Calico#Typha" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoTyphaDown [00:56:57] FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown [01:02:43] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1115542|hooks: Stop using deprecated DatabaseBlock method]], [[gerrit:1115146|Restrict editing on ruwikimedia (T382805)]] (duration: 11m 08s) [01:02:48] T382805: WMRU wiki (ru.wikimedia.org) restricting access - https://phabricator.wikimedia.org/T382805 [01:04:38] (03CR) 10BCornwall: [C:03+1] Add aws validation key for lp.email ssl cert generation [dns] - 10https://gerrit.wikimedia.org/r/1115494 (https://phabricator.wikimedia.org/T384931) (owner: 10Dwisehaupt) [01:08:39] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1115543 [01:09:30] (03CR) 10Andrew Bogott: [C:03+2] Revert "nova: temporary mod of live_migration rules" [puppet] - 10https://gerrit.wikimedia.org/r/1115507 (owner: 10Andrew Bogott) [01:11:03] !log pt1979@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [01:12:05] !log pt1979@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002" [01:12:06] !log pt1979@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1254.eqiad.wmnet with OS bookworm [01:12:13] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510891 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1002 for host db1254.eqiad.wmnet with OS bookworm completed: - db1254 (**PASS**) - Rem... [01:13:55] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510895 (10Papaul) [01:14:32] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10510897 (10Papaul) 05Open→03Resolved a:03Papaul @Marostegui this is complete [01:19:59] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install franio100[1-3] - https://phabricator.wikimedia.org/T367820#10510915 (10Papaul) @VRiley-WMF these servers are already racked what left to do is just connect the mgmt cable and 2 DAC cables using the 10G nics first nic to the first s... [01:26:29] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1115543 (owner: 10TrainBranchBot) [01:46:28] PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/b6389289f85d9d4e7027f3c9d1b815b3cb57b6919da6a314d706f2fd76e433ea/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [01:50:56] (03PS1) 10Zabe: manage-dblist: strtolower() visibility input [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115558 [01:52:20] (03CR) 10Zabe: [C:03+2] manage-dblist: strtolower() visibility input [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115558 (owner: 10Zabe) [01:53:39] (03Merged) 10jenkins-bot: manage-dblist: strtolower() visibility input [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115558 (owner: 10Zabe) [02:06:28] RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [02:15:26] (03PS1) 10Zabe: Prepare kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115567 (https://phabricator.wikimedia.org/T385181) [02:22:06] (03CR) 10Zabe: [C:03+2] Prepare kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115567 (https://phabricator.wikimedia.org/T385181) (owner: 10Zabe) [02:22:52] (03Merged) 10jenkins-bot: Prepare kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115567 (https://phabricator.wikimedia.org/T385181) (owner: 10Zabe) [02:23:15] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1115567|Prepare kncwiki (T385181)]] [02:23:21] T385181: Create Wikipedia Central Kanuri - https://phabricator.wikimedia.org/T385181 [02:25:56] !log zabe@deploy2002 zabe: Backport for [[gerrit:1115567|Prepare kncwiki (T385181)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [02:26:21] !log zabe@deploy2002 zabe: Continuing with sync [02:32:25] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1115567|Prepare kncwiki (T385181)]] (duration: 09m 10s) [02:32:30] T385181: Create Wikipedia Central Kanuri - https://phabricator.wikimedia.org/T385181 [02:34:08] (03PS1) 10Zabe: Activate kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115577 (https://phabricator.wikimedia.org/T385181) [02:35:35] (03CR) 10Zabe: [C:03+2] Activate kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115577 (https://phabricator.wikimedia.org/T385181) (owner: 10Zabe) [02:36:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:37:38] (03Merged) 10jenkins-bot: Activate kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115577 (https://phabricator.wikimedia.org/T385181) (owner: 10Zabe) [02:37:58] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1115577|Activate kncwiki (T385181)]] [02:38:03] T385181: Create Wikipedia Central Kanuri - https://phabricator.wikimedia.org/T385181 [02:40:43] !log zabe@deploy2002 zabe: Backport for [[gerrit:1115577|Activate kncwiki (T385181)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [02:42:33] !log zabe@deploy2002 zabe: Continuing with sync [02:49:00] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1115577|Activate kncwiki (T385181)]] (duration: 11m 02s) [02:49:06] T385181: Create Wikipedia Central Kanuri - https://phabricator.wikimedia.org/T385181 [02:49:41] (03PS1) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115582 [02:49:41] (03CR) 10Zabe: [C:03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115582 (owner: 10Zabe) [02:50:40] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115582 (owner: 10Zabe) [03:01:42] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:04:50] (03PS1) 10Zabe: configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) [03:05:35] (03CR) 10CI reject: [V:04-1] configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) (owner: 10Zabe) [03:06:45] (03PS2) 10Zabe: configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) [03:07:26] (03CR) 10CI reject: [V:04-1] configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) (owner: 10Zabe) [03:08:01] (03PS3) 10Zabe: configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) [03:10:15] (03CR) 10Zabe: [C:03+2] configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) (owner: 10Zabe) [03:10:58] (03Merged) 10jenkins-bot: configure kncwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115590 (https://phabricator.wikimedia.org/T385185) (owner: 10Zabe) [03:11:34] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1115582|Update interwiki cache]], [[gerrit:1115590|configure kncwiki (T385185)]] [03:11:40] T385185: Post-creation work for kncwiki - https://phabricator.wikimedia.org/T385185 [03:14:20] !log zabe@deploy2002 zabe: Backport for [[gerrit:1115582|Update interwiki cache]], [[gerrit:1115590|configure kncwiki (T385185)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [03:15:40] !log zabe@deploy2002 zabe: Continuing with sync [03:22:03] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1115582|Update interwiki cache]], [[gerrit:1115590|configure kncwiki (T385185)]] (duration: 10m 28s) [03:22:08] T385185: Post-creation work for kncwiki - https://phabricator.wikimedia.org/T385185 [03:34:26] 10ops-eqiad, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T385251 (10phaultfinder) 03NEW [03:35:20] PROBLEM - Disk space on grafana2001 is CRITICAL: DISK CRITICAL - free space: / 239MiB (1% inode=36%): /tmp 239MiB (1% inode=36%): /var/tmp 239MiB (1% inode=36%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops [03:55:20] RECOVERY - Disk space on grafana2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops [04:50:54] PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS1299/IPv6: Connect - Arelion, AS1299/IPv4: Connect - Arelion https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [04:51:50] PROBLEM - Host mr1-ulsfo.oob is DOWN: PING CRITICAL - Packet loss = 100% [04:52:07] FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic1096-production-search-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [04:53:28] FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable [04:56:52] RECOVERY - Host mr1-ulsfo.oob is UP: PING OK - Packet loss = 0%, RTA = 77.25 ms [04:56:56] FIRING: CalicoTyphaDown: Too few (0) calico-typha replicas running - https://wikitech.wikimedia.org/wiki/Calico#Typha" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoTyphaDown [04:56:58] FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown [05:47:17] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance [06:08:02] (03PS1) 10Marostegui: check_private_data_report: Add Federico's email. [puppet] - 10https://gerrit.wikimedia.org/r/1115654 [06:10:01] !log Deploy schema change on s5 codfw with replication enabled dbmaint T385183 [06:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:06] T385183: Run the T383561 ALTERs on wikifunctionswiki - https://phabricator.wikimedia.org/T385183 [06:10:49] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance [06:30:23] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q2:rack/setup/install db125[0-4] - https://phabricator.wikimedia.org/T380083#10511085 (10Marostegui) Thank you! [06:31:18] (03PS1) 10KartikMistry: Update Recommendation API to 2025-01-17-163931-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115668 (https://phabricator.wikimedia.org/T381888) [06:38:34] (03PS1) 10Marostegui: Revert "db2172: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1115675 [06:39:18] (03CR) 10Marostegui: [C:03+2] Revert "db2172: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1115675 (owner: 10Marostegui) [07:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250131T0700) [07:24:11] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1115444 (https://phabricator.wikimedia.org/T350794) (owner: 10AOkoth) [07:30:53] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS bookworm [07:30:58] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update remaining Ganeti servers in codfw to Bookworm - https://phabricator.wikimedia.org/T382508#10511134 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2029.codfw.wmnet with OS bookworm [07:35:20] PROBLEM - Disk space on grafana2001 is CRITICAL: DISK CRITICAL - free space: / 237MiB (1% inode=36%): /tmp 237MiB (1% inode=36%): /var/tmp 237MiB (1% inode=36%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops [07:46:29] (03CR) 10AOkoth: [C:03+2] os_updates: open up rsync to staging kube pods [puppet] - 10https://gerrit.wikimedia.org/r/1115444 (https://phabricator.wikimedia.org/T350794) (owner: 10AOkoth) [07:52:32] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage [07:55:20] RECOVERY - Disk space on grafana2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops [07:56:20] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250131T0800) [08:01:29] (03CR) 10Jelto: [C:03+1] kubernetes-publish-sa-cert: Don't fail when no certs in etcd [puppet] - 10https://gerrit.wikimedia.org/r/1115433 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [08:11:28] 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10511163 (10fgiunchedi) Yes we would be able to [[ https://prometheus.io/docs/alerting/latest/configuration/#inhibi... [08:14:43] (03CR) 10Brouberol: [V:03+1 C:03+2] Update the airflow deb version to downgrade confluent-kafka [puppet] - 10https://gerrit.wikimedia.org/r/1115515 (owner: 10Brouberol) [08:16:15] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS bookworm [08:16:25] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update remaining Ganeti servers in codfw to Bookworm - https://phabricator.wikimedia.org/T382508#10511166 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2029.codfw.wmnet with OS bookworm completed: - ganeti202... [08:16:42] (03CR) 10Elukey: "I rechecked the knative PR and it relates to https://github.com/knative/serving/pull/13398, that is probably worth to backport as well.." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1115394 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey) [08:22:47] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet [08:31:12] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet [08:33:42] (03PS3) 10Elukey: knative: backport patches from 1.8.x release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1115394 (https://phabricator.wikimedia.org/T369493) [08:34:04] (03CR) 10KartikMistry: [C:03+2] Update Recommendation API to 2025-01-17-163931-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115668 (https://phabricator.wikimedia.org/T381888) (owner: 10KartikMistry) [08:34:05] (03CR) 10Elukey: "Added the extra patch, rebuilt the images, they look ok!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1115394 (https://phabricator.wikimedia.org/T369493) (owner: 10Elukey) [08:34:59] !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to cluster codfw and group A [08:35:13] (03Merged) 10jenkins-bot: Update Recommendation API to 2025-01-17-163931-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115668 (https://phabricator.wikimedia.org/T381888) (owner: 10KartikMistry) [08:36:07] (03CR) 10Filippo Giunchedi: [C:03+2] Update webrequest_sampled_live turnilo config [puppet] - 10https://gerrit.wikimedia.org/r/1115457 (https://phabricator.wikimedia.org/T383900) (owner: 10Joal) [08:36:08] (03CR) 10Fabfur: benthos: send data to eventgate too (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1115113 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur) [08:37:00] !log kartik@deploy2002 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [08:37:14] (03PS5) 10Fabfur: benthos: send data to eventgate too [puppet] - 10https://gerrit.wikimedia.org/r/1115113 (https://phabricator.wikimedia.org/T383392) [08:37:47] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2029.codfw.wmnet to cluster codfw and group A [08:40:01] (03CR) 10JMeybohm: [C:03+1] "Yeah, I think it's the correct way to ensure the mw instance gets to see the 'right' host." [puppet] - 10https://gerrit.wikimedia.org/r/1115416 (https://phabricator.wikimedia.org/T384329) (owner: 10Brouberol) [08:41:53] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1115113 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur) [08:42:59] (03CR) 10Brouberol: [C:03+2] "Thanks to the both of you!" [puppet] - 10https://gerrit.wikimedia.org/r/1115416 (https://phabricator.wikimedia.org/T384329) (owner: 10Brouberol) [08:43:01] 06SRE, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Add x-analytics nocookie=1 and x-tls-sess to webrequest-sampled-live stream - https://phabricator.wikimedia.org/T383900#10511213 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done! Fields show up in the kafka topic... [08:43:46] (03CR) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (owner: 10JMeybohm) [08:50:58] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update remaining Ganeti servers in codfw to Bookworm - https://phabricator.wikimedia.org/T382508#10511228 (10MoritzMuehlenhoff) [08:51:43] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update remaining Ganeti servers in codfw to Bookworm - https://phabricator.wikimedia.org/T382508#10511229 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All Ganeti nodes in codfw have been upgraded to Bookworm (and also migrated to nftab... [08:52:07] FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic1096-production-search-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [08:52:48] !log rebalance codfw/A following OS updates T382508 [08:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:53] T382508: Update remaining Ganeti servers in codfw to Bookworm - https://phabricator.wikimedia.org/T382508 [08:53:28] FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable [08:56:56] FIRING: CalicoTyphaDown: Too few (0) calico-typha replicas running - https://wikitech.wikimedia.org/wiki/Calico#Typha" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoTyphaDown [08:56:58] FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown [09:05:30] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [09:05:35] (03PS1) 10Jelto: trafficserver: switch query-main and query service to wikikube [puppet] - 10https://gerrit.wikimedia.org/r/1115766 (https://phabricator.wikimedia.org/T350793) [09:05:53] (03PS1) 10JMeybohm: Add interative.ask_yesno [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1115767 [09:06:10] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [09:10:55] (03CR) 10CI reject: [V:04-1] Add interative.ask_yesno [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1115767 (owner: 10JMeybohm) [09:13:54] PROBLEM - BFD status on cr2-magru is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [09:14:08] PROBLEM - BFD status on cr2-eqdfw is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [09:14:54] RECOVERY - BFD status on cr2-magru is OK: UP: 3 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [09:14:54] (03PS1) 10Jcrespo: dbbackups: Decommission db2139 [puppet] - 10https://gerrit.wikimedia.org/r/1115774 (https://phabricator.wikimedia.org/T383971) [09:15:08] RECOVERY - BFD status on cr2-eqdfw is OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [09:16:39] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [09:17:16] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [09:20:29] (03PS3) 10JMeybohm: k8s.wipe-cluster: Improvements for k8s 1.31 upgrade [cookbooks] - 10https://gerrit.wikimedia.org/r/1115380 (https://phabricator.wikimedia.org/T341984) [09:21:49] (03PS1) 10Brouberol: envoy: adjust the noc config to use both SNI and a custom HTTP host header [puppet] - 10https://gerrit.wikimedia.org/r/1115776 (https://phabricator.wikimedia.org/T384329) [09:24:35] (03PS2) 10Kevin Bazira: ml-services: update article-country staging config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115359 (https://phabricator.wikimedia.org/T382295) [09:24:41] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10511260 (10phaultfinder) [09:26:21] (03CR) 10CI reject: [V:04-1] k8s.wipe-cluster: Improvements for k8s 1.31 upgrade [cookbooks] - 10https://gerrit.wikimedia.org/r/1115380 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:27:49] (03PS1) 10Brouberol: airflow-analytics: replace the mw-misc by the noc listener [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115777 (https://phabricator.wikimedia.org/T384329) [09:28:55] (03CR) 10Urbanecm: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113997 (https://phabricator.wikimedia.org/T325990) (owner: 10Sergio Gimeno) [09:29:50] (03PS8) 10JMeybohm: k8s::client: Allow for install of all kubectl versions [puppet] - 10https://gerrit.wikimedia.org/r/1114970 (https://phabricator.wikimedia.org/T341984) [09:29:50] (03PS1) 10JMeybohm: kubernetes::master: Update kubectl alternative entry [puppet] - 10https://gerrit.wikimedia.org/r/1115778 (https://phabricator.wikimedia.org/T341984) [09:32:56] (03PS1) 10Slyngshede: Switch to CAS 7.1 - Test [dns] - 10https://gerrit.wikimedia.org/r/1115780 [09:35:49] (03CR) 10Slyngshede: [C:03+2] Switch to CAS 7.1 - Test [dns] - 10https://gerrit.wikimedia.org/r/1115780 (owner: 10Slyngshede) [09:35:57] !log slyngshede@dns1004 START - running authdns-update [09:37:49] !log slyngshede@dns1004 END - running authdns-update [09:38:00] (03PS2) 10Jcrespo: dbbackups: Decommission db2139 [puppet] - 10https://gerrit.wikimedia.org/r/1115774 (https://phabricator.wikimedia.org/T383971) [09:38:01] (03PS1) 10Jelto: wikidata-query-gui: bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115786 (https://phabricator.wikimedia.org/T350793) [09:39:03] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1115778 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:39:37] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10511284 (10phaultfinder) [09:40:13] (03PS2) 10Brouberol: airflow-analytics: replace the mw-misc by the noc listener [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115777 (https://phabricator.wikimedia.org/T384329) [09:40:17] !log jynus@cumin1002 START - Cookbook sre.hosts.decommission for hosts db2139.codfw.wmnet [09:40:26] (03CR) 10Jcrespo: [C:03+2] dbbackups: Decommission db2139 [puppet] - 10https://gerrit.wikimedia.org/r/1115774 (https://phabricator.wikimedia.org/T383971) (owner: 10Jcrespo) [09:40:57] (03PS2) 10JMeybohm: Add interative.ask_yesno [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1115767 [09:41:45] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1163.eqiad.wmnet with reason: Maintenance [09:41:52] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1163 (T384592)', diff saved to https://phabricator.wikimedia.org/P72897 and previous config saved to /var/cache/conftool/dbconfig/20250131-094152-marostegui.json [09:41:57] T384592: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592 [09:43:03] (03CR) 10Jelto: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1115778 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:47:54] (03CR) 10Gkyziridis: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115359 (https://phabricator.wikimedia.org/T382295) (owner: 10Kevin Bazira) [09:49:14] (03PS1) 10Urbanecm: [Growth] Increase minimum tasks per topic to 2000 for eswiki, frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115791 (https://phabricator.wikimedia.org/T378527) [09:49:18] (03CR) 10JMeybohm: [C:03+2] kubernetes::master: Update kubectl alternative entry [puppet] - 10https://gerrit.wikimedia.org/r/1115778 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:49:22] (03CR) 10JMeybohm: [C:03+2] k8s::client: Allow for install of all kubectl versions [puppet] - 10https://gerrit.wikimedia.org/r/1114970 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:51:00] (03PS2) 10Urbanecm: [Growth] Increase minimum tasks per topic to 2000 for eswiki, frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1115791 (https://phabricator.wikimedia.org/T378527) [09:51:58] (03CR) 10Urbanecm: [C:03+1] "this has happened. let's deploy this on Monday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113984 (https://phabricator.wikimedia.org/T383714) (owner: 10Cyndywikime) [09:53:32] (03CR) 10JMeybohm: [C:03+2] kubernetes-publish-sa-cert: Don't fail when no certs in etcd [puppet] - 10https://gerrit.wikimedia.org/r/1115433 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [09:54:08] (03CR) 10Urbanecm: [C:03+1] beta: increase growth tasks lookahead size (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113997 (https://phabricator.wikimedia.org/T325990) (owner: 10Sergio Gimeno) [09:54:10] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 03 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113997 (https://phabricator.wikimedia.org/T325990) (owner: 10Sergio Gimeno) [09:54:24] (03CR) 10Urbanecm: beta: increase growth tasks lookahead size [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1113997 (https://phabricator.wikimedia.org/T325990) (owner: 10Sergio Gimeno) [10:03:56] !log jynus@cumin1002 START - Cookbook sre.dns.netbox [10:06:50] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P72898 and previous config saved to /var/cache/conftool/dbconfig/20250131-100650-marostegui.json [10:06:59] !log root@cumin1002 START - Cookbook sre.mysql.upgrade for db1246.eqiad.wmnet [10:08:06] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2238', diff saved to https://phabricator.wikimedia.org/P72899 and previous config saved to /var/cache/conftool/dbconfig/20250131-100806-marostegui.json [10:08:16] !log root@cumin1002 START - Cookbook sre.mysql.upgrade for db2238.codfw.wmnet [10:09:25] (03PS1) 10JMeybohm: Revert "kubernetes::master: Update kubectl alternative entry" [puppet] - 10https://gerrit.wikimedia.org/r/1115798 [10:09:27] (03PS1) 10JMeybohm: Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 [10:09:27] (03CR) 10AOkoth: [C:03+1] wikidata-query-gui: bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1115786 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto) [10:09:56] (03Abandoned) 10JMeybohm: Revert "kubernetes::master: Update kubectl alternative entry" [puppet] - 10https://gerrit.wikimedia.org/r/1115798 (owner: 10JMeybohm) [10:11:18] (03CR) 10JMeybohm: "Reverting for now as it makes puppet fail on deploy hosts." [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (owner: 10JMeybohm) [10:11:30] 06SRE, 10MW-on-K8s, 06serviceops: mwscript-k8s does not support short maintenance script names - https://phabricator.wikimedia.org/T385238#10511338 (10Lucas_Werkmeister_WMDE) →14Duplicate dup:03T380533 [10:11:53] (03CR) 10CI reject: [V:04-1] Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (owner: 10JMeybohm) [10:12:22] (03PS2) 10JMeybohm: Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 [10:12:40] !log root@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1246.eqiad.wmnet [10:12:43] (03PS1) 10Slyngshede: C:apereo_cas Specify encryption algorithms for CAS 7.1 [puppet] - 10https://gerrit.wikimedia.org/r/1115801 (https://phabricator.wikimedia.org/T372892) [10:13:12] (03PS3) 10JMeybohm: Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) [10:13:23] !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1246.eqiad.wmnet with reason: Index rebuild [10:13:45] !log root@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2238.codfw.wmnet [10:14:07] !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2238.codfw.wmnet with reason: Index rebuild [10:15:28] (03CR) 10CI reject: [V:04-1] Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [10:15:56] !log jynus@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2139.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002" [10:16:19] !log jynus@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2139.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002" [10:16:19] !log jynus@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [10:16:20] !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2139.codfw.wmnet [10:17:13] (03PS4) 10JMeybohm: Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) [10:19:38] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4913/co" [puppet] - 10https://gerrit.wikimedia.org/r/1115801 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede) [10:19:53] 10ops-codfw, 10Data-Persistence-Backup, 10database-backups, 06DBA, 06DC-Ops: decommission db2139 - https://phabricator.wikimedia.org/T383971#10511368 (10jcrespo) a:05jcrespo→03None [10:20:06] 10ops-codfw, 10Data-Persistence-Backup, 10database-backups, 06DBA, 06DC-Ops: decommission db2139 - https://phabricator.wikimedia.org/T383971#10511372 (10jcrespo) [10:20:19] 10ops-codfw, 10Data-Persistence-Backup, 10database-backups, 06DBA, 06DC-Ops: decommission db2139 - https://phabricator.wikimedia.org/T383971#10511373 (10jcrespo) 05In progress→03Open [10:20:27] 10ops-codfw, 10Data-Persistence-Backup, 10database-backups, 06DBA, 06DC-Ops: decommission db2139 - https://phabricator.wikimedia.org/T383971#10511375 (10jcrespo) p:05Medium→03Triage [10:20:48] (03CR) 10Clément Goubert: [C:03+1] envoy: adjust the noc config to use both SNI and a custom HTTP host header [puppet] - 10https://gerrit.wikimedia.org/r/1115776 (https://phabricator.wikimedia.org/T384329) (owner: 10Brouberol) [10:21:39] (03CR) 10Brouberol: [C:03+2] envoy: adjust the noc config to use both SNI and a custom HTTP host header [puppet] - 10https://gerrit.wikimedia.org/r/1115776 (https://phabricator.wikimedia.org/T384329) (owner: 10Brouberol) [10:24:36] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10511384 (10phaultfinder) [10:25:36] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4914/co" [puppet] - 10https://gerrit.wikimedia.org/r/1115801 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede) [10:25:37] (03CR) 10JMeybohm: [C:03+2] Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [10:26:06] (03PS5) 10JMeybohm: Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) [10:28:23] (03CR) 10Federico Ceratto: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1115654 (owner: 10Marostegui) [10:28:24] (03CR) 10Jelto: [C:03+1] Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [10:30:44] (03CR) 10JMeybohm: [C:03+2] Revert "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115799 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm) [10:31:37] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4915/co" [puppet] - 10https://gerrit.wikimedia.org/r/1115801 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede) [10:32:07] (03PS1) 10JMeybohm: Revert^2 "k8s::client: Allow for install of all kubectl versions" [puppet] - 10https://gerrit.wikimedia.org/r/1115803 [10:33:37] 06SRE, 10Deployments, 10MW-on-K8s, 06serviceops: mwscript-k8s gives Invalid value: "