[00:02:17] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:02:55] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:06:49] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:09:31] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:09:41] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:14:05] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:18:45] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:19:39] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[00:20:51] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:21:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[00:23:07] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:24:09] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:29:35] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:31:51] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:36:43] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:40:01] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (6) Blazegraph instance wdqs1004:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[00:50:17] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:50:27] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:52:03] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:52:35] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:54:11] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:56:21] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:00:51] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:08:07] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:10:13] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:12:29] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:14:57] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:16:45] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:19:19] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:23:51] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:28:07] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:34:53] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:37:11] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:40:22] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:09] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:43:57] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:44:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[01:44:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:44:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[01:44:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:45:22] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:48:49] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:48:57] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:50:41] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:50:47] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:51:05] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:57:13] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:57:29] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:57:37] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:59:47] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:01:29] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:01:49] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:03:42] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:06:13] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:08:15] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:10:45] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[02:12:29] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:14:55] <wikibugs>	 (03PS1) 10Ladsgroup: Add add_lu_attachment_method_T305300.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/776462 (https://phabricator.wikimedia.org/T305300)
[02:16:31] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:16:51] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:17:08] <wikibugs>	 (03PS2) 10Ladsgroup: Add add_lu_attachment_method_T305300.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/776462 (https://phabricator.wikimedia.org/T305300)
[02:18:39] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:23:13] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:29:51] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:31:33] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:31:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[02:31:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:31:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
[02:32:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:33:47] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:34:09] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:36:31] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:40:37] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:42:53] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:43:21] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:45:39] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:50:07] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:52:25] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:52:29] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:59:19] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:03:49] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:08:27] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:14:55] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:15:11] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:17:33] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:19:53] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:20:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[03:20:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:20:07] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[03:20:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:21:39] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:26:39] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:27:07] <icinga-wm>	 PROBLEM - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[03:29:17] <icinga-wm>	 RECOVERY - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 196 bytes in 1.018 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[03:33:33] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 691 bytes in 3.735 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:35:19] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:37:59] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:42:31] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:44:25] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:47:05] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:49:19] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:51:11] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:53:29] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[03:56:11] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:02:37] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:04:55] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:05:15] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:05:21] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:06:06] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10Aitolkyn)
[04:09:35] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:11:43] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:11:51] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:12:01] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:12:09] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:14:19] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:15:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[04:15:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:15:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[04:15:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:15:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24023 and previous config saved to /var/cache/conftool/dbconfig/20220404-041545-ladsgroup.json
[04:15:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:15:47] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[04:18:41] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:20:49] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:20:55] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:27:39] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:28:05] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:29:05] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:32:13] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:34:31] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:40:16] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: (6) Blazegraph instance wdqs1004:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[04:43:35] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:45:41] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:45:59] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:47:03] <wikibugs>	 (03PS2) 10KartikMistry: Enable Content and Section Translation for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775829 (https://phabricator.wikimedia.org/T296475)
[04:50:07] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:54:33] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[04:54:43] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:56:09] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] Add add_lu_attachment_method_T305300.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/776462 (https://phabricator.wikimedia.org/T305300) (owner: 10Ladsgroup)
[04:56:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] dbtools: Drop unused control-mariadb files [software] - 10https://gerrit.wikimedia.org/r/776235 (owner: 10Ladsgroup)
[04:57:05] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:59:01] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:03:37] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:04:22] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] dbtools: Drop unused control-mariadb files [software] - 10https://gerrit.wikimedia.org/r/776235 (owner: 10Ladsgroup)
[05:04:31] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add add_lu_attachment_method_T305300.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/776462 (https://phabricator.wikimedia.org/T305300) (owner: 10Ladsgroup)
[05:05:24] <wikibugs>	 (03Merged) 10jenkins-bot: dbtools: Drop unused control-mariadb files [software] - 10https://gerrit.wikimedia.org/r/776235 (owner: 10Ladsgroup)
[05:05:26] <wikibugs>	 (03Merged) 10jenkins-bot: Add add_lu_attachment_method_T305300.py [software/schema-changes] - 10https://gerrit.wikimedia.org/r/776462 (https://phabricator.wikimedia.org/T305300) (owner: 10Ladsgroup)
[05:08:29] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:10:27] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:11:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db1130.eqiad.wmnet with OS bullseye
[05:11:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:12:41] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:15:19] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:17:37] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:19:39] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:19:55] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:20:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24024 and previous config saved to /var/cache/conftool/dbconfig/20220404-052026-ladsgroup.json
[05:20:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:20:30] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[05:20:48] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
[05:20:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
[05:25:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:35:10] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1130: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/776335
[05:35:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24025 and previous config saved to /var/cache/conftool/dbconfig/20220404-053531-ladsgroup.json
[05:35:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:35:37] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:35:47] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:36:59] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:39:12] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1130.eqiad.wmnet with OS bullseye
[05:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:40:15] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:45:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:50:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24026 and previous config saved to /var/cache/conftool/dbconfig/20220404-055037-ladsgroup.json
[05:50:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:53] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:52:23] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:52:41] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:54:39] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:54:47] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:58:07] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:58:15] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:59:25] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:02:17] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:03:33] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:05:13] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:05:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24027 and previous config saved to /var/cache/conftool/dbconfig/20220404-060542-ladsgroup.json
[06:05:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[06:05:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[06:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:47] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[06:05:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:07:07] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:08:15] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:08:50] <wikibugs>	 (03PS7) 10Urbanecm: GrowthExperiments: Add mailing list question for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[06:09:27] <wikibugs>	 10SRE, 10Observability-Metrics: Include apache_exporter in puppet module httpd (was: apache) - https://phabricator.wikimedia.org/T187434 (10fgiunchedi)
[06:09:33] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:09:56] <wikibugs>	 10SRE, 10serviceops, 10Developer Productivity, 10Performance-Team (Radar), 10Release-Engineering-Team (Radar): Debug hosts sometimes Fatal error:  "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10fgiunchedi)
[06:10:45] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[06:13:23] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:13:36] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:15:17] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:15:29] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:17:41] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:18:35] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:21:41] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:23:07] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:23:49] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:24:06] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "overall LGTM, but I don't understand why the config is not removed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[06:26:55] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:27:36] <RhinosF1>	 jouncebot: next
[06:27:36] <jouncebot>	 In 0 hour(s) and 32 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T0700)
[06:27:43] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:28:22] <wikibugs>	 (03CR) 10Urbanecm: "actually, a q inline" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[06:29:51] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:34:51] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:37:03] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:38:33] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:44:45] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:45:33] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:47:47] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:50:29] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:54:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[06:54:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[06:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:56:25] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:57:50] <wikibugs>	 10SRE, 10Developer Productivity, 10Performance-Team (Radar), 10Release-Engineering-Team (Radar): Debug hosts sometimes Fatal error:  "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10Joe) Removing serviceops as this is not actually a production issue and is l...
[07:00:05] <jouncebot>	 Amir1, awight, Urbanecm, and taavi: (Dis)respected human, time to deploy UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T0700). Please do the needful.
[07:00:05] <jouncebot>	 kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:08] <taavi>	 o/
[07:00:28] <taavi>	 kart_: do you want to self-service?
[07:00:49] <kart_>	 taavi: Sure.
[07:00:53] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:01:03] <taavi>	 cool, just let me know when you're done
[07:01:09] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:01:39] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:01:43] <RhinosF1>	 taavi: i'm an idiot and put patches on last week
[07:01:45] <RhinosF1>	 i here
[07:01:57] <taavi>	 ok, can you move them to the correct window?
[07:02:16] <RhinosF1>	 taavi: done
[07:02:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable profile::auto_restarts::service for prometheus-atlas-exporter [puppet] - 10https://gerrit.wikimedia.org/r/775861 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[07:02:18] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Enable Content and Section Translation for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775829 (https://phabricator.wikimedia.org/T296475) (owner: 10KartikMistry)
[07:02:37] <taavi>	 thanks! I'll deploy those after kart_ is done
[07:02:45] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:03:12] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Content and Section Translation for Persian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775829 (https://phabricator.wikimedia.org/T296475) (owner: 10KartikMistry)
[07:03:16] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[07:03:26] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Use *.k8s-staging.discovery.wmnet for staging certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/776162 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:03:39] <RhinosF1>	 ty!
[07:04:49] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:05:03] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[07:06:50] <wikibugs>	 (03PS8) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240)
[07:06:52] <wikibugs>	 (03PS2) 10Kosta Harlan: GrowthExperiments: Start mailing list campaign on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775951 (https://phabricator.wikimedia.org/T303240)
[07:06:59] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[07:07:06] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Use *.k8s-staging.discovery.wmnet for staging Ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/776163 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:07:11] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 47967 bytes in 6.459 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:07:16] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove LDAP access for jrobell [puppet] - 10https://gerrit.wikimedia.org/r/776678
[07:07:51] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 14 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:08:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove LDAP access for jrobell [puppet] - 10https://gerrit.wikimedia.org/r/776678 (owner: 10Muehlenhoff)
[07:08:33] <logmsgbot>	 !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:775829|Enable Content and Section Translation for Persian Wikipedia (T296475)]] (duration: 00m 51s)
[07:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:08:36] <stashbot>	 T296475: Enable Content and Section Translation for Persian Wikipedia - https://phabricator.wikimedia.org/T296475
[07:08:52] <kart_>	 taavi: I'm done.
[07:09:09] <taavi>	 thanks!
[07:10:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:10:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:36] <wikibugs>	 (03PS3) 10Majavah: Revert "fawiki: Set new year celebration" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776329 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:10:37] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:10:41] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Revert "fawiki: Set new year celebration" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776329 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:10:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:10:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:25] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "fawiki: Set new year celebration" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776329 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:11:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:11:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:04] <taavi>	 RhinosF1: can you test the first one on mwdebug1001 please?
[07:12:24] <RhinosF1>	 taavi: lgtm
[07:12:34] <RhinosF1>	 throttle is noop so can't be tested
[07:12:52] <RhinosF1>	 fawiki old vector has gone
[07:13:02] <RhinosF1>	 back to old
[07:13:05] <taavi>	 ok, syncing
[07:13:32] <wikibugs>	 (03PS3) 10Majavah: Revert "fawiki: Set celebration logo for new vector" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776330 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:13:51] <logmsgbot>	 !log taavi@deploy1002 Synchronized wmf-config/logos.php: Config: [[gerrit:776329|Revert "fawiki: Set new year celebration" (T304314)]] (duration: 00m 51s)
[07:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:53] <stashbot>	 T304314: Requesting temporary logo change for fa.wikipedia.org - https://phabricator.wikimedia.org/T304314
[07:14:37] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:14:47] <logmsgbot>	 !log taavi@deploy1002 Synchronized logos/config.yaml: Config: [[gerrit:776329|Revert "fawiki: Set new year celebration" (T304314)]] (duration: 00m 50s)
[07:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:20] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Revert "fawiki: Set celebration logo for new vector" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776330 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:15:41] <logmsgbot>	 !log taavi@deploy1002 Synchronized static/images/project-logos: Config: [[gerrit:776329|Revert "fawiki: Set new year celebration" (T304314)]] (duration: 00m 50s)
[07:15:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:04] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "fawiki: Set celebration logo for new vector" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776330 (https://phabricator.wikimedia.org/T304314) (owner: 10RhinosF1)
[07:16:32] <wikibugs>	 (03PS9) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240)
[07:16:34] <wikibugs>	 (03PS3) 10Kosta Harlan: GrowthExperiments: Start mailing list campaign on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775951 (https://phabricator.wikimedia.org/T303240)
[07:16:35] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:16:48] <taavi>	 RhinosF1: second one available for testing too
[07:16:53] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:17:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:17:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:05] <RhinosF1>	 taavi: lgtm
[07:18:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:18:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:04] <logmsgbot>	 !log taavi@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:776330|Revert "fawiki: Set celebration logo for new vector" (T304314)]] (duration: 00m 50s)
[07:18:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:33] <wikibugs>	 (03PS1) 10JMeybohm: Include latest ingress helper update into miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/776720
[07:18:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:19:14] <logmsgbot>	 !log taavi@deploy1002 Synchronized static/images/mobile/copyright/: Config: [[gerrit:776330|Revert "fawiki: Set celebration logo for new vector" (T304314)]] (duration: 00m 49s)
[07:19:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:19:16] <wikibugs>	 (03PS3) 10Majavah: throttle: removed expired rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776332 (https://phabricator.wikimedia.org/T304836) (owner: 10RhinosF1)
[07:19:16] <stashbot>	 T304314: Requesting temporary logo change for fa.wikipedia.org - https://phabricator.wikimedia.org/T304314
[07:19:31] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] throttle: removed expired rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776332 (https://phabricator.wikimedia.org/T304836) (owner: 10RhinosF1)
[07:19:37] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 10 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:19:52] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Use *.k8s-staging.discovery.wmnet for staging certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/776162 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:20:03] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Use *.k8s-staging.discovery.wmnet for staging Ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/776163 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:20:12] <wikibugs>	 (03Merged) 10jenkins-bot: throttle: removed expired rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776332 (https://phabricator.wikimedia.org/T304836) (owner: 10RhinosF1)
[07:21:03] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:21:29] <logmsgbot>	 !log taavi@deploy1002 Synchronized wmf-config/throttle.php: Config: [[gerrit:776332|throttle: removed expired rule (T304836)]] (duration: 00m 49s)
[07:21:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:21:31] <stashbot>	 T304836: IP throttle lift request for Czech Wikigap 2022 in Brno - https://phabricator.wikimedia.org/T304836
[07:21:46] <taavi>	 ok, that should be all unless someone has a last-minute patch
[07:22:05] <RhinosF1>	 thanks taavi 
[07:22:45] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:23:40] <taavi>	 !log UTC morning deployments done
[07:23:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:23:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:23:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:24:16] <wikibugs>	 (03Merged) 10jenkins-bot: Use *.k8s-staging.discovery.wmnet for staging certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/776162 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:24:18] <wikibugs>	 (03Merged) 10jenkins-bot: Use *.k8s-staging.discovery.wmnet for staging Ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/776163 (https://phabricator.wikimedia.org/T300740) (owner: 10JMeybohm)
[07:26:17] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:28:32] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:28:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:28:34] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:14] <wikibugs>	 10SRE, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10JMeybohm)
[07:30:19] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:30:47] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:32:01] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Include latest ingress helper update into miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/776720 (owner: 10JMeybohm)
[07:32:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:32:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:33:55] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:34:49] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 15 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:35:17] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:35:57] <wikibugs>	 (03Merged) 10jenkins-bot: Include latest ingress helper update into miscweb [deployment-charts] - 10https://gerrit.wikimedia.org/r/776720 (owner: 10JMeybohm)
[07:38:21] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[07:38:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:03] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:39:04] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[07:39:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:11] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[07:39:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:41] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[07:39:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:47] <icinga-wm>	 PROBLEM - BFD status on cr2-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:40:41] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 5/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:41:17] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:41:33] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:41:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[07:41:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
[07:41:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
[07:41:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
[07:42:01] <icinga-wm>	 RECOVERY - BFD status on cr2-eqsin is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:42:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:42:46] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[07:42:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:02] <logmsgbot>	 !log jayme@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[07:43:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:15] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[07:43:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:26] <logmsgbot>	 !log jayme@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[07:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:46] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] START helmfile.d/services/miscweb: apply
[07:43:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:21] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[07:44:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:05] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:46:18] <wikibugs>	 (03PS2) 10Hashar: ci: docker system prune on ci::master [puppet] - 10https://gerrit.wikimedia.org/r/773784
[07:46:22] <wikibugs>	 (03CR) 10Hashar: "Typo fixed!" [puppet] - 10https://gerrit.wikimedia.org/r/773784 (owner: 10Hashar)
[07:47:13] <wikibugs>	 (03CR) 10Hashar: "That follows "docker: move pruning to new profile docker::prune" https://gerrit.wikimedia.org/r/c/operations/puppet/+/773641/" [puppet] - 10https://gerrit.wikimedia.org/r/773784 (owner: 10Hashar)
[07:49:01] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: cache::base: add check to netpmapper modification [puppet] - 10https://gerrit.wikimedia.org/r/773451 (https://phabricator.wikimedia.org/T302471)
[07:49:08] <wikibugs>	 (03PS1) 10Volans: interactive: catch Ctrl+c / Ctrl+d on ask_input() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776852
[07:49:10] <wikibugs>	 (03PS1) 10Volans: prometheus: add support for other instances [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853
[07:49:12] <wikibugs>	 (03PS1) 10Volans: prometheus: add support for Thanos [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854
[07:49:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: cache::base: add check to netpmapper modification (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/773451 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[07:49:43] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:54:23] <jayme>	 !log imported scap 4.6.0 to stretch-/buster-/bullseye-wikimedia - T305250
[07:54:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:25] <stashbot>	 T305250: Deploy Scap version 4.6.0 - https://phabricator.wikimedia.org/T305250
[07:57:05] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:57:17] <icinga-wm>	 PROBLEM - BFD status on cr4-ulsfo is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:59:31] <icinga-wm>	 RECOVERY - BFD status on cr4-ulsfo is OK: OK: UP: 16 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[08:00:59] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:01:33] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:01:46] <logmsgbot>	 !log jayme@deploy1002 Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
[08:01:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:00] <logmsgbot>	 !log jayme@deploy1002 Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
[08:02:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:26] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855
[08:04:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855 (owner: 10Giuseppe Lavagetto)
[08:05:25] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/4 UP : 5 v2 P2P interfaces vs. 4 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:12:51] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/2 UP : 3 v2 P2P interfaces vs. 2 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:14:29] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:15:18] <XioNoX>	 I downtimed the singtel alerts for 2 more days
[08:15:36] <XioNoX>	 they say the circuit is fixed, icinga disagree
[08:16:12] <wikibugs>	 (03PS1) 10Elukey: Add a namespace selector to helmfile_istio-proxy's config [deployment-charts] - 10https://gerrit.wikimedia.org/r/776856 (https://phabricator.wikimedia.org/T297612)
[08:18:15] <wikibugs>	 (03PS1) 10DCausse: wdqs: tune jvmquake settings (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/776857 (https://phabricator.wikimedia.org/T293862)
[08:18:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: tune jvmquake settings (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/776857 (https://phabricator.wikimedia.org/T293862) (owner: 10DCausse)
[08:19:06] <mmandere>	 !log depool cp5003 for reimage - T290005
[08:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:10] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[08:19:31] <urbanecm>	 jouncebot: nowandnext
[08:19:32] <jouncebot>	 No deployments scheduled for the next 4 hour(s) and 40 minute(s)
[08:19:32] <jouncebot>	 In 4 hour(s) and 40 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1300)
[08:19:35] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[08:19:44] <wikibugs>	 (03PS1) 10Urbanecm: Revert "cswiki: Add celebration logo for 500k" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776858
[08:19:54] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revert "cswiki: Add celebration logo for 500k" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776858 (owner: 10Urbanecm)
[08:20:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add a namespace selector to helmfile_istio-proxy's config [deployment-charts] - 10https://gerrit.wikimedia.org/r/776856 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:20:28] <wikibugs>	 (03PS2) 10DCausse: wdqs: tune jvmquake settings (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/776857 (https://phabricator.wikimedia.org/T293862)
[08:20:35] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "cswiki: Add celebration logo for 500k" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776858 (owner: 10Urbanecm)
[08:21:05] <wikibugs>	 (03PS2) 10MMandere: site: Reimage cp5003 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/775327 (https://phabricator.wikimedia.org/T290005)
[08:23:23] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Can't review the unit test but lgtm otherwise." [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776852 (owner: 10Volans)
[08:23:32] <wikibugs>	 (03PS2) 10Elukey: Add a namespace selector to helmfile_istio-proxy's config [deployment-charts] - 10https://gerrit.wikimedia.org/r/776856 (https://phabricator.wikimedia.org/T297612)
[08:23:55] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (1/3) (duration: 00m 51s)
[08:23:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:59] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp5003 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/775327 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[08:24:17] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] cache::base: add check to netpmapper modification [puppet] - 10https://gerrit.wikimedia.org/r/773451 (https://phabricator.wikimedia.org/T302471) (owner: 10Giuseppe Lavagetto)
[08:24:32] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "typos aside, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/773454 (owner: 10Giuseppe Lavagetto)
[08:24:45] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] varnish::frontend: remove normalization for parameter [puppet] - 10https://gerrit.wikimedia.org/r/773455 (owner: 10Giuseppe Lavagetto)
[08:24:46] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (2/3) (duration: 00m 50s)
[08:24:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:36] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized logos/config.yaml: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (3/3) (duration: 00m 50s)
[08:25:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:50] * urbanecm done
[08:27:45] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[08:27:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add a namespace selector to helmfile_istio-proxy's config [deployment-charts] - 10https://gerrit.wikimedia.org/r/776856 (https://phabricator.wikimedia.org/T297612) (owner: 10Elukey)
[08:28:30] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[08:28:42] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp5003.eqsin.wmnet with OS buster
[08:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[08:30:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[08:30:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:30:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:27] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:30:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24029 and previous config saved to /var/cache/conftool/dbconfig/20220404-083031-ladsgroup.json
[08:30:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:34] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[08:31:18] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp5003.eqsin.wmnet with OS buster
[08:31:30] <wikibugs>	 (03PS2) 10MMandere: site: Reimage cp6008 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/775328 (https://phabricator.wikimedia.org/T290005)
[08:31:36] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[08:31:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:38] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[08:31:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[08:31:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[08:31:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:45] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
[08:31:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:49] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
[08:31:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:09] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for zlib [puppet] - 10https://gerrit.wikimedia.org/r/776860
[08:34:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:34:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:28] <wikibugs>	 (03PS2) 10Jakob: Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901
[08:35:30] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[08:35:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:04] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] prometheus: add support for other instances (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[08:36:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for zlib [puppet] - 10https://gerrit.wikimedia.org/r/776860 (owner: 10Muehlenhoff)
[08:37:29] <mmandere>	 !log depool cp6008 for reimage - T290005
[08:37:29] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[08:37:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:31] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[08:37:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:34] <moritzm>	 !log installing flac security updates
[08:37:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:39] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[08:39:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:41] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp6008 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/775328 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[08:41:49] <wikibugs>	 (03CR) 10Volans: "replied to comment" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[08:42:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] sre: add ProbeDown paging alert for enabled services [alerts] - 10https://gerrit.wikimedia.org/r/773747 (https://phabricator.wikimedia.org/T291946) (owner: 10Filippo Giunchedi)
[08:42:10] <wikibugs>	 (03PS3) 10Filippo Giunchedi: sre: add ProbeDown paging alert for enabled services [alerts] - 10https://gerrit.wikimedia.org/r/773747 (https://phabricator.wikimedia.org/T291946)
[08:42:14] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
[08:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:21] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp6008.drmrs.wmnet with OS buster
[08:43:14] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[08:43:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:37] <wikibugs>	 (03CR) 10Volans: [C: 03+2] interactive: catch Ctrl+c / Ctrl+d on ask_input() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776852 (owner: 10Volans)
[08:45:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P24030 and previous config saved to /var/cache/conftool/dbconfig/20220404-084523-root.json
[08:45:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1130: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/776335 (owner: 10Marostegui)
[08:47:05] <wikibugs>	 (03Merged) 10jenkins-bot: interactive: catch Ctrl+c / Ctrl+d on ask_input() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776852 (owner: 10Volans)
[08:55:41] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs-backups: exclude integration-castor04, that vm has no disk image [puppet] - 10https://gerrit.wikimedia.org/r/774854 (https://phabricator.wikimedia.org/T304916) (owner: 10David Caro)
[08:55:45] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
[08:55:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[08:56:22] <moritzm>	 !log installing glibc updates from buster 10.12 point release
[08:56:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:19] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] dbbackups: Start backing up orchestrator & rename section db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/776169 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[08:58:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, nice!" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854 (owner: 10Volans)
[08:59:16] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
[08:59:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:51] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
[08:59:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:35] <wikibugs>	 (03CR) 10Jcrespo: "I will create a guide at https://wikitech.wikimedia.org/wiki/SRE/Data_Persistence/Backups/User_guides to document the procedure, as this i" [puppet] - 10https://gerrit.wikimedia.org/r/776169 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[09:01:37] <wikibugs>	 (03PS2) 10Volans: prometheus: add support for other instances [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853
[09:01:39] <wikibugs>	 (03PS2) 10Volans: prometheus: add support for Thanos [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854
[09:01:56] <wikibugs>	 (03CR) 10Volans: "addressed comment" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[09:03:20] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
[09:03:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:56] <elukey>	 7
[09:11:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus: add support for other instances [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[09:11:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus: add support for Thanos [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854 (owner: 10Volans)
[09:11:46] <wikibugs>	 (03CR) 10Volans: [C: 03+2] prometheus: add support for other instances [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[09:12:04] <moritzm>	 !log installing openssl updates from Buster 10.12 point release
[09:12:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:22] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job cache_haproxy_tls in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:15:26] <wikibugs>	 (03Merged) 10jenkins-bot: prometheus: add support for other instances [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776853 (owner: 10Volans)
[09:16:00] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[09:16:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:54] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-uk.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:17:41] <wikibugs>	 (03CR) 10Volans: [C: 03+2] prometheus: add support for Thanos [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854 (owner: 10Volans)
[09:17:50] <godog>	 jelto: re: jobunavailable above, I see gitlab and gitlab-runner failing, known/expected ?
[09:20:18] <wikibugs>	 (03Merged) 10jenkins-bot: prometheus: add support for Thanos [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776854 (owner: 10Volans)
[09:20:22] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job cache_haproxy_tls in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:23:08] <wikibugs>	 (03PS1) 10Ayounsi: Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863
[09:23:19] <jelto>	 godog: yes "expected". It's only about the two GitLab Runners which are not used publicly. Fix is in review https://gerrit.wikimedia.org/r/c/operations/puppet/+/775821
[09:24:02] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[09:24:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863 (owner: 10Ayounsi)
[09:24:27] <wikibugs>	 (03PS2) 10Ayounsi: Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863
[09:25:33] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5003.eqsin.wmnet with OS buster
[09:25:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:41] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp5003.eqsin.wmnet with OS buster com...
[09:26:27] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[09:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:47] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
[09:26:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:56] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp6008.drmrs.wmnet with OS buster com...
[09:26:57] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
[09:26:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:27] <wikibugs>	 (03PS3) 10Ayounsi: Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863
[09:28:10] <wikibugs>	 (03PS4) 10Ayounsi: Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863
[09:29:06] <godog>	 jelto: ack, thanks
[09:29:06] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
[09:29:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:56] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863 (owner: 10Ayounsi)
[09:30:11] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863 (owner: 10Ayounsi)
[09:30:49] <wikibugs>	 (03Merged) 10jenkins-bot: Network report: warning only for "no-mon" interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776863 (owner: 10Ayounsi)
[09:31:43] <moritzm>	 !log rolling restart of FPM/Apache on mw canaries to pick up updated zlib/glibc/openssl/libxml
[09:31:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:27] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Peter) Hi @vgutierrez the performance team continuously runs synthetic tests where we test the performance of a couple of Wikipedia p...
[09:35:00] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp3054 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776865 (https://phabricator.wikimedia.org/T290005)
[09:35:02] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp4028 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776866 (https://phabricator.wikimedia.org/T290005)
[09:35:04] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp3055 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776867 (https://phabricator.wikimedia.org/T290005)
[09:35:06] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp4022 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776868 (https://phabricator.wikimedia.org/T290005)
[09:35:08] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp5008 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776869 (https://phabricator.wikimedia.org/T290005)
[09:35:10] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp6015 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776870 (https://phabricator.wikimedia.org/T290005)
[09:35:12] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp5002 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776871 (https://phabricator.wikimedia.org/T290005)
[09:35:15] <wikibugs>	 (03PS1) 10MMandere: site: Reimage cp6007 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776872 (https://phabricator.wikimedia.org/T290005)
[09:40:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24031 and previous config saved to /var/cache/conftool/dbconfig/20220404-094053-ladsgroup.json
[09:40:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:57] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) >>! In T290005#7828034, @Peter wrote: > Hi @vgutierrez the performance team continuously runs synthetic tests where we te...
[09:40:57] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[09:43:21] <wikibugs>	 10SRE, 10Analytics-Radar, 10observability: Set up cross DC topic mirroring for Kafka logging clusters - https://phabricator.wikimedia.org/T276972 (10fgiunchedi) >>! In T276972#7824672, @Ottomata wrote: > In https://phabricator.wikimedia.org/T304373#7823916 @fgiunchedi wrote >> to clarify my position on T2769...
[09:44:51] <mmandere>	 !log pool cp5003 with HAProxy as TLS termination layer - T290005
[09:44:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:54] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[09:45:06] <wikibugs>	 (03CR) 10Silvan Heintze: [C: 03+1] Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901 (owner: 10Jakob)
[09:47:12] <moritzm>	 !log installing zlib security updates
[09:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:24] <wikibugs>	 (03CR) 10Ollie Shotton: [C: 03+1] Use wgRestAPIAdditionalRouteFiles for WB REST API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774901 (owner: 10Jakob)
[09:48:28] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855
[09:48:32] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875
[09:48:56] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
[09:48:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:12] <wikibugs>	 (03PS1) 10Elukey: Change the Calico's pod IP subnet for ml-serve-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/776876 (https://phabricator.wikimedia.org/T304673)
[09:49:14] <wikibugs>	 (03PS1) 10Elukey: Change the Calico's pod IP subnet for ml-serve-codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/776877 (https://phabricator.wikimedia.org/T304673)
[09:49:37] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add helm charts and a helmfile configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[09:50:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875 (owner: 10Giuseppe Lavagetto)
[09:50:44] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add helm charts and a helmfile configuration for datahub (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[09:50:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855 (owner: 10Giuseppe Lavagetto)
[09:50:52] <wikibugs>	 (03CR) 10Btullis: [V: 03+2 C: 03+2] Add helm charts and a helmfile configuration for datahub [deployment-charts] - 10https://gerrit.wikimedia.org/r/764375 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[09:51:20] <mmandere>	 !log pool cp6008 with HAProxy as TLS termination layer - T290005
[09:51:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:24] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[09:52:09] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-druid1005.eqiad.wmnet
[09:52:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:44] <icinga-wm>	 PROBLEM - Host gitlab.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[09:54:00] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Split watchrat URLs by need of proxy usage [puppet] - 10https://gerrit.wikimedia.org/r/776878 (https://phabricator.wikimedia.org/T303803)
[09:54:06] <icinga-wm>	 RECOVERY - Host gitlab.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[09:54:57] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[09:54:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:22] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:55:46] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
[09:55:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24032 and previous config saved to /var/cache/conftool/dbconfig/20220404-095558-ladsgroup.json
[09:56:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:01] <wikibugs>	 (03PS1) 10Elukey: role::ml_k8s::master: change the svc eqiad IP subnet [puppet] - 10https://gerrit.wikimedia.org/r/776879 (https://phabricator.wikimedia.org/T304673)
[09:57:03] <wikibugs>	 (03PS1) 10Elukey: role::ml_k8s::master: change the codfw svc IP range [puppet] - 10https://gerrit.wikimedia.org/r/776880 (https://phabricator.wikimedia.org/T304673)
[09:58:21] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1005.eqiad.wmnet
[09:58:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:22] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:00:56] <jelto>	 ^gitlab alerts expected due to maintenance
[10:02:15] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.2.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776883
[10:02:26] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp3054 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776865 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:03:19] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp4028 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776866 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:03:50] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp3055 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776867 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:04:27] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp4022 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776868 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:05:45] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp5008 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776869 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:06:10] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp6015 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776870 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:06:39] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp5002 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776871 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:07:08] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] site: Reimage cp6007 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776872 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:07:21] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v1.2.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776883 (owner: 10Volans)
[10:08:44] <moritzm>	 !log installing icu bugfix updates from buster 10.12 point release
[10:08:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:55] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] Change the Calico's pod IP subnet for ml-serve-codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/776877 (https://phabricator.wikimedia.org/T304673) (owner: 10Elukey)
[10:09:57] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-druid1004.eqiad.wmnet
[10:09:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:22] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v1.2.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/776883 (owner: 10Volans)
[10:10:30] <wikibugs>	 (03PS1) 10Ayounsi: PuppetDB report: more explicit error messages [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776888
[10:10:45] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[10:10:45] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] Change the Calico's pod IP subnet for ml-serve-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/776876 (https://phabricator.wikimedia.org/T304673) (owner: 10Elukey)
[10:11:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24033 and previous config saved to /var/cache/conftool/dbconfig/20220404-101104-ladsgroup.json
[10:11:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:08] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] role::ml_k8s::master: change the svc eqiad IP subnet [puppet] - 10https://gerrit.wikimedia.org/r/776879 (https://phabricator.wikimedia.org/T304673) (owner: 10Elukey)
[10:11:27] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] role::ml_k8s::master: change the codfw svc IP range [puppet] - 10https://gerrit.wikimedia.org/r/776880 (https://phabricator.wikimedia.org/T304673) (owner: 10Elukey)
[10:12:06] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776888 (owner: 10Ayounsi)
[10:12:34] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] PuppetDB report: more explicit error messages [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776888 (owner: 10Ayounsi)
[10:12:38] <icinga-wm>	 PROBLEM - Host gitlab.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[10:13:04] <icinga-wm>	 PROBLEM - Host gitlab1001 is DOWN: PING CRITICAL - Packet loss = 100%
[10:13:46] <wikibugs>	 (03Merged) 10jenkins-bot: PuppetDB report: more explicit error messages [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776888 (owner: 10Ayounsi)
[10:14:41] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1004.eqiad.wmnet
[10:14:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:47] <wikibugs>	 (03PS1) 10Vgutierrez: traffic: Add HAProxyEdgeTrafficDrop [alerts] - 10https://gerrit.wikimedia.org/r/776890 (https://phabricator.wikimedia.org/T290005)
[10:14:50] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:15:16] <icinga-wm>	 RECOVERY - Host gitlab1001 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms
[10:15:22] <jinxer-wm>	 (JobUnavailable) firing: (9) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:17:25] <wikibugs>	 (03PS1) 10Ayounsi: Fix typo [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776891
[10:19:13] <wikibugs>	 (03CR) 10Kosta Harlan: GrowthExperiments: Add mailing list question for eswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/773204 (https://phabricator.wikimedia.org/T303240) (owner: 10Kosta Harlan)
[10:19:39] <wikibugs>	 (03PS1) 10Marostegui: switchover-tmpl.sh: Add prerequisites link and calendar invite [software] - 10https://gerrit.wikimedia.org/r/776892 (https://phabricator.wikimedia.org/T303605)
[10:20:36] <icinga-wm>	 PROBLEM - SSH on gitlab1001 is CRITICAL: connect to address 208.80.154.6 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:21:11] <wikibugs>	 (03PS1) 10JMeybohm: Move datahub secrets into the right subchart YAML structure [labs/private] - 10https://gerrit.wikimedia.org/r/776893
[10:21:18] <wikibugs>	 (03PS1) 10Volans: Upstream release v1.2.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/776895
[10:21:34] <wikibugs>	 (03PS1) 10Btullis: Remove the egress rules from datahub-fronted to mysql [deployment-charts] - 10https://gerrit.wikimedia.org/r/776896 (https://phabricator.wikimedia.org/T301454)
[10:21:45] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Move datahub secrets into the right subchart YAML structure [labs/private] - 10https://gerrit.wikimedia.org/r/776893 (owner: 10JMeybohm)
[10:23:54] <wikibugs>	 (03CR) 10MMandere: [C: 03+1] "LGTM" [alerts] - 10https://gerrit.wikimedia.org/r/776890 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[10:24:16] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[10:25:23] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v1.2.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/776895 (owner: 10Volans)
[10:26:04] <moritzm>	 !log installing libxml2 security updates
[10:26:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24034 and previous config saved to /var/cache/conftool/dbconfig/20220404-102609-ladsgroup.json
[10:26:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[10:26:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[10:26:12] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[10:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24035 and previous config saved to /var/cache/conftool/dbconfig/20220404-102616-ladsgroup.json
[10:26:17] <wikibugs>	 (03PS2) 10Btullis: Remove the references to mysql_password from datahub-frontend [deployment-charts] - 10https://gerrit.wikimedia.org/r/776896 (https://phabricator.wikimedia.org/T301454)
[10:26:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:28] <wikibugs>	 (03PS3) 10Btullis: Remove the MySQL specific details from datahub-frontend [deployment-charts] - 10https://gerrit.wikimedia.org/r/776896 (https://phabricator.wikimedia.org/T301454)
[10:28:11] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v1.2.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/776895 (owner: 10Volans)
[10:29:13] <icinga-wm>	 RECOVERY - Host gitlab.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms
[10:30:22] <jinxer-wm>	 (JobUnavailable) firing: (9) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:31:54] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Remove the MySQL specific details from datahub-frontend [deployment-charts] - 10https://gerrit.wikimedia.org/r/776896 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[10:32:24] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-druid1003.eqiad.wmnet
[10:32:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:21] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776891 (owner: 10Ayounsi)
[10:38:12] <volans>	 !log uploaded python3-wmflib_1.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
[10:38:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:15] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1003.eqiad.wmnet
[10:39:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] vrts: rename module files and classes [puppet] - 10https://gerrit.wikimedia.org/r/776237 (https://phabricator.wikimedia.org/T293942) (owner: 10AOkoth)
[10:40:43] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Fix typo [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776891 (owner: 10Ayounsi)
[10:42:19] <icinga-wm>	 RECOVERY - SSH on gitlab1001 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:42:42] <wikibugs>	 (03Merged) 10jenkins-bot: Fix typo [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/776891 (owner: 10Ayounsi)
[10:51:26] <wikibugs>	 (03PS1) 10Amire80: Update feed URL for wikimedia.no blog [puppet] - 10https://gerrit.wikimedia.org/r/776905
[10:52:01] <wikibugs>	 (03PS1) 10Btullis: Increment the chart version and allow version range matching [deployment-charts] - 10https://gerrit.wikimedia.org/r/776906 (https://phabricator.wikimedia.org/T301454)
[10:53:09] <wikibugs>	 (03CR) 10Jon Harald Søby: [C: 03+1] Update feed URL for wikimedia.no blog [puppet] - 10https://gerrit.wikimedia.org/r/776905 (owner: 10Amire80)
[10:53:48] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Niiiiiiice!" [puppet] - 10https://gerrit.wikimedia.org/r/776878 (https://phabricator.wikimedia.org/T303803) (owner: 10Alexandros Kosiaris)
[10:53:49] <mmandere>	 !log depool cp3054 for reimage - T290005
[10:53:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:52] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[10:56:34] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp3054 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776865 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[10:58:19] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2026 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:59:46] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Increment the chart version and allow version range matching [deployment-charts] - 10https://gerrit.wikimedia.org/r/776906 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[11:00:29] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[11:04:58] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS buster
[11:04:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:05:08] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp3054.esams.wmnet with OS buster
[11:05:20] <James_F>	 jouncebot: next
[11:05:20] <jouncebot>	 In 1 hour(s) and 54 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1300)
[11:07:12] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Increment the chart version and allow version range matching [deployment-charts] - 10https://gerrit.wikimedia.org/r/776906 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[11:07:19] <moritzm>	 !log installing cups security updates on buster (client side tools/libs)
[11:07:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:01] <wikibugs>	 (03PS1) 10Btullis: Remove the statsv source from the VarnishkafkaNoMessages alert [alerts] - 10https://gerrit.wikimedia.org/r/776912 (https://phabricator.wikimedia.org/T300246)
[11:09:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[11:09:16] <logmsgbot>	 !log jforrester@deploy1002 Started deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block
[11:09:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:24] <logmsgbot>	 !log jforrester@deploy1002 Finished deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block (duration: 00m 08s)
[11:09:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:43] <volans>	 !log deploying python3-wmflib 1.2.0 fleet-wide
[11:11:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:12:50] <mmandere>	 !log depool cp4028 for reimage - T290005
[11:12:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:12:52] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[11:15:25] <wikibugs>	 (03PS1) 10Btullis: Remove an-test-coord* from the Hive JVM heap memory alerts [alerts] - 10https://gerrit.wikimedia.org/r/776919 (https://phabricator.wikimedia.org/T293399)
[11:15:27] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp4028 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776866 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[11:18:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Add mdadm processes to filter list for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/776929 (https://phabricator.wikimedia.org/T135991)
[11:18:24] <wikibugs>	 (03PS2) 10Btullis: Remove test hosts from the JVM heap memory alerts [alerts] - 10https://gerrit.wikimedia.org/r/776919 (https://phabricator.wikimedia.org/T293399)
[11:18:42] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:18:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:49] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp4028.ulsfo.wmnet with OS buster
[11:20:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:58] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4028.ulsfo.wmnet with OS buster
[11:23:27] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855
[11:23:29] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875
[11:25:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24036 and previous config saved to /var/cache/conftool/dbconfig/20220404-112506-ladsgroup.json
[11:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[11:25:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875 (owner: 10Giuseppe Lavagetto)
[11:25:46] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[11:25:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855 (owner: 10Giuseppe Lavagetto)
[11:27:39] <moritzm>	 !log installing jbig2dec security updates
[11:27:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:21] <icinga-wm>	 PROBLEM - SSH on aqs1007.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:33:39] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
[11:33:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[11:34:18] <moritzm>	 !log installing zziplib security updates
[11:34:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:09] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
[11:37:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:34] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
[11:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[11:39:44] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[11:39:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24037 and previous config saved to /var/cache/conftool/dbconfig/20220404-114011-ladsgroup.json
[11:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:59] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
[11:41:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:32] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:47:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:28] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:50:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:47] <wikibugs>	 (03PS2) 10Jcrespo: check: Read list of valid sections/valid backup jobs from a file [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315)
[11:55:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24038 and previous config saved to /var/cache/conftool/dbconfig/20220404-115516-ladsgroup.json
[11:55:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:40] <wikibugs>	 (03PS3) 10Jcrespo: check: Read list of valid sections/valid backup jobs from a file [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315)
[12:01:11] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS buster
[12:01:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:01:14] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[12:01:19] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp3054.esams.wmnet with OS buster com...
[12:01:34] <logmsgbot>	 !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[12:01:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:56] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4028.ulsfo.wmnet with OS buster
[12:04:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:05] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4028.ulsfo.wmnet with OS buster com...
[12:05:14] <mmandere>	 !log pool cp3054 with HAProxy as TLS termination layer - T290005
[12:05:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:17] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:10:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24039 and previous config saved to /var/cache/conftool/dbconfig/20220404-121022-ladsgroup.json
[12:10:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[12:10:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[12:10:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[12:10:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24040 and previous config saved to /var/cache/conftool/dbconfig/20220404-121030-ladsgroup.json
[12:10:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:09] <mmandere>	 !log pool cp4028 with HAProxy as TLS termination layer - T290005
[12:11:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:12] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:18:21] <moritzm>	 !log installing expat updates (followups to earlier security fixes, no security impact by itself)
[12:18:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:23] <wikibugs>	 (03PS3) 10JMeybohm: Move miscweb back to state production [puppet] - 10https://gerrit.wikimedia.org/r/774917 (https://phabricator.wikimedia.org/T290966)
[12:26:47] <ottomata>	 !log deleting empty typo topics from kafka main-codfw: codfw.mediawiki.page_delete, codfw.mediawiki.page_move, codfw.mediawiki.page_restore, codfw.mediawiki.revision_create (found while working on T241178)
[12:26:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:23] <ottomata>	 !log deleting empty typo topics from kafka main-eqiad: eqiad.mediawiki.page-edit (found while working on T241178)
[12:31:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:35] <mmandere>	 !log depool cp3055 for reimage - T290005
[12:31:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:37] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:34:02] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp3055 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776867 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[12:35:49] <ottomata>	 !log removing retention.ms override from eventstreams publicly exposed topics in kafka main-eqiad and main-codfw - T241178
[12:35:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:27] <wikibugs>	 (03Abandoned) 10Hashar: scap: automatize plugins handling [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/723992 (owner: 10Hashar)
[12:36:30] <wikibugs>	 (03Abandoned) 10Hashar: scap: automatize plugins handling [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/709975 (owner: 10Hashar)
[12:36:38] <wikibugs>	 (03Abandoned) 10Hashar: mwdeploy user is provided by LDAP on WMCS [puppet] - 10https://gerrit.wikimedia.org/r/699427 (https://phabricator.wikimedia.org/T73480) (owner: 10Hashar)
[12:38:43] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS buster
[12:38:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:52] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp3055.esams.wmnet with OS buster
[12:42:29] <mmandere>	 !log depool cp4022 for reimage - T290005
[12:42:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:32] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:43:35] <moritzm>	 !log installing gmp security updates
[12:43:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:21] <Func>	 jouncebot: next
[12:45:21] <jouncebot>	 In 0 hour(s) and 14 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1300)
[12:45:40] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[12:45:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:23] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp4022 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776868 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[12:48:53] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp4022.ulsfo.wmnet with OS buster
[12:48:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:03] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4022.ulsfo.wmnet with OS buster
[12:49:11] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "This change is ready for review." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776230 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto)
[12:49:31] <wikibugs>	 (03PS3) 10Func: Add logo variants for zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775416 (https://phabricator.wikimedia.org/T273578)
[12:49:58] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] switchover-tmpl.sh: Add prerequisites link and calendar invite [software] - 10https://gerrit.wikimedia.org/r/776892 (https://phabricator.wikimedia.org/T303605) (owner: 10Marostegui)
[12:50:35] <wikibugs>	 (03Abandoned) 10Hashar: gerrit: move CI result table to a tab [puppet] - 10https://gerrit.wikimedia.org/r/756685 (owner: 10Hashar)
[12:52:57] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
[12:52:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
[12:53:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:47] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
[12:53:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:12] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.12 point update - https://phabricator.wikimedia.org/T304546 (10MoritzMuehlenhoff)
[12:57:10] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
[12:57:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:01] <wikibugs>	 (03CR) 10Hokwelum: [C: 03+1] "Ariel and I tested on the deployment-prep and it looks good" [dumps] - 10https://gerrit.wikimedia.org/r/767477 (https://phabricator.wikimedia.org/T138208) (owner: 10Ladsgroup)
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1300).
[13:00:04] <jouncebot>	 duesen, Lucas_WMDE, and Func: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:21] * urbanecm waves
[13:00:52] <urbanecm>	 duesen: do you want to self-deploy?
[13:01:00] <taavi>	 o/
[13:01:23] <urbanecm>	 hey taavi
[13:01:58] <urbanecm>	 Func: hello, are you around?
[13:02:03] <Func>	 yes
[13:02:30] <duesen>	 o/
[13:02:34] <urbanecm>	 Func: okay, let's start with you then :). Can your patch be tested?
[13:02:52] <Func>	 Nothing to test, this just wants to make sure we can see the outcome of change 773936 after the next train.
[13:03:00] <urbanecm>	 okay
[13:03:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Add logo variants for zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775416 (https://phabricator.wikimedia.org/T273578) (owner: 10Func)
[13:03:39] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[13:03:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:42] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
[13:03:44] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
[13:03:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:51] <wikibugs>	 (03Merged) 10jenkins-bot: Add logo variants for zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775416 (https://phabricator.wikimedia.org/T273578) (owner: 10Func)
[13:04:03] <duesen>	 urbanecm: i'l like to give it a go, yes. I have only done a config deploy once before though
[13:04:41] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
[13:04:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:08] <urbanecm>	 duesen: i can guide you :)
[13:05:25] <urbanecm>	 i need to finish Func's patch first though
[13:05:29] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 7ebad8ffa1826ed3429cd822d388807270cfe341: Add logo variants for zhwiki (T273578) (duration: 00m 51s)
[13:05:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:31] <stashbot>	 T273578: Wikis with language variants must override logo in Common.css - https://phabricator.wikimedia.org/T273578
[13:05:31] <duesen>	 urbanecm: cool! But Func is going first, right?
[13:05:44] <urbanecm>	 duesen: correct. i didn't see a re from you, so went with their patch instead :)
[13:05:44] <duesen>	 (the bot chatter in here is really distracting)
[13:05:46] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
[13:05:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:56] <urbanecm>	 (and I don't see Lucas, so I'll be skipping their patch)
[13:06:01] <urbanecm>	 Func: your patch is live now.
[13:06:16] <Func>	 ok, thanks!
[13:06:29] <urbanecm>	 duesen: it also provides useful info though :))
[13:06:37] <urbanecm>	 duesen: feel free to deploy your patch now
[13:06:46] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "Built all the images locally and verified that the patch was applied correctly." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/775277 (https://phabricator.wikimedia.org/T304092) (owner: 10JMeybohm)
[13:06:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:06:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:59] <urbanecm>	 https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers is the docs
[13:07:45] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
[13:07:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24041 and previous config saved to /var/cache/conftool/dbconfig/20220404-130751-ladsgroup.json
[13:07:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:54] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:08:30] <duesen>	 urbanecm: sorry, irccloud doesn't do proper notifications
[13:08:35] <duesen>	 i'll get started on my patch now
[13:08:50] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[13:08:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:05] <urbanecm>	 duesen: okay. let me know if i can help.
[13:09:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:09:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:08] <wikibugs>	 (03CR) 10Daniel Kinzler: [C: 03+2] "deploying now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776164 (https://phabricator.wikimedia.org/T305176) (owner: 10Daniel Kinzler)
[13:10:40] <duesen>	 urbanecm: I keep forgetting to allocate time for the patch to actually merge. luckily, for config, that should be quick
[13:10:47] <urbanecm>	 yup
[13:10:49] <wikibugs>	 (03PS1) 10Volans: spicerack: add wmflib.prometheus.Thanos support [software/spicerack] - 10https://gerrit.wikimedia.org/r/776946
[13:10:51] <wikibugs>	 (03Merged) 10jenkins-bot: Always set MW_USE_CONFIG_SCHEMA. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776164 (https://phabricator.wikimedia.org/T305176) (owner: 10Daniel Kinzler)
[13:10:56] <urbanecm>	 here you go :)
[13:11:07] <duesen>	 excellent
[13:11:08] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
[13:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:36] <duesen>	 urbanecm: git log is clean
[13:11:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:11:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:06] <urbanecm>	 duesen: is that a question or just a mere statement?
[13:12:33] <duesen>	 urbanecm: just a statement :)
[13:12:39] <urbanecm>	 ok, wasn't sure:)
[13:12:40] <duesen>	 urbanecm: pulling to mwdebug1001 now
[13:12:45] <urbanecm>	 ack
[13:13:55] <duesen>	 urbanecm: i'm now waiting for the stats to show in grafana. Last time, it took a couple of minutes
[13:14:03] <urbanecm>	 ack
[13:15:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34673/console" [puppet] - 10https://gerrit.wikimedia.org/r/776878 (https://phabricator.wikimedia.org/T303803) (owner: 10Alexandros Kosiaris)
[13:15:36] <duesen>	 there we go, looking good
[13:15:51] <urbanecm>	 great
[13:16:14] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[13:16:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:38] <duesen>	 urbanecm: so... I just scap sync-file?
[13:16:41] <urbanecm>	 yes
[13:16:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:16:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:59] <duesen>	 Amir1: thank you so much for writing deployment-commands!
[13:17:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
[13:17:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:58] <Amir1>	 duesen: ^^ as I said, I'm lazy
[13:18:05] <logmsgbot>	 !log daniel@deploy1002 Synchronized multiversion/defines.php: Config: [[gerrit:776164|Always set MW_USE_CONFIG_SCHEMA. (T305176)]] (duration: 00m 50s)
[13:18:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:08] <stashbot>	 T305176: Make loading defaults from the config schema the default - https://phabricator.wikimedia.org/T305176
[13:18:30] <duesen>	 urbanecm: so, did I break wikipedia?
[13:18:41] <urbanecm>	 it still loads for me :)
[13:19:21] <urbanecm>	 and if you did, canaries would've told you, probably :)
[13:19:30] <urbanecm>	 duesen: anything else to deploy from you?
[13:19:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:19:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:20:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:35] <duesen>	 urbanecm: nope, all good, thank you! I just confirmed the stats, all requests sseem to be using the new code now
[13:20:40] <urbanecm>	 excellent
[13:20:49] <urbanecm>	 in that case, we're done, as i still don't see Lucas
[13:20:57] <urbanecm>	 !log UTC afternoon B&C window done
[13:20:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24042 and previous config saved to /var/cache/conftool/dbconfig/20220404-132256-ladsgroup.json
[13:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
[13:23:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:58] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] spicerack: add wmflib.prometheus.Thanos support [software/spicerack] - 10https://gerrit.wikimedia.org/r/776946 (owner: 10Volans)
[13:26:21] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: [WIP] Requesting access to deployment group for TThoabala - https://phabricator.wikimedia.org/T303398 (10herron) 05Stalled→03Invalid >>! In T303398#7825617, @Dzahn wrote: > Is it ok if we close this ticket and you just reopen it again once he is back?  Go...
[13:26:27] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[13:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:44] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Access for new Data Platform Dev: Thomas Chin - https://phabricator.wikimedia.org/T305193 (10herron)
[13:27:47] <wikibugs>	 (03PS2) 10Herron: admin: add tchin to groups platform-engineering and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/775954 (https://phabricator.wikimedia.org/T305193)
[13:29:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
[13:29:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:06] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4022.ulsfo.wmnet with OS buster
[13:31:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:14] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4022.ulsfo.wmnet with OS buster com...
[13:31:31] <wikibugs>	 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10Andrew)
[13:32:04] <wikibugs>	 (03CR) 10Func: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747913 (https://phabricator.wikimedia.org/T298308) (owner: 10Winston Sung)
[13:32:32] <wikibugs>	 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10Andrew) Here is the last thing I see before a blank screen and then grub:  {F35037959}
[13:34:00] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS buster
[13:34:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
[13:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:07] <icinga-wm>	 RECOVERY - SSH on aqs1007.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:34:09] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp3055.esams.wmnet with OS buster com...
[13:34:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] spicerack: add wmflib.prometheus.Thanos support [software/spicerack] - 10https://gerrit.wikimedia.org/r/776946 (owner: 10Volans)
[13:35:31] <wikibugs>	 (03CR) 10Volans: [C: 03+2] spicerack: add wmflib.prometheus.Thanos support [software/spicerack] - 10https://gerrit.wikimedia.org/r/776946 (owner: 10Volans)
[13:35:50] <mmandere>	 !log pool cp4022 with HAProxy as TLS termination layer - T290005
[13:35:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:53] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[13:36:06] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275)
[13:36:58] <wikibugs>	 (03PS1) 10Ssingh: haproxy: use Requires= in haproxy-mtail@tls.service [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275)
[13:37:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[13:37:41] <duesen>	 urbanecm: is there a good way to put a bashrc on the deployment/maintenance/debug hosts? 
[13:37:55] <duesen>	 I mean, i can copy one around, but I was hoping there was a nicer way
[13:38:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24043 and previous config saved to /var/cache/conftool/dbconfig/20220404-133801-ladsgroup.json
[13:38:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:14] <urbanecm>	 duesen: a puppet patch. under `modules/admin/files/home/<username>`
[13:38:26] <urbanecm>	 whatever you put there will be automatically propagated to all hosts you have access to
[13:38:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] haproxy: use Requires= in haproxy-mtail@tls.service [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[13:38:46] <urbanecm>	 duesen: https://github.com/wikimedia/puppet/tree/production/modules/admin/files/home/urbanecm is my dotfiles :)
[13:38:56] <urbanecm>	 (you'll need to get a friendly SRE to merge it though)
[13:41:10] <wikibugs>	 (03PS25) 10Winston Sung: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747913 (https://phabricator.wikimedia.org/T286291)
[13:41:29] <wikibugs>	 (03CR) 10Winston Sung: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747913 (https://phabricator.wikimedia.org/T286291) (owner: 10Winston Sung)
[13:41:32] <urbanecm>	 duesen: is that what you were looking for?
[13:41:34] <wikibugs>	 (03PS26) 10Winston Sung: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747913 (https://phabricator.wikimedia.org/T286291)
[13:41:43] <wikibugs>	 (03PS2) 10Ssingh: haproxy: use Requires= in haproxy-mtail@tls.service [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275)
[13:42:00] <wikibugs>	 (03PS6) 10Winston Sung: Rearrange zh namespace names and namespace aliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776031
[13:42:15] <wikibugs>	 (03PS3) 10Herron: admin: add tchin to groups platform-engineering and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/775954 (https://phabricator.wikimedia.org/T305193)
[13:42:37] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Add controller_sync_error_count metric [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/775277 (https://phabricator.wikimedia.org/T304092) (owner: 10JMeybohm)
[13:43:00] <wikibugs>	 (03Merged) 10jenkins-bot: spicerack: add wmflib.prometheus.Thanos support [software/spicerack] - 10https://gerrit.wikimedia.org/r/776946 (owner: 10Volans)
[13:44:08] <wikibugs>	 (03PS2) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275)
[13:44:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
[13:44:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:28] <wikibugs>	 (03CR) 10Func: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/747913 (https://phabricator.wikimedia.org/T286291) (owner: 10Winston Sung)
[13:44:51] <mmandere>	 !log pool cp3055 with HAProxy as TLS termination layer - T290005
[13:44:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:54] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[13:45:15] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34675/console" [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[13:47:22] <wikibugs>	 (03PS1) 10Btullis: Apply kafka broker templates correctly in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/776950 (https://phabricator.wikimedia.org/T301454)
[13:50:13] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
[13:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24044 and previous config saved to /var/cache/conftool/dbconfig/20220404-135307-ladsgroup.json
[13:53:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[13:53:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[13:53:10] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[13:53:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24045 and previous config saved to /var/cache/conftool/dbconfig/20220404-135314-ladsgroup.json
[13:53:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:41] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v2.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/776952
[13:54:10] <wikibugs>	 (03CR) 10Kormat: check: Read list of valid sections/valid backup jobs from a file (031 comment) [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[13:54:31] <wikibugs>	 (03PS3) 10Ssingh: certspotter: switch to a local CT logs list [puppet] - 10https://gerrit.wikimedia.org/r/776217 (https://phabricator.wikimedia.org/T204993)
[13:57:24] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Apply kafka broker templates correctly in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/776950 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:57:39] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Apply kafka broker templates correctly in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/776950 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[13:58:15] <mmandere>	 !log depool cp5008 for reimage - T290005
[13:58:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:19] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[13:59:49] <wikibugs>	 (03PS1) 10JMeybohm: Update cert-manager in staging to 1.5.4-3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/776953 (https://phabricator.wikimedia.org/T304092)
[14:01:37] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[14:01:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:04] <wikibugs>	 (03CR) 10Jcrespo: check: Read list of valid sections/valid backup jobs from a file (031 comment) [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[14:02:50] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-2] "this  isn't the right approach to fix the issue. This check reads data from a fifo-log-demux instance and returns OK if it's able to do so" [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[14:02:54] <wikibugs>	 (03PS1) 10Btullis: Define the DATHUB_SECRET value [deployment-charts] - 10https://gerrit.wikimedia.org/r/776954 (https://phabricator.wikimedia.org/T301454)
[14:02:55] <icinga-wm>	 RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:03:18] <wikibugs>	 (03CR) 10Bking: [C: 03+1] "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/764830 (https://phabricator.wikimedia.org/T302330) (owner: 10Addshore)
[14:05:51] <wikibugs>	 (03PS4) 10Jcrespo: check: Read list of valid sections/valid backup jobs from a file [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315)
[14:06:22] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp5008 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776869 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[14:06:56] <wikibugs>	 (03CR) 10Jcrespo: check: Read list of valid sections/valid backup jobs from a file (031 comment) [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[14:07:26] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] check: Read list of valid sections/valid backup jobs from a file [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[14:07:48] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "Thanks for working on this. I've tried to tell systemd that by adding the "After=haproxy-mtail@%i.socket" stanza on the haproxy-mtail@.sys" [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[14:08:00] <wikibugs>	 (03CR) 10Jcrespo: "Thank you a lot for the review!" [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/776171 (https://phabricator.wikimedia.org/T301315) (owner: 10Jcrespo)
[14:08:37] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp5008.eqsin.wmnet with OS buster
[14:08:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:46] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp5008.eqsin.wmnet with OS buster
[14:08:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/775954 (https://phabricator.wikimedia.org/T305193) (owner: 10Herron)
[14:10:45] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[14:11:43] <wikibugs>	 (03PS4) 10Herron: admin: add tchin to groups platform-engineering and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/775954 (https://phabricator.wikimedia.org/T305193)
[14:12:41] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Define the DATHUB_SECRET value [deployment-charts] - 10https://gerrit.wikimedia.org/r/776954 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis)
[14:13:06] <wikibugs>	 (03PS3) 10Bking: Revert "Temp remove codfw from wikidata updateQueryServiceLag check" [puppet] - 10https://gerrit.wikimedia.org/r/764830 (https://phabricator.wikimedia.org/T302330) (owner: 10Addshore)
[14:13:38] <wikibugs>	 (03CR) 10Herron: [C: 03+2] admin: add tchin to groups platform-engineering and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/775954 (https://phabricator.wikimedia.org/T305193) (owner: 10Herron)
[14:16:05] <mmandere>	 !log depool cp6015 for reimage - T290005
[14:16:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:09] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[14:16:58] <wikibugs>	 (03CR) 10Bking: [C: 03+2] Revert "Temp remove codfw from wikidata updateQueryServiceLag check" [puppet] - 10https://gerrit.wikimedia.org/r/764830 (https://phabricator.wikimedia.org/T302330) (owner: 10Addshore)
[14:17:49] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Access for new Data Platform Dev: Thomas Chin - https://phabricator.wikimedia.org/T305193 (10herron) 05Open→03Resolved a:03herron Hi @tchin, the requested access has now been provisioned and will be fully deployed within 30 minutes (as puppet runs compl...
[14:18:45] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] site: Reimage cp6015 as cache::text_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/776870 (https://phabricator.wikimedia.org/T290005) (owner: 10MMandere)
[14:24:39] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
[14:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:49] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp6015.drmrs.wmnet with OS buster
[14:26:26] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v2.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/776952 (owner: 10Volans)
[14:28:03] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
[14:28:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:30:54] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
[14:30:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:31] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1 C: 03+2] haproxy: use Requires= in haproxy-mtail@tls.service [puppet] - 10https://gerrit.wikimedia.org/r/776949 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[14:33:26] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
[14:33:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:43] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v2.4.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/776952 (owner: 10Volans)
[14:34:11] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855
[14:34:13] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875
[14:36:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add log command [software/conftool] - 10https://gerrit.wikimedia.org/r/776855 (owner: 10Giuseppe Lavagetto)
[14:36:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875 (owner: 10Giuseppe Lavagetto)
[14:36:55] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
[14:36:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:18] <herron>	 !log rebooting alert2001
[14:37:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:25] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10MoritzMuehlenhoff) @diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it expires whether to extend access or not)....
[14:37:58] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[14:37:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:59] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10MoritzMuehlenhoff) @diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it expires whether to extend access or not...
[14:38:08] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudvirt-wdqs1001.eqiad.wmnet with OS bu...
[14:38:11] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:38:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:38:35] <wikibugs>	 (03Merged) 10jenkins-bot: Version bump [software/conftool] - 10https://gerrit.wikimedia.org/r/776875 (owner: 10Giuseppe Lavagetto)
[14:38:48] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10odimitrijevic) Approved!
[14:39:25] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10diego) > @diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it expires whether to extend access or not).  Internship...
[14:40:01] <icinga-wm>	 PROBLEM - Check systemd state on releases2002 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens6.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:40:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10diego) >>! In T305298#7828668, @MoritzMuehlenhoff wrote: > @diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it...
[14:41:09] <wikibugs>	 (03PS1) 10Volans: Upstream release v2.4.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776957
[14:42:34] <logmsgbot>	 !log mmandere@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
[14:42:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to deployment for TheresNoTime - https://phabricator.wikimedia.org/T302231 (10MoritzMuehlenhoff) 05Open→03Stalled I'm setting this task to Stalled until @TheresNoTime or @thcipriani think it's ready to revisit.
[14:44:03] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] cirrus: Migrate popularity_score configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/775965 (owner: 10Ebernhardson)
[14:44:17] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
[14:44:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:19] <logmsgbot>	 !log mmandere@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
[14:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:00] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "LGTM but this adds manual work that should be documented somewhere (wikitech?). Mainly how we should stay up to date regarding available C" [puppet] - 10https://gerrit.wikimedia.org/r/776217 (https://phabricator.wikimedia.org/T204993) (owner: 10Ssingh)
[14:45:37] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to deployment for TheresNoTime - https://phabricator.wikimedia.org/T302231 (10TheresNoTime) Thank you 🙂 I'll see how T305191 goes!
[14:46:16] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullse...
[14:46:24] <wikibugs>	 (03CR) 10Ssingh: certspotter: switch to a local CT logs list (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776217 (https://phabricator.wikimedia.org/T204993) (owner: 10Ssingh)
[14:48:33] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] certspotter: switch to a local CT logs list [puppet] - 10https://gerrit.wikimedia.org/r/776217 (https://phabricator.wikimedia.org/T204993) (owner: 10Ssingh)
[14:51:34] <wikibugs>	 (03CR) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[14:51:59] <icinga-wm>	 PROBLEM - Host cp5008 is DOWN: PING CRITICAL - Packet loss = 100%
[14:52:09] <icinga-wm>	 PROBLEM - Keyholder SSH agent on alert1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
[14:53:09] <icinga-wm>	 RECOVERY - Host cp5008 is UP: PING OK - Packet loss = 0%, RTA = 224.98 ms
[14:53:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24047 and previous config saved to /var/cache/conftool/dbconfig/20220404-145323-ladsgroup.json
[14:53:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:27] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[14:53:33] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10Ottomata) Approved.
[14:53:36] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10Ottomata) Approved.
[14:54:09] <icinga-wm>	 RECOVERY - Keyholder SSH agent on alert1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder
[14:55:38] <logmsgbot>	 !log herron@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
[14:55:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:36] <herron>	 ^it actually looks fine, alert1001 is a special case where other hosts are checked from it
[15:03:28] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5008.eqsin.wmnet with OS buster
[15:03:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:30] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v2.4.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776957 (owner: 10Volans)
[15:03:37] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp5008.eqsin.wmnet with OS buster com...
[15:05:13] <duesen>	 urbanecm: sorry, got distracted. Yea, that looks great!
[15:05:41] <mmandere>	 !log pool cp5008 with HAProxy as TLS termination layer - T290005
[15:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:46] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[15:06:14] <Lucas_WMDE>	 is it okay if I deploy a beta config change now?
[15:06:33] <Lucas_WMDE>	 (I’d added it to the UTC afternoon backport window, but didn’t open my laptop until after the window started, so I missed stashbot’s reminder)
[15:06:59] <wikibugs>	 10SRE, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Dumps-Generation: Image tarball dumps on your.org are not being generated - https://phabricator.wikimedia.org/T53001 (10Mitar) I think all media files should be made available through IPFS. Then it would be easy to host a copy of files, or contr...
[15:07:25] <logmsgbot>	 !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
[15:07:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:34] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp6015.drmrs.wmnet with OS buster com...
[15:08:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24048 and previous config saved to /var/cache/conftool/dbconfig/20220404-150828-ladsgroup.json
[15:08:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:01] <wikibugs>	 (03CR) 10Volans: "question inline" [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[15:11:07] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v2.4.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776957 (owner: 10Volans)
[15:13:56] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Use "unexpectedUnconnectedPage" page prop on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774847
[15:14:08] <Lucas_WMDE>	 ^ I’ll quickly deploy this unless someone yells at me to stop :)
[15:15:07] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Use "unexpectedUnconnectedPage" page prop on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774847 (owner: 10Lucas Werkmeister (WMDE))
[15:15:49] <wikibugs>	 (03Merged) 10jenkins-bot: Use "unexpectedUnconnectedPage" page prop on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774847 (owner: 10Lucas Werkmeister (WMDE))
[15:16:27] <Lucas_WMDE>	 ooh, `scap pull` output looks different
[15:17:13] <mmandere>	 !log pool cp6015 with HAProxy as TLS termination layer - T290005
[15:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:17] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[15:17:53] <wikibugs>	 (03PS1) 10Volans: Upstream release v2.4.0 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776961
[15:18:07] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:774847|Use "unexpectedUnconnectedPage" page prop on Beta]] (production no-op) (duration: 00m 50s)
[15:18:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:29] <icinga-wm>	 PROBLEM - Check systemd state on releases2002 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens6.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:22:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[15:22:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:32] <wikibugs>	 10SRE-OnFire, 10Wikidata, 10wdwb-tech, 10Discovery-Search (Current work), and 3 others: Only generate maxlag from pooled query service servers. - https://phabricator.wikimedia.org/T238751 (10MPhamWMF)
[15:22:53] <icinga-wm>	 RECOVERY - Check systemd state on releases2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:23:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
[15:23:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24049 and previous config saved to /var/cache/conftool/dbconfig/20220404-152333-ladsgroup.json
[15:23:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:21] <icinga-wm>	 PROBLEM - SSH on wtp1045.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:24:44] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[15:24:45] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[15:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
[15:27:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[15:27:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:40] <wikibugs>	 (03PS4) 10Andrew Bogott: dynamicproxy: cleanup remaining x-novaproxy-edit-dns users [puppet] - 10https://gerrit.wikimedia.org/r/771406 (https://phabricator.wikimedia.org/T295246) (owner: 10Majavah)
[15:27:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cinder-volume-backup: use created_at rather than modified_at for purging [puppet] - 10https://gerrit.wikimedia.org/r/775997 (owner: 10Andrew Bogott)
[15:28:19] <moritzm>	 !log remove stray debmonitor-server/cumin installs (cleanup of 548425ba5833089e5ad6025890a6db87fbe718b8)
[15:28:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] dynamicproxy: cleanup remaining x-novaproxy-edit-dns users [puppet] - 10https://gerrit.wikimedia.org/r/771406 (https://phabricator.wikimedia.org/T295246) (owner: 10Majavah)
[15:30:04] <jouncebot>	 jan_drewniak: My dear minions, it's time we take the moon! Just kidding. Time for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1530).
[15:31:05] <wikibugs>	 (03PS3) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275)
[15:31:11] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v2.4.0 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776961 (owner: 10Volans)
[15:31:44] <wikibugs>	 (03CR) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[15:34:09] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "looking good, just a small nitpick but LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[15:34:59] <wikibugs>	 (03PS4) 10Ssingh: trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275)
[15:37:50] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[15:38:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24050 and previous config saved to /var/cache/conftool/dbconfig/20220404-153839-ladsgroup.json
[15:38:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[15:38:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[15:38:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:46] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[15:38:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24051 and previous config saved to /var/cache/conftool/dbconfig/20220404-153846-ladsgroup.json
[15:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:48] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v2.4.0 (take 2) [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/776961 (owner: 10Volans)
[15:38:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:12] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] trafficserver: update Icinga check in check_trafficserver_log_fifo.py [puppet] - 10https://gerrit.wikimedia.org/r/776948 (https://phabricator.wikimedia.org/T305275) (owner: 10Ssingh)
[15:44:50] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[15:44:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:00] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudvirt-wdqs1001.eqiad.wmnet with OS bu...
[15:50:24] <wikibugs>	 (03PS1) 10Volans: sre.SREBatchBase: allow to customize grace sleep [cookbooks] - 10https://gerrit.wikimedia.org/r/776965
[15:50:26] <wikibugs>	 (03PS1) 10Volans: sre.cdn.roll-restart-varnish: override grace sleep [cookbooks] - 10https://gerrit.wikimedia.org/r/776966
[15:54:26] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
[15:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:24] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Monitor db_inventory rather than zarcillo section [puppet] - 10https://gerrit.wikimedia.org/r/776170 (https://phabricator.wikimedia.org/T301315)
[15:55:26] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Setup a valid_sections.txt config for db backup checks [puppet] - 10https://gerrit.wikimedia.org/r/776969 (https://phabricator.wikimedia.org/T301315)
[15:56:58] <icinga-wm>	 RECOVERY - Check systemd state on gitlab2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:58:04] <icinga-wm>	 RECOVERY - DPKG on idp-test1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[15:58:34] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
[15:58:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:15] <logmsgbot>	 !log bblack@cumin1001 START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
[16:00:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:05] <logmsgbot>	 !log bblack@cumin1001 END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
[16:02:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:43] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:05:29] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
[16:05:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:34] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Setup a valid_sections.txt config for db backup checks [puppet] - 10https://gerrit.wikimedia.org/r/776969 (https://phabricator.wikimedia.org/T301315)
[16:08:21] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
[16:08:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:51] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
[16:08:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:53] <volans>	 !log uploaded spicerack_2.4.0 to apt.wikimedia.org bullseye-wikimedia
[16:09:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:14] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
[16:10:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:39] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
[16:11:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:24] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
[16:14:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:05] <icinga-wm>	 RECOVERY - Check systemd state on snapshot1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:18:51] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-releng-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:26:11] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
[16:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:04] <icinga-wm>	 PROBLEM - SSH on db2090.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:31:49] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[16:31:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:57] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullse...
[16:34:42] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
[16:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24052 and previous config saved to /var/cache/conftool/dbconfig/20220404-164144-ladsgroup.json
[16:41:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:48] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[16:50:53] <taavi>	 !log mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Brand" "Brand/Archive" "Majavah" --reason '[[:phab:T305387]]' # T305387
[16:50:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:56] <stashbot>	 T305387: Move the "Brand" portal page on Meta-Wiki - https://phabricator.wikimedia.org/T305387
[16:51:03] <wikibugs>	 (03PS1) 10JMeybohm: Update cert-manager to 1.5.4-3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/776971 (https://phabricator.wikimedia.org/T304092)
[16:56:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24053 and previous config saved to /var/cache/conftool/dbconfig/20220404-165649-ladsgroup.json
[16:56:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:02] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Update feed URL for wikimedia.no blog [puppet] - 10https://gerrit.wikimedia.org/r/776905 (owner: 10Amire80)
[17:00:04] <jouncebot>	 ryankemper: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T1700).
[17:00:16] <wikibugs>	 (03PS1) 10Ayounsi: uRPF: add DHCP exception [homer/public] - 10https://gerrit.wikimedia.org/r/776973 (https://phabricator.wikimedia.org/T285461)
[17:00:41] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Update cert-manager in staging to 1.5.4-3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/776953 (https://phabricator.wikimedia.org/T304092) (owner: 10JMeybohm)
[17:02:33] <wikibugs>	 (03CR) 10Ayounsi: "From the doc: https://www.juniper.net/documentation/us/en/software/junos/security-services/topics/topic-map/interfaces-configuring-unicast" [homer/public] - 10https://gerrit.wikimedia.org/r/776973 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[17:05:22] <wikibugs>	 (03Merged) 10jenkins-bot: Update cert-manager in staging to 1.5.4-3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/776953 (https://phabricator.wikimedia.org/T304092) (owner: 10JMeybohm)
[17:06:51] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
[17:06:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:50] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[17:08:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:42] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[17:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:10:37] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[17:10:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:30] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[17:11:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24054 and previous config saved to /var/cache/conftool/dbconfig/20220404-171154-ladsgroup.json
[17:11:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:20] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
[17:15:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:16:07] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[17:16:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:47] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
[17:17:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:12] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Management interface SSH icinga alerts - https://phabricator.wikimedia.org/T304289 (10Papaul) |Hostname|Old verssion|New version| |db2083.mgmt| |db2086.mgmt| |db2090.mgmt| |kubernetes2001.mgmt| |ms-fe2008.mgmt| |mw2252.mgmt| |mw2254...
[17:20:30] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10wiki_willy) a:05Cmjohnson→03Jclark-ctr
[17:20:57] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10wiki_willy) @Jclark-ctr - just following up Cathal's last comment  >>! In T292095#7801403, @cmooney wrote: > @Jclark-ctr I'm not getting any...
[17:21:10] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (Kanban): PXE boot failures on cloudvirt-wdqs100[1-3] - https://phabricator.wikimedia.org/T305368 (10ayounsi) The bug was introduced with this change: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/775279/  The following one sho...
[17:23:58] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
[17:23:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:02] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "Tested manually for https://phabricator.wikimedia.org/T305368 and confirmed working as expected." [homer/public] - 10https://gerrit.wikimedia.org/r/776973 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[17:24:40] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
[17:24:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:50] <wikibugs>	 (03Merged) 10jenkins-bot: uRPF: add DHCP exception [homer/public] - 10https://gerrit.wikimedia.org/r/776973 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[17:25:19] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Delete the git::clone and install via deb package [puppet] - 10https://gerrit.wikimedia.org/r/776977 (https://phabricator.wikimedia.org/T299705)
[17:25:41] <XioNoX>	 !log push urpf DHCP exception to all core routers with urpf configured - T285461
[17:25:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:45] <stashbot>	 T285461: Review filtering for cloud-hosts on CR routers eqiad - https://phabricator.wikimedia.org/T285461
[17:26:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24055 and previous config saved to /var/cache/conftool/dbconfig/20220404-172659-ladsgroup.json
[17:27:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[17:27:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
[17:27:03] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[17:27:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24056 and previous config saved to /var/cache/conftool/dbconfig/20220404-172707-ladsgroup.json
[17:27:09] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
[17:27:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:30] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] docker: move pruning to new profile docker::prune [puppet] - 10https://gerrit.wikimedia.org/r/773641 (https://phabricator.wikimedia.org/T304644) (owner: 10Razzi)
[17:29:53] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] ci: docker system prune on ci::master [puppet] - 10https://gerrit.wikimedia.org/r/773784 (owner: 10Hashar)
[17:30:33] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
[17:30:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:33:10] <wikibugs>	 (03CR) 10RLazarus: "Note that PCC fails, but I think only because of the private Hiera lookup:" [puppet] - 10https://gerrit.wikimedia.org/r/776977 (https://phabricator.wikimedia.org/T299705) (owner: 10RLazarus)
[17:34:30] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
[17:34:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:42:57] <icinga-wm>	 PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-tails-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:45:13] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] gitlab: move backups to /mnt/gitlab-backup [puppet] - 10https://gerrit.wikimedia.org/r/776230 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto)
[17:52:39] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
[17:52:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:19] <icinga-wm>	 PROBLEM - SSH on aqs1007.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:08:06] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
[18:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:10] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[18:18:15] <icinga-wm>	 PROBLEM - Host mc2031.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:21:45] <icinga-wm>	 PROBLEM - SSH on wtp1025.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:25:19] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
[18:25:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:23] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
[18:25:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:28] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
[18:25:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:09] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
[18:26:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:29] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
[18:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:13] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (paramita_das) - https://phabricator.wikimedia.org/T305298 (10KFrancis) Hi all, I am confirming as Paramita Das is a contractor with the WMF, an NDA is already on file.  Please proceed with the access request.
[18:31:15] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:32:24] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
[18:32:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24057 and previous config saved to /var/cache/conftool/dbconfig/20220404-183227-ladsgroup.json
[18:32:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:31] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[18:33:20] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (Aitolkyn) - https://phabricator.wikimedia.org/T305299 (10KFrancis) @MoritzMuehlenhoff I am confirming the contractor NDA is already on file.  Please proceed with the access request.
[18:34:13] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
[18:34:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:29] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
[18:34:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:35:53] <wikibugs>	 10SRE, 10Traffic: Resolve issues with cp hosts and the reboot-single cookbook - https://phabricator.wikimedia.org/T305275 (10ssingh) 05Open→03Resolved We have tested the above two changes with six cp host reboots and there are no concerns, confirming that this issue has been fixed.  Thanks to everyone for...
[18:36:24] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
[18:36:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:52] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/776982
[18:37:53] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:38:15] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
[18:38:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:38:25] <wikibugs>	 (03PS2) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/776982
[18:38:42] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage2001.codfw.wmnet
[18:38:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:28] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
[18:39:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:44] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
[18:45:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:25] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
[18:46:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:55] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
[18:46:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:22] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
[18:47:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24058 and previous config saved to /var/cache/conftool/dbconfig/20220404-184733-ladsgroup.json
[18:47:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:37] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, and 2 others: New Service Request Generated Datasets: Image Suggestions Service - https://phabricator.wikimedia.org/T304891 (10CBogen)
[18:50:10] <wikibugs>	 (03PS3) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/776982
[18:50:19] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10Jclark-ctr) the two junipers are up now.   @cmooney
[18:51:27] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Thanks. LGTM. Good to go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/776257 (https://phabricator.wikimedia.org/T45956) (owner: 10Zabe)
[18:51:43] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
[18:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:52] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
[18:52:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:38] <wikibugs>	 (03PS4) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/776982
[18:55:50] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
[18:55:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:16] <wikibugs>	 (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/pcc-worker1001/34684/" [puppet] - 10https://gerrit.wikimedia.org/r/776982 (owner: 10Herron)
[18:59:17] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
[18:59:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:47] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
[19:01:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:33] <wikibugs>	 10SRE, 10SRE Observability: apifeatureusage hosts hanging on shutdown - https://phabricator.wikimedia.org/T305403 (10herron) p:05Triage→03Medium
[19:02:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24059 and previous config saved to /var/cache/conftool/dbconfig/20220404-190238-ladsgroup.json
[19:02:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:50] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
[19:02:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:05] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5005.eqsin.wmnet
[19:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:10] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, 10Service-deployment-requests: Setup Initial Image Suggestion Service CI and k8s params/stubs - https://phabricator.wikimedia.org/T305154 (10CBogen)
[19:06:11] <icinga-wm>	 PROBLEM - Check systemd state on cp5005 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter@frontend.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:06:42] <wikibugs>	 10SRE, 10Generated Data Platform, 10Image-Suggestions, 10serviceops, and 2 others: Blubber setup for Image Suggestions Service - https://phabricator.wikimedia.org/T305155 (10CBogen)
[19:07:27] <icinga-wm>	 RECOVERY - SSH on aqs1007.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:09:36] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
[19:09:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:09:58] <wikibugs>	 (03PS5) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403)
[19:10:15] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
[19:10:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:32] <wikibugs>	 (03PS6) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403)
[19:11:54] <wikibugs>	 (03PS2) 10Volans: sre.SREBatchBase: additional customizations [cookbooks] - 10https://gerrit.wikimedia.org/r/776965
[19:11:56] <wikibugs>	 (03PS2) 10Volans: sre.cdn.roll-restart-varnish: improvements [cookbooks] - 10https://gerrit.wikimedia.org/r/776966
[19:16:01] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
[19:16:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:48] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
[19:16:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:57] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-be
[19:16:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:59] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-tls
[19:17:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24060 and previous config saved to /var/cache/conftool/dbconfig/20220404-191743-ladsgroup.json
[19:17:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[19:17:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[19:17:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:47] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[19:17:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24061 and previous config saved to /var/cache/conftool/dbconfig/20220404-191750-ladsgroup.json
[19:17:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:21] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[19:21:23] <icinga-wm>	 RECOVERY - Check systemd state on mirror1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:22:47] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:22:48] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
[19:22:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:31] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[19:26:37] <icinga-wm>	 RECOVERY - SSH on wtp1045.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:29:09] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
[19:29:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:27] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt-wdqs1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:33:09] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
[19:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:33:41] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt-wdqs1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 andrew bogott work in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:33:42] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt-wdqs1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 andrew bogott work in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:33:42] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt-wdqs1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 andrew bogott work in progress https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:35:06] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
[19:35:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:31] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
[19:35:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:57] <icinga-wm>	 RECOVERY - Check systemd state on cp5005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:37:32] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
[19:37:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:37] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
[19:38:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:24] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
[19:39:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:23] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
[19:42:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:33] <icinga-wm>	 PROBLEM - Check systemd state on cp5005 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter@frontend.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:43:38] <wikibugs>	 (03CR) 10Cwhite: "As evidenced by this, it appears we haven't found the root cause of T275405.  I propose we undo those changes and do something like this i" [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403) (owner: 10Herron)
[19:43:59] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
[19:44:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:43] <icinga-wm>	 RECOVERY - Check systemd state on cp5005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:46:37] <wikibugs>	 10SRE, 10Observability-Logging, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3): apifeatureusage hosts hanging on shutdown - https://phabricator.wikimedia.org/T305403 (10colewhite)
[19:46:49] <wikibugs>	 (03PS7) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403)
[19:47:20] <wikibugs>	 10SRE, 10Observability-Logging, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3): apifeatureusage hosts hanging on shutdown - https://phabricator.wikimedia.org/T305403 (10colewhite)
[19:48:41] <wikibugs>	 (03PS8) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403)
[19:50:10] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
[19:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:10] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
[19:51:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:24] <wikibugs>	 10SRE, 10Observability-Logging, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3): apifeatureusage hosts hanging on shutdown - https://phabricator.wikimedia.org/T305403 (10herron)
[19:56:56] <wikibugs>	 (03PS1) 10RLazarus: slo: Set a custom description for the Varnish dashboard [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/776992 (https://phabricator.wikimedia.org/T302842)
[19:56:57] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
[19:56:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:18] <wikibugs>	 (03CR) 10Herron: "PCC seems confused at the moment, but looking at the full diffs this should do the right thing https://puppet-compiler.wmflabs.org/pcc-wor" [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403) (owner: 10Herron)
[19:58:39] <wikibugs>	 (03CR) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403) (owner: 10Herron)
[20:00:04] <jouncebot>	 RoanKattouw and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T2000).
[20:00:04] <jouncebot>	 AGueyte: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:26] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
[20:00:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:27] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp3060.esams.wmnet
[20:00:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:39] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
[20:00:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:02:42] * urbanecm is around, but he doesn't see AGueyte
[20:02:43] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (4) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[20:03:05] <Tran>	 She's joining
[20:03:08] <Tran>	 sorry for the delay
[20:03:26] <urbanecm>	 Tran: okay, I'll wait
[20:03:34] <AnaisGueyte>	 hello 
[20:03:43] <urbanecm>	 hello AnaisGueyte! I can deploy today
[20:04:00] <AnaisGueyte>	 Great, thanks! It's my first deploy :)
[20:04:38] <urbanecm>	 okay, good to know! Feel free to ask if there's anything unclear -- the process can be confusing at first. No question is stupid :)
[20:05:01] <urbanecm>	 AnaisGueyte: do you have https://wikitech.wikimedia.org/wiki/WikimediaDebug#Browser_usage installed for testing the change?
[20:05:38] <AnaisGueyte>	 Yes!
[20:05:52] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
[20:05:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:06:05] <urbanecm>	 okay, great!
[20:06:11] <wikibugs>	 (03PS2) 10Urbanecm: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774904 (https://phabricator.wikimedia.org/T296469) (owner: 10Tchanders)
[20:06:16] <urbanecm>	 I'll fetch the patch to the debug server now
[20:06:29] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Remove wgWMEIPAddressCopyActionEnabled from Beta and production config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774904 (https://phabricator.wikimedia.org/T296469) (owner: 10Tchanders)
[20:06:40] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403) (owner: 10Herron)
[20:07:19] <cjming>	 urbanecm: I have one backport in the hopper -- as soon as https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/776226 merges, ok if I do that to wmf.5 after you're done?
[20:07:28] <wikibugs>	 (03Merged) 10jenkins-bot: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/774904 (https://phabricator.wikimedia.org/T296469) (owner: 10Tchanders)
[20:07:32] <wikibugs>	 (03CR) 10Herron: logstash: set unit TimeoutStopSec of 2 minutes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776982 (https://phabricator.wikimedia.org/T305403) (owner: 10Herron)
[20:07:57] <urbanecm>	 cjming: sure thing. Will you want to self-serve?
[20:08:40] <urbanecm>	 AnaisGueyte: your patch is at mwdebug1001. Can you have a look, please?
[20:08:47] <AnaisGueyte>	 yes, thanks!
[20:08:55] <cjming>	 urbanecm: ah nvm - we want to test on beta cluster first so I'll schedule it for backport tomorrow instead
[20:09:03] <urbanecm>	 cjming: sounds good.
[20:09:08] <urbanecm>	 AnaisGueyte: let me know how it goes :)
[20:10:06] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
[20:10:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:10:57] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "LGTM!" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/776992 (https://phabricator.wikimedia.org/T302842) (owner: 10RLazarus)
[20:11:32] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
[20:11:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24062 and previous config saved to /var/cache/conftool/dbconfig/20220404-201409-ladsgroup.json
[20:14:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:14:12] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:14:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:14:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:13] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
[20:15:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:15:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:12] <AnaisGueyte>	 Hi @urbanecm. do you know if there's any reason the events log would not be fired on test wiki?
[20:16:36] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
[20:16:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:16:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:16:56] <urbanecm>	 AnaisGueyte: i'm confused. I thought the patch is meant to disable instrumentation at all wikis?
[20:17:48] <AnaisGueyte>	 It does, but it's not firing either on the non test server
[20:18:01] <urbanecm>	 ah
[20:18:16] <urbanecm>	 not from top of my head
[20:18:18] <urbanecm>	 but i can check
[20:19:46] <urbanecm>	 it does fire it outside of a debug server
[20:20:12] <urbanecm>	 i see a req to https://test.wikipedia.org/beacon/statsv?MediaWiki.ipinfo_address_copy.special_contributions=1c&MediaWiki.ipinfo_address_copy_by_wiki.testwiki.special_contributions=1c when i copy an address
[20:20:14] <wikibugs>	 (03CR) 10RLazarus: [V: 03+2 C: 03+2] slo: Set a custom description for the Varnish dashboard [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/776992 (https://phabricator.wikimedia.org/T302842) (owner: 10RLazarus)
[20:21:00] <urbanecm>	 AnaisGueyte: what do you observe?
[20:21:15] <urbanecm>	 (I hope my understanding of the instrumented action is correct)
[20:21:31] <AnaisGueyte>	 I see it now but it appears very delayed. Is that something I should expect?
[20:21:46] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
[20:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:07] <urbanecm>	 AnaisGueyte: okay, great. it can take a few seconds (AFAIK it tries to save some resources by batching events)
[20:23:54] <AnaisGueyte>	 Great, I wasn't expecting the delay, testing again! Thank you
[20:25:45] <urbanecm>	 no problem
[20:26:55] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
[20:26:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:02] <AnaisGueyte>	 Good, I don't see the event being fired on the test server, it appears to be successful, thank you @urbanecm
[20:28:17] <urbanecm>	 AnaisGueyte: that's great news :)
[20:28:19] <urbanecm>	 I'm deploying the change now
[20:29:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24063 and previous config saved to /var/cache/conftool/dbconfig/20220404-202914-ladsgroup.json
[20:29:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:29] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 8c81de9c732adef4537226ec6a7023fef40f3396: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config (T296469) (duration: 00m 51s)
[20:29:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:32] <stashbot>	 T296469: Log when a user copies an IP address - https://phabricator.wikimedia.org/T296469
[20:29:39] <urbanecm>	 AnaisGueyte: should be live now. 
[20:29:44] <urbanecm>	 anything else i can do for you today?
[20:29:44] <AnaisGueyte>	 Thank you!
[20:29:54] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
[20:29:55] <AnaisGueyte>	 Nope, that was a great first experience! Thanks!
[20:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:04] <urbanecm>	 happy to help! Talk to you later AnaisGueyte 
[20:30:10] <urbanecm>	 !log UTC late B&C window completed
[20:30:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:31:05] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
[20:31:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:02] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
[20:32:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:42] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
[20:37:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:11] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:40:25] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
[20:40:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:40:57] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
[20:40:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24064 and previous config saved to /var/cache/conftool/dbconfig/20220404-204419-ladsgroup.json
[20:44:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24065 and previous config saved to /var/cache/conftool/dbconfig/20220404-205924-ladsgroup.json
[20:59:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[20:59:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[20:59:29] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[20:59:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24066 and previous config saved to /var/cache/conftool/dbconfig/20220404-205932-ladsgroup.json
[20:59:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:41] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, 10Release-Engineering-Team (Radar): Need a service account on deploy servers - https://phabricator.wikimedia.org/T303857 (10thcipriani)
[21:00:04] <jouncebot>	 Reedy and sbassett: Dear deployers, time to do the Weekly Security deployment window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220404T2100).
[21:02:04] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
[21:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:27] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
[21:02:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:33] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] geoip::maxmind: rename the update timers, don't use 'legacy' term [puppet] - 10https://gerrit.wikimedia.org/r/773845 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[21:05:55] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
[21:05:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:42] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
[21:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:55] <wikibugs>	 (03CR) 10Dzahn: "[puppetmaster2003:~] $ sudo systemctl status geoip_update" [puppet] - 10https://gerrit.wikimedia.org/r/773845 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[21:11:30] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
[21:11:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:14:17] <mutante>	 !log puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464)
[21:14:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:14:20] <stashbot>	 T303464: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464
[21:14:48] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
[21:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:18:16] <wikibugs>	 (03CR) 10Dzahn: "only runs on puppetmaste1001, the active one, but works. confirmed. manually started etc" [puppet] - 10https://gerrit.wikimedia.org/r/773845 (https://phabricator.wikimedia.org/T303464) (owner: 10Dzahn)
[21:50:14] <wikibugs>	 (03PS1) 10Bking: elastic: don't wait for green on first node in cluster [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570)
[21:52:15] <wikibugs>	 (03PS2) 10Ryan Kemper: elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[21:53:47] <wikibugs>	 (03PS3) 10Ryan Kemper: elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[22:00:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[22:03:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json
[22:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:18] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:12:25] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[22:18:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json
[22:18:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:47] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[22:28:21] <icinga-wm>	 ACKNOWLEDGEMENT - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project andrew bogott investigating https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[22:31:07] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 3 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[22:32:30] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:33:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json
[22:33:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:46] <jinxer-wm>	 (BlazegraphJvmQuakeWarnGC) firing: Blazegraph instance wdqs2003:9100 is entering a GC death spiral - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphJvmQuakeWarnGC
[22:41:44] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10RobH)
[22:42:20] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10RobH)
[22:42:27] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10RobH)
[22:48:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json
[22:48:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[22:48:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[22:48:32] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[22:48:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json
[22:48:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:22] <wikibugs>	 (03PS4) 10Ryan Kemper: elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[22:59:33] <wikibugs>	 (03PS5) 10Ryan Kemper: elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[23:01:27] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[23:03:37] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[23:05:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elastic: don't wait for green on first node [software/spicerack] - 10https://gerrit.wikimedia.org/r/776999 (https://phabricator.wikimedia.org/T304570) (owner: 10Bking)
[23:35:43] <icinga-wm>	 RECOVERY - SSH on db2090.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:36:28] <wikibugs>	 (03PS4) 10Dzahn: aptrepo: import gitlab-runner package for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659)
[23:37:43] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] aptrepo: import gitlab-runner package for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[23:45:33] <wikibugs>	 (03CR) 10Dzahn: "[apt1001:~] $ sudo -E reprepro --component thirdparty/gitlab-runner checkupdate bullseye-wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/767604 (https://phabricator.wikimedia.org/T297659) (owner: 10Dzahn)
[23:48:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json
[23:48:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:48:55] <stashbot>	 T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched, user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T298565
[23:50:14] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3), 10WMF-NDA: non-wikimedia.org domain names for status page - https://phabricator.wikimedia.org/T293504 (10CDanis)
[23:51:38] <mutante>	 !log apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold  --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659)
[23:51:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:51:41] <stashbot>	 T297659: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659