[00:00:24] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[00:06:56] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1068886 (owner: 10TrainBranchBot)
[00:09:29] <jinxer-wm>	 FIRING: KubernetesDeploymentUnavailableReplicas: ...
[00:09:29] <jinxer-wm>	 Deployment k8s-controller-sidecars in sidecar-controller at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=sidecar-controller&var-deployment=k8s-controller-sidecars - ...
[00:09:29] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[00:09:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68253 and previous config saved to /var/cache/conftool/dbconfig/20240830-000950-ladsgroup.json
[00:09:55] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[00:10:52] <swfrench-wmf>	 ^ KubernetesDeploymentUnavailableReplicas alert is known issue, patch is available to fix
[00:13:32] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68254 and previous config saved to /var/cache/conftool/dbconfig/20240830-001331-ladsgroup.json
[00:13:34] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
[00:13:36] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[00:13:47] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
[00:13:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68255 and previous config saved to /var/cache/conftool/dbconfig/20240830-001353-ladsgroup.json
[00:15:13] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] site: add insetup configs for new logging-hd hosts [puppet] - 10https://gerrit.wikimedia.org/r/1062758 (https://phabricator.wikimedia.org/T372511) (owner: 10Cwhite)
[00:20:30] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 918.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:20:40] <wikibugs>	 (03PS4) 10Krinkle: Do not log failed autocreations on closed wikis as diagnostic errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1068879 (https://phabricator.wikimedia.org/T373650) (owner: 10Zabe)
[00:20:47] <wikibugs>	 (03CR) 10Krinkle: [C:03+1] Do not log failed autocreations on closed wikis as diagnostic errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1068879 (https://phabricator.wikimedia.org/T373650) (owner: 10Zabe)
[00:22:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68258 and previous config saved to /var/cache/conftool/dbconfig/20240830-002239-ladsgroup.json
[00:22:44] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[00:23:36] <wikibugs>	 (03PS1) 10Scott French: sre.switchdc.mediawiki: migrate to the class API [cookbooks] - 10https://gerrit.wikimedia.org/r/1068896
[00:23:36] <wikibugs>	 (03PS1) 10Scott French: sre.switchdc.mediawiki: add --task-id argument [cookbooks] - 10https://gerrit.wikimedia.org/r/1068897
[00:23:36] <wikibugs>	 (03PS1) 10Scott French: sre.switchdc.mediawiki: use admin reason in puppet disable [cookbooks] - 10https://gerrit.wikimedia.org/r/1068898
[00:23:36] <wikibugs>	 (03PS1) 10Scott French: sre.switchdc.mediawiki: record RO start/end in task [cookbooks] - 10https://gerrit.wikimedia.org/r/1068899
[00:24:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68259 and previous config saved to /var/cache/conftool/dbconfig/20240830-002457-ladsgroup.json
[00:25:30] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 815.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:37:47] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68260 and previous config saved to /var/cache/conftool/dbconfig/20240830-003746-ladsgroup.json
[00:40:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68261 and previous config saved to /var/cache/conftool/dbconfig/20240830-004004-ladsgroup.json
[00:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[00:52:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68262 and previous config saved to /var/cache/conftool/dbconfig/20240830-005254-ladsgroup.json
[00:55:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68263 and previous config saved to /var/cache/conftool/dbconfig/20240830-005512-ladsgroup.json
[00:55:14] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
[00:55:16] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[00:55:27] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
[00:55:34] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68264 and previous config saved to /var/cache/conftool/dbconfig/20240830-005534-ladsgroup.json
[01:08:01] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68265 and previous config saved to /var/cache/conftool/dbconfig/20240830-010801-ladsgroup.json
[01:08:03] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
[01:08:06] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[01:08:17] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
[01:08:24] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68266 and previous config saved to /var/cache/conftool/dbconfig/20240830-010823-ladsgroup.json
[01:16:27] <wikibugs>	 06SRE, 06Data-Persistence, 06serviceops, 07Datacenter-Switchover: Migrate sre.switchdc.mediawiki to spicerack class API - https://phabricator.wikimedia.org/T328908#10105107 (10Scott_French) a:03Scott_French
[01:17:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68267 and previous config saved to /var/cache/conftool/dbconfig/20240830-011721-ladsgroup.json
[01:17:27] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[01:20:45] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68268 and previous config saved to /var/cache/conftool/dbconfig/20240830-012044-ladsgroup.json
[01:20:49] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[01:32:29] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68269 and previous config saved to /var/cache/conftool/dbconfig/20240830-013229-ladsgroup.json
[01:35:52] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68270 and previous config saved to /var/cache/conftool/dbconfig/20240830-013551-ladsgroup.json
[01:47:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68271 and previous config saved to /var/cache/conftool/dbconfig/20240830-014736-ladsgroup.json
[01:50:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68272 and previous config saved to /var/cache/conftool/dbconfig/20240830-015059-ladsgroup.json
[02:02:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68273 and previous config saved to /var/cache/conftool/dbconfig/20240830-020243-ladsgroup.json
[02:02:45] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
[02:02:48] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[02:02:59] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
[02:03:06] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68274 and previous config saved to /var/cache/conftool/dbconfig/20240830-020305-ladsgroup.json
[02:06:06] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68275 and previous config saved to /var/cache/conftool/dbconfig/20240830-020606-ladsgroup.json
[02:06:11] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[02:12:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68276 and previous config saved to /var/cache/conftool/dbconfig/20240830-021225-ladsgroup.json
[02:12:30] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[02:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[02:27:33] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68277 and previous config saved to /var/cache/conftool/dbconfig/20240830-022732-ladsgroup.json
[02:36:28] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:42:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68278 and previous config saved to /var/cache/conftool/dbconfig/20240830-024239-ladsgroup.json
[02:57:47] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68279 and previous config saved to /var/cache/conftool/dbconfig/20240830-025747-ladsgroup.json
[02:57:49] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
[02:57:55] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[02:58:02] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
[02:58:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68280 and previous config saved to /var/cache/conftool/dbconfig/20240830-025809-ladsgroup.json
[03:01:28] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68281 and previous config saved to /var/cache/conftool/dbconfig/20240830-030602-ladsgroup.json
[03:06:07] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[03:21:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68282 and previous config saved to /var/cache/conftool/dbconfig/20240830-032109-ladsgroup.json
[03:26:30] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 6.25% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[03:31:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 0% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[03:36:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68283 and previous config saved to /var/cache/conftool/dbconfig/20240830-033616-ladsgroup.json
[03:51:24] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68284 and previous config saved to /var/cache/conftool/dbconfig/20240830-035123-ladsgroup.json
[03:51:26] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
[03:51:28] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[03:51:38] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
[04:00:24] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[04:00:36] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
[04:00:49] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
[04:00:56] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68285 and previous config saved to /var/cache/conftool/dbconfig/20240830-040055-ladsgroup.json
[04:01:00] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[04:09:29] <jinxer-wm>	 FIRING: KubernetesDeploymentUnavailableReplicas: ...
[04:09:29] <jinxer-wm>	 Deployment k8s-controller-sidecars in sidecar-controller at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=sidecar-controller&var-deployment=k8s-controller-sidecars - ...
[04:09:29] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[04:09:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68286 and previous config saved to /var/cache/conftool/dbconfig/20240830-040957-ladsgroup.json
[04:10:03] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[04:25:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68287 and previous config saved to /var/cache/conftool/dbconfig/20240830-042505-ladsgroup.json
[04:40:13] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68288 and previous config saved to /var/cache/conftool/dbconfig/20240830-044012-ladsgroup.json
[04:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[04:55:20] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68289 and previous config saved to /var/cache/conftool/dbconfig/20240830-045519-ladsgroup.json
[04:55:25] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[05:34:40] <logmsgbot>	 !log kcvelaga@deploy1003 Started deploy [airflow-dags/analytics_product@0321fda]: (no justification provided)
[05:35:13] <logmsgbot>	 !log kcvelaga@deploy1003 Finished deploy [airflow-dags/analytics_product@0321fda]: (no justification provided) (duration: 00m 32s)
[05:45:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 28533472 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[05:46:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 48056 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240830T0600)
[06:04:36] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:09:36] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[06:34:51] <wikibugs>	 (03Abandoned) 10Arnaudb: debug: printing results when return object count > 1 [software/conftool] - 10https://gerrit.wikimedia.org/r/971437 (https://phabricator.wikimedia.org/T350656) (owner: 10Arnaudb)
[06:40:59] <wikibugs>	 (03CR) 10Slavina Stefanova: [C:03+1] P:toolforge::bastion: Re-install joe [puppet] - 10https://gerrit.wikimedia.org/r/1059451 (https://phabricator.wikimedia.org/T371556) (owner: 10Majavah)
[06:55:15] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "Nice find, thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068869 (https://phabricator.wikimedia.org/T362978) (owner: 10Scott French)
[06:56:26] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] global_config: Add pki::multirootca IPs to external-services [puppet] - 10https://gerrit.wikimedia.org/r/1068754 (https://phabricator.wikimedia.org/T337928) (owner: 10JMeybohm)
[06:59:42] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] sre.k8s.pool-depool-node: Check calico and fix phab [cookbooks] - 10https://gerrit.wikimedia.org/r/1068007 (owner: 10Clément Goubert)
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240830T0700)
[07:08:53] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[07:09:00] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[07:09:01] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[07:09:24] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[07:09:25] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[07:09:36] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[07:09:38] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[07:10:08] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[07:10:10] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[07:10:22] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[07:10:24] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
[07:10:56] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
[07:10:57] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
[07:11:30] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
[07:11:31] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:11:46] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:11:47] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:11:57] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:22:27] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 52965
[07:22:53] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52965
[07:24:06] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format db2234, db2236, db2237 [puppet] - 10https://gerrit.wikimedia.org/r/1069106
[07:27:04] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format db2234, db2236, db2237 [puppet] - 10https://gerrit.wikimedia.org/r/1069106 (owner: 10Marostegui)
[07:32:12] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Rename mw229[567] to wikikube-worker205[234] [puppet] - 10https://gerrit.wikimedia.org/r/1068833 (https://phabricator.wikimedia.org/T372878)
[07:35:10] <wikibugs>	 (03Abandoned) 10Alexandros Kosiaris: Rename mw229[567] to wikikube-worker205[123] [puppet] - 10https://gerrit.wikimedia.org/r/1068824 (https://phabricator.wikimedia.org/T372878) (owner: 10Alexandros Kosiaris)
[07:35:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Rename mw229[567] to wikikube-worker205[234] [puppet] - 10https://gerrit.wikimedia.org/r/1068833 (https://phabricator.wikimedia.org/T372878) (owner: 10Alexandros Kosiaris)
[07:36:39] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2295 to wikikube-worker2052
[07:36:56] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[07:37:57] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] k8s-controller-sidecars: adopt securityContext [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068869 (https://phabricator.wikimedia.org/T362978) (owner: 10Scott French)
[07:39:51] <icinga-wm>	 PROBLEM - SSH on wdqs1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:41:27] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:42:11] <wikibugs>	 (03PS1) 10Slyngshede: data.yaml: Extend expiry date for account. [puppet] - 10https://gerrit.wikimedia.org/r/1069113
[07:45:30] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 0% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[07:45:33] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:46:20] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] data.yaml Update email address. [puppet] - 10https://gerrit.wikimedia.org/r/1068673 (owner: 10Slyngshede)
[07:48:31] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-wikifunctions (k8s) 4.582s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[07:50:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 0% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[07:53:31] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-wikifunctions (k8s) 4.589s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[07:54:40] <wikibugs>	 (03PS1) 10Slyngshede: cloudweb2002-dev: Add dummy secrets from IDP on cloudweb2002-dev. [labs/private] - 10https://gerrit.wikimedia.org/r/1069114
[07:56:59] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 443, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:00:18] <wikibugs>	 (03PS2) 10Slyngshede: cloudweb2002-dev: Add dummy secrets for IDP on cloudweb2002-dev. [labs/private] - 10https://gerrit.wikimedia.org/r/1069114
[08:00:24] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[08:04:01] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 525, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:05:31] <icinga-wm>	 PROBLEM - SSH on wdqs1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:07:07] <icinga-wm>	 PROBLEM - SSH on wdqs2024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:09:29] <jinxer-wm>	 FIRING: KubernetesDeploymentUnavailableReplicas: ...
[08:09:29] <jinxer-wm>	 Deployment k8s-controller-sidecars in sidecar-controller at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=sidecar-controller&var-deployment=k8s-controller-sidecars - ...
[08:09:29] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[08:18:49] <icinga-wm>	 PROBLEM - SSH on wdqs1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:21:22] <wikibugs>	 (03PS3) 10Slyngshede: R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[08:21:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[08:23:12] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
[08:23:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
[08:23:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:23:43] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
[08:24:09] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
[08:24:38] <wikibugs>	 (03PS2) 10Klausman: manifests: move new GPU hosts in eqiad from insetup to worker role [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432)
[08:24:47] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2295 to wikikube-worker2052
[08:25:03] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105411 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2295 to...
[08:26:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2296 to wikikube-worker2053
[08:26:37] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[08:27:40] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3788/co" [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[08:28:23] <wikibugs>	 (03PS2) 10Dreamy Jazz: Remove wgCheckUserPurgeOldClientHintsData [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1064458 (https://phabricator.wikimedia.org/T359560)
[08:28:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:29:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1064458 (https://phabricator.wikimedia.org/T359560) (owner: 10Dreamy Jazz)
[08:29:31] <icinga-wm>	 PROBLEM - SSH on wdqs1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:29:54] <jinxer-wm>	 FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[08:33:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:35:38] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
[08:36:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
[08:36:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:36:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
[08:36:16] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
[08:36:55] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2296 to wikikube-worker2053
[08:37:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105431 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2296 to wikikube-worker2053 c...
[08:37:12] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2297 to wikikube-worker2054
[08:37:29] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[08:42:04] <wikibugs>	 (03CR) 10Elukey: [C:03+1] data.yaml: Extend expiry date for account. [puppet] - 10https://gerrit.wikimedia.org/r/1069113 (owner: 10Slyngshede)
[08:42:28] <wikibugs>	 (03CR) 10Elukey: [C:03+1] cloudweb2002-dev: Add dummy secrets for IDP on cloudweb2002-dev. [labs/private] - 10https://gerrit.wikimedia.org/r/1069114 (owner: 10Slyngshede)
[08:43:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:43:49] <wikibugs>	 (03PS1) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[08:46:10] <wikibugs>	 (03CR) 10Elukey: [C:03+2] services: update Thumbor's Docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068819 (https://phabricator.wikimedia.org/T373618) (owner: 10Elukey)
[08:47:42] <logmsgbot>	 !log jnuche@deploy1003 Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
[08:48:02] <logmsgbot>	 !log jnuche@deploy1003 Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 20s)
[08:48:40] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:48:41] <wikibugs>	 (03PS3) 10Arnaudb: mysql: replication lag monitoring threshold and severity change [alerts] - 10https://gerrit.wikimedia.org/r/1053689 (https://phabricator.wikimedia.org/T367278)
[08:49:06] <wikibugs>	 (03CR) 10Arnaudb: mysql: replication lag monitoring threshold and severity change (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1053689 (https://phabricator.wikimedia.org/T367278) (owner: 10Arnaudb)
[08:49:28] <wikibugs>	 (03PS2) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[08:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[08:49:47] <wikibugs>	 (03CR) 10Tiziano Fogli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[08:50:05] <logmsgbot>	 !log jnuche@deploy1003 Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
[08:50:21] <logmsgbot>	 !log elukey@deploy1003 helmfile [staging] START helmfile.d/services/thumbor: sync
[08:50:24] <logmsgbot>	 !log elukey@deploy1003 helmfile [staging] DONE helmfile.d/services/thumbor: sync
[08:50:46] <logmsgbot>	 !log jnuche@deploy1003 Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 41s)
[08:51:10] <jinxer-wm>	 FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:52:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[08:52:55] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
[08:53:40] <jinxer-wm>	 RESOLVED: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:55:11] <jinxer-wm>	 FIRING: [2x] RdfStreamingUpdaterFlinkProcessingLatencyIsHigh: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[08:55:25] <icinga-wm>	 RECOVERY - SSH on wdqs1023 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:55:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
[08:55:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:55:29] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
[08:55:39] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
[08:56:18] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2297 to wikikube-worker2054
[08:56:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105470 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2297 to wikikube-worker2054 c...
[08:58:03] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3789/co" [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[08:58:40] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:59:20] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2052.codfw.wmnet with OS bullseye
[08:59:30] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2052
[08:59:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105475 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host wikikube-worker2052.co...
[08:59:34] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[09:00:11] <jinxer-wm>	 RESOLVED: [2x] RdfStreamingUpdaterFlinkProcessingLatencyIsHigh: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[09:00:28] <icinga-wm>	 RECOVERY - SSH on wdqs2024 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:02:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2053.codfw.wmnet with OS bullseye
[09:02:34] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105492 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host wikikube-worker2053.co...
[09:02:42] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
[09:02:46] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
[09:02:47] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:02:47] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:02:50] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:02:50] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
[09:03:02] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
[09:03:03] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2052
[09:03:03] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2053
[09:03:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[09:03:38] <icinga-wm>	 PROBLEM - SSH on wdqs2024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:03:40] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:04:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2054.codfw.wmnet with OS bullseye
[09:04:09] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105509 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host wikikube-worker2054.co...
[09:05:20] <icinga-wm>	 PROBLEM - Host mw2295 is DOWN: PING CRITICAL - Packet loss = 100%
[09:05:48] <icinga-wm>	 RECOVERY - SSH on wdqs1022 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:06:08] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:06:08] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:06:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
[09:06:27] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
[09:06:27] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:06:27] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:06:30] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:06:31] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
[09:06:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
[09:06:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2053
[09:07:11] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2054
[09:07:17] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[09:07:30] <icinga-wm>	 RECOVERY - SSH on wdqs2024 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:08:40] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:10:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
[09:10:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
[09:10:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:10:28] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:10:31] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[09:10:32] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
[09:10:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
[09:10:42] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2054
[09:12:04] <wikibugs>	 (03CR) 10JMeybohm: sre.k8s.renumber-node: vlan, IP change k8s workers (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989 (owner: 10Clément Goubert)
[09:12:44] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3790/console" [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[09:13:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:13:42] <icinga-wm>	 PROBLEM - SSH on wdqs2024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:18:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:19:30] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
[09:21:21] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/thumbor: sync
[09:22:39] <icinga-wm>	 RECOVERY - SSH on wdqs2024 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:22:44] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
[09:22:55] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
[09:23:09] <icinga-wm>	 RECOVERY - BFD status on cr1-esams is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[09:23:40] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:25:31] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
[09:27:02] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
[09:27:02] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/thumbor: sync
[09:31:46] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
[09:33:45] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Update cfss-issuer charts to v0.4.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068028 (https://phabricator.wikimedia.org/T337928) (owner: 10JMeybohm)
[09:34:18] <wikibugs>	 (03CR) 10Klausman: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3791/co" [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[09:34:48] <icinga-wm>	 RECOVERY - SSH on wdqs1024 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:35:02] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] jaeger: add securityContext configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068034 (https://phabricator.wikimedia.org/T369491) (owner: 10Elukey)
[09:37:24] <wikibugs>	 (03Merged) 10jenkins-bot: Update cfss-issuer charts to v0.4.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068028 (https://phabricator.wikimedia.org/T337928) (owner: 10JMeybohm)
[09:38:03] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] "It does. We're using that for generic http probes (in service::catalog) already." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066718 (https://phabricator.wikimedia.org/T373192) (owner: 10JMeybohm)
[09:39:06] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[09:39:13] <wikibugs>	 (03Merged) 10jenkins-bot: eventgate: Offer readinessProbe that does not test kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066718 (https://phabricator.wikimedia.org/T373192) (owner: 10JMeybohm)
[09:42:26] <wikibugs>	 (03PS1) 10Elukey: services: update Proton's Docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069123 (https://phabricator.wikimedia.org/T373665)
[09:42:36] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[09:42:45] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2052.codfw.wmnet with OS bullseye
[09:42:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105549 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2052.codfw....
[09:43:13] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[09:43:34] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[09:44:34] <wikibugs>	 (03CR) 10Elukey: [C:03+2] services: update Proton's Docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069123 (https://phabricator.wikimedia.org/T373665) (owner: 10Elukey)
[09:44:36] <icinga-wm>	 RECOVERY - SSH on wdqs1021 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:45:36] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2053.codfw.wmnet with OS bullseye
[09:45:48] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105552 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2053.codfw....
[09:47:18] <wikibugs>	 (03CR) 10Clément Goubert: sre.k8s.renumber-node: vlan, IP change k8s workers (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989 (owner: 10Clément Goubert)
[09:48:01] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks D7 & D8 from asw to lsw - https://phabricator.wikimedia.org/T373105#10105554 (10Jelto)
[09:48:14] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks D5 & D6 from asw to lsw - https://phabricator.wikimedia.org/T373104#10105555 (10Jelto)
[09:48:14] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] sre.k8s.pool-depool-node: Check calico and fix phab [cookbooks] - 10https://gerrit.wikimedia.org/r/1068007 (owner: 10Clément Goubert)
[09:48:24] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10105557 (10Jelto)
[09:48:32] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10105559 (10Jelto)
[09:48:40] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:48:45] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10105560 (10Jelto)
[09:49:44] <wikibugs>	 (03PS3) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[09:51:03] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2054.codfw.wmnet with OS bullseye
[09:51:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10105563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2054.codfw....
[09:52:34] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to <ENTER RESOURCE NAME> for <ENTER YOUR USERNAME> - https://phabricator.wikimedia.org/T373666 (10zoe) 03NEW
[09:52:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[09:52:55] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for zoe - https://phabricator.wikimedia.org/T373666#10105577 (10zoe)
[09:55:40] <logmsgbot>	 !log elukey@deploy1003 helmfile [staging] START helmfile.d/services/proton: sync
[09:56:34] <logmsgbot>	 !log elukey@deploy1003 helmfile [staging] DONE helmfile.d/services/proton: sync
[09:56:47] <wikibugs>	 (03PS4) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[09:58:15] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/proton: sync
[09:59:23] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/proton: sync
[10:01:09] <wikibugs>	 (03Merged) 10jenkins-bot: sre.k8s.pool-depool-node: Check calico and fix phab [cookbooks] - 10https://gerrit.wikimedia.org/r/1068007 (owner: 10Clément Goubert)
[10:01:31] <wikibugs>	 (03CR) 10Tiziano Fogli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[10:04:22] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] START helmfile.d/services/proton: sync
[10:06:08] <logmsgbot>	 !log elukey@deploy1003 helmfile [eqiad] DONE helmfile.d/services/proton: sync
[10:10:30] <wikibugs>	 (03PS5) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[10:11:53] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Looks good, and the service exists" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068768 (https://phabricator.wikimedia.org/T359423) (owner: 10JMeybohm)
[10:15:07] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] data.yaml: Extend expiry date for account. [puppet] - 10https://gerrit.wikimedia.org/r/1069113 (owner: 10Slyngshede)
[10:16:06] <wikibugs>	 (03CR) 10Slyngshede: [V:03+2 C:03+2] cloudweb2002-dev: Add dummy secrets for IDP on cloudweb2002-dev. [labs/private] - 10https://gerrit.wikimedia.org/r/1069114 (owner: 10Slyngshede)
[10:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[10:19:57] <wikibugs>	 (03PS4) 10Slyngshede: R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[10:25:00] <wikibugs>	 (03CR) 10Jaime Nuche: "Following up on this. In the end we changed the `jenkins-deploy` deployment repo so that in the future only the change in puppet is necess" [puppet] - 10https://gerrit.wikimedia.org/r/884887 (https://phabricator.wikimedia.org/T323909) (owner: 10Jaime Nuche)
[10:27:02] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2052.codfw.wmnet
[10:27:02] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2052.codfw.wmnet
[10:27:05] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2053.codfw.wmnet
[10:27:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2053.codfw.wmnet
[10:27:08] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2054.codfw.wmnet
[10:27:09] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2054.codfw.wmnet
[10:28:16] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes mw2295,mw2296,mw2297 - https://phabricator.wikimedia.org/T373669 (10akosiaris) 03NEW
[10:31:29] <wikibugs>	 (03PS5) 10Slyngshede: R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[10:32:16] <wikibugs>	 (03PS6) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[10:32:59] <wikibugs>	 (03CR) 10Tiziano Fogli: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[10:34:30] <wikibugs>	 (03PS6) 10Slyngshede: R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[10:35:24] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[10:38:26] <wikibugs>	 (03PS7) 10Tiziano Fogli: ripeatlas: move measurements checks to prom/alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506)
[10:39:24] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: EX4600 does not support class-of-service 'port scheduling' - https://phabricator.wikimedia.org/T373594#10105707 (10cmooney) 05Open→03Resolved Updated config is applied on asw2-ulsfo since yesterday and not showing signs of problems....
[10:39:53] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Move sretest2002 primary uplink to asw-d4-codfw - https://phabricator.wikimedia.org/T370475#10105731 (10cmooney) This all is working fine thank you @Jhancock.wm   Unless there is an issue I'll leave this task open for tidy-up when we a...
[10:44:00] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2381.codfw.wmnet
[10:44:32] <Emperor>	 !log restart swift-proxy on ms-fe2009 and ms-fe2014 T360913
[10:44:34] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2381.codfw.wmnet
[10:44:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:36] <stashbot>	 T360913: Swift proxy server misbehaviour (no longer calling `accept`?) - https://phabricator.wikimedia.org/T360913
[10:45:43] <wikibugs>	 (03PS1) 10Hnowlan: k8s: rename mw2381 to wikikube-worker2055 [puppet] - 10https://gerrit.wikimedia.org/r/1069144 (https://phabricator.wikimedia.org/T372878)
[10:52:31] <wikibugs>	 (03PS9) 10Ladsgroup: mediawiki: Add schema file and test for tables catalog [puppet] - 10https://gerrit.wikimedia.org/r/1068817 (https://phabricator.wikimedia.org/T363581)
[10:52:35] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] mediawiki: Add schema file and test for tables catalog [puppet] - 10https://gerrit.wikimedia.org/r/1068817 (https://phabricator.wikimedia.org/T363581) (owner: 10Ladsgroup)
[10:53:42] <wikibugs>	 (03PS1) 10Jgiannelos: changeprop: Update references to latest beta restbase node [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069145 (https://phabricator.wikimedia.org/T370460)
[10:56:10] <wikibugs>	 (03PS1) 10Jgiannelos: Update references to latest beta restbase node [puppet] - 10https://gerrit.wikimedia.org/r/1069148 (https://phabricator.wikimedia.org/T370460)
[11:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240830T0700)
[11:00:05] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: GitLab version upgrades (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240830T1100). Please do the needful.
[11:00:46] <wikibugs>	 (03Abandoned) 10Ladsgroup: [DNM] Test the table schema [puppet] - 10https://gerrit.wikimedia.org/r/1068818 (owner: 10Ladsgroup)
[11:03:14] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[11:03:27] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[11:03:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68290 and previous config saved to /var/cache/conftool/dbconfig/20240830-110334-ladsgroup.json
[11:03:39] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[11:03:49] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[11:04:02] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[11:04:04] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[11:04:19] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[11:04:27] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68291 and previous config saved to /var/cache/conftool/dbconfig/20240830-110426-ladsgroup.json
[11:04:31] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[11:11:44] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] k8s: rename mw2381 to wikikube-worker2055 [puppet] - 10https://gerrit.wikimedia.org/r/1069144 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[11:22:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68292 and previous config saved to /var/cache/conftool/dbconfig/20240830-112159-ladsgroup.json
[11:22:04] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[11:22:39] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] k8s: rename mw2381 to wikikube-worker2055 [puppet] - 10https://gerrit.wikimedia.org/r/1069144 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[11:24:55] <wikibugs>	 (03PS5) 10Clément Goubert: interactive: Ring the bell by default in ask_input [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1069136
[11:24:55] <wikibugs>	 (03CR) 10Clément Goubert: "CI failure seems unrelated" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1069136 (owner: 10Clément Goubert)
[11:28:47] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.rename from mw2381 to wikikube-worker2055
[11:29:06] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.dns.netbox
[11:34:14] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
[11:35:08] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
[11:35:09] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:35:10] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
[11:35:45] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68293 and previous config saved to /var/cache/conftool/dbconfig/20240830-113544-ladsgroup.json
[11:35:52] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[11:37:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68294 and previous config saved to /var/cache/conftool/dbconfig/20240830-113706-ladsgroup.json
[11:39:42] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
[11:40:23] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2381 to wikikube-worker2055
[11:40:29] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106002 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin2002 from mw2381 to wikikube-worker2055 com...
[11:41:59] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2055.codfw.wmnet with OS bullseye
[11:42:09] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2055
[11:42:36] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106009 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker2055.codf...
[11:42:52] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[11:46:21] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
[11:46:25] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
[11:46:25] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:46:25] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[11:46:28] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[11:46:29] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
[11:46:44] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
[11:46:44] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2055
[11:50:00] <wikibugs>	 (03PS1) 10Hnowlan: k8s: rename mw2382 to wikikube-worker2056 [puppet] - 10https://gerrit.wikimedia.org/r/1069151 (https://phabricator.wikimedia.org/T372878)
[11:50:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68295 and previous config saved to /var/cache/conftool/dbconfig/20240830-115052-ladsgroup.json
[11:52:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68296 and previous config saved to /var/cache/conftool/dbconfig/20240830-115213-ladsgroup.json
[11:52:48] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] k8s: rename mw2382 to wikikube-worker2056 [puppet] - 10https://gerrit.wikimedia.org/r/1069151 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[11:55:27] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] k8s: rename mw2382 to wikikube-worker2056 [puppet] - 10https://gerrit.wikimedia.org/r/1069151 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[11:56:50] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2382.codfw.wmnet
[11:57:28] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2382.codfw.wmnet
[11:59:16] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 435, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:00:48] <wikibugs>	 (03CR) 10Slyngshede: [V:03+2 C:03+2] Fix syntax error [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1067988 (owner: 10Slyngshede)
[12:00:51] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.rename from mw2382 to wikikube-worker2056
[12:01:08] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[12:02:55] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
[12:04:39] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
[12:06:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68297 and previous config saved to /var/cache/conftool/dbconfig/20240830-120559-ladsgroup.json
[12:06:11] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
[12:07:21] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68298 and previous config saved to /var/cache/conftool/dbconfig/20240830-120720-ladsgroup.json
[12:07:23] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[12:07:25] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[12:07:36] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[12:07:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68299 and previous config saved to /var/cache/conftool/dbconfig/20240830-120742-ladsgroup.json
[12:08:43] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
[12:08:44] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:08:44] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
[12:09:19] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
[12:09:29] <jinxer-wm>	 FIRING: KubernetesDeploymentUnavailableReplicas: ...
[12:09:29] <jinxer-wm>	 Deployment k8s-controller-sidecars in sidecar-controller at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=sidecar-controller&var-deployment=k8s-controller-sidecars - ...
[12:09:29] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[12:09:57] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2382 to wikikube-worker2056
[12:10:07] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106087 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2382 to w...
[12:11:50] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet on all recursors
[12:11:53] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet on all recursors
[12:12:20] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet on all recursors
[12:12:23] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet on all recursors
[12:13:00] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2056.codfw.wmnet with OS bullseye
[12:13:09] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2056
[12:13:17] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106095 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi...
[12:13:17] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[12:13:32] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 517, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:15:39] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2377.codfw.wmnet
[12:16:17] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2377.codfw.wmnet
[12:16:21] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2378.codfw.wmnet
[12:17:02] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
[12:17:02] <wikibugs>	 (03PS4) 10Arnaudb: mysql: replication lag monitoring threshold and severity change [alerts] - 10https://gerrit.wikimedia.org/r/1053689 (https://phabricator.wikimedia.org/T367278)
[12:17:03] <wikibugs>	 (03CR) 10Arnaudb: "thanks for the feedback, hopefully this PS covers everything" [alerts] - 10https://gerrit.wikimedia.org/r/1053689 (https://phabricator.wikimedia.org/T367278) (owner: 10Arnaudb)
[12:17:06] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
[12:17:07] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:17:07] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:17:10] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:17:10] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
[12:17:26] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
[12:17:26] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2056
[12:19:20] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Rename mw237[789] to wikikube-worker205[789] [puppet] - 10https://gerrit.wikimedia.org/r/1069164 (https://phabricator.wikimedia.org/T372878)
[12:19:31] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2378.codfw.wmnet
[12:19:35] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2379.codfw.wmnet
[12:20:09] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2379.codfw.wmnet
[12:20:18] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:20:30] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:21:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68300 and previous config saved to /var/cache/conftool/dbconfig/20240830-122106-ladsgroup.json
[12:21:09] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[12:21:11] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[12:21:32] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[12:21:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68301 and previous config saved to /var/cache/conftool/dbconfig/20240830-122139-ladsgroup.json
[12:24:44] <hnowlan>	 !log homer 'lsw1-a3-codfw*' commit 
[12:24:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:31] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2055.codfw.wmnet with OS bullseye
[12:25:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106126 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[12:26:34] <icinga-wm>	 PROBLEM - Host mw2296 is DOWN: PING CRITICAL - Packet loss = 100%
[12:27:46] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2055.codfw.wmnet
[12:27:48] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2055.codfw.wmnet
[12:27:52] <icinga-wm>	 PROBLEM - BGP status on lsw1-a3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:28:59] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106131 (10hnowlan)
[12:29:54] <jinxer-wm>	 FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[12:31:30] <wikibugs>	 (03PS1) 10Slyngshede: P:idp Reallow CAS 6.6 to be installed. [puppet] - 10https://gerrit.wikimedia.org/r/1069165
[12:32:38] <wikibugs>	 (03PS7) 10Slyngshede: R:codfw1dev:cloudweb [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[12:33:41] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3795/co" [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[12:33:56] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
[12:35:51] <wikibugs>	 (03CR) 10JMeybohm: sre.k8s.renumber-node: vlan, IP change k8s workers (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989 (owner: 10Clément Goubert)
[12:36:44] <wikibugs>	 (03PS8) 10Slyngshede: R:codfw1dev:cloudweb Add CAS IDP installation. [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[12:37:39] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
[12:39:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] P:idp Reallow CAS 6.6 to be installed. [puppet] - 10https://gerrit.wikimedia.org/r/1069165 (owner: 10Slyngshede)
[12:39:34] <wikibugs>	 (03PS9) 10Slyngshede: R:codfw1dev:cloudweb Add CAS IDP installation. [puppet] - 10https://gerrit.wikimedia.org/r/1068786
[12:40:31] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3796/co" [puppet] - 10https://gerrit.wikimedia.org/r/1069165 (owner: 10Slyngshede)
[12:41:29] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for zoe - https://phabricator.wikimedia.org/T373666#10106174 (10ssingh)
[12:41:46] <wikibugs>	 (03PS1) 10Stevemunene: Update airflow-test-k8s image to include authlib [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069166 (https://phabricator.wikimedia.org/T368760)
[12:42:07] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM as long as traffic are happy with it!" [puppet] - 10https://gerrit.wikimedia.org/r/1006063 (https://phabricator.wikimedia.org/T358260) (owner: 10Cathal Mooney)
[12:42:35] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Rename mw237[789] to wikikube-worker205[789] [puppet] - 10https://gerrit.wikimedia.org/r/1069164 (https://phabricator.wikimedia.org/T372878) (owner: 10Alexandros Kosiaris)
[12:42:45] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for zoe - https://phabricator.wikimedia.org/T373666#10106173 (10ssingh) @thcipriani: This requires your approval as well, in addition to @VPuffetMichel. Thanks!
[12:46:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68302 and previous config saved to /var/cache/conftool/dbconfig/20240830-124617-ladsgroup.json
[12:46:22] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[12:47:55] <wikibugs>	 (03PS10) 10Andrew Bogott: R:codfw1dev:cloudweb Add CAS IDP installation. [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[12:47:57] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[12:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[12:52:00] <wikibugs>	 (03PS2) 10Slyngshede: P:idp Reallow CAS 6.6 to be installed. [puppet] - 10https://gerrit.wikimedia.org/r/1069165
[12:52:50] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3797/console" [puppet] - 10https://gerrit.wikimedia.org/r/1069165 (owner: 10Slyngshede)
[12:53:58] <icinga-wm>	 RECOVERY - BGP status on lsw1-a3-codfw.mgmt is OK: BGP OK - up: 36, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:54:27] <wikibugs>	 (03PS11) 10Andrew Bogott: R:codfw1dev:cloudweb Add CAS IDP installation. [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[12:56:56] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2056.codfw.wmnet with OS bullseye
[12:57:07] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106208 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[12:58:01] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3798/console" [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[12:59:15] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2377 to wikikube-worker2057
[12:59:32] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:00:35] <wikibugs>	 (03PS1) 10Ssingh: admin: add zoe to deployment (move from ldap_only_users) [puppet] - 10https://gerrit.wikimedia.org/r/1069175 (https://phabricator.wikimedia.org/T373666)
[13:01:25] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68303 and previous config saved to /var/cache/conftool/dbconfig/20240830-130124-ladsgroup.json
[13:02:48] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
[13:04:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
[13:04:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:04:06] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
[13:04:17] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
[13:04:56] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2377 to wikikube-worker2057
[13:05:08] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106215 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2377 to...
[13:08:24] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: puppetserver1002 thrashing and requiring a power cycle as a result - https://phabricator.wikimedia.org/T373527#10106218 (10elukey) After checking the JVM's [[ https://grafana-rw.wikimedia.org/d/e0f6afe3-2aea-483d-9f5e-55f0cba9207f/puppetserver?orgId=1&...
[13:13:31] <wikibugs>	 (03PS1) 10Elukey: profile::puppetserver: set java_start_mem to 40g [puppet] - 10https://gerrit.wikimedia.org/r/1069185 (https://phabricator.wikimedia.org/T373527)
[13:14:31] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3799/co" [puppet] - 10https://gerrit.wikimedia.org/r/1069185 (https://phabricator.wikimedia.org/T373527) (owner: 10Elukey)
[13:15:44] <wikibugs>	 (03PS1) 10JMeybohm: Make k8s/pool-depool-node work on control-planes as well [cookbooks] - 10https://gerrit.wikimedia.org/r/1069186 (https://phabricator.wikimedia.org/T372878)
[13:16:32] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68304 and previous config saved to /var/cache/conftool/dbconfig/20240830-131631-ladsgroup.json
[13:18:36] <wikibugs>	 (03PS2) 10JMeybohm: Make k8s/pool-depool-node work on control-planes as well [cookbooks] - 10https://gerrit.wikimedia.org/r/1069186 (https://phabricator.wikimedia.org/T372878)
[13:21:28] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
[13:21:28] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
[13:21:34] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2378 to wikikube-worker2058
[13:21:51] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:26:15] <wikibugs>	 (03PS3) 10JMeybohm: Make k8s/pool-depool-node work on control-planes as well [cookbooks] - 10https://gerrit.wikimedia.org/r/1069186 (https://phabricator.wikimedia.org/T372878)
[13:26:41] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
[13:26:41] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
[13:27:04] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2001.codfw.wmnet
[13:27:04] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2001.codfw.wmnet
[13:27:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
[13:27:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68305 and previous config saved to /var/cache/conftool/dbconfig/20240830-132750-ladsgroup.json
[13:27:55] <wikibugs>	 (03CR) 10Bking: [C:03+1] manifests: move new GPU hosts in eqiad from insetup to worker role [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[13:27:55] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[13:31:33] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
[13:31:33] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:31:34] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
[13:31:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68306 and previous config saved to /var/cache/conftool/dbconfig/20240830-133139-ladsgroup.json
[13:31:41] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[13:31:45] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[13:31:54] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[13:32:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68307 and previous config saved to /var/cache/conftool/dbconfig/20240830-133201-ladsgroup.json
[13:32:38] <wikibugs>	 (03CR) 10Ssingh: "Sounds like it's worth a shot, let me know if you want to merge today 😄" [puppet] - 10https://gerrit.wikimedia.org/r/1069185 (https://phabricator.wikimedia.org/T373527) (owner: 10Elukey)
[13:32:51] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] profile::puppetserver: set java_start_mem to 40g [puppet] - 10https://gerrit.wikimedia.org/r/1069185 (https://phabricator.wikimedia.org/T373527) (owner: 10Elukey)
[13:33:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
[13:33:40] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2378 to wikikube-worker2058
[13:34:08] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2379 to wikikube-worker2059
[13:34:25] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:35:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106249 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2378 to...
[13:35:55] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2003.codfw.wmnet
[13:35:57] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2003.codfw.wmnet
[13:37:26] <wikibugs>	 06SRE, 06Anti-Harassment, 06DBA: Error Unknown column  ipb_sitewide in field list on query - https://phabricator.wikimedia.org/T208462#10106248 (10Lafeber) I upgraded to 1.42 from a very early version and got the same error. I ran the manual SQL that @DonPaolo mentioned (thank you!) and I presume it was...
[13:38:20] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
[13:38:30] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106259 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki...
[13:38:40] <logmsgbot>	 !log jayme@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
[13:38:52] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106264 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube...
[13:40:23] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2003.codfw.wmnet
[13:40:25] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2003.codfw.wmnet
[13:41:54] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2001.codfw.wmnet
[13:41:56] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2001.codfw.wmnet
[13:42:12] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
[13:42:55] <wikibugs>	 (03CR) 10Klausman: [V:03+1 C:03+2] manifests: move new GPU hosts in eqiad from insetup to worker role [puppet] - 10https://gerrit.wikimedia.org/r/1068657 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[13:42:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68308 and previous config saved to /var/cache/conftool/dbconfig/20240830-134257-ladsgroup.json
[13:43:01] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2001.codfw.wmnet
[13:43:03] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2001.codfw.wmnet
[13:43:37] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
[13:43:46] <logmsgbot>	 !log jayme@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
[13:43:48] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106271 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki...
[13:43:58] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube...
[13:45:21] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
[13:45:21] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:45:22] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2059
[13:45:33] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2001.codfw.wmnet
[13:45:33] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2001.codfw.wmnet
[13:45:38] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2003.codfw.wmnet
[13:45:38] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2003.codfw.wmnet
[13:45:40] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2059
[13:46:19] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2379 to wikikube-worker2059
[13:46:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106283 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by akosiaris@cumin1002 from mw2379 to...
[13:46:44] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Update airflow-test-k8s image to include authlib [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069166 (https://phabricator.wikimedia.org/T368760) (owner: 10Stevemunene)
[13:48:30] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[13:49:55] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68309 and previous config saved to /var/cache/conftool/dbconfig/20240830-134954-ladsgroup.json
[13:49:59] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[13:51:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: dragonfly-dfdaemon.service on ml-serve1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:52:08] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1 C:03+2] P:idp Reallow CAS 6.6 to be installed. [puppet] - 10https://gerrit.wikimedia.org/r/1069165 (owner: 10Slyngshede)
[13:52:33] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2057.codfw.wmnet with OS bullseye
[13:52:47] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106318 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host...
[13:52:47] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2057
[13:53:04] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2058.codfw.wmnet with OS bullseye
[13:53:18] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106321 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host...
[13:53:30] <jinxer-wm>	 RESOLVED: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[13:53:33] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
[13:53:36] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:53:37] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2059.codfw.wmnet with OS bullseye
[13:53:40] <icinga-wm>	 PROBLEM - BGP status on lsw1-f5-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv4: Active - kubernetes-ml-eqiad, AS64606/IPv6: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:53:44] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:53:47] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106324 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host...
[13:53:48] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106325 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wiki...
[13:54:34] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52482 bytes in 0.066 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:55:35] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
[13:55:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106330 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host...
[13:56:40] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: dragonfly-dfdaemon.service on ml-serve1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:56:47] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
[13:56:52] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
[13:56:52] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:56:52] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:56:55] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:56:56] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
[13:58:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68310 and previous config saved to /var/cache/conftool/dbconfig/20240830-135804-ladsgroup.json
[13:58:10] <wikibugs>	 (03CR) 10Tiziano Fogli: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3800/co" [puppet] - 10https://gerrit.wikimedia.org/r/1069117 (https://phabricator.wikimedia.org/T370506) (owner: 10Tiziano Fogli)
[13:58:24] <wikibugs>	 (03CR) 10Elukey: WIP: sre.hosts.provison: add BIOS/Mgmt-console support for Supermicro (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1037806 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[13:58:37] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
[13:58:37] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2057
[13:58:45] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2058
[13:59:04] <icinga-wm>	 PROBLEM - BGP status on lsw1-a3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:59:24] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3801/console" [puppet] - 10https://gerrit.wikimedia.org/r/1068786 (owner: 10Slyngshede)
[13:59:55] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on dse-k8s-worker1009:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=dse-k8s-worker1009 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:02:22] <wikibugs>	 (03PS12) 10Elukey: WIP: sre.hosts.provison: add BIOS/Mgmt-console support for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1037806 (https://phabricator.wikimedia.org/T365372)
[14:03:02] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[14:04:17] <wikibugs>	 (03PS15) 10Clément Goubert: sre.k8s.renumber-node: vlan, IP change k8s workers [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989
[14:05:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68311 and previous config saved to /var/cache/conftool/dbconfig/20240830-140501-ladsgroup.json
[14:05:51] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2056.codfw.wmnet
[14:05:52] <wikibugs>	 (03CR) 10Clément Goubert: sre.k8s.renumber-node: vlan, IP change k8s workers (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989 (owner: 10Clément Goubert)
[14:05:53] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2056.codfw.wmnet
[14:06:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
[14:06:21] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
[14:06:21] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:06:21] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:06:24] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:06:25] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
[14:06:34] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
[14:06:34] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2058
[14:07:28] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/1069175 (https://phabricator.wikimedia.org/T373666) (owner: 10Ssingh)
[14:11:48] <wikibugs>	 (03PS1) 10Hnowlan: k8s: rename mw238[345] to wikikube-worker206[012] [puppet] - 10https://gerrit.wikimedia.org/r/1069214 (https://phabricator.wikimedia.org/T372878)
[14:11:50] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
[14:13:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68312 and previous config saved to /var/cache/conftool/dbconfig/20240830-141311-ladsgroup.json
[14:13:13] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[14:13:16] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[14:13:26] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[14:13:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: dragonfly-dfdaemon.service on ml-serve1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:14:41] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
[14:15:06] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
[14:16:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: sre.hosts.provison: add BIOS/Mgmt-console support for Supermicro [cookbooks] - 10https://gerrit.wikimedia.org/r/1037806 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[14:17:16] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] k8s: rename mw238[345] to wikikube-worker206[012] [puppet] - 10https://gerrit.wikimedia.org/r/1069214 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[14:17:17] <wikibugs>	 (03PS1) 10Andrew Bogott: Fake secrets for idp redirect on cloudcontrols [labs/private] - 10https://gerrit.wikimedia.org/r/1069217
[14:17:36] <wikibugs>	 (03PS2) 10Andrew Bogott: Fake secrets for idp redirect on cloudcontrols [labs/private] - 10https://gerrit.wikimedia.org/r/1069217
[14:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[14:18:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
[14:18:42] <wikibugs>	 (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Fake secrets for idp redirect on cloudcontrols [labs/private] - 10https://gerrit.wikimedia.org/r/1069217 (owner: 10Andrew Bogott)
[14:19:55] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on dse-k8s-worker1009:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:20:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68313 and previous config saved to /var/cache/conftool/dbconfig/20240830-142008-ladsgroup.json
[14:22:17] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
[14:23:30] <wikibugs>	 (03PS2) 10Clément Goubert: k8s: rename mw238[345] to wikikube-worker206[012] [puppet] - 10https://gerrit.wikimedia.org/r/1069214 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[14:23:30] <wikibugs>	 (03PS1) 10Clément Goubert: kubernetes: Rename last appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/1069223 (https://phabricator.wikimedia.org/T351074)
[14:24:24] <wikibugs>	 (03PS2) 10Elukey: dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372)
[14:24:24] <wikibugs>	 (03PS2) 10Elukey: doc: add intersphinx_timeout [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060855 (https://phabricator.wikimedia.org/T367410)
[14:24:24] <wikibugs>	 (03PS1) 10Elukey: tox: add config for jenkins [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069224 (https://phabricator.wikimedia.org/T372485)
[14:24:37] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
[14:24:55] <jinxer-wm>	 RESOLVED: [2x] KubernetesRsyslogDown: rsyslog on dse-k8s-worker1009:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:25:55] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on ml-serve1009:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=ml-serve1009 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:26:24] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: Rename last appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/1069223 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[14:26:30] <wikibugs>	 (03PS2) 10Nik Gkountas: admin: add new ssh key for ngkountas [puppet] - 10https://gerrit.wikimedia.org/r/1065216 (https://phabricator.wikimedia.org/T371372)
[14:27:17] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 429, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:28:00] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2383.codfw.wmnet
[14:28:39] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2383.codfw.wmnet
[14:28:45] <wikibugs>	 (03PS1) 10Klausman: BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432)
[14:29:38] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Make k8s/pool-depool-node work on control-planes as well [cookbooks] - 10https://gerrit.wikimedia.org/r/1069186 (https://phabricator.wikimedia.org/T372878) (owner: 10JMeybohm)
[14:30:09] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2384.codfw.wmnet
[14:30:42] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2384.codfw.wmnet
[14:30:56] <wikibugs>	 (03CR) 10Nik Gkountas: admin: add new ssh key for ngkountas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1065216 (https://phabricator.wikimedia.org/T371372) (owner: 10Nik Gkountas)
[14:31:08] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2385.codfw.wmnet
[14:31:41] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2385.codfw.wmnet
[14:32:09] <wikibugs>	 (03PS2) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877
[14:33:10] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: ngkountas user has same SSH key for cloud/prod - https://phabricator.wikimedia.org/T371372#10106462 (10ngkountas) @ssingh sorry for the repeated mistake. I uploaded a different one that is not listed in `idm.wikimedia.org` keys. Thank you!
[14:33:58] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2059.codfw.wmnet with OS bullseye
[14:34:14] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106463 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wiki...
[14:35:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68314 and previous config saved to /var/cache/conftool/dbconfig/20240830-143516-ladsgroup.json
[14:35:18] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[14:35:21] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[14:35:24] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] admin: add new ssh key for ngkountas [puppet] - 10https://gerrit.wikimedia.org/r/1065216 (https://phabricator.wikimedia.org/T371372) (owner: 10Nik Gkountas)
[14:35:31] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[14:35:38] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68315 and previous config saved to /var/cache/conftool/dbconfig/20240830-143537-ladsgroup.json
[14:36:12] <icinga-wm>	 RECOVERY - BGP status on lsw1-a3-codfw.mgmt is OK: BGP OK - up: 40, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:36:22] <wikibugs>	 (03PS3) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590)
[14:36:28] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:46] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590) (owner: 10Andrew Bogott)
[14:37:15] <wikibugs>	 (03CR) 10Elukey: dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[14:37:56] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] kubernetes: Rename last appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/1069223 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[14:37:59] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2057.codfw.wmnet with OS bullseye
[14:38:06] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 511, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:38:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106472 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wiki...
[14:38:24] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] k8s: rename mw238[345] to wikikube-worker206[012] [puppet] - 10https://gerrit.wikimedia.org/r/1069214 (https://phabricator.wikimedia.org/T372878) (owner: 10Hnowlan)
[14:40:12] <icinga-wm>	 PROBLEM - BGP status on lsw1-a3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:40:24] <wikibugs>	 (03CR) 10Elukey: "One small nit!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[14:40:45] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.rename from mw2299 to wikikube-worker2063
[14:40:53] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.rename from mw2383 to wikikube-worker2060
[14:40:55] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on ml-serve1009:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=ml-serve1009 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[14:41:01] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[14:41:30] <wikibugs>	 (03CR) 10Elukey: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[14:41:45] <wikibugs>	 (03PS2) 10Klausman: BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432)
[14:41:50] <wikibugs>	 (03CR) 10Klausman: BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[14:42:12] <icinga-wm>	 RECOVERY - BGP status on lsw1-a3-codfw.mgmt is OK: BGP OK - up: 40, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:42:52] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: ngkountas user has same SSH key for cloud/prod - https://phabricator.wikimedia.org/T371372#10106475 (10ssingh) 05Open→03Resolved a:03ssingh Thanks @ngkountas; key updated.
[14:43:40] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: dragonfly-dfdaemon.service on ml-serve1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:43:45] <wikibugs>	 (03CR) 10Bking: [C:03+2] Update airflow-test-k8s image to include authlib [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069166 (https://phabricator.wikimedia.org/T368760) (owner: 10Stevemunene)
[14:44:35] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.rename from mw2384 to wikikube-worker2061
[14:44:37] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.rename from mw2385 to wikikube-worker2062
[14:44:42] <wikibugs>	 (03Merged) 10jenkins-bot: Update airflow-test-k8s image to include authlib [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069166 (https://phabricator.wikimedia.org/T368760) (owner: 10Stevemunene)
[14:44:44] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2058.codfw.wmnet with OS bullseye
[14:45:02] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106482 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wiki...
[14:45:03] <wikibugs>	 (03PS1) 10Hashar: tox: only install flake8 when running flake8 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069226 (https://phabricator.wikimedia.org/T372485)
[14:46:09] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[14:46:27] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
[14:47:34] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
[14:47:35] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:47:43] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
[14:47:56] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
[14:48:04] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2299 to wikikube-worker2063
[14:48:16] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106484 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2299 to...
[14:48:58] <wikibugs>	 (03CR) 10Hashar: "That one largely speeds up tox flake8 environments :) The same should be done for `style` and `format`." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069226 (https://phabricator.wikimedia.org/T372485) (owner: 10Hashar)
[14:49:11] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2063.codfw.wmnet with OS bullseye
[14:49:21] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2063
[14:49:27] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106485 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host w...
[14:49:33] <wikibugs>	 (03PS1) 10Hashar: tox: run less environments on CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069220 (https://phabricator.wikimedia.org/T372485)
[14:49:35] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review, all!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068869 (https://phabricator.wikimedia.org/T362978) (owner: 10Scott French)
[14:49:37] <wikibugs>	 (03CR) 10Scott French: [C:03+2] k8s-controller-sidecars: adopt securityContext [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068869 (https://phabricator.wikimedia.org/T362978) (owner: 10Scott French)
[14:50:19] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
[14:50:24] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
[14:50:24] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:50:25] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
[14:50:32] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[14:50:47] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
[14:51:26] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2385 to wikikube-worker2062
[14:51:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106489 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2385 to w...
[14:53:08] <wikibugs>	 (03Merged) 10jenkins-bot: k8s-controller-sidecars: adopt securityContext [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068869 (https://phabricator.wikimedia.org/T362978) (owner: 10Scott French)
[14:53:43] <wikibugs>	 (03CR) 10Elukey: [C:03+1] BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[14:53:58] <wikibugs>	 (03CR) 10Klausman: [C:03+2] BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[14:54:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68316 and previous config saved to /var/cache/conftool/dbconfig/20240830-145442-ladsgroup.json
[14:54:48] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[14:55:23] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for zoe - https://phabricator.wikimedia.org/T373666#10106494 (10thcipriani) >>! In T373666#10106172, @ssingh wrote: > @thcipriani: This requires your approval as well, in addition to @VPuffetMichel. Thanks!  Approved. More Cito...
[14:55:43] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069224 (https://phabricator.wikimedia.org/T372485) (owner: 10Elukey)
[14:56:06] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[14:56:17] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for zoe - https://phabricator.wikimedia.org/T373666#10106495 (10ssingh)
[14:56:24] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[14:57:31] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
[14:57:34] <wikibugs>	 (03Merged) 10jenkins-bot: BGP peers: add lsw1-e5-eqiad and lsw1-f5-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069225 (https://phabricator.wikimedia.org/T372432) (owner: 10Klausman)
[14:57:35] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[14:57:36] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
[14:57:36] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:57:37] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
[14:57:54] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[14:58:12] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
[14:58:15] <wikibugs>	 (03PS2) 10Elukey: tox: run less environments on CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069220 (https://phabricator.wikimedia.org/T372485) (owner: 10Hashar)
[14:58:25] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[14:58:38] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[14:58:42] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:58:42] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:58:45] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:58:46] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
[14:58:50] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2384 to wikikube-worker2061
[14:58:51] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[14:59:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106500 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2384 to w...
[14:59:14] <wikibugs>	 (03Abandoned) 10Elukey: tox: add config for jenkins [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069224 (https://phabricator.wikimedia.org/T372485) (owner: 10Elukey)
[15:00:26] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[15:00:26] <wikibugs>	 (03PS1) 10Clément Goubert: kubernetes: Rename last appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/1069227 (https://phabricator.wikimedia.org/T351074)
[15:00:30] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
[15:00:30] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2063
[15:01:21] <wikibugs>	 (03PS2) 10Clément Goubert: kubernetes: Rename last appserver in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1069227 (https://phabricator.wikimedia.org/T351074)
[15:01:28] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:02:44] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:02:45] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2060
[15:04:38] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2060
[15:05:18] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2383 to wikikube-worker2060
[15:05:28] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: Rename last appserver in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1069227 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[15:05:29] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106513 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2383 to w...
[15:05:38] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] kubernetes: Rename last appserver in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1069227 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[15:06:05] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:06:52] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:07:13] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2062.codfw.wmnet with OS bullseye
[15:07:19] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2061.codfw.wmnet with OS bullseye
[15:07:19] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
[15:07:21] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[15:07:23] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2062
[15:07:24] <logmsgbot>	 !log hnowlan@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2060.codfw.wmnet with OS bullseye
[15:07:30] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106518 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi...
[15:07:33] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106520 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi...
[15:07:33] <logmsgbot>	 !log klausman@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[15:07:34] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106519 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi...
[15:07:44] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106521 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[15:07:51] <logmsgbot>	 !log klausman@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[15:07:51] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
[15:07:59] <wikibugs>	 (03PS1) 10Thcipriani: Admin data matrix: show ldap_only_users, too [puppet] - 10https://gerrit.wikimedia.org/r/1069229
[15:08:01] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106522 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi...
[15:08:02] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[15:08:31] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[15:08:47] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.rename from mw1398 to wikikube-worker1033
[15:08:51] <icinga-wm>	 RECOVERY - BGP status on lsw1-f5-eqiad.mgmt is OK: BGP OK - up: 4, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:08:59] <jinxer-wm>	 RESOLVED: KubernetesDeploymentUnavailableReplicas: ...
[15:08:59] <jinxer-wm>	 Deployment k8s-controller-sidecars in sidecar-controller at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=codfw&var-cluster=k8s&var-namespace=sidecar-controller&var-deployment=k8s-controller-sidecars - ...
[15:08:59] <jinxer-wm>	 https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas
[15:09:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68317 and previous config saved to /var/cache/conftool/dbconfig/20240830-150950-ladsgroup.json
[15:10:57] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[15:11:11] <wikibugs>	 (03CR) 10Elukey: [C:03+2] tox: run less environments on CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069220 (https://phabricator.wikimedia.org/T372485) (owner: 10Hashar)
[15:11:17] <icinga-wm>	 PROBLEM - BGP status on lsw1-a3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:11:21] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[15:11:28] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68318 and previous config saved to /var/cache/conftool/dbconfig/20240830-151128-ladsgroup.json
[15:11:33] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[15:11:38] <logmsgbot>	 !log klausman@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[15:11:43] <logmsgbot>	 !log klausman@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[15:12:17] <logmsgbot>	 !log klausman@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[15:12:37] <logmsgbot>	 !log klausman@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[15:13:07] <wikibugs>	 (03CR) 10Thcipriani: "Adding sukhe since the context of this one is this deployment access request: https://phabricator.wikimedia.org/T373666" [puppet] - 10https://gerrit.wikimedia.org/r/1069229 (owner: 10Thcipriani)
[15:13:38] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
[15:13:49] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.netbox
[15:15:06] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
[15:15:06] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:15:06] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:15:09] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:15:13] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
[15:15:41] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
[15:15:41] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2062
[15:16:09] <wikibugs>	 10ops-drmrs: determine cable ID for CRT-008647 - https://phabricator.wikimedia.org/T369951#10106552 (10RobH) 05Open→03Resolved There was no label so they slapped 'CRT-008647' on there since I had advised that was the ID of the circuit.  (I had that note from it being on there potentially during install)....
[15:16:15] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2061
[15:17:03] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
[15:17:05] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
[15:17:07] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
[15:17:08] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:17:08] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.netbox
[15:17:08] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1033
[15:17:28] <wikibugs>	 (03PS1) 10Herron: grafana: set thanos as default datasource [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T371520)
[15:18:09] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:18:24] <wikibugs>	 (03PS2) 10Herron: grafana: set thanos as default datasource [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333)
[15:18:25] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:18:42] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1033
[15:18:51] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1398 to wikikube-worker1033
[15:19:02] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw1398 to...
[15:19:21] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1033.eqiad.wmnet on all recursors
[15:19:24] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1033.eqiad.wmnet on all recursors
[15:19:40] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1033.eqiad.wmnet with OS bullseye
[15:19:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106565 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host w...
[15:20:04] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
[15:20:21] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
[15:20:26] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
[15:20:26] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:20:26] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:20:29] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:20:29] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
[15:21:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] grafana: set thanos as default datasource [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333) (owner: 10Herron)
[15:21:10] <icinga-wm>	 PROBLEM - Host mw2385 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:06] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T373696 (10Clement_Goubert) 03NEW
[15:22:15] <wikibugs>	 (03Merged) 10jenkins-bot: tox: run less environments on CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/1069220 (https://phabricator.wikimedia.org/T372485) (owner: 10Hashar)
[15:22:23] <wikibugs>	 (03PS3) 10Herron: grafana: set thanos as default datasource [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333)
[15:22:23] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
[15:22:23] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2061
[15:23:28] <wikibugs>	 (03PS1) 10Scott French: kubernetes: re-name/IP kubernetes20(30|57) as wikikube-worker206[45] [puppet] - 10https://gerrit.wikimedia.org/r/1069231 (https://phabricator.wikimedia.org/T372878)
[15:23:42] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
[15:23:50] <wikibugs>	 (03CR) 10Elukey: [C:03+2] dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[15:23:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106587 (10Clement_Goubert)
[15:24:20] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106591 (10Clement_Goubert)
[15:24:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68319 and previous config saved to /var/cache/conftool/dbconfig/20240830-152457-ladsgroup.json
[15:25:36] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes mw2295,mw2296,mw2297 - https://phabricator.wikimedia.org/T373669#10106589 (10Clement_Goubert) →14Duplicate dup:03T373591
[15:26:25] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10106599 (10elukey) Unblocked!  Next steps:  1) Release a new version of Spicerack to include https://gerrit.wikimed...
[15:26:58] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: re-name/IP kubernetes20(30|57) as wikikube-worker206[45] [puppet] - 10https://gerrit.wikimedia.org/r/1069231 (https://phabricator.wikimedia.org/T372878) (owner: 10Scott French)
[15:27:14] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
[15:28:18] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2030.codfw.wmnet
[15:28:21] <icinga-wm>	 PROBLEM - Host mw2384 is DOWN: PING CRITICAL - Packet loss = 100%
[15:28:51] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2030.codfw.wmnet
[15:29:07] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2057.codfw.wmnet
[15:29:43] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2057.codfw.wmnet
[15:30:08] <wikibugs>	 (03CR) 10Scott French: [C:03+2] kubernetes: re-name/IP kubernetes20(30|57) as wikikube-worker206[45] [puppet] - 10https://gerrit.wikimedia.org/r/1069231 (https://phabricator.wikimedia.org/T372878) (owner: 10Scott French)
[15:31:51] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
[15:32:54] <wikibugs>	 (03PS3) 10Elukey: dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372)
[15:33:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_producer_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=producer - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnstable
[15:33:01] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[15:33:04] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.rename from kubernetes2030 to wikikube-worker2064
[15:33:12] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Seems OK in theory but perhaps wait for someone from Infrastructure Foundations to review as well!" [puppet] - 10https://gerrit.wikimedia.org/r/1069229 (owner: 10Thcipriani)
[15:33:21] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
[15:33:35] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:34:15] <wikibugs>	 (03PS3) 10Clément Goubert: sre.k8s.renumber-node: Handle renamed host [cookbooks] - 10https://gerrit.wikimedia.org/r/1068779
[15:35:30] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[15:35:36] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
[15:37:03] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
[15:37:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68320 and previous config saved to /var/cache/conftool/dbconfig/20240830-153715-ladsgroup.json
[15:37:20] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[15:37:36] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
[15:37:36] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:37:37] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
[15:37:53] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
[15:38:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_producer_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=producer - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnstable
[15:38:13] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
[15:38:33] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2030 to wikikube-worker2064
[15:38:40] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
[15:38:43] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106620 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by swfrench@cumin2002 from kubernetes...
[15:39:18] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.rename from kubernetes2057 to wikikube-worker2065
[15:39:26] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:40:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68322 and previous config saved to /var/cache/conftool/dbconfig/20240830-154004-ladsgroup.json
[15:40:07] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[15:40:09] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[15:40:30] <jinxer-wm>	 FIRING: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[15:40:30] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[15:40:32] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:40:48] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[15:40:50] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2063.codfw.wmnet with OS bullseye
[15:40:55] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68323 and previous config saved to /var/cache/conftool/dbconfig/20240830-154054-ladsgroup.json
[15:41:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106625 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikik...
[15:41:35] <claime>	 !log homer 'lsw1-a3-codfw*' commit 'T351074'
[15:41:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:39] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[15:41:43] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
[15:42:54] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
[15:43:22] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
[15:43:23] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:43:24] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
[15:43:39] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
[15:44:15] <wikibugs>	 (03Merged) 10jenkins-bot: dhcp: allow empty distro for DHCPConfMac and DHCPConfOpt82 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1060854 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[15:44:19] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2057 to wikikube-worker2065
[15:44:33] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
[15:44:35] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106639 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by swfrench@cumin2002 from kubernetes...
[15:44:36] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
[15:45:30] <jinxer-wm>	 FIRING: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[15:45:36] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2064.codfw.wmnet with OS bullseye
[15:45:47] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2064
[15:45:49] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106649 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by swfrench@cumin2002 for host w...
[15:46:11] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:47:11] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2060.codfw.wmnet with OS bullseye
[15:47:28] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106650 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[15:49:00] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2063.codfw.wmnet
[15:49:08] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2063.codfw.wmnet
[15:49:33] <claime>	 !log homer 'cr*eqiad*' commit 'T351074'
[15:49:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:38] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[15:49:52] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
[15:49:58] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
[15:49:58] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:49:58] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:50:01] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:50:02] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
[15:50:30] <jinxer-wm>	 RESOLVED: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[15:50:43] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
[15:50:44] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2064
[15:52:07] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2065.codfw.wmnet with OS bullseye
[15:52:18] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2065
[15:52:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68325 and previous config saved to /var/cache/conftool/dbconfig/20240830-155222-ladsgroup.json
[15:52:23] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106657 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by swfrench@cumin2002 for host w...
[15:52:57] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:53:47] <hnowlan>	 !log homer 'lsw1-a3-codfw*' commit 
[15:53:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:03] <claime>	 hnowlan: probably won't show anything, I just ran it and it had the changes for 61, 62 and 63
[15:55:08] <hnowlan>	 ah cool
[15:55:12] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2062.codfw.wmnet with OS bullseye
[15:55:14] <hnowlan>	 yeah you're right
[15:55:21] <wikibugs>	 (03CR) 10Eevans: [C:03+2] Update references to latest beta restbase node [puppet] - 10https://gerrit.wikimedia.org/r/1069148 (https://phabricator.wikimedia.org/T370460) (owner: 10Jgiannelos)
[15:55:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106659 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[15:55:28] <wikibugs>	 (03CR) 10Andrea Denisse: "LGTM, I just think there's a small typo in the commit message." [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333) (owner: 10Herron)
[15:56:32] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
[15:56:38] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
[15:56:38] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:56:38] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:56:41] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:56:42] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
[15:56:58] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
[15:56:58] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2065
[15:57:28] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1033.eqiad.wmnet with OS bullseye
[15:57:40] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106664 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikik...
[15:58:42] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68326 and previous config saved to /var/cache/conftool/dbconfig/20240830-155842-ladsgroup.json
[15:58:47] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[16:01:17] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2061.codfw.wmnet with OS bullseye
[16:01:33] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106675 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku...
[16:02:39] <icinga-wm>	 RECOVERY - BGP status on lsw1-a3-codfw.mgmt is OK: BGP OK - up: 46, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:02:47] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2060.codfw.wmnet
[16:02:50] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2060.codfw.wmnet
[16:07:18] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2061.codfw.wmnet
[16:07:20] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2061.codfw.wmnet
[16:07:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68328 and previous config saved to /var/cache/conftool/dbconfig/20240830-160729-ladsgroup.json
[16:07:30] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2062.codfw.wmnet
[16:07:31] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2062.codfw.wmnet
[16:08:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106681 (10hnowlan)
[16:09:48] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
[16:11:30] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 24.96% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:12:30] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
[16:13:49] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68329 and previous config saved to /var/cache/conftool/dbconfig/20240830-161349-ladsgroup.json
[16:13:59] <wikibugs>	 (03CR) 10Scott French: [C:03+1] cfssl-issuer: Add external-services support (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1068768 (https://phabricator.wikimedia.org/T359423) (owner: 10JMeybohm)
[16:15:48] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
[16:16:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 23.59% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:19:32] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
[16:21:56] <claime>	 !log flipping BGP flag to true in netbox for ml-serve-ctrl100[1-2],ml-serve100[1-4],dse-k8s-ctrl100[1-2],dse-k8s-worker100[1-4]
[16:21:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68330 and previous config saved to /var/cache/conftool/dbconfig/20240830-162236-ladsgroup.json
[16:22:39] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[16:22:41] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[16:22:52] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[16:22:55] <klausman>	 claime: wait, the flag was _false_ for ml-serve machines < 1009?
[16:22:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68331 and previous config saved to /var/cache/conftool/dbconfig/20240830-162258-ladsgroup.json
[16:23:07] <claime>	 klausman: for all these above yes
[16:23:13] <klausman>	 weeeird.
[16:23:20] <klausman>	 definitely should not be the case.
[16:23:21] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2057.codfw.wmnet
[16:23:23] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2057.codfw.wmnet
[16:23:27] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2058.codfw.wmnet
[16:23:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2058.codfw.wmnet
[16:23:30] <klausman>	 I'll have a look at NB logs
[16:23:32] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2059.codfw.wmnet
[16:23:34] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2059.codfw.wmnet
[16:23:36] <claime>	 klausman: what's weirder is that I don't see changelogs for it
[16:23:59] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes mw237[789] - https://phabricator.wikimedia.org/T373699 (10akosiaris) 03NEW
[16:24:57] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106748 (10Clement_Goubert)
[16:25:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106755 (10Clement_Goubert)
[16:26:26] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes mw237[789] - https://phabricator.wikimedia.org/T373699#10106753 (10Clement_Goubert) →14Duplicate dup:03T373591
[16:26:41] <claime>	 !log homer 'cr*eqiad*' commit 'T351074, T372878, and fix ml-serve and dse-k8s bgp'
[16:26:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:26:46] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[16:26:47] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[16:28:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68332 and previous config saved to /var/cache/conftool/dbconfig/20240830-162856-ladsgroup.json
[16:30:24] <jinxer-wm>	 FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[16:32:48] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2064.codfw.wmnet with OS bullseye
[16:33:05] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106766 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by swfrench@cumin2002 for host wikik...
[16:36:17] <wikibugs>	 (03PS4) 10Herron: grafana: set thanos as default datasource [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333)
[16:36:33] <wikibugs>	 (03CR) 10Herron: grafana: set thanos as default datasource (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333) (owner: 10Herron)
[16:38:07] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1069230 (https://phabricator.wikimedia.org/T269333) (owner: 10Herron)
[16:39:33] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2065.codfw.wmnet with OS bullseye
[16:39:49] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06serviceops, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10106783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by swfrench@cumin2002 for host wikik...
[16:39:57] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1033.eqiad.wmnet
[16:39:59] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1033.eqiad.wmnet
[16:40:31] <swfrench-wmf>	 !log running homer 'lsw1-b3-codfw*' commit 'T372878'
[16:40:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:36] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[16:42:24] <wikibugs>	 (03CR) 10Andrew Bogott: Make cloudcephosd1039-1041 into ceph osd nodes (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1063892 (https://phabricator.wikimedia.org/T372814) (owner: 10Andrew Bogott)
[16:42:35] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2064.codfw.wmnet
[16:42:38] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2064.codfw.wmnet
[16:42:50] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2065.codfw.wmnet
[16:42:52] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2065.codfw.wmnet
[16:44:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68333 and previous config saved to /var/cache/conftool/dbconfig/20240830-164403-ladsgroup.json
[16:44:05] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[16:44:08] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[16:44:19] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[16:44:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68334 and previous config saved to /var/cache/conftool/dbconfig/20240830-164425-ladsgroup.json
[16:47:17] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T373591#10106812 (10Scott_French)
[16:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[16:53:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68335 and previous config saved to /var/cache/conftool/dbconfig/20240830-165322-ladsgroup.json
[16:53:27] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[16:59:47] <swfrench-wmf>	 !log running homer 'cr*codfw*' commit 'T372878'
[16:59:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:52] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[17:00:05] <icinga-wm>	 PROBLEM - Host mw2377 is DOWN: PING CRITICAL - Packet loss = 100%
[17:00:13] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 421, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:02:39] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68336 and previous config saved to /var/cache/conftool/dbconfig/20240830-170238-ladsgroup.json
[17:02:43] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[17:06:25] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 503, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:07:50] <wikibugs>	 (03CR) 10Dzahn: "ACK, and thank you :)" [puppet] - 10https://gerrit.wikimedia.org/r/884887 (https://phabricator.wikimedia.org/T323909) (owner: 10Jaime Nuche)
[17:08:12] <wikibugs>	 (03PS1) 10Andrew Bogott: Move idc/oidc keystone secrets to a place where we can find them [labs/private] - 10https://gerrit.wikimedia.org/r/1069250
[17:08:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68337 and previous config saved to /var/cache/conftool/dbconfig/20240830-170829-ladsgroup.json
[17:10:28] <wikibugs>	 (03PS4) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590)
[17:10:47] <wikibugs>	 (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Move idc/oidc keystone secrets to a place where we can find them [labs/private] - 10https://gerrit.wikimedia.org/r/1069250 (owner: 10Andrew Bogott)
[17:11:18] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590) (owner: 10Andrew Bogott)
[17:17:46] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68338 and previous config saved to /var/cache/conftool/dbconfig/20240830-171745-ladsgroup.json
[17:19:13] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Nice! I see one or two things on the parent, but will comment there." [cookbooks] - 10https://gerrit.wikimedia.org/r/1068779 (owner: 10Clément Goubert)
[17:22:09] <icinga-wm>	 PROBLEM - Host mw2378 is DOWN: PING CRITICAL - Packet loss = 100%
[17:23:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68339 and previous config saved to /var/cache/conftool/dbconfig/20240830-172336-ladsgroup.json
[17:28:02] <sukhe>	 ^ this seems to be a rename from mw2378 to wikikube-worker2058
[17:28:30] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 6.25% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:28:49] <sukhe>	 and then maybe the check is against the older hostname and which is why it is failing because the new host seems to be up
[17:30:38] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10106962 (10Dzahn) The server `lists2001` mentioned here for Collaboration Services is standby and therefore ok to do anytime.
[17:32:37] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:32:43] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:32:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68340 and previous config saved to /var/cache/conftool/dbconfig/20240830-173253-ladsgroup.json
[17:33:07] <jinxer-wm>	 FIRING: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:33:10] <sukhe>	 oohh
[17:33:22] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10106969 (10Dzahn) The server `phab2002` mentioned here for Collaboration Services is standby and therefore ok to do anytime.
[17:33:30] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-wikifunctions (k8s) 4.941s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:34:25] <wikibugs>	 10ops-codfw, 06SRE, 06collaboration-services, 06DC-Ops, and 2 others: Migrate servers in codfw racks D7 & D8 from asw to lsw - https://phabricator.wikimedia.org/T373105#10106972 (10Dzahn) The server `gerrit2002` mentioned here for Collaboration Services is a replica, not the main host. It's somewhat in pro...
[17:35:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:35:53] <jinxer-wm>	 RESOLVED: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:38:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-wikifunctions at eqiad: 0% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:38:30] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-wikifunctions (k8s) 4.941s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:38:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68341 and previous config saved to /var/cache/conftool/dbconfig/20240830-173843-ladsgroup.json
[17:38:46] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
[17:38:49] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[17:38:59] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
[17:39:06] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68342 and previous config saved to /var/cache/conftool/dbconfig/20240830-173905-ladsgroup.json
[17:40:40] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:42:29] <wikibugs>	 (03CR) 10Eevans: [C:03+1] changeprop: Update references to latest beta restbase node [deployment-charts] - 10https://gerrit.wikimedia.org/r/1069145 (https://phabricator.wikimedia.org/T370460) (owner: 10Jgiannelos)
[17:44:25] <mutante>	 !log releases1003/2003 - sudo apt-get remove openjdk-11-* - Java 11 has been replaced by Java 17 - T359795
[17:44:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:30] <stashbot>	 T359795: Switch Jenkins instances from Java 11 to Java 17 - https://phabricator.wikimedia.org/T359795
[17:48:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68343 and previous config saved to /var/cache/conftool/dbconfig/20240830-174800-ladsgroup.json
[17:48:02] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[17:48:05] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[17:48:15] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[17:48:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68344 and previous config saved to /var/cache/conftool/dbconfig/20240830-174822-ladsgroup.json
[17:54:53] <wikibugs>	 (03PS3) 10Msz2001: Enable EditCheck references on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069257 (https://phabricator.wikimedia.org/T373079)
[17:55:03] <wikibugs>	 (03PS1) 10Physikerwelt: Remove redundandant setting of $wgDefaultUserOptions['math'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069258 (https://phabricator.wikimedia.org/T373703)
[17:55:36] <wikibugs>	 (03CR) 10Scott French: "This is great! A couple of comments, but otherwise LGTM." [cookbooks] - 10https://gerrit.wikimedia.org/r/1067989 (owner: 10Clément Goubert)
[17:59:08] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] profile::puppetserver: set java_start_mem to 40g [puppet] - 10https://gerrit.wikimedia.org/r/1069185 (https://phabricator.wikimedia.org/T373527) (owner: 10Elukey)
[18:01:22] <wikibugs>	 (03CR) 10DLynch: [C:03+1] Enable EditCheck references on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069257 (https://phabricator.wikimedia.org/T373079) (owner: 10Msz2001)
[18:07:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68345 and previous config saved to /var/cache/conftool/dbconfig/20240830-180757-ladsgroup.json
[18:08:02] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[18:08:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68346 and previous config saved to /var/cache/conftool/dbconfig/20240830-180843-ladsgroup.json
[18:08:48] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[18:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[18:18:35] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:19:17] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:20:15] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:20:31] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8923 bytes in 4.196 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:21:05] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 12 Oct 2024 12:50:00 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:21:07] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52482 bytes in 0.087 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:23:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68347 and previous config saved to /var/cache/conftool/dbconfig/20240830-182304-ladsgroup.json
[18:23:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68348 and previous config saved to /var/cache/conftool/dbconfig/20240830-182350-ladsgroup.json
[18:32:37] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:32:43] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:33:18] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 02 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069257 (https://phabricator.wikimedia.org/T373079) (owner: 10Msz2001)
[18:35:40] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:38:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68349 and previous config saved to /var/cache/conftool/dbconfig/20240830-183812-ladsgroup.json
[18:38:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68350 and previous config saved to /var/cache/conftool/dbconfig/20240830-183858-ladsgroup.json
[18:44:35] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: ...
[18:44:36] <jinxer-wm>	 CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to eqiad) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[18:49:35] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: ...
[18:49:35] <jinxer-wm>	 CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to eqiad) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[18:51:03] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
[18:51:27] <wikibugs>	 10ops-eqiad, 06SRE, 10Cassandra, 06DC-Ops: Degraded RAID on aqs1014 - https://phabricator.wikimedia.org/T362841#10107138 (10ops-monitoring-bot) Host rebooted by eevans@cumin1002 with reason: None
[18:53:19] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68351 and previous config saved to /var/cache/conftool/dbconfig/20240830-185319-ladsgroup.json
[18:53:21] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
[18:53:24] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[18:53:34] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
[18:53:41] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68352 and previous config saved to /var/cache/conftool/dbconfig/20240830-185341-ladsgroup.json
[18:54:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68353 and previous config saved to /var/cache/conftool/dbconfig/20240830-185405-ladsgroup.json
[18:54:07] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[18:54:10] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[18:54:20] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[18:54:27] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68354 and previous config saved to /var/cache/conftool/dbconfig/20240830-185427-ladsgroup.json
[18:55:53] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service aqs1014-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:59:07] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
[19:00:53] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service aqs1014-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:20:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68355 and previous config saved to /var/cache/conftool/dbconfig/20240830-192021-ladsgroup.json
[19:20:26] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[19:34:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68356 and previous config saved to /var/cache/conftool/dbconfig/20240830-193413-ladsgroup.json
[19:34:19] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[19:35:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68357 and previous config saved to /var/cache/conftool/dbconfig/20240830-193528-ladsgroup.json
[19:43:58] <wikibugs>	 (03PS5) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590)
[19:43:58] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone: make codfw1dev keystone APIs public [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590)
[19:46:22] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590) (owner: 10Andrew Bogott)
[19:49:20] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68358 and previous config saved to /var/cache/conftool/dbconfig/20240830-194919-ladsgroup.json
[19:49:43] <wikibugs>	 (03PS2) 10Andrew Bogott: Keystone: make codfw1dev keystone APIs public [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590)
[19:49:44] <wikibugs>	 (03PS6) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590)
[19:50:38] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68359 and previous config saved to /var/cache/conftool/dbconfig/20240830-195037-ladsgroup.json
[19:52:07] <wikibugs>	 (03PS3) 10Andrew Bogott: Keystone: make codfw1dev keystone APIs public [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590)
[19:52:08] <wikibugs>	 (03PS7) 10Andrew Bogott: keystone + oidc [puppet] - 10https://gerrit.wikimedia.org/r/1068877 (https://phabricator.wikimedia.org/T359590)
[19:52:09] <icinga-wm>	 PROBLEM - Disk space on grafana1002 is CRITICAL: DISK CRITICAL - free space: / 585MiB (3% inode=53%): /tmp 585MiB (3% inode=53%): /var/tmp 585MiB (3% inode=53%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana1002&var-datasource=eqiad+prometheus/ops
[19:52:21] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590) (owner: 10Andrew Bogott)
[19:55:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Keystone: make codfw1dev keystone APIs public [puppet] - 10https://gerrit.wikimedia.org/r/1069279 (https://phabricator.wikimedia.org/T359590) (owner: 10Andrew Bogott)
[20:04:27] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68361 and previous config saved to /var/cache/conftool/dbconfig/20240830-200427-ladsgroup.json
[20:05:45] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68362 and previous config saved to /var/cache/conftool/dbconfig/20240830-200544-ladsgroup.json
[20:05:46] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
[20:05:54] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[20:06:00] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
[20:06:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68363 and previous config saved to /var/cache/conftool/dbconfig/20240830-200606-ladsgroup.json
[20:10:29] <icinga-wm>	 RECOVERY - Host mw2295 is UP: PING WARNING - Packet loss = 33%, RTA = 0.27 ms
[20:11:19] <icinga-wm>	 PROBLEM - SSH on mw2295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[20:12:09] <icinga-wm>	 RECOVERY - Disk space on grafana1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana1002&var-datasource=eqiad+prometheus/ops
[20:16:53] <icinga-wm>	 PROBLEM - Host mw2295 is DOWN: PING CRITICAL - Packet loss = 100%
[20:19:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68364 and previous config saved to /var/cache/conftool/dbconfig/20240830-201934-ladsgroup.json
[20:19:36] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[20:19:40] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[20:19:50] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[20:19:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68365 and previous config saved to /var/cache/conftool/dbconfig/20240830-201956-ladsgroup.json
[20:30:24] <jinxer-wm>	 FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[20:33:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:33:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:36:19] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:49:41] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[20:58:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:00:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68366 and previous config saved to /var/cache/conftool/dbconfig/20240830-210014-ladsgroup.json
[21:00:32] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[21:01:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:09:13] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: logging: Remove WhatFailureGroupHandler wrapper from handlers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1067364 (https://phabricator.wikimedia.org/T373444)
[21:10:28] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68367 and previous config saved to /var/cache/conftool/dbconfig/20240830-211028-ladsgroup.json
[21:10:33] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[21:15:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68368 and previous config saved to /var/cache/conftool/dbconfig/20240830-211521-ladsgroup.json
[21:19:15] <wikibugs>	 (03Abandoned) 10Bartosz Dziewoński: logging: Remove WhatFailureGroupHandler wrapper from handlers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1067364 (https://phabricator.wikimedia.org/T373444) (owner: 10Bartosz Dziewoński)
[21:25:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68369 and previous config saved to /var/cache/conftool/dbconfig/20240830-212535-ladsgroup.json
[21:30:29] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68370 and previous config saved to /var/cache/conftool/dbconfig/20240830-213028-ladsgroup.json
[21:33:40] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-api-int_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:36:19] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[21:39:54] <jinxer-wm>	 FIRING: [2x] CirrusSearchHighOldGCFrequency: Elasticsearch instance cloudelastic1005-cloudelastic-omega-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[21:40:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68371 and previous config saved to /var/cache/conftool/dbconfig/20240830-214042-ladsgroup.json
[21:44:35] <icinga-wm>	 PROBLEM - NTP peers on dns1006 is CRITICAL: NTP CRITICAL: Offset 0.189255719 secs (CRITICAL) https://wikitech.wikimedia.org/wiki/NTP
[21:45:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68372 and previous config saved to /var/cache/conftool/dbconfig/20240830-214536-ladsgroup.json
[21:45:38] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
[21:45:43] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[21:45:51] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
[21:45:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68373 and previous config saved to /var/cache/conftool/dbconfig/20240830-214558-ladsgroup.json
[21:53:35] <icinga-wm>	 RECOVERY - NTP peers on dns1006 is OK: NTP OK: Offset 0.001082523 secs https://wikitech.wikimedia.org/wiki/NTP
[21:55:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68374 and previous config saved to /var/cache/conftool/dbconfig/20240830-215549-ladsgroup.json
[21:55:51] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
[21:55:54] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[21:56:04] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
[21:56:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:56:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68375 and previous config saved to /var/cache/conftool/dbconfig/20240830-215611-ladsgroup.json
[21:58:40] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:07:23] <wikibugs>	 (03PS1) 10MusikAnimal: Remove $wgCodeMirrorRTL temporary feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069293 (https://phabricator.wikimedia.org/T170001)
[22:13:19] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68376 and previous config saved to /var/cache/conftool/dbconfig/20240830-221319-ladsgroup.json
[22:13:24] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[22:18:11] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[22:28:27] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68377 and previous config saved to /var/cache/conftool/dbconfig/20240830-222826-ladsgroup.json
[22:31:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:31:43] <wikibugs>	 (03PS1) 10Cwhite: loki: increase chunk flush interval [puppet] - 10https://gerrit.wikimedia.org/r/1069301 (https://phabricator.wikimedia.org/T335610)
[22:36:10] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:43:34] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68378 and previous config saved to /var/cache/conftool/dbconfig/20240830-224333-ladsgroup.json
[22:56:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:58:41] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68379 and previous config saved to /var/cache/conftool/dbconfig/20240830-225840-ladsgroup.json
[22:58:42] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
[22:58:46] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[22:58:55] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
[22:59:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68380 and previous config saved to /var/cache/conftool/dbconfig/20240830-225902-ladsgroup.json
[23:01:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68381 and previous config saved to /var/cache/conftool/dbconfig/20240830-230059-ladsgroup.json
[23:01:09] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[23:01:10] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:03:41] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:05:39] <wikibugs>	 (03PS1) 10Catrope: CodexModule: Fix double-flipping in RTL [core] (wmf/1.43.0-wmf.20) - 10https://gerrit.wikimedia.org/r/1069310 (https://phabricator.wikimedia.org/T373676)
[23:12:34] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 02 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.43.0-wmf.20) - 10https://gerrit.wikimedia.org/r/1069310 (https://phabricator.wikimedia.org/T373676) (owner: 10Catrope)
[23:16:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68382 and previous config saved to /var/cache/conftool/dbconfig/20240830-231606-ladsgroup.json
[23:31:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:31:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68383 and previous config saved to /var/cache/conftool/dbconfig/20240830-233113-ladsgroup.json
[23:36:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:37:42] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Replace confusing uses of $wgDebugLogFile with $wmgExtraLogFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069320
[23:37:42] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Remove labs settings for $wmgExtraLogFile that have no effect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069321
[23:38:52] <wikibugs>	 (03PS1) 10Dzahn: contint: add java jdk-17 packages in addition to jdk-11 [puppet] - 10https://gerrit.wikimedia.org/r/1069325 (https://phabricator.wikimedia.org/T359795)
[23:39:10] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1069326
[23:39:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1069326 (owner: 10TrainBranchBot)
[23:39:47] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "I'm just trying to understand what logging.php really does, and finding bizarre things. I wrote the commit message in a confident tone, bu" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069320 (owner: 10Bartosz Dziewoński)
[23:40:03] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "I'm just trying to understand what logging.php really does, and finding bizarre things. I wrote the commit message in a confident tone, bu" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069321 (owner: 10Bartosz Dziewoński)
[23:40:27] <wikibugs>	 (03PS1) 10Dzahn: contint: switch java_home from jdk-11 to jdk-17 [puppet] - 10https://gerrit.wikimedia.org/r/1069327 (https://phabricator.wikimedia.org/T359795)
[23:42:04] <wikibugs>	 (03PS1) 10Dzahn: contint: remove jdk-11 packages [puppet] - 10https://gerrit.wikimedia.org/r/1069328 (https://phabricator.wikimedia.org/T359795)
[23:42:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68384 and previous config saved to /var/cache/conftool/dbconfig/20240830-234257-ladsgroup.json
[23:43:02] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[23:44:41] <wikibugs>	 (03CR) 10Dzahn: [V:03+1] "https://puppet-compiler.wmflabs.org/output/1069325/3802/contint1002.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1069325 (https://phabricator.wikimedia.org/T359795) (owner: 10Dzahn)
[23:46:09] <wikibugs>	 (03PS1) 10Stoyofuku-wmf: Turn on donate link in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1069334 (https://phabricator.wikimedia.org/T372757)
[23:46:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68385 and previous config saved to /var/cache/conftool/dbconfig/20240830-234621-ladsgroup.json
[23:46:23] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[23:46:28] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[23:46:36] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[23:56:11] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:58:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P68386 and previous config saved to /var/cache/conftool/dbconfig/20240830-235804-ladsgroup.json