[00:02:49] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1382.eqiad.wmnet with reason: host reimage
[00:05:42] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1381.eqiad.wmnet with reason: host reimage
[00:09:15] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1379.eqiad.wmnet with reason: host reimage
[00:12:03] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1380.eqiad.wmnet with reason: host reimage
[00:12:03] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1375.eqiad.wmnet with reason: host reimage
[00:14:37] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:15:07] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:15:08] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1376.eqiad.wmnet with OS trixie
[00:15:15] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873879 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:15:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1248:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1248 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:16:09] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1383.eqiad.wmnet with OS trixie
[00:16:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873880 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[00:18:56] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:19:15] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:19:16] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1382.eqiad.wmnet with OS trixie
[00:19:24] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873881 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:19:52] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1384.eqiad.wmnet with OS trixie
[00:20:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[00:20:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1248:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1248 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:21:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:21:44] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1380.eqiad.wmnet with OS trixie
[00:21:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873883 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:22:16] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:22:37] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:22:38] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1381.eqiad.wmnet with OS trixie
[00:22:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873885 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:24:18] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1380.eqiad.wmnet with OS trixie
[00:24:27] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873886 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[00:25:26] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:26:10] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:26:11] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1379.eqiad.wmnet with OS trixie
[00:26:19] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873887 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:28:18] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1383.eqiad.wmnet with reason: host reimage
[00:29:37] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:30:01] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:30:02] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1375.eqiad.wmnet with OS trixie
[00:30:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873889 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:31:38] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1384.eqiad.wmnet with reason: host reimage
[00:33:44] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1383.eqiad.wmnet with reason: host reimage
[00:36:44] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1380.eqiad.wmnet with reason: host reimage
[00:37:54] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1384.eqiad.wmnet with reason: host reimage
[00:39:05] <wikibugs>	 (03PS2) 10C. Scott Ananian: Increase Parsoid Read Views to 60% of enwiki mobile web traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279453 (https://phabricator.wikimedia.org/T424880)
[00:39:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279453 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[00:39:41] <wikibugs>	 (03PS2) 10C. Scott Ananian: Increase Parsoid Read Views to 100% of enwiki mobile web traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279454 (https://phabricator.wikimedia.org/T424880)
[00:39:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279454 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[00:41:36] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1380.eqiad.wmnet with reason: host reimage
[00:49:50] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:50:07] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:50:08] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1383.eqiad.wmnet with OS trixie
[00:50:17] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873893 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:53:47] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:56:52] <logmsgbot>	 jclark@cumin1003 reimage (PID 2815818) is awaiting input
[00:57:01] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:57:02] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1384.eqiad.wmnet with OS trixie
[00:57:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873919 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:57:17] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:58:15] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[00:58:16] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1380.eqiad.wmnet with OS trixie
[00:58:27] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[00:59:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11873922 (10Jclark-ctr) These have finished wikikube-worker1375 wikikube-worker1376 wikik...
[01:09:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[01:09:52] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1279521
[01:09:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1279521 (owner: 10TrainBranchBot)
[01:21:56] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1279521 (owner: 10TrainBranchBot)
[02:01:42] <logmsgbot>	 !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image
[02:03:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:05:20] <jinxer-wm>	 FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 3d 11h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry
[02:08:09] <logmsgbot>	 !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 26s)
[02:09:20] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:19:42] <wikibugs>	 10ops-eqiad, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker1376:9290 - https://phabricator.wikimedia.org/T424917 (10phaultfinder) 03NEW
[02:34:20] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:42:24] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wikifunctions/python-evaluator on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[03:30:16] <wikibugs>	 (03CR) 10Dragoniez: [C:03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279477 (https://phabricator.wikimedia.org/T424898) (owner: 10VadymTS1)
[03:36:16] <wikibugs>	 (03CR) 10Dragoniez: "Should we check with the folks on the task before proceeding? Configuration changes are generally technically trivial, but that the task h" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1278382 (https://phabricator.wikimedia.org/T355445) (owner: 10VadymTS1)
[03:43:22] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, May 04 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279477 (https://phabricator.wikimedia.org/T424898) (owner: 10VadymTS1)
[03:44:41] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11873981 (10Papaul)
[03:46:42] <wikibugs>	 (03CR) 10Dragoniez: [C:04-1] mediawikiwiki: Changetags right only for bots and administrators in MediaWiki.org (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1278382 (https://phabricator.wikimedia.org/T355445) (owner: 10VadymTS1)
[04:08:19] <wikibugs>	 (03PS2) 10Ryan Kemper: cumin: repurpose wdqs-public, add wdqs-internal [puppet] - 10https://gerrit.wikimedia.org/r/1278603 (https://phabricator.wikimedia.org/T415073)
[04:08:41] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1278603 (https://phabricator.wikimedia.org/T415073) (owner: 10Ryan Kemper)
[04:21:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:30:54] <wikibugs>	 (03PS1) 10VadymTS1: Code bugs fixed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279633
[04:35:45] <wikibugs>	 (03Abandoned) 10VadymTS1: mediawikiwiki: Changetags right only for bots and administrators in MediaWiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1278382 (https://phabricator.wikimedia.org/T355445) (owner: 10VadymTS1)
[04:35:48] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11873989 (10Ladsgroup) >>! In T414805#11873042, @Nux wrote: >  > There are still loads of broken `MediaWiki:Common.css`. I'm usually...
[04:36:31] <wikibugs>	 (03Abandoned) 10VadymTS1: Code bugs fixed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279633 (owner: 10VadymTS1)
[04:39:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1274928 (https://phabricator.wikimedia.org/T423461) (owner: 10Codename Noreste)
[04:51:22] <wikibugs>	 (03CR) 10VadymTS1: "I'm closed this change because this very old phab ticket, I don't want to take risk and have to make changes after 2 years" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1278382 (https://phabricator.wikimedia.org/T355445) (owner: 10VadymTS1)
[04:52:29] <wikibugs>	 (03PS1) 10Marostegui: db2149: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1279649 (https://phabricator.wikimedia.org/T424792)
[04:53:01] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie
[04:53:06] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2149: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1279649 (https://phabricator.wikimedia.org/T424792) (owner: 10Marostegui)
[04:53:07] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie
[04:53:25] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie
[04:55:44] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie
[04:57:25] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission pc2012 [puppet] - 10https://gerrit.wikimedia.org/r/1279650 (https://phabricator.wikimedia.org/T424201)
[04:59:46] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts pc2012.codfw.wmnet
[04:59:51] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Decommission pc2012 [puppet] - 10https://gerrit.wikimedia.org/r/1279650 (https://phabricator.wikimedia.org/T424201) (owner: 10Marostegui)
[05:04:27] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.dns.netbox
[05:09:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[05:09:34] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[05:09:50] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[05:09:50] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:09:51] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2012.codfw.wmnet
[05:10:38] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc2012.codfw.wmnet - https://phabricator.wikimedia.org/T424201#11874046 (10Marostegui) a:05Marostegui→03None
[05:10:45] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc2012.codfw.wmnet - https://phabricator.wikimedia.org/T424201#11874050 (10Marostegui) This is ready for DC-Ops
[05:11:57] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc2012.codfw.wmnet - https://phabricator.wikimedia.org/T424201#11874052 (10Marostegui) a:03Jhancock.wm
[05:14:23] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
[05:18:41] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
[05:19:58] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db2146 [puppet] - 10https://gerrit.wikimedia.org/r/1279661 (https://phabricator.wikimedia.org/T424189)
[05:20:22] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts db2146.codfw.wmnet
[05:23:26] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Decommission db2146 [puppet] - 10https://gerrit.wikimedia.org/r/1279661 (https://phabricator.wikimedia.org/T424189) (owner: 10Marostegui)
[05:27:18] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.dns.netbox
[05:31:21] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2146.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[05:33:22] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2146.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[05:33:22] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:33:23] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2146.codfw.wmnet
[05:34:04] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2146.codfw.wmnet - https://phabricator.wikimedia.org/T424189#11874067 (10Marostegui) a:05Marostegui→03Jhancock.wm
[05:34:11] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2146.codfw.wmnet - https://phabricator.wikimedia.org/T424189#11874072 (10Marostegui) This is ready for DC-Ops
[05:34:35] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2146.codfw.wmnet - https://phabricator.wikimedia.org/T424189#11874074 (10Marostegui)
[05:35:25] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Remove db2147 [puppet] - 10https://gerrit.wikimedia.org/r/1279674 (https://phabricator.wikimedia.org/T424226)
[05:35:37] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2149: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1279675
[05:36:02] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Remove db2147 [puppet] - 10https://gerrit.wikimedia.org/r/1279674 (https://phabricator.wikimedia.org/T424226) (owner: 10Marostegui)
[05:36:32] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2149: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1279675 (owner: 10Marostegui)
[05:37:13] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove db2147 from dbctl T424226', diff saved to https://phabricator.wikimedia.org/P92000 and previous config saved to /var/cache/conftool/dbconfig/20260430-053712-marostegui.json
[05:37:18] <stashbot>	 T424226: decommission db2147.codfw.wmnet - https://phabricator.wikimedia.org/T424226
[05:38:05] <wikibugs>	 (03PS1) 10Marostegui: db2147: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1279679 (https://phabricator.wikimedia.org/T424226)
[05:38:48] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2147: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1279679 (https://phabricator.wikimedia.org/T424226) (owner: 10Marostegui)
[05:40:55] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie
[05:47:06] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie
[05:47:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/Translate] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1279079 (https://phabricator.wikimedia.org/T424618) (owner: 10Abijeet Patro)
[06:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T0600)
[06:00:05] <jouncebot>	 marostegui, Amir1, and federico3: Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T0600). Please do the needful.
[06:03:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:05:20] <jinxer-wm>	 FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 3d 7h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry
[06:32:32] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie
[06:32:49] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:04-1] "CR-1 to remind myself that I need to understand why the diffs generated by CI look so different from the git diffs." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1272765 (https://phabricator.wikimedia.org/T413448) (owner: 10Daniel Kinzler)
[06:42:24] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wikifunctions/python-evaluator on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[06:50:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend access for sarmbrutser [puppet] - 10https://gerrit.wikimedia.org/r/1280055
[06:54:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Extend access for sarmbrutser [puppet] - 10https://gerrit.wikimedia.org/r/1280055 (owner: 10Muehlenhoff)
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T0700).
[07:00:05] <jouncebot>	 phuedx and abijeet: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:01:34] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] admin: extend expiry_date for sarmbruster by 1 month [puppet] - 10https://gerrit.wikimedia.org/r/1279482 (https://phabricator.wikimedia.org/T424402) (owner: 10Dzahn)
[07:02:56] <abijeet>	 hello
[07:03:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host hcaptcha-proxy5003.wikimedia.org
[07:03:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[07:08:53] <logmsgbot>	 jmm@cumin2002 makevm (PID 1129140) is awaiting input
[07:15:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy5003.wikimedia.org - jmm@cumin2002"
[07:15:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy5003.wikimedia.org - jmm@cumin2002"
[07:15:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:15:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache hcaptcha-proxy5003.wikimedia.org on all recursors
[07:15:52] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha-proxy5003.wikimedia.org on all recursors
[07:16:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy5003.wikimedia.org - jmm@cumin2002"
[07:16:32] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy5003.wikimedia.org - jmm@cumin2002"
[07:16:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
[07:17:33] <kart_>	 jouncebot: now
[07:17:33] <jouncebot>	 For the next 0 hour(s) and 42 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T0700)
[07:17:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
[07:18:36] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.3 point update - https://phabricator.wikimedia.org/T414179#11874217 (10MoritzMuehlenhoff)
[07:18:47] <wikibugs>	 (03PS1) 10Brouberol: kafka-jumbo: set inter.broker.protocol to 3.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527)
[07:19:38] <logmsgbot>	 jmm@cumin2002 makevm (PID 1129140) is awaiting input
[07:19:40] <kart_>	 phuedx: is your change deployed?
[07:19:44] <wikibugs>	 (03PS2) 10Brouberol: kafka-jumbo: set inter.broker.protocol to 3.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527)
[07:20:30] <phuedx>	 kart_: Hey. Sorry. I was delayed
[07:21:08] <phuedx>	 abijeet, kart_: Can you deploy your patch?
[07:21:34] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527) (owner: 10Brouberol)
[07:21:48] <kart_>	 phuedx: I can deploy abijeet's change :) 
[07:22:10] <phuedx>	 Cool. Please do. I'll get ready to deploy mine :)
[07:22:13] <phuedx>	 Sorry for the delay both
[07:22:35] <kart_>	 no problem.
[07:22:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy1003 using scap backport" [extensions/Translate] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1279079 (https://phabricator.wikimedia.org/T424618) (owner: 10Abijeet Patro)
[07:24:11] <abijeet>	 kart_, thanks
[07:25:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
[07:27:34] <wikibugs>	 (03PS3) 10Brouberol: kafka-jumbo: set inter.broker.protocol to 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527)
[07:29:41] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527) (owner: 10Brouberol)
[07:31:03] <wikibugs>	 (03PS1) 10Muehlenhoff: d-i: Remove dhcpcd-base after installation completed [puppet] - 10https://gerrit.wikimedia.org/r/1280082 (https://phabricator.wikimedia.org/T414341)
[07:33:02] <wikibugs>	 (03PS1) 10MVernon: swift: restore 2 nodes to rings, drain 2 more for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1280083 (https://phabricator.wikimedia.org/T354872)
[07:33:11] <wikibugs>	 (03Merged) 10jenkins-bot: Don't load general modules  as style modules [extensions/Translate] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1279079 (https://phabricator.wikimedia.org/T424618) (owner: 10Abijeet Patro)
[07:33:26] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
[07:35:57] <logmsgbot>	 !log kartik@deploy1003 Started scap sync-world: Backport for [[gerrit:1279079|Don't load general modules  as style modules (T424618)]]
[07:36:01] <stashbot>	 T424618: Increase in "Unexpected general module "ext.translate.special.XXXX in styles queue" Resourceloader errors - https://phabricator.wikimedia.org/T424618
[07:36:40] <wikibugs>	 (03CR) 10JavierMonton: [C:03+1] kafka-jumbo: set inter.broker.protocol to 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527) (owner: 10Brouberol)
[07:37:44] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host hcaptcha-proxy5003.wikimedia.org with OS bookworm
[07:37:57] <logmsgbot>	 !log kartik@deploy1003 kartik, abi: Backport for [[gerrit:1279079|Don't load general modules  as style modules (T424618)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:37:59] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874232 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host hcaptcha-proxy5003.wikimedia.org with OS bookworm
[07:38:43] <kart_>	 abijeet: available for testing. Let me know.
[07:39:12] <abijeet>	 kart_, ok
[07:41:24] <abijeet>	 kart_, looks good.
[07:41:29] <kart_>	 cool
[07:41:35] <logmsgbot>	 !log kartik@deploy1003 kartik, abi: Continuing with deployment
[07:45:26] <logmsgbot>	 !log kartik@deploy1003 Finished scap sync-world: Backport for [[gerrit:1279079|Don't load general modules  as style modules (T424618)]] (duration: 09m 29s)
[07:45:31] <stashbot>	 T424618: Increase in "Unexpected general module "ext.translate.special.XXXX in styles queue" Resourceloader errors - https://phabricator.wikimedia.org/T424618
[07:47:10] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
[07:47:18] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2153 (T419961)', diff saved to https://phabricator.wikimedia.org/P92006 and previous config saved to /var/cache/conftool/dbconfig/20260430-074717-fceratto.json
[07:47:22] <kart_>	 phuedx: we're done.
[07:48:00] <abijeet>	 kart_, thanks
[07:48:27] <phuedx>	 Thanks
[07:48:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy1003 using scap backport" [extensions/TestKitchen] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1279476 (owner: 10Phuedx)
[07:54:31] <wikibugs>	 (03Merged) 10jenkins-bot: JS SDK: Remove compat deprecation warnings [extensions/TestKitchen] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1279476 (owner: 10Phuedx)
[07:54:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T419961)', diff saved to https://phabricator.wikimedia.org/P92007 and previous config saved to /var/cache/conftool/dbconfig/20260430-075436-fceratto.json
[07:55:01] <logmsgbot>	 !log phuedx@deploy1003 Started scap sync-world: Backport for [[gerrit:1279476|JS SDK: Remove compat deprecation warnings]]
[07:56:51] <logmsgbot>	 !log phuedx@deploy1003 phuedx: Backport for [[gerrit:1279476|JS SDK: Remove compat deprecation warnings]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:57:46] <jinxer-wm>	 FIRING: [2x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
[08:01:16] <logmsgbot>	 !log phuedx@deploy1003 phuedx: Continuing with deployment
[08:01:37] <phuedx>	 Checked on a group1 wiki that the deprecation warnings weren't coming through. LGTM
[08:02:46] <jinxer-wm>	 RESOLVED: [2x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
[08:04:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P92008 and previous config saved to /var/cache/conftool/dbconfig/20260430-080444-fceratto.json
[08:05:14] <logmsgbot>	 !log phuedx@deploy1003 Finished scap sync-world: Backport for [[gerrit:1279476|JS SDK: Remove compat deprecation warnings]] (duration: 10m 13s)
[08:08:17] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[08:08:56] <phuedx>	 Cool. I think that's the window over
[08:08:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[08:09:21] <phuedx>	 !log UTC morning backport window finished
[08:09:24] <moritzm>	 !log installing rsync security updates
[08:09:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:29] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
[08:10:00] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
[08:14:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] d-i: Remove dhcpcd-base after installation completed [puppet] - 10https://gerrit.wikimedia.org/r/1280082 (https://phabricator.wikimedia.org/T414341) (owner: 10Muehlenhoff)
[08:14:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P92009 and previous config saved to /var/cache/conftool/dbconfig/20260430-081452-fceratto.json
[08:18:32] <moritzm>	 !log installing nginx security updates
[08:18:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:22:46] <wikibugs>	 (03CR) 10Hashar: [C:03+2] wm-checks-api: add tag for PostgreSQL jobs [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1266965 (owner: 10Hashar)
[08:23:35] <wikibugs>	 (03Merged) 10jenkins-bot: wm-checks-api: add tag for PostgreSQL jobs [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1266965 (owner: 10Hashar)
[08:23:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:04-1] "My understanding is that there are two issues at play here:" [puppet] - 10https://gerrit.wikimedia.org/r/1278524 (https://phabricator.wikimedia.org/T422646) (owner: 10Andrew Bogott)
[08:25:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T419961)', diff saved to https://phabricator.wikimedia.org/P92010 and previous config saved to /var/cache/conftool/dbconfig/20260430-082501-fceratto.json
[08:25:22] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
[08:25:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2170 (T419961)', diff saved to https://phabricator.wikimedia.org/P92011 and previous config saved to /var/cache/conftool/dbconfig/20260430-082530-fceratto.json
[08:25:45] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha-proxy5003.wikimedia.org with reason: host reimage
[08:27:34] <logmsgbot>	 !log hashar@deploy1003 Started deploy [gerrit/gerrit@83b886a]: wm-checks-api: add tag for PostgreSQL jobs
[08:27:48] <logmsgbot>	 !log hashar@deploy1003 Finished deploy [gerrit/gerrit@83b886a]: wm-checks-api: add tag for PostgreSQL jobs (duration: 00m 14s)
[08:29:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha-proxy5003.wikimedia.org with reason: host reimage
[08:31:05] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11874323 (10Nux) >>! In T414805#11873989, @Ladsgroup wrote: >>>! In T414805#11873042, @Nux wrote: >>  >> There are still loads of bro...
[08:33:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170 (T419961)', diff saved to https://phabricator.wikimedia.org/P92012 and previous config saved to /var/cache/conftool/dbconfig/20260430-083313-fceratto.json
[08:33:18] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: kserve-inference: allow ingress on queue-proxy port 8013. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280202 (https://phabricator.wikimedia.org/T424049)
[08:35:02] <wikibugs>	 (03PS2) 10Bartosz Wójtowicz: kserve-inference: allow ingress on queue-proxy port 8013. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280202 (https://phabricator.wikimedia.org/T424049)
[08:42:19] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+1] kserve-inference: allow ingress on queue-proxy port 8013. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280202 (https://phabricator.wikimedia.org/T424049) (owner: 10Bartosz Wójtowicz)
[08:42:36] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+2] kserve-inference: allow ingress on queue-proxy port 8013. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280202 (https://phabricator.wikimedia.org/T424049) (owner: 10Bartosz Wójtowicz)
[08:43:22] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P92013 and previous config saved to /var/cache/conftool/dbconfig/20260430-084321-fceratto.json
[08:47:33] <wikibugs>	 (03Merged) 10jenkins-bot: kserve-inference: allow ingress on queue-proxy port 8013. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280202 (https://phabricator.wikimedia.org/T424049) (owner: 10Bartosz Wójtowicz)
[08:48:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha-proxy5003.wikimedia.org with OS bookworm
[08:48:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha-proxy5003.wikimedia.org
[08:48:54] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874354 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host hcaptcha-proxy5003.wikimedia.org with OS bookworm completed:...
[08:49:15] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[08:49:41] <wikibugs>	 (03PS1) 10DCausse: cirrus-streaming-updater: bump to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280206 (https://phabricator.wikimedia.org/T424799)
[08:49:47] <dcausse>	 jouncebot: nowandnext
[08:49:47] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 10 minute(s)
[08:49:47] <jouncebot>	 In 1 hour(s) and 10 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1000)
[08:50:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host hcaptcha-proxy5004.wikimedia.org
[08:50:40] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha-proxy5004.wikimedia.org
[08:50:40] <jinxer-wm>	 FIRING: ProbeDown: Service etherpad1004:9001 has failed probes (http_etherpad_nodejs_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#etherpad1004:9001 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:51:01] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host hcaptcha-proxy5004.wikimedia.org
[08:51:03] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[08:52:56] <wikibugs>	 (03CR) 10DCausse: [C:03+2] cirrus-streaming-updater: bump to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280206 (https://phabricator.wikimedia.org/T424799) (owner: 10DCausse)
[08:53:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P92014 and previous config saved to /var/cache/conftool/dbconfig/20260430-085329-fceratto.json
[08:54:15] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[08:54:25] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[08:54:35] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[08:54:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy5004.wikimedia.org - jmm@cumin2002"
[08:55:04] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy5004.wikimedia.org - jmm@cumin2002"
[08:55:04] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:55:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache hcaptcha-proxy5004.wikimedia.org on all recursors
[08:55:05] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-streaming-updater: bump to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280206 (https://phabricator.wikimedia.org/T424799) (owner: 10DCausse)
[08:55:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha-proxy5004.wikimedia.org on all recursors
[08:55:40] <jinxer-wm>	 RESOLVED: ProbeDown: Service etherpad1004:9001 has failed probes (http_etherpad_nodejs_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#etherpad1004:9001 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:55:44] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy5004.wikimedia.org - jmm@cumin2002"
[08:55:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy5004.wikimedia.org - jmm@cumin2002"
[08:56:43] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[08:56:58] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[08:57:37] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host hcaptcha-proxy5004.wikimedia.org with OS bookworm
[08:57:47] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874378 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host hcaptcha-proxy5004.wikimedia.org with OS bookworm
[09:01:21] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[09:01:27] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[09:02:39] <wikibugs>	 (03CR) 10JavierMonton: [C:03+1] alerts: mw-page-html-feature-counts-change-enrich (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1278559 (https://phabricator.wikimedia.org/T424224) (owner: 10AKhatun)
[09:03:31] <wikibugs>	 (03CR) 10Btullis: [C:03+1] kafka-jumbo: set inter.broker.protocol to 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527) (owner: 10Brouberol)
[09:03:38] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170 (T419961)', diff saved to https://phabricator.wikimedia.org/P92015 and previous config saved to /var/cache/conftool/dbconfig/20260430-090337-fceratto.json
[09:03:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] kafka-jumbo: set inter.broker.protocol to 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/1280078 (https://phabricator.wikimedia.org/T424527) (owner: 10Brouberol)
[09:04:00] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
[09:04:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2173 (T419961)', diff saved to https://phabricator.wikimedia.org/P92016 and previous config saved to /var/cache/conftool/dbconfig/20260430-090408-fceratto.json
[09:04:30] <wikibugs>	 (03PS1) 10Phuedx: mw.testKitchen.getExperiment() -> mw.testKitchen.compat.getExperiment() [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280210 (https://phabricator.wikimedia.org/T419513)
[09:06:02] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280210 (https://phabricator.wikimedia.org/T419513) (owner: 10Phuedx)
[09:09:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[09:11:48] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T419961)', diff saved to https://phabricator.wikimedia.org/P92017 and previous config saved to /var/cache/conftool/dbconfig/20260430-091147-fceratto.json
[09:12:56] <logmsgbot>	 !log brouberol@cumin1003 START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
[09:15:20] <logmsgbot>	 !log brouberol@cumin1003 END (ERROR) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=97) rolling restart_daemons on A:kafka-jumbo-eqiad
[09:17:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
[09:18:11] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:04-1] "I think it's just re-ordering. But it's a bit confusing. Would be good to know how to proeprly test these routes before rolling out." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1272765 (https://phabricator.wikimedia.org/T413448) (owner: 10Daniel Kinzler)
[09:21:54] <wikibugs>	 (03PS1) 10Sergio Gimeno: loggedOutWarning: instrument browser navigation and tab close [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280226 (https://phabricator.wikimedia.org/T421518)
[09:21:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P92018 and previous config saved to /var/cache/conftool/dbconfig/20260430-092154-fceratto.json
[09:22:06] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280226 (https://phabricator.wikimedia.org/T421518) (owner: 10Sergio Gimeno)
[09:24:28] <moritzm>	 !log temporarily remove ganeti4006 from the ulsfo02 Ganeti cluster in preparation of forthcoming switch maintenance in ulsfo T424686
[09:24:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:32] <stashbot>	 T424686: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686
[09:26:43] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti4006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[09:27:03] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti4006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[09:27:21] <moritzm>	 !log failover Ganeti master in ulsfo02 to ganeti4005 in preparation of forthcoming switch maintenance in ulsfo T424686
[09:27:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:10] <jinxer-wm>	 FIRING: [17x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:28:43] <icinga-wm>	 PROBLEM - ganeti-wconfd running on ganeti4008 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti
[09:32:03] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P92019 and previous config saved to /var/cache/conftool/dbconfig/20260430-093202-fceratto.json
[09:35:35] <wikibugs>	 (03CR) 10Tiziano Fogli: logstash: add thanos-query-frontend filter (0313 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1275800 (https://phabricator.wikimedia.org/T423986) (owner: 10Tiziano Fogli)
[09:35:51] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "The change is ok, but I don't recognize that syntax for regexes on the description." [puppet] - 10https://gerrit.wikimedia.org/r/1280083 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon)
[09:42:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T419961)', diff saved to https://phabricator.wikimedia.org/P92020 and previous config saved to /var/cache/conftool/dbconfig/20260430-094210-fceratto.json
[09:42:32] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
[09:42:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2174 (T419961)', diff saved to https://phabricator.wikimedia.org/P92021 and previous config saved to /var/cache/conftool/dbconfig/20260430-094239-fceratto.json
[09:43:14] <wikibugs>	 (03PS5) 10Tiziano Fogli: rsyslog: forward thanos-query-frontend logs to kafka [puppet] - 10https://gerrit.wikimedia.org/r/1275799 (https://phabricator.wikimedia.org/T423986)
[09:43:14] <wikibugs>	 (03PS8) 10Tiziano Fogli: logstash: add thanos-query-frontend filter [puppet] - 10https://gerrit.wikimedia.org/r/1275800 (https://phabricator.wikimedia.org/T423986)
[09:43:48] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Figure out plan for mailman IP situation - https://phabricator.wikimedia.org/T278495#11874459 (10ABran-WMF) 05Open→03Resolved Now that {T286066} is done, and the MX record has been updated:  ` ~ $ dig MX lists.wikimedia.org +short 10 lists10...
[09:47:23] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] dse-k8s: adding more opensearch namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279423 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[09:50:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T419961)', diff saved to https://phabricator.wikimedia.org/P92022 and previous config saved to /var/cache/conftool/dbconfig/20260430-095000-fceratto.json
[09:50:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha-proxy5004.wikimedia.org with reason: host reimage
[09:50:33] <wikibugs>	 (03PS3) 10Atsuko: dse-k8s: adding more opensearch namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279423 (https://phabricator.wikimedia.org/T424248)
[09:50:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:04-1] "Something else that occurred to me re: 2, we could switch to match zookeeper_clusters on hostname rather than fqdn (or try both), though t" [puppet] - 10https://gerrit.wikimedia.org/r/1278524 (https://phabricator.wikimedia.org/T422646) (owner: 10Andrew Bogott)
[09:54:32] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha-proxy5004.wikimedia.org with reason: host reimage
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1000)
[10:00:10] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P92023 and previous config saved to /var/cache/conftool/dbconfig/20260430-100009-fceratto.json
[10:01:48] <wikibugs>	 (03PS2) 10MVernon: swift: restore 2 nodes to rings, drain 2 more for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1280083 (https://phabricator.wikimedia.org/T354872)
[10:02:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 4.442% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[10:02:33] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: restore 2 nodes to rings, drain 2 more for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1280083 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon)
[10:05:20] <jinxer-wm>	 FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 3d 3h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry
[10:05:57] <wikibugs>	 10SRE-swift-storage, 10Ceph, 06Data-Persistence, 06DBA: Data persistance: Re-IP eqiad private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T421719#11874500 (10MatthewVernon)
[10:06:48] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations, 13Patch-For-Review: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11874504 (10MatthewVernon)
[10:07:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 9.435% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[10:10:18] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P92024 and previous config saved to /var/cache/conftool/dbconfig/20260430-101017-fceratto.json
[10:14:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha-proxy5004.wikimedia.org with OS bookworm
[10:14:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha-proxy5004.wikimedia.org
[10:14:55] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874553 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host hcaptcha-proxy5004.wikimedia.org with OS bookworm completed:...
[10:16:59] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] "re-trigger" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279423 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[10:20:26] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T419961)', diff saved to https://phabricator.wikimedia.org/P92025 and previous config saved to /var/cache/conftool/dbconfig/20260430-102026-fceratto.json
[10:20:48] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
[10:20:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2176 (T419961)', diff saved to https://phabricator.wikimedia.org/P92026 and previous config saved to /var/cache/conftool/dbconfig/20260430-102055-fceratto.json
[10:24:52] <wikibugs>	 (03Merged) 10jenkins-bot: dse-k8s: adding more opensearch namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279423 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[10:28:10] <jinxer-wm>	 FIRING: [17x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:28:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T419961)', diff saved to https://phabricator.wikimedia.org/P92027 and previous config saved to /var/cache/conftool/dbconfig/20260430-102830-fceratto.json
[10:35:00] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
[10:36:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
[10:36:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host bast5005.wikimedia.org
[10:36:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[10:38:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P92028 and previous config saved to /var/cache/conftool/dbconfig/20260430-103838-fceratto.json
[10:39:28] <phuedx>	 !log Clearing stuck Test Kitchen experiment configs value from codfw local cluster cache
[10:39:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:24] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5005.wikimedia.org - jmm@cumin2002"
[10:40:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5005.wikimedia.org - jmm@cumin2002"
[10:40:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:40:44] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache bast5005.wikimedia.org on all recursors
[10:40:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5005.wikimedia.org on all recursors
[10:41:18] <wikibugs>	 (03PS1) 10Marostegui: db2205: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1280280
[10:41:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5005.wikimedia.org - jmm@cumin2002"
[10:41:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5005.wikimedia.org - jmm@cumin2002"
[10:41:47] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host bast5005.wikimedia.org with OS trixie
[10:41:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
[10:42:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
[10:42:03] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874617 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast5005.wikimedia.org with OS trixie
[10:42:24] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wikifunctions/python-evaluator on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:46:20] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2205: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1280280 (owner: 10Marostegui)
[10:46:55] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2205.codfw.wmnet with reason: Reimage to Trixie
[10:47:01] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db2205: Reimage to Trixie
[10:47:18] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2205: Reimage to Trixie
[10:48:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P92030 and previous config saved to /var/cache/conftool/dbconfig/20260430-104846-fceratto.json
[10:48:54] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db2205.codfw.wmnet with OS trixie
[10:49:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete Hiera file [puppet] - 10https://gerrit.wikimedia.org/r/1273792 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[10:52:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[10:53:47] <wikibugs>	 (03PS3) 10Muehlenhoff: http-sso-django-login: Switch to firewall::service and restrict access [puppet] - 10https://gerrit.wikimedia.org/r/1276526 (https://phabricator.wikimedia.org/T149804)
[10:58:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T419961)', diff saved to https://phabricator.wikimedia.org/P92031 and previous config saved to /var/cache/conftool/dbconfig/20260430-105854-fceratto.json
[10:59:17] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
[10:59:25] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2188 (T419961)', diff saved to https://phabricator.wikimedia.org/P92032 and previous config saved to /var/cache/conftool/dbconfig/20260430-105924-fceratto.json
[10:59:55] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[11:00:29] <wikibugs>	 (03CR) 10Muehlenhoff: http-sso-django-login: Switch to firewall::service and restrict access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1276526 (https://phabricator.wikimedia.org/T149804) (owner: 10Muehlenhoff)
[11:00:51] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[11:01:17] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
[11:02:10] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
[11:06:41] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188 (T419961)', diff saved to https://phabricator.wikimedia.org/P92033 and previous config saved to /var/cache/conftool/dbconfig/20260430-110640-fceratto.json
[11:06:57] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
[11:10:27] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
[11:13:15] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q4:rack/setup/install ms-be209[7,8] - https://phabricator.wikimedia.org/T424892#11874684 (10MatthewVernon) a:05MatthewVernon→03None No changes needed for this - modules/install_server/files/autoinstall/scripts/partman_early_comman...
[11:15:42] <wikibugs>	 (03PS1) 10JMeybohm: Update rsyslog image to trixie and rsyslog 8.2504.0-1 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200)
[11:16:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P92034 and previous config saved to /var/cache/conftool/dbconfig/20260430-111648-fceratto.json
[11:18:23] <wikibugs>	 (03PS1) 10JMeybohm: Bump default rsyslog container version to 8.2504.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/1280317 (https://phabricator.wikimedia.org/T418200)
[11:18:26] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts: fix ipmi() calls after spicerack 12.5.0 [cookbooks] - 10https://gerrit.wikimedia.org/r/1279379 (https://phabricator.wikimedia.org/T418929) (owner: 10Elukey)
[11:18:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11874719 (10MoritzMuehlenhoff)
[11:19:22] <elukey>	 !log upgrade spicerack on cumin hosts to 12.5.0
[11:19:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:52] <moritzm>	 !log installing policykit-1 security updates
[11:20:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:56] <wikibugs>	 (03PS1) 10JMeybohm: Test updated rsyslog image on mw-experimental and mw-web canary [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280324 (https://phabricator.wikimedia.org/T418200)
[11:26:19] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2205: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1280333
[11:26:37] <wikibugs>	 (03PS2) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929)
[11:26:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P92035 and previous config saved to /var/cache/conftool/dbconfig/20260430-112656-fceratto.json
[11:27:07] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1039.eqiad.wmnet
[11:27:09] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1039.eqiad.wmnet
[11:27:11] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Multi-bit memory errors on wikikube-worker1039.eqiad.wmnet - https://phabricator.wikimedia.org/T424797#11874759 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin1003 pool for host wikikube-w...
[11:27:15] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2205: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1280333 (owner: 10Marostegui)
[11:27:31] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.remove-downtime for wikikube-worker1039.eqiad.wmnet
[11:27:31] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker1039.eqiad.wmnet
[11:27:48] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:28:52] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:31:29] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker1376:9290 - https://phabricator.wikimedia.org/T424917#11874781 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr
[11:33:29] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2205.codfw.wmnet with OS trixie
[11:34:44] <wikibugs>	 (03CR) 10Blake: [C:03+1] Bump default rsyslog container version to 8.2504.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/1280317 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[11:35:29] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host wikikube-worker1377.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:35:35] <wikibugs>	 (03CR) 10Blake: "It doesn't look like there's anything in this CR explicitly updating the version to 8.2504.0-1, is that happening implicitly somehow?" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[11:35:39] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install db1265-db1290 - https://phabricator.wikimedia.org/T418909#11874800 (10VRiley-WMF)
[11:35:58] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:36:08] <wikibugs>	 (03CR) 10Blake: [C:03+1] Test updated rsyslog image on mw-experimental and mw-web canary [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280324 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[11:36:14] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:37:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188 (T419961)', diff saved to https://phabricator.wikimedia.org/P92036 and previous config saved to /var/cache/conftool/dbconfig/20260430-113704-fceratto.json
[11:38:27] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host wikikube-worker1378.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:39:03] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1159.eqiad.wmnet with reason: Maintenance
[11:39:05] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db2205: after reimage to trixie
[11:39:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1159 (T419635)', diff saved to https://phabricator.wikimedia.org/P92038 and previous config saved to /var/cache/conftool/dbconfig/20260430-113910-fceratto.json
[11:39:16] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[11:39:41] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
[11:39:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92039 and previous config saved to /var/cache/conftool/dbconfig/20260430-113948-fceratto.json
[11:40:21] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T419635)', diff saved to https://phabricator.wikimedia.org/P92040 and previous config saved to /var/cache/conftool/dbconfig/20260430-114020-fceratto.json
[11:40:37] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on bast5005.wikimedia.org with reason: host reimage
[11:42:15] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1377.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:44:50] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1377.eqiad.wmnet with OS trixie
[11:45:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast5005.wikimedia.org with reason: host reimage
[11:45:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11874825 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[11:45:11] <wikibugs>	 (03PS1) 10Muehlenhoff: Assign the hcaptcha::proxy role to  hcaptcha-proxy5003/5004 [puppet] - 10https://gerrit.wikimedia.org/r/1280353 (https://phabricator.wikimedia.org/T421863)
[11:45:53] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874840 (10MoritzMuehlenhoff)
[11:46:03] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1378.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:46:26] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host wikikube-worker1378.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[11:47:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92041 and previous config saved to /var/cache/conftool/dbconfig/20260430-114703-fceratto.json
[11:47:11] <logmsgbot>	 !log jnuche@deploy1003 Started deploy [releng/jenkins-deploy@fb711fc] (releasing): Update backup releases Jenkins
[11:47:34] <logmsgbot>	 !log jnuche@deploy1003 Finished deploy [releng/jenkins-deploy@fb711fc] (releasing): Update backup releases Jenkins (duration: 00m 33s)
[11:49:33] <logmsgbot>	 !log jnuche@deploy1003 Started deploy [releng/jenkins-deploy@fb711fc] (releasing): Update production releases Jenkins
[11:50:22] <logmsgbot>	 !log jnuche@deploy1003 Finished deploy [releng/jenkins-deploy@fb711fc] (releasing): Update production releases Jenkins (duration: 01m 04s)
[11:50:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P92042 and previous config saved to /var/cache/conftool/dbconfig/20260430-115028-fceratto.json
[11:56:43] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1377.eqiad.wmnet with reason: host reimage
[11:57:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P92044 and previous config saved to /var/cache/conftool/dbconfig/20260430-115712-fceratto.json
[12:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1200)
[12:00:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P92045 and previous config saved to /var/cache/conftool/dbconfig/20260430-120036-fceratto.json
[12:00:44] <wikibugs>	 (03CR) 10Cathal Mooney: Add BGP peering from asw1-23 to core routers and mr1 (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1279501 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[12:00:53] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1377.eqiad.wmnet with reason: host reimage
[12:02:40] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11874964 (10MoritzMuehlenhoff)
[12:03:00] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[12:04:04] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1378.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[12:04:53] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1378.eqiad.wmnet with OS trixie
[12:05:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11874988 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[12:05:19] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast5005.wikimedia.org with OS trixie
[12:05:19] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5005.wikimedia.org
[12:05:32] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11874989 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast5005.wikimedia.org with OS trixie completed: - bast5005...
[12:07:02] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11874991 (10neriah) >>! In T414805#11874323, @Nux wrote: > And that is on top of WMF staff already making interface edits harder by f...
[12:07:21] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P92046 and previous config saved to /var/cache/conftool/dbconfig/20260430-120720-fceratto.json
[12:07:21] <wikibugs>	 (03PS1) 10Urbanecm: ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280368 (https://phabricator.wikimedia.org/T418194)
[12:08:26] <wikibugs>	 (03PS1) 10Urbanecm: ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.24) - 10https://gerrit.wikimedia.org/r/1280370 (https://phabricator.wikimedia.org/T418194)
[12:08:39] <logmsgbot>	 cmooney@cumin1003 netbox (PID 2991460) is awaiting input
[12:08:43] <urbanecm>	 jouncebot: nowandnext
[12:08:43] <jouncebot>	 For the next 0 hour(s) and 51 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1200)
[12:08:43] <jouncebot>	 In 0 hour(s) and 51 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1300)
[12:08:51] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280368 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:08:58] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.24) - 10https://gerrit.wikimedia.org/r/1280370 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:09:38] <wikibugs>	 (03PS1) 10MVernon: swift: prep for ms-be11* [puppet] - 10https://gerrit.wikimedia.org/r/1280373 (https://phabricator.wikimedia.org/T424895)
[12:09:49] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11875018 (10MoritzMuehlenhoff)
[12:10:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T419635)', diff saved to https://phabricator.wikimedia.org/P92048 and previous config saved to /var/cache/conftool/dbconfig/20260430-121044-fceratto.json
[12:10:50] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[12:11:02] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[12:11:10] <moritzm>	 !log installing gdk-pixbuf security updates
[12:11:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:23] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:11:28] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install ms-be1098, ms-be1099, ms-be1100 - https://phabricator.wikimedia.org/T424895#11875047 (10MatthewVernon)
[12:11:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T419635)', diff saved to https://phabricator.wikimedia.org/P92049 and previous config saved to /var/cache/conftool/dbconfig/20260430-121130-fceratto.json
[12:11:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1003 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280368 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:11:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1003 using scap backport" [extensions/GrowthExperiments] (wmf/1.46.0-wmf.24) - 10https://gerrit.wikimedia.org/r/1280370 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:12:35] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279385 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[12:13:41] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T419635)', diff saved to https://phabricator.wikimedia.org/P92050 and previous config saved to /var/cache/conftool/dbconfig/20260430-121340-fceratto.json
[12:16:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Add durum5003/5004 [puppet] - 10https://gerrit.wikimedia.org/r/1280375 (https://phabricator.wikimedia.org/T421863)
[12:16:20] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279388 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[12:17:15] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[12:17:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92051 and previous config saved to /var/cache/conftool/dbconfig/20260430-121728-fceratto.json
[12:17:51] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
[12:17:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2216 (T419961)', diff saved to https://phabricator.wikimedia.org/P92052 and previous config saved to /var/cache/conftool/dbconfig/20260430-121758-fceratto.json
[12:18:40] <wikibugs>	 (03Merged) 10jenkins-bot: ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280368 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:18:46] <wikibugs>	 (03Merged) 10jenkins-bot: ReassignMentees: Add logging information [extensions/GrowthExperiments] (wmf/1.46.0-wmf.24) - 10https://gerrit.wikimedia.org/r/1280370 (https://phabricator.wikimedia.org/T418194) (owner: 10Urbanecm)
[12:19:15] <logmsgbot>	 !log urbanecm@deploy1003 Started scap sync-world: Backport for [[gerrit:1280368|ReassignMentees: Add logging information (T418194)]], [[gerrit:1280370|ReassignMentees: Add logging information (T418194)]]
[12:19:19] <stashbot>	 T418194: Mentors still having mentees after removing themselves - https://phabricator.wikimedia.org/T418194
[12:19:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to plain
[12:20:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to plain
[12:20:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 8.023% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[12:20:21] <logmsgbot>	 jclark@cumin1003 reimage (PID 2977448) is awaiting input
[12:21:06] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm: Backport for [[gerrit:1280368|ReassignMentees: Add logging information (T418194)]], [[gerrit:1280370|ReassignMentees: Add logging information (T418194)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[12:21:40] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:21:53] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to plain
[12:22:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[12:23:37] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to plain
[12:23:45] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm: Continuing with deployment
[12:23:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P92053 and previous config saved to /var/cache/conftool/dbconfig/20260430-122348-fceratto.json
[12:24:30] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2205: after reimage to trixie
[12:25:17] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2216 (T419961)', diff saved to https://phabricator.wikimedia.org/P92055 and previous config saved to /var/cache/conftool/dbconfig/20260430-122516-fceratto.json
[12:26:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-web releases routed via main (k8s) 1.442s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:26:16] <urbanecm>	 something's wrong with scap...
[12:28:12] <urbanecm>	 jhathaway: elukey: https://spiderpig.wikimedia.org/jobs/1862 says something about a missing values.yaml file, but i can't get the page with logs loaded...
[12:28:20] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11875160 (10A_smart_kitten) >>! In T414805#11873989, @Ladsgroup wrote: >>>! In T414805#11873042, @Nux wrote: >>  >> There are still l...
[12:28:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to plain
[12:28:31] <wikibugs>	 (03CR) 10ArielGlenn: rest gateway: rate limits for liftwing (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1272765 (https://phabricator.wikimedia.org/T413448) (owner: 10Daniel Kinzler)
[12:28:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to plain
[12:29:47] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct typo with reverse for mr1-ulsfo address - cmooney@cumin1003"
[12:31:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[12:32:17] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "LGTM!" [dns] - 10https://gerrit.wikimedia.org/r/1279402 (https://phabricator.wikimedia.org/T424785) (owner: 10CDobbins)
[12:32:26] <elukey>	 urbanecm: o/
[12:32:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to plain
[12:32:53] <logmsgbot>	 cmooney@cumin1003 netbox (PID 2991460) is awaiting input
[12:32:55] <urbanecm>	 elukey: scap failed due to $reasons, and https://spiderpig.wikimedia.org/jobs/1862#log refuses to load :/
[12:33:50] <elukey>	 yeah same for me, my browser crashes
[12:33:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P92056 and previous config saved to /var/cache/conftool/dbconfig/20260430-123356-fceratto.json
[12:33:58] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct typo with reverse for mr1-ulsfo address - cmooney@cumin1003"
[12:33:58] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:34:15] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] swift: prep for ms-be11* [puppet] - 10https://gerrit.wikimedia.org/r/1280373 (https://phabricator.wikimedia.org/T424895) (owner: 10MVernon)
[12:34:37] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to plain
[12:34:52] <urbanecm>	 not cool. on the mainpage, i can tell it to either retry or abort, but w/o seeing the logs, i have no clue what makes more sense :/
[12:35:25] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P92057 and previous config saved to /var/cache/conftool/dbconfig/20260430-123524-fceratto.json
[12:35:47] <wikibugs>	 (03PS3) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929)
[12:35:53] <elukey>	 urbanecm: those logs should be somewhere, lemme check
[12:36:07] <urbanecm>	 i see something in https://logstash.wikimedia.org/goto/05b61171fa7033fc1a1fd7bcbe139de4
[12:36:13] <urbanecm>	 trying to get to the actual error
[12:36:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-web releases routed via main (k8s) 1.055s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:37:09] <urbanecm>	  Deployment of mw-cron-main-eqiad failed: Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'apply']' returned non-zero exit status 1.
[12:37:56] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[12:37:58] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1377.eqiad.wmnet with OS trixie
[12:38:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11875244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[12:40:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 2.755% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[12:40:23] <elukey>	 urbanecm: I am a bit ignorant about spiderpig, maybe we could ping somebody from releng?
[12:40:26] <elukey>	 oh noes
[12:40:36] <elukey>	 is that your deployment?
[12:40:49] <urbanecm>	 depending on where it left things
[12:41:09] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: prep for ms-be11* [puppet] - 10https://gerrit.wikimedia.org/r/1280373 (https://phabricator.wikimedia.org/T424895) (owner: 10MVernon)
[12:41:10] <wikibugs>	 (03PS1) 10STran: Fix incorrect source in back instrumentation [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280386 (https://phabricator.wikimedia.org/T424075)
[12:41:15] <urbanecm>	 it could be, as it failed somewhere in the middle
[12:41:18] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280386 (https://phabricator.wikimedia.org/T424075) (owner: 10STran)
[12:41:39] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy the latest version of rr-multilingual model server on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279388 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[12:41:49] <elukey>	 mmm not sure, the graph looks bad
[12:42:23] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install ms-be1098, ms-be1099, ms-be1100 - https://phabricator.wikimedia.org/T424895#11875274 (10MatthewVernon) a:05MatthewVernon→03None
[12:42:42] <urbanecm>	 it also predates my deployment by a few mins
[12:42:45] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-web releases routed via main (k8s) 1.517s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:43:42] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy the latest version of rr-multilingual model server on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279388 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[12:44:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T419635)', diff saved to https://phabricator.wikimedia.org/P92058 and previous config saved to /var/cache/conftool/dbconfig/20260430-124405-fceratto.json
[12:44:11] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[12:44:22] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
[12:44:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T419635)', diff saved to https://phabricator.wikimedia.org/P92059 and previous config saved to /var/cache/conftool/dbconfig/20260430-124429-fceratto.json
[12:45:33] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P92060 and previous config saved to /var/cache/conftool/dbconfig/20260430-124532-fceratto.json
[12:45:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T419635)', diff saved to https://phabricator.wikimedia.org/P92061 and previous config saved to /var/cache/conftool/dbconfig/20260430-124539-fceratto.json
[12:46:01] <wikibugs>	 (03CR) 10Bearloga: EventStreamConfig: remove ABST contextual attribute (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001) (owner: 10Bearloga)
[12:47:14] <wikibugs>	 (03PS2) 10Bearloga: EventStreamConfig: remove ABST contextual attribute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001)
[12:47:47] <wikibugs>	 (03CR) 10Bearloga: EventStreamConfig: remove ABST contextual attribute (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001) (owner: 10Bearloga)
[12:48:01] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Degraded RAID on an-worker1199 - https://phabricator.wikimedia.org/T424654#11875306 (10Jclark-ctr) Both drives have been Swapped
[12:49:36] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to plain
[12:50:14] <wikibugs>	 (03CR) 10Phuedx: [C:03+1] EventStreamConfig: remove ABST contextual attribute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001) (owner: 10Bearloga)
[12:50:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 2.755% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[12:50:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add durum5003/5004 [puppet] - 10https://gerrit.wikimedia.org/r/1280375 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[12:50:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to plain
[12:50:46] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[12:50:56] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[12:50:59] <wikibugs>	 (03CR) 10Cathal Mooney: Add BGP peering from asw1-23 to core routers and mr1 (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1279501 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[12:51:03] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Add BGP peering from asw1-23 to core routers and mr1 [homer/public] - 10https://gerrit.wikimedia.org/r/1279501 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[12:51:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[12:52:46] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-web releases routed via main (k8s) 801.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:53:21] <icinga-wm>	 PROBLEM - Bird Internet Routing Daemon on hcaptcha-proxy4003 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[12:54:21] <icinga-wm>	 RECOVERY - Bird Internet Routing Daemon on hcaptcha-proxy4003 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[12:55:41] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2216 (T419961)', diff saved to https://phabricator.wikimedia.org/P92062 and previous config saved to /var/cache/conftool/dbconfig/20260430-125540-fceratto.json
[12:55:48] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P92063 and previous config saved to /var/cache/conftool/dbconfig/20260430-125547-fceratto.json
[12:59:26] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] changeprop: Configure RevertRisk multilingual model on changeprop. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1279385 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[12:59:59] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[13:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1300)
[13:00:05] <jouncebot>	 cscott, phuedx, Sergi0, and Tran: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:13] <Tran>	 o/
[13:00:20] <sergi0>	 o/
[13:00:37] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[13:00:38] <moritzm>	 !log temporarily remove ganeti4008 from the ulsfo02 Ganeti cluster in preparation of forthcoming switch maintenance in ulsfo T424686
[13:00:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:44] <stashbot>	 T424686: ulsfo switch work May 2026: Host reimaging - https://phabricator.wikimedia.org/T424686
[13:00:52] <wikibugs>	 06SRE: Please add Google Search Console domain verification for wikimediafoundation.org - https://phabricator.wikimedia.org/T424976 (10SCherukuwada) 03NEW
[13:00:57] <cscott>	 o/
[13:01:00] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop: sync
[13:01:25] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop: sync
[13:01:35] <Tran>	 phuedx's and my patches go together so I'll be deploying in his stead.
[13:01:35] <cscott>	 i can spiderpig my patch, and as it's a config patch it should be pretty fast.
[13:01:46] <cscott>	 looks like the rest of you have "real" backports
[13:02:31] <bearloga>	 cscott: spiderpig is currently stalled on urbanecm’s job (good luck, dude!)
[13:02:55] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti4008 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[13:03:15] <cscott>	 oh, are we in line behind urbanecm ?
[13:03:27] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti4008 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[13:03:35] <bearloga>	 It would appear so, yeah
[13:04:07] <wikibugs>	 (03PS3) 10STran: Add exposure for experiment instrumentation [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280387 (https://phabricator.wikimedia.org/T424075)
[13:04:22] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/ReportIncident] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280387 (https://phabricator.wikimedia.org/T424075) (owner: 10STran)
[13:05:33] <urbanecm>	 cscott: there is an incident as well
[13:05:49] <cscott>	 oh, fun.
[13:05:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P92064 and previous config saved to /var/cache/conftool/dbconfig/20260430-130556-fceratto.json
[13:06:00] <urbanecm>	 bearloga: and that job is blocked on 'something wrong in logs and logs are inaccessible so i have no idea where to go'
[13:06:24] <cscott>	 yeah i found [(1) ⚓ T424975 Certain deployment logs cause Spiderpig to crash the browser](https://phabricator.wikimedia.org/T424975#11875286)
[13:06:24] <stashbot>	 T424975: Certain deployment logs cause Spiderpig to crash the browser - https://phabricator.wikimedia.org/T424975
[13:06:36] <urbanecm>	 Yep
[13:06:37] <bearloga>	 urbanecm: I tried opening the logs and saw some of them but then the tab went unresponsive
[13:06:53] <cscott>	 anyway, i'll be here if/when things get rolling again, and if not I guess i can cross my fingers for the late backport window
[13:07:01] <bearloga>	 Sending positive thoughts your way
[13:07:03] <urbanecm>	 bearloga: exactly my problem
[13:07:32] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Review the most critical/popular Kafka clients before the Kafka upgrade - https://phabricator.wikimedia.org/T417031#11875360 (10brouberol) This has been handled out of band, and is no longer necessary to keep open now that we're performing the upgrade (or have done so, depe...
[13:07:37] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Review the most critical/popular Kafka clients before the Kafka upgrade - https://phabricator.wikimedia.org/T417031#11875362 (10brouberol) 05Open→03Resolved a:03brouberol
[13:07:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Add some kafka clients to the Kafka test cluster - https://phabricator.wikimedia.org/T417034#11875365 (10brouberol) 05Open→03Resolved a:03brouberol This has been handled out of band, and is no longer necessary to keep open now that we're performing the upgrade (or...
[13:07:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Add some kafka clients to the Kafka test cluster - https://phabricator.wikimedia.org/T417034#11875369 (10brouberol) a:05brouberol→03None
[13:08:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Review the most critical/popular Kafka clients before the Kafka upgrade - https://phabricator.wikimedia.org/T417031#11875370 (10brouberol) a:05brouberol→03None
[13:08:10] <jinxer-wm>	 FIRING: [17x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:09:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[13:09:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[13:12:25] <wikibugs>	 (03Abandoned) 10ZhaoFJx: arbcom_zhwiki: Add electionadmin group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1248954 (https://phabricator.wikimedia.org/T419309) (owner: 10ZhaoFJx)
[13:12:41] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[13:12:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1186 (T419961)', diff saved to https://phabricator.wikimedia.org/P92065 and previous config saved to /var/cache/conftool/dbconfig/20260430-131249-fceratto.json
[13:16:04] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T419635)', diff saved to https://phabricator.wikimedia.org/P92066 and previous config saved to /var/cache/conftool/dbconfig/20260430-131604-fceratto.json
[13:16:10] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[13:16:22] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[13:16:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T419635)', diff saved to https://phabricator.wikimedia.org/P92067 and previous config saved to /var/cache/conftool/dbconfig/20260430-131629-fceratto.json
[13:17:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T419635)', diff saved to https://phabricator.wikimedia.org/P92068 and previous config saved to /var/cache/conftool/dbconfig/20260430-131739-fceratto.json
[13:20:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.66% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:21:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T419961)', diff saved to https://phabricator.wikimedia.org/P92069 and previous config saved to /var/cache/conftool/dbconfig/20260430-132114-fceratto.json
[13:21:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.76% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:24:10] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279477 (https://phabricator.wikimedia.org/T424898) (owner: 10VadymTS1)
[13:25:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.76% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[13:27:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P92070 and previous config saved to /var/cache/conftool/dbconfig/20260430-132747-fceratto.json
[13:28:10] <jinxer-wm>	 FIRING: [17x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:30:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247186 (https://phabricator.wikimedia.org/T418815) (owner: 10MGChecker)
[13:31:23] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P92071 and previous config saved to /var/cache/conftool/dbconfig/20260430-133122-fceratto.json
[13:32:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "The patch looks good and this seems fine for the initial deployment. qlever seems like a rather straightforward C++ application with sensi" [puppet] - 10https://gerrit.wikimedia.org/r/1278479 (https://phabricator.wikimedia.org/T424340) (owner: 10Btullis)
[13:34:35] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:35:08] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:35:22] <wikibugs>	 (03PS1) 10Zabe: Add script to fix fr_deleted drifts [extensions/WikimediaMaintenance] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280417 (https://phabricator.wikimedia.org/T424553)
[13:37:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P92072 and previous config saved to /var/cache/conftool/dbconfig/20260430-133756-fceratto.json
[13:39:27] <wikibugs>	 (03PS1) 10Zabe: Start reading from new file tables on testwiki (2nd try) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280418 (https://phabricator.wikimedia.org/T416548)
[13:41:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P92073 and previous config saved to /var/cache/conftool/dbconfig/20260430-134130-fceratto.json
[13:44:07] <wikibugs>	 (03CR) 10Zabe: [C:04-2] "not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280418 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe)
[13:45:33] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:45:49] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:48:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T419635)', diff saved to https://phabricator.wikimedia.org/P92074 and previous config saved to /var/cache/conftool/dbconfig/20260430-134804-fceratto.json
[13:48:11] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[13:48:22] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[13:48:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92075 and previous config saved to /var/cache/conftool/dbconfig/20260430-134829-fceratto.json
[13:50:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.13 point update - https://phabricator.wikimedia.org/T414205#11875571 (10MoritzMuehlenhoff)
[13:50:41] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92076 and previous config saved to /var/cache/conftool/dbconfig/20260430-135040-fceratto.json
[13:51:17] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add packages.qlever.org to reprepro as thirdparty/qlever [puppet] - 10https://gerrit.wikimedia.org/r/1278479 (https://phabricator.wikimedia.org/T424340) (owner: 10Btullis)
[13:51:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T419961)', diff saved to https://phabricator.wikimedia.org/P92077 and previous config saved to /var/cache/conftool/dbconfig/20260430-135139-fceratto.json
[13:52:00] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[13:52:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1195 (T419961)', diff saved to https://phabricator.wikimedia.org/P92078 and previous config saved to /var/cache/conftool/dbconfig/20260430-135207-fceratto.json
[13:54:01] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.reimage for host kafka-logging2005.codfw.wmnet with OS trixie
[13:54:29] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.move-vlan for host kafka-logging2005
[13:57:32] <logmsgbot>	 herron@cumin1003 reimage (PID 3049150) is awaiting input
[14:00:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T419961)', diff saved to https://phabricator.wikimedia.org/P92079 and previous config saved to /var/cache/conftool/dbconfig/20260430-140030-fceratto.json
[14:00:48] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:00:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P92080 and previous config saved to /var/cache/conftool/dbconfig/20260430-140048-fceratto.json
[14:02:34] <wikibugs>	 (03PS1) 10Elukey: Add Wikifunctions' evaluator ingress endpoints to service.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1280433 (https://phabricator.wikimedia.org/T424193)
[14:02:36] <wikibugs>	 (03PS1) 10Elukey: Turn Wikifunctions evaluator endpoints to production state [puppet] - 10https://gerrit.wikimedia.org/r/1280434 (https://phabricator.wikimedia.org/T424193)
[14:02:39] <wikibugs>	 (03PS1) 10Elukey: profile::services_proxy::envoy: add wikifunctions eval endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1280435 (https://phabricator.wikimedia.org/T424193)
[14:03:55] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: Q3:rack/setup/install db1265-db1290 - https://phabricator.wikimedia.org/T418909#11875647 (10VRiley-WMF)
[14:04:33] <wikibugs>	 (03PS3) 10Herron: kafka-logging2005: update IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/1280431 (https://phabricator.wikimedia.org/T421712)
[14:05:39] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.dns.netbox
[14:06:42] <wikibugs>	 (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280441
[14:08:30] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Deploy hotfix revertrisk-multilingual on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280442
[14:09:20] <wikibugs>	 (03CR) 10Trueg: "Our Gitlab pipeline (https://gitlab.wikimedia.org/repos/wikidata-platform/wdqs/wdqs-qlever/-/blob/main/.gitlab-ci.yml) already contains a " [puppet] - 10https://gerrit.wikimedia.org/r/1278479 (https://phabricator.wikimedia.org/T424340) (owner: 10Btullis)
[14:10:38] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P92081 and previous config saved to /var/cache/conftool/dbconfig/20260430-141038-fceratto.json
[14:10:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P92082 and previous config saved to /var/cache/conftool/dbconfig/20260430-141057-fceratto.json
[14:11:03] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2005 - herron@cumin1003"
[14:11:09] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2005 - herron@cumin1003"
[14:11:09] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:11:09] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.dns.wipe-cache kafka-logging2005.codfw.wmnet 85.48.192.10.in-addr.arpa 5.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:11:13] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2005.codfw.wmnet 85.48.192.10.in-addr.arpa 5.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:11:14] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
[14:12:21] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
[14:12:21] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2005
[14:12:25] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236361 (https://phabricator.wikimedia.org/T416174) (owner: 10Seawolf35gerrit)
[14:12:40] <wikibugs>	 10ops-magru: Alert for device asw1-b4-magru.mgmt.magru.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T419298#11875684 (10phaultfinder)
[14:18:06] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11875703 (10daniel) >>! In T414805#11875160, @A_smart_kitten wrote: > One potential difference that comes to mind is that -- potentia...
[14:20:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P92083 and previous config saved to /var/cache/conftool/dbconfig/20260430-142046-fceratto.json
[14:21:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92084 and previous config saved to /var/cache/conftool/dbconfig/20260430-142105-fceratto.json
[14:21:10] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[14:21:11] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[14:21:51] <wikibugs>	 (03CR) 10AKhatun: [C:03+2] alerts: mw-page-html-feature-counts-change-enrich [alerts] - 10https://gerrit.wikimedia.org/r/1278559 (https://phabricator.wikimedia.org/T424224) (owner: 10AKhatun)
[14:22:42] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.reimage for host wikikube-worker1378.eqiad.wmnet with OS trixie
[14:22:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11875740 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclar...
[14:23:33] <wikibugs>	 (03Merged) 10jenkins-bot: alerts: mw-page-html-feature-counts-change-enrich [alerts] - 10https://gerrit.wikimedia.org/r/1278559 (https://phabricator.wikimedia.org/T424224) (owner: 10AKhatun)
[14:23:59] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:24:03] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=logging-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[14:24:04] <dancy>	 I'm investigating the SpiderPig problem 
[14:24:54] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:25:13] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:25:44] <wikibugs>	 10SRE-SLO, 06ServiceOps new, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work, and 2 others: IPoid: Define service level indicators and service level objectives - https://phabricator.wikimedia.org/T348935#11875752 (10Gehel) We also have some general documentation of availability expectation...
[14:25:49] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:26:08] <wikibugs>	 (03PS1) 10VadymTS1: [eswiktionary] Switch $wgSignatureValidation to 'disallow' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280449 (https://phabricator.wikimedia.org/T424983)
[14:26:32] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
[14:26:44] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1230 (T419635)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260430-142639-fceratto.json
[14:26:53] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[14:27:18] <cscott>	 dancy: if you could ping me when things are back to normal, i'd like to stage two patches with a couple hours gap between them for cache mitigation reasons, so i'd really like to get a patch deployed "a couple of hours before" the late backport window today
[14:27:29] <dancy>	 cscott: Will do.
[14:27:29] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Q3:rack/setup/install rdb201[34] - https://phabricator.wikimedia.org/T418922#11875764 (10Jclark-ctr) Talked to @Jhancock.wm same issues with imaging eqiad servers T418916
[14:27:33] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
[14:28:05] <cscott>	 dancy: thanks! good luck on your expedition to the bug caves
[14:28:26] <cscott>	 (or is it spiderpig sty?)
[14:28:47] <cscott>	 do spiderpigs live in a sty, like pigs, or a web, like spiders?
[14:28:48] <wikibugs>	 10SRE-SLO, 06ServiceOps new, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work, and 2 others: IPoid: Define service level indicators and service level objectives - https://phabricator.wikimedia.org/T348935#11875768 (10MLechvien-WMF) Thanks all for the work on this! @kostajh as you were origin...
[14:30:05] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1430)
[14:30:46] <logmsgbot>	 !log urbanecm@deploy1003 Finished scap sync-world: Backport for [[gerrit:1280368|ReassignMentees: Add logging information (T418194)]], [[gerrit:1280370|ReassignMentees: Add logging information (T418194)]] (duration: 131m 31s)
[14:30:52] <stashbot>	 T418194: Mentors still having mentees after removing themselves - https://phabricator.wikimedia.org/T418194
[14:30:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T419961)', diff saved to https://phabricator.wikimedia.org/P92086 and previous config saved to /var/cache/conftool/dbconfig/20260430-143054-fceratto.json
[14:31:14] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.252.0" for 2 host(s)
[14:31:16] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[14:31:36] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[14:31:44] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1196 (T419961)', diff saved to https://phabricator.wikimedia.org/P92087 and previous config saved to /var/cache/conftool/dbconfig/20260430-143143-fceratto.json
[14:31:51] <icinga-wm>	 PROBLEM - Host kafka-logging2005 is DOWN: PING CRITICAL - Packet loss = 100%
[14:32:26] <herron>	 ^^ that's me reimaging
[14:33:00] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.252.0" completed for 2 hosts
[14:33:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Sounds good, the base libraries of gnutls are already universally installed anyway." [puppet] - 10https://gerrit.wikimedia.org/r/1279491 (https://phabricator.wikimedia.org/T424672) (owner: 10Bking)
[14:33:31] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
[14:34:33] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.255.0" for 2 host(s)
[14:34:36] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1378.eqiad.wmnet with reason: host reimage
[14:35:41] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280449 (https://phabricator.wikimedia.org/T424983) (owner: 10VadymTS1)
[14:36:14] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.255.0" completed for 2 hosts
[14:36:35] <dancy>	 cscott: Go ahead with your deployment.  
[14:36:52] <icinga-wm>	 RECOVERY - Host kafka-logging2005 is UP: PING OK - Packet loss = 0%, RTA = 31.60 ms
[14:37:18] <wikibugs>	 (03CR) 10Bking: [C:03+2] cumin: install gnutls-bin package [puppet] - 10https://gerrit.wikimedia.org/r/1279491 (https://phabricator.wikimedia.org/T424672) (owner: 10Bking)
[14:37:52] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1378.eqiad.wmnet with reason: host reimage
[14:38:28] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy hotfix revertrisk-multilingual on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280442 (owner: 10Gkyziridis)
[14:40:34] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy hotfix revertrisk-multilingual on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280442 (owner: 10Gkyziridis)
[14:40:44] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11875866 (10RobH) >>! In T408892#11873637, @Papaul wrote: > @RobH Remote hands instructions are ready @ https://docs.google.com/document/d/1EW6hxHCQjXPy1PXQWlu...
[14:41:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T419961)', diff saved to https://phabricator.wikimedia.org/P92088 and previous config saved to /var/cache/conftool/dbconfig/20260430-144105-fceratto.json
[14:41:18] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[14:41:25] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[14:41:50] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:42:25] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wikifunctions/python-evaluator on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[14:42:26] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[14:45:48] <moritzm>	 !log installing pdns security updates
[14:45:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:46] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[14:49:39] <wikibugs>	 (03CR) 10Phuedx: [C:03+1] Test Kitchen UI: Deploy v1.3.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280441 (owner: 10Santiago Faci)
[14:50:39] <cscott>	 dancy: thanks!
[14:50:49] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:51:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P92089 and previous config saved to /var/cache/conftool/dbconfig/20260430-145112-fceratto.json
[14:51:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279453 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[14:52:30] <wikibugs>	 (03Merged) 10jenkins-bot: Increase Parsoid Read Views to 60% of enwiki mobile web traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279453 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[14:53:00] <logmsgbot>	 !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1279453|Increase Parsoid Read Views to 60% of enwiki mobile web traffic (T424880)]]
[14:53:05] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[14:53:36] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[14:54:04] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[14:54:55] <logmsgbot>	 !log cscott@deploy1003 cscott: Backport for [[gerrit:1279453|Increase Parsoid Read Views to 60% of enwiki mobile web traffic (T424880)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:55:34] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
[14:55:35] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1378.eqiad.wmnet with OS trixie
[14:55:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11875934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cu...
[14:56:36] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Test Kitchen UI: Deploy v1.3.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280441 (owner: 10Santiago Faci)
[14:57:50] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: extend expiry_date for sarmbruster by 1 month [puppet] - 10https://gerrit.wikimedia.org/r/1279482 (https://phabricator.wikimedia.org/T424402) (owner: 10Dzahn)
[14:58:09] <wikibugs>	 (03PS1) 10Eevans: linked-artifacts: deploy hoarde v1.2.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280469 (https://phabricator.wikimedia.org/T424545)
[14:58:35] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v1.3.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280441 (owner: 10Santiago Faci)
[14:58:42] <wikibugs>	 (03PS2) 10Dzahn: admin: extend expiry_date for sarmbruster by 1 month [puppet] - 10https://gerrit.wikimedia.org/r/1279482 (https://phabricator.wikimedia.org/T424402)
[14:59:05] <elukey>	 herron: o/ if you are reimaging a kafka 3.7 node to trixie, keep https://wikitech.wikimedia.org/wiki/Kafka/Administration#Upgrade_to_Debian_Trixie in mind
[14:59:34] <herron>	 elukey: thanks was just working on this part
[14:59:39] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[14:59:53] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:00:12] <wikibugs>	 (03CR) 10Dzahn: "rebased to nothing - because done in 8897a46aae1185bd" [puppet] - 10https://gerrit.wikimedia.org/r/1279482 (https://phabricator.wikimedia.org/T424402) (owner: 10Dzahn)
[15:00:25] <wikibugs>	 (03Abandoned) 10Dzahn: admin: extend expiry_date for sarmbruster by 1 month [puppet] - 10https://gerrit.wikimedia.org/r/1279482 (https://phabricator.wikimedia.org/T424402) (owner: 10Dzahn)
[15:01:12] <wikibugs>	 (03CR) 10Eevans: [C:03+2] linked-artifacts: deploy hoarde v1.2.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280469 (https://phabricator.wikimedia.org/T424545) (owner: 10Eevans)
[15:01:21] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P92091 and previous config saved to /var/cache/conftool/dbconfig/20260430-150120-fceratto.json
[15:01:30] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Extend wmde/nda LDAP access for Sarmbruster - https://phabricator.wikimedia.org/T424402#11875959 (10Dzahn) already done by Moritz with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1280055
[15:02:22] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: Extend wmde/nda LDAP access for Sarmbruster - https://phabricator.wikimedia.org/T424402#11875974 (10Dzahn) 05In progress→03Resolved a:03MoritzMuehlenhoff
[15:02:29] <logmsgbot>	 !log cscott@deploy1003 cscott: Continuing with deployment
[15:03:12] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[15:03:20] <wikibugs>	 (03Merged) 10jenkins-bot: linked-artifacts: deploy hoarde v1.2.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280469 (https://phabricator.wikimedia.org/T424545) (owner: 10Eevans)
[15:04:23] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul: Upgrade to Zuul 14.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/1279500 (https://phabricator.wikimedia.org/T424879) (owner: 10Dduvall)
[15:04:43] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[15:05:21] <logmsgbot>	 !log eevans@deploy1003 helmfile [staging] START helmfile.d/services/linked-artifacts: apply
[15:05:36] <logmsgbot>	 !log eevans@deploy1003 helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply
[15:06:16] <logmsgbot>	 !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1279453|Increase Parsoid Read Views to 60% of enwiki mobile web traffic (T424880)]] (duration: 13m 15s)
[15:06:20] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[15:06:34] <cscott>	 dancy: ok, i'm done now.  thanks!
[15:06:41] <logmsgbot>	 !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[15:06:49] <wikibugs>	 (03PS3) 10Herron: kafka-logging2005: use jdk 21 in trixie [puppet] - 10https://gerrit.wikimedia.org/r/1280467 (https://phabricator.wikimedia.org/T417001)
[15:07:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bearloga@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001) (owner: 10Bearloga)
[15:08:33] <wikibugs>	 (03Merged) 10jenkins-bot: EventStreamConfig: remove ABST contextual attribute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270454 (https://phabricator.wikimedia.org/T422001) (owner: 10Bearloga)
[15:08:56] <logmsgbot>	 !log bearloga@deploy1003 Started scap sync-world: Backport for [[gerrit:1270454|EventStreamConfig: remove ABST contextual attribute (T422001)]]
[15:09:01] <stashbot>	 T422001: '.performer.active_browsing_session_token' should NOT be shorter than 20 characters - https://phabricator.wikimedia.org/T422001
[15:10:51] <logmsgbot>	 !log bearloga@deploy1003 bearloga: Backport for [[gerrit:1270454|EventStreamConfig: remove ABST contextual attribute (T422001)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:11:01] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2005.codfw.wmnet with OS trixie
[15:11:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T419961)', diff saved to https://phabricator.wikimedia.org/P92092 and previous config saved to /var/cache/conftool/dbconfig/20260430-151128-fceratto.json
[15:11:42] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280479
[15:11:49] <logmsgbot>	 !log bearloga@deploy1003 bearloga: Continuing with deployment
[15:11:50] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[15:11:58] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1206 (T419961)', diff saved to https://phabricator.wikimedia.org/P92093 and previous config saved to /var/cache/conftool/dbconfig/20260430-151157-fceratto.json
[15:14:03] <jinxer-wm>	 RESOLVED: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=logging-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[15:16:22] <logmsgbot>	 !log bearloga@deploy1003 Finished scap sync-world: Backport for [[gerrit:1270454|EventStreamConfig: remove ABST contextual attribute (T422001)]] (duration: 07m 25s)
[15:16:28] <stashbot>	 T422001: '.performer.active_browsing_session_token' should NOT be shorter than 20 characters - https://phabricator.wikimedia.org/T422001
[15:20:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T419961)', diff saved to https://phabricator.wikimedia.org/P92094 and previous config saved to /var/cache/conftool/dbconfig/20260430-152011-fceratto.json
[15:20:15] <jinxer-wm>	 FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:20:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[15:22:12] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11876109 (10elukey) Deployed the spicerack changes, now I am testing https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1266257 to bypass the roo...
[15:25:15] <jinxer-wm>	 RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:25:20] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11876119 (10A_smart_kitten) >>! In T414805#11875703, @daniel wrote: > APIs are maintained as stable interfaces, their evolution is su...
[15:25:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[15:25:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Repurpose tools-k8s-ctrl[1001-1002],tools-k8s-worker[1001-1008] to wikikube-worker13{75-84} - https://phabricator.wikimedia.org/T423719#11876121 (10Jclark-ctr) 05Open→03Resolved
[15:28:24] <wikibugs>	 (03PS3) 10C. Scott Ananian: Increase Parsoid Read Views to 100% of enwiki mobile web traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279454 (https://phabricator.wikimedia.org/T424880)
[15:28:24] <wikibugs>	 (03PS1) 10C. Scott Ananian: Enable Parsoid postprocessing cache on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280491 (https://phabricator.wikimedia.org/T424880)
[15:29:48] <cscott>	 dancy: is the window still clear?  turns out i need a follow up to the patch I just deployed: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1280491)
[15:30:00] <dancy>	 Yep.
[15:30:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P92095 and previous config saved to /var/cache/conftool/dbconfig/20260430-153019-fceratto.json
[15:30:29] <cscott>	 ok, i'm going to spiderpig that patch out if that's ok.
[15:30:39] <dancy>	 OK with me.
[15:31:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280491 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[15:33:05] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Parsoid postprocessing cache on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280491 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[15:33:32] <logmsgbot>	 !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1280491|Enable Parsoid postprocessing cache on enwiki (T424880)]]
[15:33:39] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[15:35:25] <logmsgbot>	 !log cscott@deploy1003 cscott: Backport for [[gerrit:1280491|Enable Parsoid postprocessing cache on enwiki (T424880)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:37:54] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11876203 (10RobH) >>! In T408892#11875866, @RobH wrote: >>>! In T408892#11873637, @Papaul wrote: >> @RobH Remote hands instructions are ready @ https://docs.go...
[15:37:58] <logmsgbot>	 !log cscott@deploy1003 cscott: Continuing with deployment
[15:40:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P92096 and previous config saved to /var/cache/conftool/dbconfig/20260430-154027-fceratto.json
[15:41:45] <logmsgbot>	 !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1280491|Enable Parsoid postprocessing cache on enwiki (T424880)]] (duration: 08m 13s)
[15:41:53] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[15:41:59] <cscott>	 dancy: ok, done.  for real this time i hope.
[15:44:01] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Figure out plan for mailman IP situation - https://phabricator.wikimedia.org/T278495#11876239 (10Ladsgroup) Amazing. Thank you!!! \o/
[15:44:31] <dancy>	 cscott: Thanks, and good luck!
[15:44:41] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.256.0" for 2 host(s)
[15:45:30] <cscott>	 the cache save rate is rising as page views transition, but so far nothing alarming.  🤞
[15:46:32] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.256.0" completed for 2 hosts
[15:50:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T419961)', diff saved to https://phabricator.wikimedia.org/P92098 and previous config saved to /var/cache/conftool/dbconfig/20260430-155034-fceratto.json
[15:50:55] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[15:51:03] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92099 and previous config saved to /var/cache/conftool/dbconfig/20260430-155102-fceratto.json
[16:00:05] <jouncebot>	 jhathaway and rzl: Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1600). Please do the needful.
[16:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:03:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92100 and previous config saved to /var/cache/conftool/dbconfig/20260430-160307-fceratto.json
[16:05:45] <wikibugs>	 (03PS1) 10AKhatun: stream: mw-page-html-feature-counts-change-enrich; increase source parallelism to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280508 (https://phabricator.wikimedia.org/T423920)
[16:06:24] <zabe>	 jouncebot: nowandnext
[16:06:24] <jouncebot>	 For the next 0 hour(s) and 53 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1600)
[16:06:24] <jouncebot>	 In 0 hour(s) and 53 minute(s): Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1700)
[16:06:24] <jouncebot>	 In 0 hour(s) and 53 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1700)
[16:09:20] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:11:27] <rzl>	 zabe: puppet window isn't in use today, as you could probably tell
[16:13:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P92103 and previous config saved to /var/cache/conftool/dbconfig/20260430-161315-fceratto.json
[16:13:50] <wikibugs>	 06SRE, 10observability: Observability: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T422816#11876405 (10herron) Today I reimaged kafka-logging2005 with `--move-vlan` and afterwards the node is having trouble rejoining the cluster.  I'm seeing errors like...
[16:16:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:17:19] <zabe>	 thx
[16:18:07] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: Q3:rack/setup/install wikikube-worker23[57-74] - https://phabricator.wikimedia.org/T418925#11876423 (10Jhancock.wm)
[16:18:17] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Roll back to the previous model revertrisk-multilingual. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280514
[16:20:50] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Roll back to the previous model revertrisk-multilingual. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280514 (owner: 10Gkyziridis)
[16:22:28] <wikibugs>	 (03CR) 10JavierMonton: [C:03+1] stream: mw-page-html-feature-counts-change-enrich; increase source parallelism to 6 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280508 (https://phabricator.wikimedia.org/T423920) (owner: 10AKhatun)
[16:22:53] <wikibugs>	 (03PS1) 10Atsuko: dse-k8s: deploy additional opensearch clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280515 (https://phabricator.wikimedia.org/T424248)
[16:22:57] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Roll back to the previous model revertrisk-multilingual. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280514 (owner: 10Gkyziridis)
[16:23:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P92104 and previous config saved to /var/cache/conftool/dbconfig/20260430-162323-fceratto.json
[16:23:56] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11876468 (10Papaul) Yes I can take care of that.
[16:24:20] <wikibugs>	 (03PS2) 10AKhatun: stream: mw-page-html-feature-counts-change-enrich; increase source parallelism to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280508 (https://phabricator.wikimedia.org/T423920)
[16:25:50] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[16:26:02] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[16:26:35] <wikibugs>	 (03CR) 10AKhatun: [C:03+2] stream: mw-page-html-feature-counts-change-enrich; increase source parallelism to 6 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280508 (https://phabricator.wikimedia.org/T423920) (owner: 10AKhatun)
[16:28:35] <wikibugs>	 (03Merged) 10jenkins-bot: stream: mw-page-html-feature-counts-change-enrich; increase source parallelism to 6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280508 (https://phabricator.wikimedia.org/T423920) (owner: 10AKhatun)
[16:30:01] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:30:04] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:31:07] <wikibugs>	 (03CR) 10CDanis: fundraising_data_import maintenance script wrapper & timer (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1271028 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis)
[16:31:57] <wikibugs>	 (03PS1) 10Medelius: Suggestion Mode controlled experiment: limit exposure to newcomers [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280516 (https://phabricator.wikimedia.org/T422736)
[16:32:17] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280516 (https://phabricator.wikimedia.org/T422736) (owner: 10Medelius)
[16:33:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92105 and previous config saved to /var/cache/conftool/dbconfig/20260430-163332-fceratto.json
[16:33:52] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[16:34:00] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92106 and previous config saved to /var/cache/conftool/dbconfig/20260430-163400-fceratto.json
[16:34:20] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:40:32] <wikibugs>	 (03PS2) 10Atsuko: dse-k8s: deploy additional opensearch clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280515 (https://phabricator.wikimedia.org/T424248)
[16:42:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92107 and previous config saved to /var/cache/conftool/dbconfig/20260430-164211-fceratto.json
[16:52:22] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P92108 and previous config saved to /var/cache/conftool/dbconfig/20260430-165221-fceratto.json
[16:52:42] <wikibugs>	 10ops-eqiad, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker1378:9290 - https://phabricator.wikimedia.org/T425015 (10phaultfinder) 03NEW
[17:00:05] <jouncebot>	 bd808: #bothumor I � Unicode. All rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1700).
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1700)
[17:02:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P92109 and previous config saved to /var/cache/conftool/dbconfig/20260430-170229-fceratto.json
[17:03:04] <wikibugs>	 (03PS1) 10Jasmine: sophroid: define nodePort to utilze custom load balancers, [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748)
[17:04:41] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.257.0" for 2 host(s)
[17:04:51] <wikibugs>	 10ops-eqiad, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker1378:9290 - https://phabricator.wikimedia.org/T425015#11876581 (10Jclark-ctr) a:03Jclark-ctr
[17:06:32] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.257.0" completed for 2 hosts
[17:09:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[17:10:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[17:10:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:12:38] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92110 and previous config saved to /var/cache/conftool/dbconfig/20260430-171237-fceratto.json
[17:12:59] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
[17:13:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1232 (T419961)', diff saved to https://phabricator.wikimedia.org/P92111 and previous config saved to /var/cache/conftool/dbconfig/20260430-171306-fceratto.json
[17:13:26] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[17:13:34] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[17:14:34] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: sync
[17:14:38] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
[17:15:23] <wikibugs>	 (03PS3) 10Dduvall: zuul: create profile for new zuul-launcher replacing nodepool [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[17:15:32] <wikibugs>	 (03PS2) 10Jasmine: sophroid: define nodePort to utilze custom load balancers, [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748)
[17:15:45] <wikibugs>	 (03PS4) 10Dduvall: zuul: create profile for new zuul-launcher replacing nodepool [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[17:17:00] <wikibugs>	 (03CR) 10Dduvall: "Sorry, Daniel. I messed up in the task description. It's actually zuul-launcher, not zuul-builder. I renamed everything. I think this also" [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[17:19:57] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 6 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11876609 (10Nux) >>! In T414805#11875703, @daniel wrote: >>>! In T414805#11875160, @A_smart_kitten wrote: >> While it may be true tha...
[17:21:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T419961)', diff saved to https://phabricator.wikimedia.org/P92112 and previous config saved to /var/cache/conftool/dbconfig/20260430-172119-fceratto.json
[17:22:10] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release wikifunctions/python-evaluator on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=wikifunctions - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[17:22:21] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[17:22:39] <logmsgbot>	 !log gengh@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[17:23:32] <jinxer-wm>	 FIRING: Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[17:23:50] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10GitLab (CI & Job Runners), 13Patch-For-Review, 06Release-Engineering-Team (Priority Backlog 📥): Update default GitLab runner image to a base image without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423971#11876616 (10dancy) 05Open→03Resolved...
[17:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:31:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P92113 and previous config saved to /var/cache/conftool/dbconfig/20260430-173127-fceratto.json
[17:34:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[17:41:36] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P92114 and previous config saved to /var/cache/conftool/dbconfig/20260430-174135-fceratto.json
[17:43:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUns
[17:51:44] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T419961)', diff saved to https://phabricator.wikimedia.org/P92115 and previous config saved to /var/cache/conftool/dbconfig/20260430-175143-fceratto.json
[17:52:04] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
[17:52:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1234 (T419961)', diff saved to https://phabricator.wikimedia.org/P92116 and previous config saved to /var/cache/conftool/dbconfig/20260430-175211-fceratto.json
[17:57:10] <wikibugs>	 (03PS5) 10Dduvall: zuul: create profile for new zuul-launcher replacing nodepool [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[17:57:38] <wikibugs>	 (03CR) 10Dduvall: "Added kubeconfig for zuul-launcher and a new connection section to zuul.conf." [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[17:57:47] <wikibugs>	 (03CR) 10CDobbins: [C:03+2] wikimedia.org: Add TXT verification for Claude [dns] - 10https://gerrit.wikimedia.org/r/1279402 (https://phabricator.wikimedia.org/T424785) (owner: 10CDobbins)
[18:00:05] <jouncebot>	 jeena and dduvall: That opportune time for a MediaWiki train - Utc-7 Version deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T1800).
[18:00:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1234 (T419961)', diff saved to https://phabricator.wikimedia.org/P92117 and previous config saved to /var/cache/conftool/dbconfig/20260430-180036-fceratto.json
[18:04:14] <logmsgbot>	 !log cdobbins@dns1005 START - running authdns-update
[18:05:53] <logmsgbot>	 !log cdobbins@dns1005 END - running authdns-update
[18:07:33] <wikibugs>	 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: [Update DNS Record Request] - wikimedia.org - Add TXT verification for Anthropic - https://phabricator.wikimedia.org/T424785#11876724 (10CDobbins) 05Open→03In progress p:05Triage→03Medium
[18:08:31] <wikibugs>	 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: [Update DNS Record Request] - wikimedia.org - Add TXT verification for Anthropic - https://phabricator.wikimedia.org/T424785#11876731 (10CDobbins) I just updated our DNS records, @bcampbell. Let me know if there's any unexpected behavior or if I can close the ti...
[18:08:35] <wikibugs>	 (03PS1) 10TrainBranchBot: group2 to 1.46.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280551 (https://phabricator.wikimedia.org/T423877)
[18:08:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by jhuneidi@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280551 (https://phabricator.wikimedia.org/T423877) (owner: 10TrainBranchBot)
[18:08:50] <wikibugs>	 (03PS1) 10Medelius: Abandon the editor survey: update edit count restriction [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931)
[18:09:02] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931) (owner: 10Medelius)
[18:09:33] <wikibugs>	 (03Merged) 10jenkins-bot: group2 to 1.46.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280551 (https://phabricator.wikimedia.org/T423877) (owner: 10TrainBranchBot)
[18:10:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P92118 and previous config saved to /var/cache/conftool/dbconfig/20260430-181044-fceratto.json
[18:15:13] <logmsgbot>	 !log jhuneidi@deploy1003 rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.26  refs T423877
[18:15:17] <stashbot>	 T423877: 1.46.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T423877
[18:20:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Abandon the editor survey: update edit count restriction [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931) (owner: 10Medelius)
[18:20:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P92119 and previous config saved to /var/cache/conftool/dbconfig/20260430-182052-fceratto.json
[18:28:06] <wikibugs>	 (03CR) 10Medelius: "recheck" [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931) (owner: 10Medelius)
[18:28:06] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[18:28:11] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:29:51] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[18:30:13] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and 208.80.153.221 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:31:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1234 (T419961)', diff saved to https://phabricator.wikimedia.org/P92120 and previous config saved to /var/cache/conftool/dbconfig/20260430-183100-fceratto.json
[18:31:23] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
[18:31:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1235 (T419961)', diff saved to https://phabricator.wikimedia.org/P92121 and previous config saved to /var/cache/conftool/dbconfig/20260430-183130-fceratto.json
[18:35:13] <jinxer-wm>	 RESOLVED: [3x] BFDdown: BFD session down between cr1-codfw and 208.80.153.220 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[18:44:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1235 (T419961)', diff saved to https://phabricator.wikimedia.org/P92122 and previous config saved to /var/cache/conftool/dbconfig/20260430-184439-fceratto.json
[18:44:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[18:54:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P92123 and previous config saved to /var/cache/conftool/dbconfig/20260430-185451-fceratto.json
[19:04:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P92124 and previous config saved to /var/cache/conftool/dbconfig/20260430-190459-fceratto.json
[19:12:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11876957 (10elukey) @Jclark-ctr if you have time could you please check the status of the 1007's BMC? Like if you are able to access the WebUI somehow...
[19:15:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1235 (T419961)', diff saved to https://phabricator.wikimedia.org/P92125 and previous config saved to /var/cache/conftool/dbconfig/20260430-191507-fceratto.json
[19:15:20] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1251.eqiad.wmnet with reason: Maintenance
[19:15:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1251 (T419961)', diff saved to https://phabricator.wikimedia.org/P92126 and previous config saved to /var/cache/conftool/dbconfig/20260430-191527-fceratto.json
[19:23:13] <wikibugs>	 (03PS3) 10Jasmine: sophroid: define nodePort to utilze custom load balancers [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748)
[19:24:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1251 (T419961)', diff saved to https://phabricator.wikimedia.org/P92127 and previous config saved to /var/cache/conftool/dbconfig/20260430-192407-fceratto.json
[19:34:15] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P92128 and previous config saved to /var/cache/conftool/dbconfig/20260430-193415-fceratto.json
[19:34:57] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Jasmine!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748) (owner: 10Jasmine)
[19:41:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:44:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P92129 and previous config saved to /var/cache/conftool/dbconfig/20260430-194423-fceratto.json
[19:54:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1251 (T419961)', diff saved to https://phabricator.wikimedia.org/P92130 and previous config saved to /var/cache/conftool/dbconfig/20260430-195431-fceratto.json
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: May I have your attention please! UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T2000)
[20:00:05] <jouncebot>	 cscott, VadymTS1, and cmede: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:20] <cmede>	 o/
[20:00:24] <cscott>	 o/
[20:00:56] <cscott>	 I'm going to spiderpig mine straight out of the gate here, since I'd like to be able to watch the cache stats during the duration of the backport window, in case I need to dial things back.
[20:01:20] <VadymTS1>	 I'm here
[20:01:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279454 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[20:02:32] <wikibugs>	 (03Merged) 10jenkins-bot: Increase Parsoid Read Views to 100% of enwiki mobile web traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279454 (https://phabricator.wikimedia.org/T424880) (owner: 10C. Scott Ananian)
[20:02:49] <logmsgbot>	 !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1279454|Increase Parsoid Read Views to 100% of enwiki mobile web traffic (T424880)]]
[20:02:54] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[20:04:32] <logmsgbot>	 !log cscott@deploy1003 cscott: Backport for [[gerrit:1279454|Increase Parsoid Read Views to 100% of enwiki mobile web traffic (T424880)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:06:02] <logmsgbot>	 !log cscott@deploy1003 cscott: Continuing with deployment
[20:06:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[20:09:20] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:09:50] <logmsgbot>	 !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1279454|Increase Parsoid Read Views to 100% of enwiki mobile web traffic (T424880)]] (duration: 07m 01s)
[20:09:55] <stashbot>	 T424880: Parsoid Read Views to deploy 2026-04-29-2026-04-30 (enwiki mobile web) - https://phabricator.wikimedia.org/T424880
[20:10:05] <wikibugs>	 (03CR) 10Jasmine: [C:03+2] sophroid: define nodePort to utilze custom load balancers [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748) (owner: 10Jasmine)
[20:10:13] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] sophroid: define nodePort to utilze custom load balancers [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748) (owner: 10Jasmine)
[20:10:14] <cscott>	 ok, i'm done
[20:10:14] <cscott>	 1
[20:10:28] <kostajh>	 hi, I’d like to add something to the window
[20:10:28] <cscott>	 VadymTS1: over to you
[20:10:58] <VadymTS1>	 ok
[20:11:02] <VadymTS1>	 lets start
[20:11:11] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Label load and execute duration metrics with outcome [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280656 (https://phabricator.wikimedia.org/T421204)
[20:11:23] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Reduce default MAX_LOAD_ATTEMPTS from 10 to 6 [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280657 (https://phabricator.wikimedia.org/T421204)
[20:11:26] <cscott>	 kostajh: schedule-deployment on gerrit works up until the window closes, i believe. :)
[20:11:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[20:11:55] <kostajh>	 yep, will add it
[20:12:04] <cscott>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: who's the deployer for this window?
[20:12:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280657 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[20:12:08] <wikibugs>	 (03Merged) 10jenkins-bot: sophroid: define nodePort to utilze custom load balancers [0] [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280521 (https://phabricator.wikimedia.org/T418748) (owner: 10Jasmine)
[20:12:23] <jeena>	 I can deploy
[20:12:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, April 30 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280656 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[20:12:39] <VadymTS1>	 thanks jeena
[20:12:41] <cscott>	 i've finished my config patch, we're up to VadymTS1 in the corner
[20:12:42] <kostajh>	 my patches can be synced together 
[20:12:46] <cscott>	 *order, not corner :)
[20:12:54] <kostajh>	 and they do not need to be verified either
[20:13:16] <RoanKattouw>	 Thanks jeena. Sorry I'm at the hackathon and heading to bed 
[20:13:27] <TheresNoTime>	 You have a deployer now?
[20:14:07] <jeena>	 TheresNoTime: yeah all god
[20:14:10] <jeena>	 good*
[20:14:22] <TheresNoTime>	 cool :)
[20:15:00] <jeena>	 VadymTS1: is it fine to deploy all your changes together?
[20:15:33] <VadymTS1>	 I'm think yes
[20:15:41] <VadymTS1>	 I don't see problems
[20:17:14] <VadymTS1>	 It's not like it's forbidden
[20:18:31] <jeena>	 yes of course, just making sure!
[20:18:37] <jeena>	 I will proceed now
[20:20:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1274928 (https://phabricator.wikimedia.org/T423461) (owner: 10Codename Noreste)
[20:20:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279477 (https://phabricator.wikimedia.org/T424898) (owner: 10VadymTS1)
[20:20:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247186 (https://phabricator.wikimedia.org/T418815) (owner: 10MGChecker)
[20:20:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236361 (https://phabricator.wikimedia.org/T416174) (owner: 10Seawolf35gerrit)
[20:20:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280449 (https://phabricator.wikimedia.org/T424983) (owner: 10VadymTS1)
[20:21:19] <wikibugs>	 (03Merged) 10jenkins-bot: ukwiki: Remove the patroller user group and adjust various user rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1274928 (https://phabricator.wikimedia.org/T423461) (owner: 10Codename Noreste)
[20:21:22] <wikibugs>	 (03Merged) 10jenkins-bot: nlwiki: Modify autoconfirmed requirements for nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1279477 (https://phabricator.wikimedia.org/T424898) (owner: 10VadymTS1)
[20:21:26] <wikibugs>	 (03Merged) 10jenkins-bot: dewiki: Add abusefilter group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247186 (https://phabricator.wikimedia.org/T418815) (owner: 10MGChecker)
[20:21:30] <wikibugs>	 (03Merged) 10jenkins-bot: Add map domains for ruwiki to the list of externallinks-excluded domains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1236361 (https://phabricator.wikimedia.org/T416174) (owner: 10Seawolf35gerrit)
[20:21:33] <wikibugs>	 (03Merged) 10jenkins-bot: [eswiktionary] Switch $wgSignatureValidation to 'disallow' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1280449 (https://phabricator.wikimedia.org/T424983) (owner: 10VadymTS1)
[20:21:47] <logmsgbot>	 !log jhuneidi@deploy1003 Started scap sync-world: Backport for [[gerrit:1274928|ukwiki: Remove the patroller user group and adjust various user rights (T423461)]], [[gerrit:1279477|nlwiki: Modify autoconfirmed requirements for nlwiki (T424898)]], [[gerrit:1247186|dewiki: Add abusefilter group (T418815)]], [[gerrit:1236361|Add map domains for ruwiki to the list of externallinks-excluded domains (T416174)]], [[gerrit:128044
[20:21:47] <logmsgbot>	 9|[eswiktionary] Switch $wgSignatureValidation to 'disallow' (T424983)]]
[20:21:57] <stashbot>	 T423461: Turn off patrolling in ukwiki - https://phabricator.wikimedia.org/T423461
[20:21:57] <stashbot>	 T424898: Modify autoconfirmed requirements for nlwiki - https://phabricator.wikimedia.org/T424898
[20:21:57] <stashbot>	 T418815: Add abusefilter group to dewiki - https://phabricator.wikimedia.org/T418815
[20:21:58] <stashbot>	 T416174: Add map domains for ruwiki to the list of externallinks-excluded domains (wgExternalLinksIgnoreDomains) - https://phabricator.wikimedia.org/T416174
[20:21:58] <stashbot>	 T424983: Set $wgSignatureValidation to 'disallow' on Spanish Wiktionary - https://phabricator.wikimedia.org/T424983
[20:23:30] <logmsgbot>	 !log jhuneidi@deploy1003 vadymts1, seawolf35gerrit, jhuneidi, codenamenoreste, mgchecker: Backport for [[gerrit:1274928|ukwiki: Remove the patroller user group and adjust various user rights (T423461)]], [[gerrit:1279477|nlwiki: Modify autoconfirmed requirements for nlwiki (T424898)]], [[gerrit:1247186|dewiki: Add abusefilter group (T418815)]], [[gerrit:1236361|Add map domains for ruwiki to the list of externallinks-exclu
[20:23:30] <logmsgbot>	 ded domains (T416174)]], [[gerrit:1280449|[eswiktionary] Switch $wgSignatureValidation to 'disallow' (T424983)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:23:40] <VadymTS1>	 cheking
[20:27:54] <VadymTS1>	 wait a minute a have bad internet
[20:28:47] <jeena>	 no problem
[20:29:49] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[20:30:21] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11877160 (10Jclark-ctr) hooked crashcart up to 1007 bmc is set to dhcp and is not picking up any address.
[20:30:33] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[20:31:56] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[20:32:27] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[20:32:27] <VadymTS1>	 jeena All good
[20:32:44] <jeena>	 Thanks!
[20:32:49] <logmsgbot>	 !log jhuneidi@deploy1003 vadymts1, seawolf35gerrit, jhuneidi, codenamenoreste, mgchecker: Continuing with deployment
[20:35:40] <logmsgbot>	 !log jasmine@deploy1003 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply
[20:36:40] <logmsgbot>	 !log jhuneidi@deploy1003 Finished scap sync-world: Backport for [[gerrit:1274928|ukwiki: Remove the patroller user group and adjust various user rights (T423461)]], [[gerrit:1279477|nlwiki: Modify autoconfirmed requirements for nlwiki (T424898)]], [[gerrit:1247186|dewiki: Add abusefilter group (T418815)]], [[gerrit:1236361|Add map domains for ruwiki to the list of externallinks-excluded domains (T416174)]], [[gerrit:12804
[20:36:41] <logmsgbot>	 49|[eswiktionary] Switch $wgSignatureValidation to 'disallow' (T424983)]] (duration: 14m 53s)
[20:37:00] <stashbot>	 T423461: Turn off patrolling in ukwiki - https://phabricator.wikimedia.org/T423461
[20:37:00] <stashbot>	 T424898: Modify autoconfirmed requirements for nlwiki - https://phabricator.wikimedia.org/T424898
[20:37:00] <stashbot>	 T418815: Add abusefilter group to dewiki - https://phabricator.wikimedia.org/T418815
[20:37:01] <stashbot>	 T416174: Add map domains for ruwiki to the list of externallinks-excluded domains (wgExternalLinksIgnoreDomains) - https://phabricator.wikimedia.org/T416174
[20:37:02] <stashbot>	 T424983: Set $wgSignatureValidation to 'disallow' on Spanish Wiktionary - https://phabricator.wikimedia.org/T424983
[20:37:43] <logmsgbot>	 !log jasmine@deploy1003 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply
[20:38:02] <jeena>	 cmede: do you need a deployer?
[20:38:06] <cmede>	 yes please!
[20:38:14] <jeena>	 👍
[20:39:55] <jeena>	 It's fine to do both your changes in one deploy right?
[20:40:01] <cmede>	 yep :)
[20:40:17] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[20:40:26] <swfrench-wmf>	 \i/
[20:40:30] <swfrench-wmf>	 jasmine_: ^
[20:40:31] <rzl>	 \i/
[20:40:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280516 (https://phabricator.wikimedia.org/T422736) (owner: 10Medelius)
[20:40:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931) (owner: 10Medelius)
[20:40:50] <dancy>	 Chlorinated
[20:41:17] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[20:42:03] <wikibugs>	 (03Merged) 10jenkins-bot: Suggestion Mode controlled experiment: limit exposure to newcomers [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280516 (https://phabricator.wikimedia.org/T422736) (owner: 10Medelius)
[20:42:05] <jeena>	 dancy: 😆
[20:42:08] <wikibugs>	 (03Merged) 10jenkins-bot: Abandon the editor survey: update edit count restriction [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280552 (https://phabricator.wikimedia.org/T422931) (owner: 10Medelius)
[20:42:25] <logmsgbot>	 !log jhuneidi@deploy1003 Started scap sync-world: Backport for [[gerrit:1280516|Suggestion Mode controlled experiment: limit exposure to newcomers (T422736)]], [[gerrit:1280552|Abandon the editor survey: update edit count restriction (T422931)]]
[20:42:32] <stashbot>	 T422736: Define and implement any missing metrics needed for Suggestion Mode controlled experiment - https://phabricator.wikimedia.org/T422736
[20:42:32] <stashbot>	 T422931: Implement the "Exit the editor" survey - https://phabricator.wikimedia.org/T422931
[20:42:33] <jasmine_>	 nicee, thanks swfrench-wmf! 
[20:44:05] <logmsgbot>	 !log jhuneidi@deploy1003 caro, jhuneidi: Backport for [[gerrit:1280516|Suggestion Mode controlled experiment: limit exposure to newcomers (T422736)]], [[gerrit:1280552|Abandon the editor survey: update edit count restriction (T422931)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:44:14] <cmede>	 checking~~
[20:46:12] <cmede>	 all good
[20:46:29] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11877235 (10Jclark-ctr) As soon as i started provision script it started to ping   I aborted  its back to you.
[20:46:31] <logmsgbot>	 !log jasmine@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/sophroid: apply
[20:46:55] <logmsgbot>	 !log jasmine@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/sophroid: apply
[20:47:13] <logmsgbot>	 !log jhuneidi@deploy1003 caro, jhuneidi: Continuing with deployment
[20:47:16] <jeena>	 thanks cmede 
[20:47:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker1378:9290 - https://phabricator.wikimedia.org/T425015#11877257 (10Jclark-ctr) 05Open→03Resolved
[20:49:11] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[20:49:35] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[20:49:58] <VadymTS1>	 jeene Sorry to bother you again, but it seems the change 1280449 hasn't been applied, can you see her. That's my promise
[20:50:57] <jeena>	 VadymTS1: it says it was merged, so it should be deployed. Is it not working?
[20:50:59] <logmsgbot>	 !log jhuneidi@deploy1003 Finished scap sync-world: Backport for [[gerrit:1280516|Suggestion Mode controlled experiment: limit exposure to newcomers (T422736)]], [[gerrit:1280552|Abandon the editor survey: update edit count restriction (T422931)]] (duration: 08m 33s)
[20:51:04] <stashbot>	 T422736: Define and implement any missing metrics needed for Suggestion Mode controlled experiment - https://phabricator.wikimedia.org/T422736
[20:51:05] <stashbot>	 T422931: Implement the "Exit the editor" survey - https://phabricator.wikimedia.org/T422931
[20:51:21] <cmede>	 thank you jeena!
[20:51:30] <jeena>	 yw!
[20:51:50] <VadymTS1>	 No I see another problem the Phabricator don't see the SAL logs
[20:52:29] <kostajh>	 jeena: will you sync the two patches I have up, or would you like for me to do it?
[20:52:45] <jeena>	 I can do it if you prefer!
[20:52:48] <kostajh>	 VadymTS1: that’s probably just because they were all synced together
[20:52:49] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul: remove zuul-nodepool config, user, stop service [puppet] - 10https://gerrit.wikimedia.org/r/1279461 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[20:53:02] <kostajh>	 jeena: happy for you to do it as it’s late here
[20:53:09] <jeena>	 👍
[20:53:13] <VadymTS1>	 thanks kostajh
[20:53:26] <VadymTS1>	 Yes I see the cod is working
[20:54:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280657 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[20:54:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280656 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[20:57:55] <jeena>	 VadymTS1: I think what happened is probably there is a character limit for the SAL log and since we synced so many changes the final one got cut off
[20:59:10] <rzl>	 yep this is https://phabricator.wikimedia.org/T285709
[20:59:20] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:59:27] <rzl>	 it's not SAL per se, it's an IRC message length limit -- the message starting with "!log" gets split in two
[20:59:32] <jeena>	 oh thanks rzl!
[20:59:41] <jeena>	 I see
[21:00:03] <rzl>	 but because we use IRC to carry messages from the deployment server to the SAL, that's the limiting factor in between
[21:00:05] <jouncebot>	 Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260430T2100)
[21:00:13] <jeena>	 just noticed the second line after you mentioned that
[21:00:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[21:05:49] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Reduce default MAX_LOAD_ATTEMPTS from 10 to 6 [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280657 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[21:05:51] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Label load and execute duration metrics with outcome [extensions/ConfirmEdit] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1280656 (https://phabricator.wikimedia.org/T421204) (owner: 10Kosta Harlan)
[21:06:08] <logmsgbot>	 !log jhuneidi@deploy1003 Started scap sync-world: Backport for [[gerrit:1280657|hCaptcha: Reduce default MAX_LOAD_ATTEMPTS from 10 to 6 (T421204)]], [[gerrit:1280656|hCaptcha: Label load and execute duration metrics with outcome (T421204)]]
[21:07:09] <mutante>	 !log zuul1001/zuul2001 - rmdir /etc/nodepool
[21:07:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:48] <logmsgbot>	 !log jhuneidi@deploy1003 kharlan, jhuneidi: Backport for [[gerrit:1280657|hCaptcha: Reduce default MAX_LOAD_ATTEMPTS from 10 to 6 (T421204)]], [[gerrit:1280656|hCaptcha: Label load and execute duration metrics with outcome (T421204)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:08:05] <logmsgbot>	 !log jhuneidi@deploy1003 kharlan, jhuneidi: Continuing with deployment
[21:08:39] <A_smart_kitten>	 rzl: i was considering filing a task about that at some point some time ago, glad to see that there is already one :D
[21:10:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:11:55] <logmsgbot>	 !log jhuneidi@deploy1003 Finished scap sync-world: Backport for [[gerrit:1280657|hCaptcha: Reduce default MAX_LOAD_ATTEMPTS from 10 to 6 (T421204)]], [[gerrit:1280656|hCaptcha: Label load and execute duration metrics with outcome (T421204)]] (duration: 05m 47s)
[21:12:06] <kostajh>	 jeena: thanks!
[21:12:16] <jeena>	 yw!
[21:12:22] <rzl>	 A_smart_kitten: yeah! I can't exactly say with a straight face that we're prioritizing it 🙃 but it's known at least
[21:12:48] <wikibugs>	 (03PS6) 10Dzahn: zuul: create profile for new zuul-launcher replacing nodepool [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879)
[21:21:54] <wikibugs>	 (03CR) 10Dzahn: [V:04-1] "Function lookup() did not find a value for the name 'profile::zuul::launcher::user_token'" [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[21:23:46] <jinxer-wm>	 FIRING: Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[21:28:14] <logmsgbot>	 !log cdobbins@cumin2002 conftool action : get/pooled; selector: name=cp4041.ulsfo.wmnet
[21:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:28:42] <logmsgbot>	 !log cdobbins@cumin2002 conftool action : get/pooled; selector: name=cp*
[21:31:41] <logmsgbot>	 !log cdobbins@cumin2002 conftool action : get/pooled; selector: name=cp4044.ulsfo.wmnet
[21:31:48] <logmsgbot>	 !log cdobbins@cumin2002 conftool action : get/pooled; selector: name=cp4040.ulsfo.wmnet
[21:42:29] <wikibugs>	 (03PS1) 10Dzahn: zuul: rename nodepool::user_token to launcher::user_token [labs/private] - 10https://gerrit.wikimedia.org/r/1280729 (https://phabricator.wikimedia.org/T424879)
[21:43:03] <wikibugs>	 (03CR) 10Dzahn: [V:03+2 C:03+2] zuul: rename nodepool::user_token to launcher::user_token [labs/private] - 10https://gerrit.wikimedia.org/r/1280729 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[21:45:34] <wikibugs>	 (03PS1) 10Cwhite: update pyyaml in dev [software/ecs] - 10https://gerrit.wikimedia.org/r/1280733
[21:48:48] <wikibugs>	 (03CR) 10Cwhite: [C:04-1] logstash: add thanos-query-frontend filter (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1275800 (https://phabricator.wikimedia.org/T423986) (owner: 10Tiziano Fogli)
[21:48:54] <wikibugs>	 (03CR) 10Bking: [C:03+1] "Feel free to deploy one or two clusters once the DNS piece is ready, no need to deploy every one quite yet." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280515 (https://phabricator.wikimedia.org/T424248) (owner: 10Atsuko)
[21:50:29] <wikibugs>	 (03CR) 10Dzahn: [V:04-1] "renamed the nodepool::user_token to launcher::user_token in private and fake private" [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[21:52:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[21:52:50] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[21:53:29] <wikibugs>	 (03PS1) 10Cwhite: add query object [software/ecs] - 10https://gerrit.wikimedia.org/r/1280737 (https://phabricator.wikimedia.org/T423986)
[21:55:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[21:58:10] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-Needs-Improvement: switchdc SAL log entries are getting cut off because long lines are being split over IRC - https://phabricator.wikimedia.org/T285709#11877426 (10A_smart_kitten) Just noting for the task record that this also affects (e.g.) s...
[21:58:29] <wikibugs>	 (03CR) 10Cwhite: [C:04-1] logstash: add thanos-query-frontend filter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1275800 (https://phabricator.wikimedia.org/T423986) (owner: 10Tiziano Fogli)
[22:02:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[22:02:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[22:23:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[22:24:26] <wikibugs>	 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: [Update DNS Record Request] - wikimedia.org - Add TXT verification for Anthropic - https://phabricator.wikimedia.org/T424785#11877498 (10bcampbell) @CDobbins All looks good on the Anthropic end, I'm seeing the domain as verified now. Thanks for your help, feel f...
[22:40:41] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Chris!" [puppet] - 10https://gerrit.wikimedia.org/r/1271028 (https://phabricator.wikimedia.org/T416948) (owner: 10CDanis)
[22:53:20] <wikibugs>	 (03PS1) 10Cwhite: opensearch: move pki::get_cert call into profile module [puppet] - 10https://gerrit.wikimedia.org/r/1280788 (https://phabricator.wikimedia.org/T424204)
[22:53:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] opensearch: move pki::get_cert call into profile module [puppet] - 10https://gerrit.wikimedia.org/r/1280788 (https://phabricator.wikimedia.org/T424204) (owner: 10Cwhite)
[23:03:32] <jinxer-wm>	 FIRING: [2x] Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[23:08:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[23:28:32] <jinxer-wm>	 FIRING: [2x] Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[23:30:14] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul: create profile for new zuul-launcher replacing nodepool [puppet] - 10https://gerrit.wikimedia.org/r/1279470 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[23:40:38] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1280813
[23:40:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1280813 (owner: 10TrainBranchBot)
[23:40:55] <wikibugs>	 (03PS1) 10Dzahn: zuul: remove nodepool profile from zuul::main role [puppet] - 10https://gerrit.wikimedia.org/r/1280815 (https://phabricator.wikimedia.org/T424879)
[23:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1018:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:43:25] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul: remove nodepool profile from zuul::main role [puppet] - 10https://gerrit.wikimedia.org/r/1280815 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[23:50:32] <wikibugs>	 (03PS1) 10Dzahn: zuul: add placeholder template for launcher config [puppet] - 10https://gerrit.wikimedia.org/r/1280820 (https://phabricator.wikimedia.org/T424879)
[23:51:26] <logmsgbot>	 !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on zuul2001.codfw.wmnet with reason: T421398
[23:51:31] <stashbot>	 T421398: SystemdUnitFailed - zuul-executor - https://phabricator.wikimedia.org/T421398
[23:51:32] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1280813 (owner: 10TrainBranchBot)
[23:51:58] <logmsgbot>	 !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: T421398
[23:54:39] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[23:55:13] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] zuul: add placeholder template for launcher config [puppet] - 10https://gerrit.wikimedia.org/r/1280820 (https://phabricator.wikimedia.org/T424879) (owner: 10Dzahn)
[23:56:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate