[00:07:28] <icinga-wm>	 PROBLEM - MariaDB disk space on db1208 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[00:07:28] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: matomo on db1208 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:07:28] <icinga-wm>	 PROBLEM - mysqld processes on db1208 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[00:07:28] <icinga-wm>	 PROBLEM - MariaDB Replica IO: matomo on db1208 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:07:29] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: analytics_meta on db1208 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:07:29] <icinga-wm>	 PROBLEM - MariaDB Replica IO: analytics_meta on db1208 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:07:50] <icinga-wm>	 PROBLEM - MariaDB read only matomo on db1208 is CRITICAL: Could not connect to localhost:3351 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[00:07:58] <icinga-wm>	 PROBLEM - MariaDB read only analytics_meta on db1208 is CRITICAL: Could not connect to localhost:3352 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[00:08:57] <wikibugs>	 (03CR) 10Santiago Faci: "Also, we should keep in mind that, according to what is said in the related ticket, we don't want to make this change while any experiment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303490 (owner: 10Pushpaktiwari)
[00:11:28] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: matomo on db1208 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:11:28] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: analytics_meta on db1208 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[00:17:16] <jinxer-wm>	 FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[01:08:01] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#12048730 (10Papaul)
[01:10:15] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#12048732 (10Papaul)
[01:12:27] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1305281
[01:12:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1305281 (owner: 10TrainBranchBot)
[01:13:19] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12048738 (10Papaul)
[01:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[01:20:22] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1305281 (owner: 10TrainBranchBot)
[01:52:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:00:41] <logmsgbot>	 !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image
[02:07:26] <logmsgbot>	 !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 45s)
[02:09:41] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:14:41] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:11:50] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 656.52 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[03:13:48] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 0.38 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[03:25:41] <wikibugs>	 (03PS1) 10Clare Ming: Remove saved groups config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305287 (https://phabricator.wikimedia.org/T429959)
[03:26:53] <wikibugs>	 (03PS2) 10Clare Ming: Remove saved groups config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305287 (https://phabricator.wikimedia.org/T429959)
[03:31:36] <icinga-wm>	 PROBLEM - Ensure traffic_manager is running for instance backend on cp6009 is CRITICAL: PROCS CRITICAL: 3 processes with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[03:32:36] <icinga-wm>	 RECOVERY - Ensure traffic_manager is running for instance backend on cp6009 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[03:35:53] <wikibugs>	 (03PS1) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305288 (https://phabricator.wikimedia.org/T428984)
[03:38:01] <wikibugs>	 (03PS1) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305289 (https://phabricator.wikimedia.org/T428984)
[03:44:36] <wikibugs>	 (03PS3) 10Abijeet Patro: Enable ULS v2 on group2 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290
[03:44:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable ULS v2 on group2 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290 (owner: 10Abijeet Patro)
[03:46:04] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[03:55:32] <wikibugs>	 (03CR) 10Abijeet Patro: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290 (owner: 10Abijeet Patro)
[03:55:53] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 25 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290 (owner: 10Abijeet Patro)
[04:17:16] <jinxer-wm>	 FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[05:13:39] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis isvwiki in section s5
[05:13:43] <wikibugs>	 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12048852 (10hashar) The `build-homepage` service is indeed failing https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&from=now-5m&to=now&timezone=utc&var-...
[05:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[05:22:41] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis isvwiki in section s5
[05:25:52] <wikibugs>	 (03PS1) 10Ryan Kemper: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844)
[05:30:22] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Check future m2-master [puppet] - 10https://gerrit.wikimedia.org/r/1305322 (https://phabricator.wikimedia.org/T429929)
[05:31:11] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Check future m2-master [puppet] - 10https://gerrit.wikimedia.org/r/1305322 (https://phabricator.wikimedia.org/T429929) (owner: 10Marostegui)
[05:32:02] <wikibugs>	 (03PS11) 10Trueg: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313)
[05:33:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg)
[05:35:30] <wikibugs>	 (03PS1) 10Marostegui: Revert "mariadb: Check future m2-master" [puppet] - 10https://gerrit.wikimedia.org/r/1305323
[05:35:49] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[05:36:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "mariadb: Check future m2-master" [puppet] - 10https://gerrit.wikimedia.org/r/1305323 (owner: 10Marostegui)
[05:41:07] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Set es7 eqiad as read-only for maintenance - T429867', diff saved to https://phabricator.wikimedia.org/P94392 and previous config saved to /var/cache/conftool/dbconfig/20260624-054106-marostegui.json
[05:41:12] <stashbot>	 T429867: Switchover es7 master (es1035 -> es1039) - https://phabricator.wikimedia.org/T429867
[05:41:22] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Primary switchover es7 T429867
[05:41:32] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Set es1039 with weight 0 T429867', diff saved to https://phabricator.wikimedia.org/P94393 and previous config saved to /var/cache/conftool/dbconfig/20260624-054131-marostegui.json
[05:42:11] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote es1039 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1305020 (https://phabricator.wikimedia.org/T429867) (owner: 10Gerrit maintenance bot)
[05:44:24] <marostegui>	 !log Starting es7 eqiad failover from es1035 to es1039 - T429867
[05:44:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:44:47] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote es1039 to es7 primary T429867', diff saved to https://phabricator.wikimedia.org/P94394 and previous config saved to /var/cache/conftool/dbconfig/20260624-054446-marostegui.json
[05:44:55] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Platform-SRE, and 5 others: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12048903 (10ayounsi)
[05:45:09] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1305021 (https://phabricator.wikimedia.org/T429867) (owner: 10Gerrit maintenance bot)
[05:45:16] <logmsgbot>	 !log marostegui@dns1004 START - running authdns-update
[05:45:48] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool es1035 T429867', diff saved to https://phabricator.wikimedia.org/P94395 and previous config saved to /var/cache/conftool/dbconfig/20260624-054547-marostegui.json
[05:46:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Set es7 eqiad back to read-write - T429867', diff saved to https://phabricator.wikimedia.org/P94396 and previous config saved to /var/cache/conftool/dbconfig/20260624-054611-marostegui.json
[05:46:16] <stashbot>	 T429867: Switchover es7 master (es1035 -> es1039) - https://phabricator.wikimedia.org/T429867
[05:47:07] <logmsgbot>	 !log marostegui@dns1004 END - running authdns-update
[05:52:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:55:41] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[05:55:41] <logmsgbot>	 !log marostegui@cumin1003 dbmaint on es7@eqiad T429463
[05:55:47] <stashbot>	 T429463: Migrate es7 section to Debian Trixie - https://phabricator.wikimedia.org/T429463
[05:55:50] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1035: Upgrading es1035.eqiad.wmnet
[05:56:01] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1035: Upgrading es1035.eqiad.wmnet
[05:56:49] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[05:57:33] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Platform-SRE, and 5 others: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12048932 (10ayounsi)
[05:59:40] <logmsgbot>	 marostegui@cumin1003 major-upgrade (PID 2533770) is awaiting input
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T0600)
[06:04:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[06:07:59] <logmsgbot>	 marostegui@cumin1003 major-upgrade (PID 2533770) is awaiting input
[06:08:55] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS trixie
[06:21:51] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] hiera: disable awslc on magru hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305128 (https://phabricator.wikimedia.org/T419825) (owner: 10Fabfur)
[06:24:36] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
[06:27:25] <wikibugs>	 (03CR) 10Jelto: [C:03+2] gerrit: increase thresholds for GerritHigh4xxRatio alert [alerts] - 10https://gerrit.wikimedia.org/r/1304506 (https://phabricator.wikimedia.org/T428979) (owner: 10Jelto)
[06:29:19] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
[06:30:08] <wikibugs>	 (03Merged) 10jenkins-bot: gerrit: increase thresholds for GerritHigh4xxRatio alert [alerts] - 10https://gerrit.wikimedia.org/r/1304506 (https://phabricator.wikimedia.org/T428979) (owner: 10Jelto)
[06:32:03] <wikibugs>	 (03PS1) 10Slyngshede: data.yaml: new expiry date for aramilferaxa [puppet] - 10https://gerrit.wikimedia.org/r/1305326
[06:35:02] <wikibugs>	 (03CR) 10Arnaudb: "I see! thanks for the review, lets leave that change aside for now then." [puppet] - 10https://gerrit.wikimedia.org/r/1302834 (https://phabricator.wikimedia.org/T420865) (owner: 10Arnaudb)
[06:36:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1305326 (owner: 10Slyngshede)
[06:42:08] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "thanks for adjusting the threshold!" [alerts] - 10https://gerrit.wikimedia.org/r/1304506 (https://phabricator.wikimedia.org/T428979) (owner: 10Jelto)
[06:45:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
[06:46:33] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1035.eqiad.wmnet with OS trixie
[06:51:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
[06:53:06] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] data.yaml: new expiry date for aramilferaxa [puppet] - 10https://gerrit.wikimedia.org/r/1305326 (owner: 10Slyngshede)
[06:53:38] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
[06:54:06] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1035: Migration of es1035.eqiad.wmnet completed
[06:54:21] <logmsgbot>	 !log jmm@cumin2003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2028.codfw.wmnet
[06:54:28] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
[06:54:48] <logmsgbot>	 !log jmm@cumin2003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
[06:54:57] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:55:01] <wikibugs>	 (03PS1) 10Matthias Mullie: Enable MMV carousel on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305329 (https://phabricator.wikimedia.org/T429509)
[06:55:15] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305329 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie)
[06:57:01] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.changedisk for changing disk type of ml-staging-etcd2001.codfw.wmnet to drbd
[06:57:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305329 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie)
[06:59:03] <wikibugs>	 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12049032 (10elukey) ` Jun 24 06:24:07 registry2004 registry-homepage-builder[3522966]: INFO:root:Fetching the image catalog for localhost:5004 Jun 24 06:24:07 registry2...
[06:59:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:00:04] <jouncebot>	 Amir1, urbanecm, and awight: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T0700).
[07:00:04] <jouncebot>	 matthiasmullie: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:09] <matthiasmullie>	 o/
[07:00:17] <matthiasmullie>	 I've already begun
[07:00:20] <wikibugs>	 (03Merged) 10jenkins-bot: Enable MMV carousel on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305329 (https://phabricator.wikimedia.org/T429509) (owner: 10Matthias Mullie)
[07:01:22] <logmsgbot>	 !log mlitn@deploy1003 Started scap sync-world: Backport for [[gerrit:1305329|Enable MMV carousel on enwiki (T429509)]]
[07:01:26] <stashbot>	 T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509
[07:03:54] <logmsgbot>	 !log mlitn@deploy1003 mlitn: Backport for [[gerrit:1305329|Enable MMV carousel on enwiki (T429509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:04:47] <logmsgbot>	 !log mlitn@deploy1003 mlitn: Continuing with deployment
[07:05:57] <wikibugs>	 (03PS1) 10Elukey: docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287)
[07:06:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[07:07:04] <logmsgbot>	 !log jmm@cumin2003 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-staging-etcd2001.codfw.wmnet to drbd
[07:07:19] <icinga-wm>	 PROBLEM - Host ml-staging-etcd2001 is DOWN: PING CRITICAL - Packet loss = 100%
[07:07:43] <wikibugs>	 (03PS2) 10Elukey: docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287)
[07:07:47] <icinga-wm>	 RECOVERY - Host ml-staging-etcd2001 is UP: PING OK - Packet loss = 0%, RTA = 31.89 ms
[07:08:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[07:08:26] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
[07:08:38] <wikibugs>	 (03CR) 10Elukey: [C:03+2] docker_registry: remove support for the nginx blob cache [puppet] - 10https://gerrit.wikimedia.org/r/1304512 (https://phabricator.wikimedia.org/T427175) (owner: 10Elukey)
[07:09:12] <logmsgbot>	 !log mlitn@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305329|Enable MMV carousel on enwiki (T429509)]] (duration: 07m 49s)
[07:09:16] <stashbot>	 T429509: [Image Browsing] Carousel: Take the feature out of beta and set up a config variable to enable in production - https://phabricator.wikimedia.org/T429509
[07:09:30] <logmsgbot>	 !log jmm@cumin2003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
[07:09:37] <matthiasmullie>	 Done; rest of backport window is up for grabs
[07:11:16] <wikibugs>	 (03PS1) 10Marostegui: db2202: Add note [puppet] - 10https://gerrit.wikimedia.org/r/1305331 (https://phabricator.wikimedia.org/T430017)
[07:11:32] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
[07:11:39] <wikibugs>	 (03PS2) 10Jelto: Update to v3.30.7 [debs/calico] (v3.30) - 10https://gerrit.wikimedia.org/r/1305139 (https://phabricator.wikimedia.org/T427400)
[07:11:39] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "build verified on `build2002`" [debs/calico] (v3.30) - 10https://gerrit.wikimedia.org/r/1305139 (https://phabricator.wikimedia.org/T427400) (owner: 10Jelto)
[07:13:07] <wikibugs>	 (03PS3) 10Elukey: docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287)
[07:15:19] <wikibugs>	 (03CR) 10Marostegui: "Can you do some testing around with test-cookbook just to make sure it is all working as expected without any major issues." [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) (owner: 10Federico Ceratto)
[07:16:21] <wikibugs>	 (03PS4) 10Hashar: docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[07:16:35] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "Great idea, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[07:21:17] <wikibugs>	 (03CR) 10Elukey: [C:03+2] docker_registry: improve homepage-builder.py's tag ordering [puppet] - 10https://gerrit.wikimedia.org/r/1305330 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[07:24:41] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:28:36] <wikibugs>	 (03CR) 10Elukey: "Applied suggestion thanks! The ipmi cookbook will go away soon, but yeah I'll update it right after changing spicerack!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[07:29:36] <wikibugs>	 (03PS5) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699)
[07:29:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:34:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[07:39:37] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1035: Migration of es1035.eqiad.wmnet completed
[07:39:38] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[07:40:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Failover url-downloader.codfw CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1304764 (https://phabricator.wikimedia.org/T427282) (owner: 10Muehlenhoff)
[07:40:59] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[07:41:12] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "Ok, I don't think it is a problem to use it as a critical host. The host won't be reclaimed intermediately and you could also "return a di" [puppet] - 10https://gerrit.wikimedia.org/r/1305331 (https://phabricator.wikimedia.org/T430017) (owner: 10Marostegui)
[07:42:51] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[07:44:33] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2202: Add note [puppet] - 10https://gerrit.wikimedia.org/r/1305331 (https://phabricator.wikimedia.org/T430017) (owner: 10Marostegui)
[07:45:25] <wikibugs>	 (03PS1) 10Dpogorzelski: ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697)
[07:46:21] <wikibugs>	 (03PS2) 10Dpogorzelski: ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697)
[07:47:33] <wikibugs>	 (03PS3) 10Dpogorzelski: ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697)
[07:48:38] <wikibugs>	 (03PS4) 10Dpogorzelski: ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697)
[07:50:33] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[07:50:33] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893
[07:50:39] <stashbot>	 T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893
[07:50:53] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1199: Upgrading db1199.eqiad.wmnet
[07:51:48] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[07:52:50] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[07:53:14] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1199: Upgrading db1199.eqiad.wmnet
[07:54:44] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[07:54:44] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@codfw T429893
[07:55:06] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2179: Upgrading db2179.codfw.wmnet
[07:55:17] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[07:55:38] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2179: Upgrading db2179.codfw.wmnet
[07:56:32] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1199.eqiad.wmnet with OS trixie
[07:57:55] <wikibugs>	 (03PS1) 10Kevin Bazira: ml: assemble venv in build stage and chunk runtime layers to fit registry limit [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305341 (https://phabricator.wikimedia.org/T429667)
[07:58:10] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS trixie
[08:01:22] <wikibugs>	 (03PS1) 10Brouberol: Fix prometheus for kafka monitoring, to fix linting alert [alerts] - 10https://gerrit.wikimedia.org/r/1305342 (https://phabricator.wikimedia.org/T429127)
[08:03:03] <wikibugs>	 (03PS2) 10Kevin Bazira: ml: assemble venv in build stage and chunk runtime layers to fit registry limit [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305341 (https://phabricator.wikimedia.org/T429667)
[08:05:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Fix prometheus for kafka monitoring, to fix linting alert [alerts] - 10https://gerrit.wikimedia.org/r/1305342 (https://phabricator.wikimedia.org/T429127) (owner: 10Brouberol)
[08:06:34] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:07:31] <fabfur>	 !log depooling cp7001 and cp7009 to reimage (T419825)
[08:07:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:07:36] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[08:08:48] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp7001.*
[08:08:56] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp7001.*
[08:09:07] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp7009.*
[08:09:27] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: disable awslc on magru hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305128 (https://phabricator.wikimedia.org/T419825) (owner: 10Fabfur)
[08:10:49] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
[08:11:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Update redis-misc-canary alias [puppet] - 10https://gerrit.wikimedia.org/r/1305344
[08:13:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Move the the hourly httpbb run to cumin2003 [puppet] - 10https://gerrit.wikimedia.org/r/1304803 (https://phabricator.wikimedia.org/T427897) (owner: 10Muehlenhoff)
[08:13:47] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS trixie
[08:13:57] <wikibugs>	 10SRE-swift-storage, 06Traffic: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744#12049171 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1003 for host cp7001.magru.wmnet with OS trixie
[08:14:35] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
[08:16:31] <wikibugs>	 (03PS5) 10Ayounsi: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554
[08:16:31] <wikibugs>	 (03PS1) 10Ayounsi: tox: add python 3.14 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345
[08:16:56] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
[08:17:16] <jinxer-wm>	 FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[08:17:34] <moritzm>	 fabfur: okay to merge your "hiera: disable awslc on magru hosts" patch along?
[08:18:40] <logmsgbot>	 !log marostegui@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
[08:19:48] <fabfur>	 moritzm: sorry forgot it, thanks
[08:20:28] <moritzm>	 ok
[08:21:58] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1013.eqiad.wmnet with reason: Cloning cloddb1026
[08:24:40] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
[08:25:47] <wikibugs>	 (03PS2) 10Tiziano Fogli: redis: remove nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/1305075 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan)
[08:25:47] <wikibugs>	 (03PS1) 10Tiziano Fogli: redis: disable nrpe checks, replace with prometheus checks [puppet] - 10https://gerrit.wikimedia.org/r/1305347 (https://phabricator.wikimedia.org/T384924)
[08:26:19] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize cloudb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305349 (https://phabricator.wikimedia.org/T409557)
[08:27:40] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp7009.magru.wmnet with OS trixie
[08:29:39] <wikibugs>	 (03CR) 10Tiziano Fogli: "I just added a commit to disable the Icinga check before deleting it from the configuration." [puppet] - 10https://gerrit.wikimedia.org/r/1305347 (https://phabricator.wikimedia.org/T384924) (owner: 10Tiziano Fogli)
[08:31:53] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1199.eqiad.wmnet with OS trixie
[08:32:14] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:33:07] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:34:19] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Productionize cloudb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305349 (https://phabricator.wikimedia.org/T409557) (owner: 10Marostegui)
[08:35:55] <wikibugs>	 (03PS1) 10Ayounsi: Add depool policy for VTRS [puppet] - 10https://gerrit.wikimedia.org/r/1305350 (https://phabricator.wikimedia.org/T327300)
[08:37:58] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan)
[08:38:15] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] redis: remove nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/1305075 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan)
[08:38:37] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
[08:40:42] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12049245 (10MoritzMuehlenhoff)
[08:40:58] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12049248 (10MoritzMuehlenhoff) @Gehel This needs your approval for analytics-wmde-users
[08:42:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Add nicholusmuwonge to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1305352 (https://phabricator.wikimedia.org/T429896)
[08:43:24] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS trixie
[08:43:45] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
[08:46:41] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1199: Migration of db1199.eqiad.wmnet completed
[08:48:51] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] "LGTM from an o11y perspective. I'll leave the specifics to the team." [alerts] - 10https://gerrit.wikimedia.org/r/1300745 (https://phabricator.wikimedia.org/T428873) (owner: 10Filippo Giunchedi)
[08:48:59] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] "LGTM from an o11y perspective. I'll leave the specifics to the team." [alerts] - 10https://gerrit.wikimedia.org/r/1302151 (https://phabricator.wikimedia.org/T328502) (owner: 10Filippo Giunchedi)
[08:49:40] <wikibugs>	 (03CR) 10Ayounsi: diffscan: pyhotnify (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[08:50:02] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
[08:50:06] <logmsgbot>	 !log jmm@cumin2003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
[08:50:09] <wikibugs>	 (03PS20) 10Ayounsi: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[08:53:56] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
[08:56:32] <wikibugs>	 (03CR) 10Ayounsi: Cookbook to configure switch port vlans for cloud hosts (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303397 (https://phabricator.wikimedia.org/T429466) (owner: 10Cathal Mooney)
[08:56:40] <wikibugs>	 (03Abandoned) 10Blake: mw-wikifunctions: Prune host list for mw-wikifunctions ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1301313 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake)
[08:58:25] <wikibugs>	 (03CR) 10Cathal Mooney: Cookbook to configure switch port vlans for cloud hosts (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1303397 (https://phabricator.wikimedia.org/T429466) (owner: 10Cathal Mooney)
[08:59:59] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2179: Migration of db2179.codfw.wmnet completed
[09:00:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover url-downloader.eqiad CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1305354 (https://phabricator.wikimedia.org/T427282)
[09:01:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12049282 (10Gehel) Approved
[09:02:27] <wikibugs>	 (03CR) 10Klausman: [C:03+1] ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697) (owner: 10Dpogorzelski)
[09:02:32] <wikibugs>	 (03PS6) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699)
[09:03:12] <moritzm>	 !log temporarily remove ganeti2028 from the codfw cluster T429817
[09:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:17] <stashbot>	 T429817: codfw: rack A7 maintenance - https://phabricator.wikimedia.org/T429817
[09:04:50] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Add laurabarluzzi to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304751 (https://phabricator.wikimedia.org/T429431) (owner: 10Muehlenhoff)
[09:05:28] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti2028 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[09:05:28] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti2028 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[09:05:59] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Add nicholusmuwonge to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1305352 (https://phabricator.wikimedia.org/T429896) (owner: 10Muehlenhoff)
[09:06:02] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12049302 (10SLyngshede-WMF)
[09:06:50] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti2028:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:07:12] <wikibugs>	 (03CR) 10Klausman: [V:03+2 C:03+2] ml: assemble venv in build stage and chunk runtime layers to fit registry limit [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305341 (https://phabricator.wikimedia.org/T429667) (owner: 10Kevin Bazira)
[09:07:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[09:07:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[09:08:07] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[09:08:34] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS trixie
[09:08:43] <wikibugs>	 10SRE-swift-storage, 06Traffic: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744#12049314 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1003 for host cp7001.magru.wmnet with OS trixie completed: - cp7001 (**PASS**)   - Downtimed on Icinga/Alertmana...
[09:12:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[09:12:47] <logmsgbot>	 elukey@cumin1003 provision (PID 2562859) is awaiting input
[09:13:30] <wikibugs>	 (03CR) 10Blake: [C:03+2] main: Add a namespace for the mw-pretrain service. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake)
[09:15:42] <wikibugs>	 (03CR) 10Elukey: tox: add python 3.14 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[09:16:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add laurabarluzzi to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1304751 (https://phabricator.wikimedia.org/T429431) (owner: 10Muehlenhoff)
[09:18:18] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for laurabarluzzi - https://phabricator.wikimedia.org/T429431#12049348 (10MoritzMuehlenhoff) 05In progress→03Resolved a:05XenoRyet→03MoritzMuehlenhoff @Laurabarluzzi Your access has been enabled, it...
[09:18:54] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7009.magru.wmnet with OS trixie
[09:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[09:20:14] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] ml-serve: temperature/power and partition usage [puppet] - 10https://gerrit.wikimedia.org/r/1305336 (https://phabricator.wikimedia.org/T403697) (owner: 10Dpogorzelski)
[09:20:18] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[09:21:05] <wikibugs>	 (03CR) 10Elukey: [C:03+1] netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 (owner: 10Ayounsi)
[09:22:06] <logmsgbot>	 !log jmm@cumin2003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
[09:22:13] <wikibugs>	 (03Merged) 10jenkins-bot: main: Add a namespace for the mw-pretrain service. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304083 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake)
[09:22:57] <logmsgbot>	 !log jmm@cumin2003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2046.codfw.wmnet
[09:22:59] <wikibugs>	 (03CR) 10Ayounsi: tox: add python 3.14 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[09:24:28] <wikibugs>	 (03CR) 10Elukey: tox: add python 3.14 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[09:25:39] <wikibugs>	 (03PS7) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699)
[09:25:39] <wikibugs>	 (03PS1) 10Elukey: Add setuptools to sphinx's tox environment [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305355
[09:25:46] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[09:26:18] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[09:26:47] <fabfur>	 !log repooling cp7001 and cp7009 after reimage (T419825)
[09:26:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:51] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[09:27:09] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp7009.*
[09:27:13] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp7001.*
[09:27:23] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
[09:27:24] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
[09:27:30] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp7009.magru.wmnet
[09:27:30] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7009.magru.wmnet
[09:28:00] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs2014 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[09:28:00] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs2013 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[09:28:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add nicholusmuwonge to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1305352 (https://phabricator.wikimedia.org/T429896) (owner: 10Muehlenhoff)
[09:28:38] <wikibugs>	 (03PS1) 10Jcrespo: versitygw: Fix service not reloading after certificate change [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023)
[09:29:31] <wikibugs>	 (03PS2) 10Jcrespo: versitygw: Fix service not reloading after certificate change [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023)
[09:29:39] <wikibugs>	 (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023) (owner: 10Jcrespo)
[09:29:53] <wikibugs>	 (03PS1) 10Elukey: docker_registry: fix homepage-builder.py code [puppet] - 10https://gerrit.wikimedia.org/r/1305357 (https://phabricator.wikimedia.org/T388287)
[09:30:21] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Analytics Production Access for Nicholusmuwonge_wmde - https://phabricator.wikimedia.org/T429896#12049392 (10MoritzMuehlenhoff) 05Open→03Resolved a:05Gehel→03MoritzMuehlenhoff @Nicholusmuwonge_wmde  Your access has been enable...
[09:31:22] <James_F>	 jouncebot: nowandnext
[09:31:22] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 28 minute(s)
[09:31:23] <jouncebot>	 In 0 hour(s) and 28 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1000)
[09:31:32] <James_F>	 OK, I'll push some config out now
[09:32:13] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1199: Migration of db1199.eqiad.wmnet completed
[09:32:14] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[09:32:28] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester)
[09:32:35] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester)
[09:32:59] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] [abstractwiki] Update favicon with new version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304110 (https://phabricator.wikimedia.org/T429620) (owner: 10Jforrester)
[09:33:05] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] WikiLambda: Expose wikilambda-abstract-optin for global group assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305182 (https://phabricator.wikimedia.org/T422698) (owner: 10Jforrester)
[09:33:25] <wikibugs>	 (03Merged) 10jenkins-bot: [testwiki] Enable Abstract Client integration mode, not just previews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304800 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester)
[09:33:32] <wikibugs>	 (03Merged) 10jenkins-bot: [abstractwiki] Add the 'allowed' temporary vars for cross-wiki content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304770 (https://phabricator.wikimedia.org/T422657) (owner: 10Jforrester)
[09:33:53] <wikibugs>	 (03CR) 10Jcrespo: "Hi, Moritz, this looks like a silly mistake (missing notification) I made when setting up this service in a hurry. A quick sanity check wo" [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023) (owner: 10Jcrespo)
[09:33:55] <wikibugs>	 (03Merged) 10jenkins-bot: [abstractwiki] Update favicon with new version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304110 (https://phabricator.wikimedia.org/T429620) (owner: 10Jforrester)
[09:34:00] <wikibugs>	 (03Merged) 10jenkins-bot: WikiLambda: Expose wikilambda-abstract-optin for global group assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305182 (https://phabricator.wikimedia.org/T422698) (owner: 10Jforrester)
[09:34:42] <logmsgbot>	 !log jforrester@deploy1003 Started scap sync-world: Backport for [[gerrit:1304800|[testwiki] Enable Abstract Client integration mode, not just previews (T422657)]], [[gerrit:1304770|[abstractwiki] Add the 'allowed' temporary vars for cross-wiki content (T422657)]], [[gerrit:1305182|WikiLambda: Expose wikilambda-abstract-optin for global group assignment (T422698)]], [[gerrit:1304110|[abstractwiki] Update favicon with new
[09:34:42] <logmsgbot>	 version (T429620)]]
[09:34:48] <stashbot>	 T422657: Enable abstract client mode on Test Wikipedia - https://phabricator.wikimedia.org/T422657
[09:34:48] <stashbot>	 T422698: Grant `wikilambda-abstract-optin` to cross-wiki global groups via mediawiki-config / stewards' config on Special:GlobalGroupPermissions - https://phabricator.wikimedia.org/T422698
[09:34:49] <stashbot>	 T429620: Fix Abstract Wikipedia favicon - https://phabricator.wikimedia.org/T429620
[09:36:03] <wikibugs>	 (03CR) 10Elukey: [C:03+2] docker_registry: fix homepage-builder.py code [puppet] - 10https://gerrit.wikimedia.org/r/1305357 (https://phabricator.wikimedia.org/T388287) (owner: 10Elukey)
[09:36:46] <logmsgbot>	 !log jforrester@deploy1003 jforrester: Backport for [[gerrit:1304800|[testwiki] Enable Abstract Client integration mode, not just previews (T422657)]], [[gerrit:1304770|[abstractwiki] Add the 'allowed' temporary vars for cross-wiki content (T422657)]], [[gerrit:1305182|WikiLambda: Expose wikilambda-abstract-optin for global group assignment (T422698)]], [[gerrit:1304110|[abstractwiki] Update favicon with new version (T429
[09:36:46] <logmsgbot>	 620)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:36:54] <stashbot>	 T429: Analyze MW-Vagrant qualitative survey - https://phabricator.wikimedia.org/T429
[09:37:46] <logmsgbot>	 !log jforrester@deploy1003 jforrester: Continuing with deployment
[09:38:37] <wikibugs>	 (03CR) 10Elukey: "@rcoccioli@wikimedia.org ready!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[09:38:53] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2013 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[09:40:17] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12049442 (10jcrespo) I've stopped ms-backups2003 network operations for now, codfw media backups will continue to flow temporarily only through ms-backup2004. No hurry on h...
[09:42:05] <logmsgbot>	 !log jforrester@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304800|[testwiki] Enable Abstract Client integration mode, not just previews (T422657)]], [[gerrit:1304770|[abstractwiki] Add the 'allowed' temporary vars for cross-wiki content (T422657)]], [[gerrit:1305182|WikiLambda: Expose wikilambda-abstract-optin for global group assignment (T422698)]], [[gerrit:1304110|[abstractwiki] Update favicon with new
[09:42:05] <logmsgbot>	 version (T429620)]] (duration: 07m 23s)
[09:42:13] <stashbot>	 T422657: Enable abstract client mode on Test Wikipedia - https://phabricator.wikimedia.org/T422657
[09:42:14] <stashbot>	 T422698: Grant `wikilambda-abstract-optin` to cross-wiki global groups via mediawiki-config / stewards' config on Special:GlobalGroupPermissions - https://phabricator.wikimedia.org/T422698
[09:42:14] <stashbot>	 T429620: Fix Abstract Wikipedia favicon - https://phabricator.wikimedia.org/T429620
[09:42:29] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2014 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[09:43:17] <wikibugs>	 (03CR) 10Aklapper: "Good point. This was overwritten anyway in https://gitlab.wikimedia.org/repos/phabricator/deployment/-/blob/wmf/stable/scap/templates/phab" [puppet] - 10https://gerrit.wikimedia.org/r/1305041 (https://phabricator.wikimedia.org/T330797) (owner: 10Aklapper)
[09:43:36] <wikibugs>	 (03PS2) 10Blake: kubernetes: Add a k8s deployment for pretrain. [puppet] - 10https://gerrit.wikimedia.org/r/1305358 (https://phabricator.wikimedia.org/T427668)
[09:45:30] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2179: Migration of db2179.codfw.wmnet completed
[09:45:31] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[09:47:18] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy outlink model latest version on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305056 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[09:49:21] <wikibugs>	 (03PS12) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613)
[09:49:34] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy outlink model latest version on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305056 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[09:51:50] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti2028:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:52:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:54:17] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[09:54:27] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[09:54:44] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305355 (owner: 10Elukey)
[09:55:12] <wikibugs>	 (03PS1) 10Muehlenhoff: thumbor-plugins: Rebuild against latest package versions in Bookworm [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1305363
[09:57:04] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[09:58:13] <logmsgbot>	 !log marostegui@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
[09:58:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023) (owner: 10Jcrespo)
[09:59:52] <wikibugs>	 (03PS1) 10Marostegui: check_private_data_report: Add clouddb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305365 (https://phabricator.wikimedia.org/T409557)
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1000)
[10:00:08] <wikibugs>	 (03CR) 10Volans: "LGTM, one lost thing in rebase ;)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[10:01:12] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] check_private_data_report: Add clouddb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305365 (https://phabricator.wikimedia.org/T409557) (owner: 10Marostegui)
[10:01:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[10:04:09] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: disable awslc on codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305131 (https://phabricator.wikimedia.org/T419825) (owner: 10Fabfur)
[10:05:04] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp2043.*
[10:05:08] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp2044.*
[10:05:24] <fabfur>	 !log depooling cp2043 and cp2044 to reimage (T419825)
[10:05:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:27] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[10:08:27] <wikibugs>	 (03PS4) 10Abijeet Patro: Enable ULS v2 by default across all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290
[10:10:33] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS trixie
[10:10:35] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
[10:11:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] thumbor-plugins: Rebuild against latest package versions in Bookworm [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1305363 (owner: 10Muehlenhoff)
[10:13:45] <wikibugs>	 (03CR) 10Muehlenhoff: docker_registry: remove support for the nginx blob cache (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1304512 (https://phabricator.wikimedia.org/T427175) (owner: 10Elukey)
[10:14:12] <wikibugs>	 (03CR) 10Marostegui: cookbooks/sre/mysql/decommission: add cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[10:17:24] <wikibugs>	 (03PS17) 10Federico Ceratto: mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436)
[10:18:13] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Failover url-downloader.eqiad CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1305354 (https://phabricator.wikimedia.org/T427282) (owner: 10Muehlenhoff)
[10:19:01] <wikibugs>	 (03PS2) 10Zabe: Use Hadoop for Mostcategories on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1248909 (https://phabricator.wikimedia.org/T413362)
[10:19:27] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format db1290 [puppet] - 10https://gerrit.wikimedia.org/r/1305369
[10:22:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format db1290 [puppet] - 10https://gerrit.wikimedia.org/r/1305369 (owner: 10Marostegui)
[10:25:42] <wikibugs>	 (03PS1) 10Marostegui: eqiad.yaml: Add clouddb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305373 (https://phabricator.wikimedia.org/T409557)
[10:25:51] <wikibugs>	 (03PS1) 10Muehlenhoff: thumbor: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305374
[10:26:16] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
[10:26:17] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
[10:29:00] <wikibugs>	 (03PS3) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112)
[10:29:00] <wikibugs>	 (03PS3) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[10:30:17] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
[10:30:40] <wikibugs>	 (03PS7) 10Gkyziridis: ml-services: Deploy Qwen3.6 model. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680)
[10:34:12] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
[10:39:02] <wikibugs>	 (03PS1) 10Blake: kube-state-metrics: Add v2.18.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1305377 (https://phabricator.wikimedia.org/T427405)
[10:42:59] <claime>	 jouncebot: nowandnext
[10:42:59] <jouncebot>	 For the next 0 hour(s) and 17 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1000)
[10:42:59] <jouncebot>	 In 0 hour(s) and 17 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1100)
[10:43:35] <claime>	 I'm locking scap because I need to test https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1304845 on beta, so I need to +2 it and merge, but I don't want it to get pulled just now
[10:43:47] <claime>	 Please ping me if that's a problem for you
[10:45:03] <logmsgbot>	 !log cgoubert@deploy1003 Locking from deployment [ALL REPOSITORIES]: Testing apiportalwiki deletion in beta - T418494
[10:45:08] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[10:45:09] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin)
[10:46:39] <wikibugs>	 (03PS3) 10Btullis: Grant sudo privileges for the analytics-fr-tech-users group [puppet] - 10https://gerrit.wikimedia.org/r/1266980 (https://phabricator.wikimedia.org/T417213)
[10:49:34] <wikibugs>	 (03CR) 10MSantos: [C:03+2] Publish public PGP key of Yiannis Giannelos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305151 (https://phabricator.wikimedia.org/T423255) (owner: 10Jgiannelos)
[10:49:59] <wikibugs>	 (03CR) 10MSantos: [C:03+2] mediawiki.org keys.html: Limit height of key code blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305200 (owner: 10Bartosz Dziewoński)
[10:50:41] <claime>	 mbsantos: I just locked scap btw.
[10:50:52] <wikibugs>	 (03Abandoned) 10Jforrester: ExecuteTestAndCacheJob: Don't explode when there are no connected Implementations/Tests [extensions/WikiLambda] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1304563 (https://phabricator.wikimedia.org/T429460) (owner: 10Jforrester)
[10:51:22] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS trixie
[10:52:14] <wikibugs>	 (03Merged) 10jenkins-bot: Remove config related to the API Portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304845 (https://phabricator.wikimedia.org/T429372) (owner: 10Alex Paskulin)
[10:52:20] <wikibugs>	 (03CR) 10FNegri: [C:03+1] eqiad.yaml: Add clouddb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305373 (https://phabricator.wikimedia.org/T409557) (owner: 10Marostegui)
[10:52:46] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] eqiad.yaml: Add clouddb1026 [puppet] - 10https://gerrit.wikimedia.org/r/1305373 (https://phabricator.wikimedia.org/T409557) (owner: 10Marostegui)
[10:53:16] <wikibugs>	 (03Merged) 10jenkins-bot: Publish public PGP key of Yiannis Giannelos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305151 (https://phabricator.wikimedia.org/T423255) (owner: 10Jgiannelos)
[10:53:19] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki.org keys.html: Limit height of key code blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305200 (owner: 10Bartosz Dziewoński)
[10:53:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos)
[10:55:33] <wikibugs>	 (03CR) 10Atsuko: presto: Test resource groups and spill features on the test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[10:57:01] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS trixie
[10:57:03] <wikibugs>	 (03PS1) 10Kosta Harlan: CheckUserGetUsersPager: Fix TypeError for numeric usernames [extensions/CheckUser] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305378 (https://phabricator.wikimedia.org/T429971)
[10:57:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/CheckUser] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305378 (https://phabricator.wikimedia.org/T429971) (owner: 10Kosta Harlan)
[10:59:02] <wikibugs>	 (03PS1) 10Clément Goubert: CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305380 (https://phabricator.wikimedia.org/T429372)
[11:00:05] <jouncebot>	 mvolz: Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1100). Please do the needful.
[11:01:22] <wikibugs>	 (03CR) 10Trueg: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg)
[11:01:40] <wikibugs>	 (03CR) 10Zabe: [C:03+1] CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305380 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[11:01:41] <claime>	 Sooo now https://api.wikimedia.beta.wmcloud.org/wiki/Main_Page# is completely broken but somehow still exists even though it's in the deleted.dblist
[11:01:45] <claime>	 cool cool 
[11:01:55] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305380 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[11:02:25] <wikibugs>	 (03PS12) 10Trueg: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313)
[11:02:53] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305380 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[11:05:24] <wikibugs>	 (03PS18) 10Federico Ceratto: mysql: update replication source [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436)
[11:07:17] <wikibugs>	 (03CR) 10Federico Ceratto: mysql: update replication source (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1238368 (https://phabricator.wikimedia.org/T373436) (owner: 10Federico Ceratto)
[11:09:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] thumbor: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305374 (owner: 10Muehlenhoff)
[11:10:52] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Deploy artest version of ticle-country model on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675)
[11:11:19] <logmsgbot>	 !log jmm@deploy1003 helmfile [staging] START helmfile.d/services/thumbor: apply
[11:11:26] <logmsgbot>	 !log jmm@deploy1003 helmfile [staging] DONE helmfile.d/services/thumbor: apply
[11:20:27] <wikibugs>	 (03CR) 10Ozge: [C:03+1] ml-services: Deploy artest version of ticle-country model on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[11:22:26] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Bump revscoring staging images to 2026-06-23-094330-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305384 (https://phabricator.wikimedia.org/T429675)
[11:23:04] <wikibugs>	 (03PS1) 10Clément Goubert: beta: remove api.wikimedia.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1305385 (https://phabricator.wikimedia.org/T429372)
[11:23:18] <wikibugs>	 (03CR) 10Gkyziridis: "I am not sure if it would be more wise to split it in separated deployments." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305384 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[11:23:26] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy artest version of ticle-country model on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[11:25:37] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy artest version of ticle-country model on staging. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[11:27:01] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Grant sudo privileges for the analytics-fr-tech-users group [puppet] - 10https://gerrit.wikimedia.org/r/1266980 (https://phabricator.wikimedia.org/T417213) (owner: 10Btullis)
[11:27:14] <wikibugs>	 (03CR) 10Zabe: [C:03+1] beta: remove api.wikimedia.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1305385 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[11:27:35] <logmsgbot>	 !log jmm@deploy1003 helmfile [codfw] START helmfile.d/services/thumbor: apply
[11:27:36] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] beta: remove api.wikimedia.beta.wmcloud.org [puppet] - 10https://gerrit.wikimedia.org/r/1305385 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[11:27:57] <wikibugs>	 (03PS3) 10Hnowlan: redis: remove nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/1305075 (https://phabricator.wikimedia.org/T384924)
[11:28:08] <claime>	 btullis: if you catch my change you can merge it
[11:28:30] <hnowlan>	 jouncebot: nowandnext
[11:28:30] <jouncebot>	 For the next 0 hour(s) and 31 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1100)
[11:28:30] <jouncebot>	 In 1 hour(s) and 31 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1300)
[11:28:39] <claime>	 hnowlan: scap is locked atm 
[11:28:47] <claime>	 I'm testing mediawiki-config stuff in beta
[11:28:52] <hnowlan>	 ack, nbd
[11:28:57] <hnowlan>	 I'm gonna roll out a new check for redis
[11:29:01] <claime>	 ack
[11:29:06] <hnowlan>	 so won't conflict
[11:29:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[11:29:54] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan)
[11:32:00] <wikibugs>	 (03Merged) 10jenkins-bot: redis: migrate icinga checks to prometheus [alerts] - 10https://gerrit.wikimedia.org/r/1305072 (https://phabricator.wikimedia.org/T384924) (owner: 10Hnowlan)
[11:33:39] <logmsgbot>	 !log jmm@deploy1003 helmfile [codfw] DONE helmfile.d/services/thumbor: apply
[11:34:08] <logmsgbot>	 !log jmm@deploy1003 helmfile [eqiad] START helmfile.d/services/thumbor: apply
[11:35:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface cr1-eqiad:ae2 (asw2-b-eqiad:ae1) - https://phabricator.wikimedia.org/T429116#12049836 (10Jclark-ctr) I swapped the optic on Switch B2 and also replaced the cable between the core router and the ToR switch.  New Cable ID: G2210253241007206.
[11:36:38] <logmsgbot>	 !log jmm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
[11:37:40] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304652 (owner: 10PipelineBot)
[11:39:41] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[11:39:55] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1304652 (owner: 10PipelineBot)
[11:41:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface cr1-eqiad:ae2 (asw2-b-eqiad:ae1) - https://phabricator.wikimedia.org/T429116#12049844 (10Jclark-ctr) I decided to swap the cable after noticing that the interface errors had increased quite a bit over the past few days. Since the new switch was recent...
[11:42:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Failover url-downloader.eqiad CNAME to one of the new Trixie hosts [dns] - 10https://gerrit.wikimedia.org/r/1305354 (https://phabricator.wikimedia.org/T427282) (owner: 10Muehlenhoff)
[11:42:41] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[11:44:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[11:44:46] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[11:46:21] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] versitygw: Fix service not reloading after certificate change [puppet] - 10https://gerrit.wikimedia.org/r/1305356 (https://phabricator.wikimedia.org/T430023) (owner: 10Jcrespo)
[11:46:38] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[11:46:50] <Dreamy_Jazz>	 jouncebot: nowandnext
[11:46:50] <jouncebot>	 For the next 0 hour(s) and 13 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1100)
[11:46:50] <jouncebot>	 In 1 hour(s) and 13 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1300)
[11:47:14] <claime>	 Dreamy_Jazz: scap is locked
[11:47:19] <Dreamy_Jazz>	 Thanks
[11:47:21] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[11:47:38] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[11:48:11] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet
[11:48:27] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[11:49:28] <logmsgbot>	 !log marostegui@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1026.eqiad.wmnet,service=s1
[11:49:41] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[11:50:21] <logmsgbot>	 !log marostegui@cumin1003 conftool action : set/weight=100; selector: name=clouddb1026.eqiad.wmnet
[11:51:27] <wikibugs>	 (03CR) 10Kosta Harlan: [C:03+1] hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[11:52:04] <claime>	 Dreamy_Jazz: You may know, when running populateHomeDB.php, what wiki should I pass to mwscript?
[11:52:39] <Dreamy_Jazz>	 Not sure, I haven't used that script before
[11:52:46] <claime>	 Ugh
[11:53:15] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::server_depool: Mark ganeti/test as fine to ignore [puppet] - 10https://gerrit.wikimedia.org/r/1305387 (https://phabricator.wikimedia.org/T327300)
[11:53:43] <moritzm>	 !log installing postgresql security updates
[11:53:46] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.network.depool-rack with action 'depool' for codfw rack A6
[11:53:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:02] <claime>	 ok metawiki seems to be the play
[11:56:46] <logmsgbot>	 ayounsi@cumin1003 depool-rack (PID 2584950) is awaiting input
[11:57:00] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM, thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis)
[11:57:02] <Mvolz>	 Is anything alerting for citoid? I only deployed to staging and the SLO is going crazy 
[11:57:14] <Mvolz>	 https://grafana.wikimedia.org/goto/afq3lqap3bdhcd?orgId=1
[11:58:03] <wikibugs>	 (03PS1) 10Marostegui: clouddb1026: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1305388
[11:58:46] <wikibugs>	 (03CR) 10Marostegui: "This is a noop" [puppet] - 10https://gerrit.wikimedia.org/r/1305388 (owner: 10Marostegui)
[11:58:49] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] clouddb1026: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1305388 (owner: 10Marostegui)
[11:59:14] <Mvolz>	 claime: halp :(
[11:59:53] <claime>	 Mvolz: taking a look
[12:00:10] <Mvolz>	 ty
[12:00:42] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy Qwen3.6 model. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis)
[12:01:45] <claime>	 Mvolz: do you see anything in logstash?
[12:01:48] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 23 hosts with reason: Switch maintenance
[12:01:50] <claime>	 pods seem ok
[12:02:01] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12049887 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b3768ce5-4982-4cdb-ac8d-3735e9e5290b) set by ayounsi@cumin1003 for 2:00:00 on 23 host(s) and th...
[12:02:16] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Failover url-downloader.eqiad CNAME to one of the new Trixie hosts" [dns] - 10https://gerrit.wikimedia.org/r/1305389
[12:02:47] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-a6-codfw,lsw1-a6-codfw IPv6,lsw1-a6-codfw.mgmt with reason: Switch maintenance
[12:02:53] <logmsgbot>	 !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.network.depool-rack (exit_code=99) with action 'depool' for codfw rack A6
[12:02:57] <Mvolz>	 claime: 100% of our outgoing response codes are 504s
[12:02:58] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12049888 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ac4b8e18-e54c-4708-804e-e3c84d435ded) set by ayounsi@cumin1003 for 2:00:00 on 3 host(s) and the...
[12:03:04] <Mvolz>	 I think this is related to url-downloader maybe
[12:03:07] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy Qwen3.6 model. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305071 (https://phabricator.wikimedia.org/T425680) (owner: 10Gkyziridis)
[12:03:13] <claime>	 Mvolz: there's a zotero pod misbehaving as well
[12:03:22] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[12:03:45] <Mvolz>	 moritzm: in #wikimedia-serviceops is reverting a url-downloader thing I think
[12:03:50] <claime>	 Ah, may be related to moritzm work on url-downloader
[12:03:52] <claime>	 yeah
[12:03:55] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.mysql.depool depool db2155: rack depool
[12:03:56] <Mvolz>	 ty 
[12:04:15] <Dreamy_Jazz>	 url-downloader issues have caused hCaptcha requests to drop btw
[12:04:15] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2155: rack depool
[12:04:27] <Dreamy_Jazz>	 *specifically those made to the siteverify API from our servers
[12:04:29] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.mysql.depool depool db2156: rack depool
[12:04:33] <Dreamy_Jazz>	 So hCaptcha has gone into failover
[12:04:34] <kostajh>	 what work is being done on urldownloader?
[12:05:00] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2156: rack depool
[12:05:10] <Dreamy_Jazz>	 (The failover state is not ideal as it's not really working)
[12:05:11] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[12:05:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Failover url-downloader.codfw CNAME to one of the new Trixie hosts" [dns] - 10https://gerrit.wikimedia.org/r/1305390
[12:05:56] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2067,2072-2073,2114-2115,2124-2127,2256-2257].codfw.wmnet
[12:06:21] <dcausse>	 !log T423993: closing ttmserver indices in the cirrussearch opensearch cluster (eqiad & codfw)
[12:06:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:26] <stashbot>	 T423993: Upgrade old indices in the CirrusSearch opensearch clusters - https://phabricator.wikimedia.org/T423993
[12:06:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Revert "Failover url-downloader.codfw CNAME to one of the new Trixie hosts" [dns] - 10https://gerrit.wikimedia.org/r/1305390 (owner: 10Muehlenhoff)
[12:07:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Revert "Failover url-downloader.eqiad CNAME to one of the new Trixie hosts" [dns] - 10https://gerrit.wikimedia.org/r/1305389 (owner: 10Muehlenhoff)
[12:07:20] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[12:07:22] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
[12:07:23] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
[12:07:38] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host ml-staging2001.codfw.wmnet
[12:09:16] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[12:09:27] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[12:09:32] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "@mmuhlenhoff@wikimedia.org how can we move forward here? Do you think we can test this approach for some of the collab hosts (etherpad, gi" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[12:10:07] <claime>	 Ok the populateHomeDB.php script takes ages on beta I don't think it's been run forever
[12:10:32] <wikibugs>	 (03PS8) 10Elukey: __init__: modify the management_password property [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699)
[12:10:36] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
[12:11:21] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[12:11:22] <wikibugs>	 (03CR) 10Elukey: __init__: modify the management_password property (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[12:11:34] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add setuptools to sphinx's tox environment [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305355 (owner: 10Elukey)
[12:12:53] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] CheckUserGetUsersPager: Fix TypeError for numeric usernames [extensions/CheckUser] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305378 (https://phabricator.wikimedia.org/T429971) (owner: 10Kosta Harlan)
[12:13:18] <elukey>	 Mvolz: o/ did you get a task for the issue or were you checking the dashboard?
[12:13:52] <Mvolz>	 No it was a total coinidence, I was trying to deploy at the same time and noticed nothing was working
[12:14:28] <Mvolz>	 https://phabricator.wikimedia.org/T381372 re-ups the necessity for this
[12:14:47] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] Enable ULS v2 by default across all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305290 (owner: 10Abijeet Patro)
[12:15:17] <claime>	 I just stumbled upon https://phabricator.wikimedia.org/T316472 while checking the count for users without a gu_home_db and seeing it is 6909463 in prod... cc Reedy zabe 
[12:16:01] <claime>	 Makes me very unsure about running that script in prod
[12:16:07] <Reedy>	 claime: that it's still happening?
[12:16:13] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
[12:16:27] <Reedy>	 should probably re-run that query
[12:16:48] <claime>	 Reedy: One it's still happening, two I'm deleting a wiki so I'm gonna set that field to '' for another 10752
[12:16:52] <Mvolz>	 elukey: we still have a large increase in failures that started around 8:00 utc today
[12:17:05] <Mvolz>	 Those are still happening post url downloader
[12:17:07] <claime>	 Reedy: I *just* checked the count in prod
[12:17:08] <Mvolz>	 two separate incidents?
[12:17:09] <claime>	 Reedy: select COUNT(*) from globaluser where gu_home_db IS NULL OR gu_home_db = "";
[12:17:11] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2067,2072-2073,2114-2115,2124-2127,2256-2257].codfw.wmnet
[12:17:13] <claime>	 Reedy: 6909463
[12:17:16] <jinxer-wm>	 FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[12:17:45] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-staging2001.codfw.wmnet
[12:18:03] <Reedy>	 claime: I mean grouped by year etc
[12:19:13] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM, ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[12:20:11] <claime>	 Reedy: 130k-180k per year starting 2023
[12:20:14] <claime>	 :)
[12:20:55] <claime>	 Reedy: exact split https://phabricator.wikimedia.org/T418494#12049939
[12:21:09] <wikibugs>	 (03CR) 10Elukey: [C:03+2] __init__: modify the management_password property (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304753 (https://phabricator.wikimedia.org/T429699) (owner: 10Elukey)
[12:21:43] <icinga-wm>	 PROBLEM - Host gitlab-replica-b.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[12:22:07] <icinga-wm>	 PROBLEM - BFD status on ssw1-a1-codfw.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:22:27] <icinga-wm>	 PROBLEM - BFD status on ssw1-a8-codfw.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:22:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between ssw1-a8-codfw and lsw1-a6-codfw (10.192.252.8) - group EVPN_IBGP - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=ssw1-a8-codfw:9804&var-bgp_group=EVPN_IBGP&var-bgp_neighbor=lsw1-a6-codfw - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:22:40] <jinxer-wm>	 FIRING: [12x] ProbeDown: Service aqs2001-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:23:00] <claime>	 Reedy: I mean I can do the UPDATE to set to null and not reassign a home wiki as well
[12:23:25] <claime>	 I need to move forwards with at least the localuser changes because it's breaking GlobalWatchlist
[12:23:44] <claime>	 So I'm gonna consider my beta tests successful and do that
[12:23:51] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-a1-codfw:et-0/0/5 (Core: lsw1-a6-codfw:et-0/0/55 {#230403800020}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[12:23:53] <logmsgbot>	 !log cgoubert@deploy1003 Unlocked for deployment [ALL REPOSITORIES]: Testing apiportalwiki deletion in beta - T418494 (duration: 98m 50s)
[12:23:57] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[12:26:24] <claime>	 mbsantos: that's gonna pull your and matmarex's patches
[12:26:54] <logmsgbot>	 !log cgoubert@deploy1003 Started scap sync-world: Backport for [[gerrit:1304845|Remove config related to the API Portal (T429372 T418494)]], [[gerrit:1305380|CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org (T429372 T418494)]]
[12:27:00] <stashbot>	 T429372: Remove API Portal from WMF MediaWiki config - https://phabricator.wikimedia.org/T429372
[12:27:11] <wikibugs>	 (03CR) 10Muehlenhoff: profile::reboot::unattended: add class to mark hosts for unattended reboots (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[12:27:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between ssw1-a1-codfw and lsw1-a6-codfw (10.192.252.8) - group EVPN_IBGP - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:27:40] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs2001-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:29:02] <logmsgbot>	 !log cgoubert@deploy1003 apaskulin, cgoubert: Backport for [[gerrit:1304845|Remove config related to the API Portal (T429372 T418494)]], [[gerrit:1305380|CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org (T429372 T418494)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[12:29:07] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[12:29:41] <jinxer-wm>	 FIRING: [5x] JobUnavailable: Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:31:10] <logmsgbot>	 !log cgoubert@deploy1003 apaskulin, cgoubert: Continuing with deployment
[12:31:23] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host db1208.eqiad.wmnet
[12:31:44] <wikibugs>	 (03PS1) 10Anzx: csbwiki: update logo, wordmark and tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304695 (https://phabricator.wikimedia.org/T429126)
[12:31:59] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304695 (https://phabricator.wikimedia.org/T429126) (owner: 10Anzx)
[12:33:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304001 (https://phabricator.wikimedia.org/T427917) (owner: 10Valn_ilyo)
[12:33:07] <icinga-wm>	 RECOVERY - BFD status on ssw1-a1-codfw.mgmt is OK: UP: 17 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:33:27] <icinga-wm>	 RECOVERY - BFD status on ssw1-a8-codfw.mgmt is OK: UP: 17 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:33:51] <jinxer-wm>	 RESOLVED: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-a1-codfw:et-0/0/5 (Core: lsw1-a6-codfw:et-0/0/55 {#230403800020}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[12:33:55] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host ml-staging2001.codfw.wmnet
[12:33:57] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-staging2001.codfw.wmnet
[12:34:06] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
[12:34:07] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
[12:34:41] <jinxer-wm>	 RESOLVED: [5x] JobUnavailable: Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:35:22] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.mysql.pool pool db2155: rack depool
[12:35:28] <logmsgbot>	 !log cgoubert@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304845|Remove config related to the API Portal (T429372 T418494)]], [[gerrit:1305380|CommonSettings-labs: Remove api.wikimedia.beta.wmcloud.org (T429372 T418494)]] (duration: 08m 34s)
[12:35:35] <stashbot>	 T429372: Remove API Portal from WMF MediaWiki config - https://phabricator.wikimedia.org/T429372
[12:35:35] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[12:35:59] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2067,2072-2073,2114-2115,2124-2127,2256-2257].codfw.wmnet
[12:36:06] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2067,2072-2073,2114-2115,2124-2127,2256-2257].codfw.wmnet
[12:36:42] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.mysql.pool pool db2156: rack depool
[12:36:50] <claime>	 marostegui: head's up that I'm going to do the localuser and localnames db changes for apiportalwiki deletion
[12:36:55] <icinga-wm>	 RECOVERY - Host gitlab-replica-b.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 30.43 ms
[12:37:37] <marostegui>	 claime: cool, I'll be around if you need me
[12:37:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between ssw1-a1-codfw and lsw1-a6-codfw (10.192.252.8) - group EVPN_IBGP - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[12:37:40] <jinxer-wm>	 RESOLVED: [16x] ProbeDown: Service aqs2001-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:37:45] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[12:37:45] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893
[12:37:52] <stashbot>	 T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893
[12:38:05] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1238: Upgrading db1238.eqiad.wmnet
[12:38:30] <claime>	 !log Deleting apiportalwiki references in localuser table - T418494
[12:38:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:45] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1238: Upgrading db1238.eqiad.wmnet
[12:38:54] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[12:38:54] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@codfw T429893
[12:39:14] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Deploy the latest version of article-country model on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305392 (https://phabricator.wikimedia.org/T429675)
[12:39:14] <claime>	 Haha of course mysql.php connects to a read-only replica
[12:39:15] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2206: Upgrading db2206.codfw.wmnet
[12:39:37] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2206: Upgrading db2206.codfw.wmnet
[12:39:55] <claime>	 marostegui: Do I need to connect to a mariadb server directly or something?
[12:39:56] <logmsgbot>	 !log ayounsi@cumin1003 END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2155: rack depool
[12:40:40] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp2043.codfw.wmnet
[12:40:41] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2043.codfw.wmnet
[12:40:47] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp2044.codfw.wmnet
[12:40:48] <wikibugs>	 (03CR) 10Zaidusyy: "I tried to submit this upstream to Google Gerrit, but my Google account is bugged and giving me a 403 Permission Denied error even after s" [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1305218 (https://phabricator.wikimedia.org/T429901) (owner: 10Zaidusyy)
[12:40:48] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2044.codfw.wmnet
[12:41:44] <logmsgbot>	 !log filippo@cumin1003 START - Cookbook sre.dns.netbox
[12:41:45] <logmsgbot>	 cwilliams@cumin1003 major-upgrade (PID 2591934) is awaiting input
[12:42:09] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, and 3 others: codfw: rack A6 maintenance - https://phabricator.wikimedia.org/T429812#12049997 (10ayounsi) 05Open→03Resolved All done, and all services re-pooled.
[12:42:30] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2026-06-18-181627 to 2026-06-23-135458 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305394 (https://phabricator.wikimedia.org/T416144)
[12:42:33] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2026-06-17-182805 to 2026-06-23-115555 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305395 (https://phabricator.wikimedia.org/T416144)
[12:42:37] <logmsgbot>	 cwilliams@cumin1003 major-upgrade (PID 2591995) is awaiting input
[12:42:48] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Double memory for evaluators from 1G to 2G [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305396
[12:42:49] <claime>	 Ah there's a --write option
[12:42:50] <claime>	 ofc
[12:42:56] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db2155: repool after rack maintenance
[12:44:33] <claime>	 !log Deleting apiportalwiki references in localnames table - T418494
[12:44:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:38] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[12:44:45] <fabfur>	 !log repooling cp2043 and cp2044 after reimage (T419825)
[12:44:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:49] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[12:44:50] <wikibugs>	 (03PS2) 10Gerrit maintenance bot: wmnet: Update x3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1296511 (https://phabricator.wikimedia.org/T427895)
[12:45:07] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp2043.*
[12:45:11] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp2044.*
[12:45:20] <wikibugs>	 (03CR) 10CWilliams: [C:03+2] wmnet: Update x3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1296511 (https://phabricator.wikimedia.org/T427895) (owner: 10Gerrit maintenance bot)
[12:45:48] <wikibugs>	 (03PS4) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112)
[12:45:48] <wikibugs>	 (03PS4) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[12:46:18] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: disable awslc on esams hosts [puppet] - 10https://gerrit.wikimedia.org/r/1305132 (https://phabricator.wikimedia.org/T419825) (owner: 10Fabfur)
[12:46:24] <logmsgbot>	 !log filippo@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for dumps-nfs - filippo@cumin1003"
[12:46:26] <logmsgbot>	 !log cwilliams@dns1005 START - running authdns-update
[12:46:29] <logmsgbot>	 !log filippo@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for dumps-nfs - filippo@cumin1003"
[12:46:29] <logmsgbot>	 !log filippo@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:46:51] <fabfur>	 !log depooling cp3066 and cp3074 to reimage (T419825)
[12:46:53] <logmsgbot>	 elukey@cumin1003 provision (PID 2592388) is awaiting input
[12:46:55] <claime>	 !log Setting globaluser gu_home_db to NULL for apiportalwiki globalusers - T418494
[12:46:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:12] <logmsgbot>	 !log filippo@cumin1003 START - Cookbook sre.dns.netbox
[12:47:19] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp3066.*
[12:47:22] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp3066.*
[12:47:26] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp3074.*
[12:47:33] <wikibugs>	 (03PS2) 10Ayounsi: tox: add python 3.14 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345
[12:47:33] <wikibugs>	 (03PS6) 10Ayounsi: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554
[12:47:39] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:48:23] <claime>	 !log Deleting apiportalwiki references in GlobalUsage - T418494
[12:48:23] <logmsgbot>	 !log cwilliams@dns1005 END - running authdns-update
[12:48:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:00] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS trixie
[12:49:02] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.reimage for host cp3074.esams.wmnet with OS trixie
[12:49:26] <wikibugs>	 (03CR) 10Ayounsi: tox: add python 3.14 (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[12:51:33] <logmsgbot>	 !log filippo@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for dumps-nfs - filippo@cumin1003"
[12:51:37] <logmsgbot>	 !log filippo@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for dumps-nfs - filippo@cumin1003"
[12:51:37] <logmsgbot>	 !log filippo@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:53:22] <marostegui>	 claime: some lag showing up in codfw
[12:53:26] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:53:34] <marostegui>	 but it should recover soon, let me give it some downtime to avoid the p4ge
[12:53:38] <claime>	 ack
[12:54:25] <marostegui>	 claime: is it finished from your side?
[12:54:29] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: maintenance
[12:54:34] <claime>	 yep I'm done with direct db edits
[12:54:47] <marostegui>	 claime: oki, cool, I am still keeping an eye for the lag
[12:54:59] <marostegui>	 should recover soon
[12:55:23] <wikibugs>	 (03CR) 10Reedy: "CC paladox" [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1305218 (https://phabricator.wikimedia.org/T429901) (owner: 10Zaidusyy)
[12:55:36] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:56:53] <wikibugs>	 (03PS1) 10Marostegui: production-m3.sql.erb: Remove old grant [puppet] - 10https://gerrit.wikimedia.org/r/1305399 (https://phabricator.wikimedia.org/T423727)
[12:57:40] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] "This is a noop until removed across the dbs directly." [puppet] - 10https://gerrit.wikimedia.org/r/1305399 (https://phabricator.wikimedia.org/T423727) (owner: 10Marostegui)
[12:57:42] <wikibugs>	 (03CR) 10Marostegui: [V:03+2 C:03+2] production-m3.sql.erb: Remove old grant [puppet] - 10https://gerrit.wikimedia.org/r/1305399 (https://phabricator.wikimedia.org/T423727) (owner: 10Marostegui)
[12:58:10] <wikibugs>	 (03CR) 10Btullis: presto: Test resource groups and spill features on the test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[12:59:01] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[12:59:17] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[13:00:05] <jouncebot>	 Lucas_WMDE, urbanecm, and TheresNoTime: That opportune time for a UTC afternoon backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1300).
[13:00:05] <jouncebot>	 nemo-yiannis, kostajh, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:13] <wikibugs>	 (03PS1) 10Filippo Giunchedi: conftool-data: add dumps-nfs [puppet] - 10https://gerrit.wikimedia.org/r/1305402 (https://phabricator.wikimedia.org/T411248)
[13:00:15] <anzx>	 o/
[13:00:16] <nemo-yiannis>	 👋
[13:00:17] <wikibugs>	 (03PS1) 10Filippo Giunchedi: dumps: open nfs port to lb healthchecks [puppet] - 10https://gerrit.wikimedia.org/r/1305403 (https://phabricator.wikimedia.org/T411248)
[13:00:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: add dumps-nfs service in service_setup state [puppet] - 10https://gerrit.wikimedia.org/r/1305404 (https://phabricator.wikimedia.org/T411248)
[13:00:23] <wikibugs>	 (03PS1) 10Filippo Giunchedi: dumps: add dumps-nfs service pool [puppet] - 10https://gerrit.wikimedia.org/r/1305405 (https://phabricator.wikimedia.org/T411248)
[13:00:37] <kostajh>	 hi
[13:00:48] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1238.eqiad.wmnet with OS trixie
[13:01:07] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[13:01:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/CheckUser] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305378 (https://phabricator.wikimedia.org/T429971) (owner: 10Kosta Harlan)
[13:01:32] <Dreamy_Jazz>	 \o
[13:01:59] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wikimedia.org: add dumps-nfs [dns] - 10https://gerrit.wikimedia.org/r/1305406 (https://phabricator.wikimedia.org/T411248)
[13:02:22] <wikibugs>	 (03PS1) 10Dreamy Jazz: Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848)
[13:02:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:02:33] <wikibugs>	 (03PS1) 10Dreamy Jazz: Create ConfirmEditGetGlobalInstanceFromContext hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305408 (https://phabricator.wikimedia.org/T429848)
[13:02:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:02:54] <wikibugs>	 (03CR) 10Dreamy Jazz: "recheck" [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:03:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:03:19] <marostegui>	 claime: all fine
[13:03:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305408 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:03:28] <claime>	 marostegui: ack, thanks
[13:03:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10observability: Q3:rack/setup/install kafka-logging200[6-8] - https://phabricator.wikimedia.org/T418931#12050131 (10elukey) @Jhancock.wm I was able to provision 2006 and 2007 with the new cookbook that I am testing, but I get connection timeouts to kafka-logging2008.s BMC. Is...
[13:03:53] <wikibugs>	 (03Merged) 10jenkins-bot: CheckUserGetUsersPager: Fix TypeError for numeric usernames [extensions/CheckUser] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305378 (https://phabricator.wikimedia.org/T429971) (owner: 10Kosta Harlan)
[13:04:05] <wikibugs>	 (03CR) 10Elukey: [C:03+1] tox: add python 3.14 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[13:04:29] <kostajh>	 scap is failing 
[13:04:33] <kostajh>	 https://spiderpig.wikimedia.org/jobs/2389
[13:04:46] <kostajh>	 13:04:04 prep failed: <FailedCommand> Command 'git checkout --force -B master origin/master' failed with exit code 128
[13:04:48] <claime>	 kostajh: looking
[13:04:55] <kostajh>	 claime: any ideas about this? 
[13:04:59] <kostajh>	 :) 
[13:05:00] <logmsgbot>	 cwilliams@cumin1003 major-upgrade (PID 2591995) is awaiting input
[13:05:12] <kostajh>	 ty
[13:05:42] <claime>	 kostajh: try again I think we just race-conditioned
[13:05:56] <kostajh>	 ok, trying again
[13:06:22] <logmsgbot>	 !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1305378|CheckUserGetUsersPager: Fix TypeError for numeric usernames (T429971)]]
[13:06:27] <stashbot>	 T429971: TypeError: CheckUserGetUsersPager::formatUserRow(): Argument #1 ($user_text) must be of type string, int given - https://phabricator.wikimedia.org/T429971
[13:06:50] <kostajh>	 seems to be working now, thanks
[13:06:57] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2206.codfw.wmnet with OS trixie
[13:07:11] <wikibugs>	 (03CR) 10Kosta Harlan: [C:03+1] Create ConfirmEditGetGlobalInstanceFromContext hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305408 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:07:12] <claime>	 yeah I was testing something in mediawiki-staging and basically you ran scap just as I reset HEAD^
[13:07:31] <claime>	 Sorry about that :D
[13:08:26] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1305378|CheckUserGetUsersPager: Fix TypeError for numeric usernames (T429971)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:11:23] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Continuing with deployment
[13:13:59] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
[13:14:55] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
[13:15:39] <logmsgbot>	 !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305378|CheckUserGetUsersPager: Fix TypeError for numeric usernames (T429971)]] (duration: 09m 17s)
[13:15:44] <stashbot>	 T429971: TypeError: CheckUserGetUsersPager::formatUserRow(): Argument #1 ($user_text) must be of type string, int given - https://phabricator.wikimedia.org/T429971
[13:15:53] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
[13:16:54] <Dreamy_Jazz>	 Who's next
[13:17:19] <anzx>	 i need someone to deploy mine
[13:17:49] <Dreamy_Jazz>	 nemo-yiannis: What about yours?
[13:18:02] <nemo-yiannis>	 i can deploy mine
[13:18:14] <Dreamy_Jazz>	 Do you want to go then, as you are at the top of the list
[13:18:17] <nemo-yiannis>	 ok
[13:18:25] <wikibugs>	 (03PS2) 10MSantos: Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201
[13:18:34] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos)
[13:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[13:19:28] <wikibugs>	 (03Merged) 10jenkins-bot: Disable parser survey for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1302201 (owner: 10MSantos)
[13:19:57] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
[13:20:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:21:03] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[13:21:08] <nemo-yiannis>	 ok merged, deploying mine
[13:21:48] <logmsgbot>	 !log jgiannelos@deploy1003 Started scap sync-world: Backport for [[gerrit:1302201|Disable parser survey for all wikis]]
[13:22:15] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2156: rack depool
[13:23:52] <logmsgbot>	 !log jgiannelos@deploy1003 mbsantos, jgiannelos: Backport for [[gerrit:1302201|Disable parser survey for all wikis]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:24:42] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
[13:26:15] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2206.codfw.wmnet with reason: host reimage
[13:26:47] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Platform-SRE, and 5 others: codfw: rack B2 maintenance 2026-07-01 11:00 am CT - https://phabricator.wikimedia.org/T429861#12050252 (10ayounsi)
[13:28:25] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2155: repool after rack maintenance
[13:28:46] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
[13:29:04] <wikibugs>	 (03PS5) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112)
[13:29:04] <wikibugs>	 (03PS5) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[13:29:54] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[13:30:46] <wikibugs>	 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12050271 (10hashar) The pages got generated. I went to purge the example page: ` $ mwscript purgeList --wiki=aawiki https://docker-registry.wikimedia.org/releng/node22-...
[13:30:56] <wikibugs>	 07Puppet, 06Release-Engineering-Team: registry-homepage-builder.py doesn't sort images as expected - https://phabricator.wikimedia.org/T388287#12050274 (10hashar) 05Open→03Resolved
[13:31:51] <wikibugs>	 06SRE: hcaptcha failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430045 (10MoritzMuehlenhoff) 03NEW
[13:32:10] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[13:32:36] <wikibugs>	 06SRE, 10hCaptcha, 06Product Safety and Integrity: hcaptcha failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430045#12050301 (10kostajh)
[13:32:50] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2206.codfw.wmnet with reason: host reimage
[13:33:05] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] tox: add python 3.14 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[13:33:11] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 (owner: 10Ayounsi)
[13:33:36] <logmsgbot>	 !log jgiannelos@deploy1003 mbsantos, jgiannelos: Continuing with deployment
[13:33:38] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] profile::server_depool: Mark ganeti/test as fine to ignore [puppet] - 10https://gerrit.wikimedia.org/r/1305387 (https://phabricator.wikimedia.org/T327300) (owner: 10Muehlenhoff)
[13:33:54] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add depool policy for VTRS [puppet] - 10https://gerrit.wikimedia.org/r/1305350 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi)
[13:36:52] <wikibugs>	 06SRE, 10hCaptcha, 06Product Safety and Integrity: hcaptcha failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430045#12050343 (10MLechvien-WMF)
[13:37:07] <wikibugs>	 (03PS1) 10Ayounsi: profile::server_depool for memcache and k8s master [puppet] - 10https://gerrit.wikimedia.org/r/1305422 (https://phabricator.wikimedia.org/T327300)
[13:37:50] <logmsgbot>	 !log jgiannelos@deploy1003 Finished scap sync-world: Backport for [[gerrit:1302201|Disable parser survey for all wikis]] (duration: 16m 01s)
[13:37:52] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] "self merging as noop" [puppet] - 10https://gerrit.wikimedia.org/r/1305422 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi)
[13:37:55] <wikibugs>	 (03Merged) 10jenkins-bot: tox: add python 3.14 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1305345 (owner: 10Ayounsi)
[13:37:56] <wikibugs>	 (03Merged) 10jenkins-bot: netbox: add a BGP getter/setter [software/spicerack] - 10https://gerrit.wikimedia.org/r/1304554 (owner: 10Ayounsi)
[13:38:05] <nemo-yiannis>	 okay done
[13:40:02] <wikibugs>	 (03PS21) 10Ayounsi: diffscan: pynotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[13:40:13] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] diffscan: pynotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[13:41:14] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] diffscan: pynotnify (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634572 (https://phabricator.wikimedia.org/T415347) (owner: 10Jbond)
[13:41:16] <Dreamy_Jazz>	 Okay, up next
[13:41:46] <Dreamy_Jazz>	 I'll take a look at the config patches
[13:42:10] <wikibugs>	 (03PS6) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112)
[13:42:10] <wikibugs>	 (03PS6) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[13:45:12] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Migrate diffscan VM to Trixie - https://phabricator.wikimedia.org/T415347#12050399 (10ayounsi) Updated script merged, old instance powered down, the v4 public IP needs to be moved but that's not a blocker. Then I'll monitor for a few days.
[13:45:31] <wikibugs>	 (03PS4) 10Dreamy Jazz: hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848)
[13:45:46] <wikibugs>	 06SRE, 10SRE-swift-storage, 07Essential-Work, 13Patch-For-Review: Migrate production swift clusters to trixie - https://phabricator.wikimedia.org/T429630#12050411 (10MatthewVernon)
[13:45:53] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1238.eqiad.wmnet with OS trixie
[13:46:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304001 (https://phabricator.wikimedia.org/T427917) (owner: 10Valn_ilyo)
[13:46:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304695 (https://phabricator.wikimedia.org/T429126) (owner: 10Anzx)
[13:46:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:46:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:46:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305408 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:46:52] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3074.esams.wmnet with OS trixie
[13:47:31] <wikibugs>	 (03Merged) 10jenkins-bot: Fix autonym for Khasi (kha) in wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304001 (https://phabricator.wikimedia.org/T427917) (owner: 10Valn_ilyo)
[13:47:35] <wikibugs>	 (03Merged) 10jenkins-bot: csbwiki: update logo, wordmark and tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304695 (https://phabricator.wikimedia.org/T429126) (owner: 10Anzx)
[13:47:38] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Enable for Special:Contact [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1304919 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:48:28] <wikibugs>	 (03PS5) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[13:49:08] <wikibugs>	 (03PS7) 10Btullis: presto: Test resource groups and spill features on the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112)
[13:49:08] <wikibugs>	 (03PS7) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[13:49:41] <wikibugs>	 (03Merged) 10jenkins-bot: Create ConfirmEditGetGlobalInstanceFromContext hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305408 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:49:43] <wikibugs>	 (03Merged) 10jenkins-bot: Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305407 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:50:17] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1304001|Fix autonym for Khasi (kha) in wmgExtraLanguageNames (T427917)]], [[gerrit:1304695|csbwiki: update logo, wordmark and tagline (T429126)]], [[gerrit:1305407|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1304919|hCaptcha: Enable for Special:Contact (T429848)]], [[gerrit:1305408|Create ConfirmEditGetGlobalInstanc
[13:50:17] <logmsgbot>	 eFromContext hook (T429848)]]
[13:50:22] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2206.codfw.wmnet with OS trixie
[13:50:25] <stashbot>	 T427917: Add monolingual language code kha (khasi language) - https://phabricator.wikimedia.org/T427917
[13:50:26] <stashbot>	 T429126: Change name of Kashubian Wikipedia from Wikipedijô to Wikipediô - https://phabricator.wikimedia.org/T429126
[13:50:26] <stashbot>	 T429848: hCaptcha: Use hCaptcha for contact pages on metawiki - https://phabricator.wikimedia.org/T429848
[13:50:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10observability: Q3:rack/setup/install kafka-logging200[6-8] - https://phabricator.wikimedia.org/T418931#12050429 (10Jhancock.wm) @elukey i checked the cabling. reseated everything and rebooted the server. it still looked fine. I double checked the luggage tag and i had the wr...
[13:51:15] <wikibugs>	 (03CR) 10Btullis: presto: Test resource groups and spill features on the test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[13:51:24] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3066.esams.wmnet with OS trixie
[13:51:47] <wikibugs>	 06SRE, 10Citoid: citoid failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430053 (10MoritzMuehlenhoff) 03NEW
[13:52:24] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz, valn-ilyo, anzx: Backport for [[gerrit:1304001|Fix autonym for Khasi (kha) in wmgExtraLanguageNames (T427917)]], [[gerrit:1304695|csbwiki: update logo, wordmark and tagline (T429126)]], [[gerrit:1305407|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1304919|hCaptcha: Enable for Special:Contact (T429848)]], [[gerrit:1305408|Create ConfirmEditGetGlobalIns
[13:52:24] <logmsgbot>	 tanceFromContext hook (T429848)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:52:39] <anzx>	 checking
[13:52:42] <Dreamy_Jazz>	 Thanks
[13:53:29] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 5 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[13:53:39] <wikibugs>	 (03CR) 10Jelto: [V:03+1] profile::base::reboot_unattended: add class to mark hosts for unattended reboots (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[13:53:43] <anzx>	 Dreamy_Jazz: looks good, ok to sync 
[13:54:41] <Dreamy_Jazz>	 Thanks, I'm still testing mine
[13:55:10] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz, valn-ilyo, anzx: Continuing with deployment
[13:55:30] <wikibugs>	 (03PS1) 10Dreamy Jazz: Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305430 (https://phabricator.wikimedia.org/T429848)
[13:55:43] <wikibugs>	 (03PS1) 10Dreamy Jazz: Create ConfirmEditGetGlobalInstanceFromContext hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305431 (https://phabricator.wikimedia.org/T429848)
[13:55:50] <wikibugs>	 (03CR) 10Muehlenhoff: profile::base::reboot_unattended: add class to mark hosts for unattended reboots (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[13:55:50] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305430 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:56:21] <wikibugs>	 (03CR) 10Dreamy Jazz: "recheck" [extensions/ContactPage] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305430 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[13:56:28] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[13:56:36] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[13:57:04] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[13:59:30] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1304001|Fix autonym for Khasi (kha) in wmgExtraLanguageNames (T427917)]], [[gerrit:1304695|csbwiki: update logo, wordmark and tagline (T429126)]], [[gerrit:1305407|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1304919|hCaptcha: Enable for Special:Contact (T429848)]], [[gerrit:1305408|Create ConfirmEditGetGlobalInstan
[13:59:30] <logmsgbot>	 ceFromContext hook (T429848)]] (duration: 09m 13s)
[13:59:38] <stashbot>	 T427917: Add monolingual language code kha (khasi language) - https://phabricator.wikimedia.org/T427917
[13:59:39] <stashbot>	 T429126: Change name of Kashubian Wikipedia from Wikipedijô to Wikipediô - https://phabricator.wikimedia.org/T429126
[13:59:39] <stashbot>	 T429848: hCaptcha: Use hCaptcha for contact pages on metawiki - https://phabricator.wikimedia.org/T429848
[14:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1400)
[14:00:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ContactPage] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305430 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[14:00:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305431 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[14:01:36] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1238: Migration of db1238.eqiad.wmnet completed
[14:01:45] <James_F>	 Dreamy_Jazz: Don't worry, we're not using the MW side of the deployment window.
[14:02:27] <Dreamy_Jazz>	 Thanks, needed to backport to wmf.7 but only realised during the test stage :D
[14:02:42] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2026-06-18-181627 to 2026-06-23-135458 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305394 (https://phabricator.wikimedia.org/T416144) (owner: 10Jforrester)
[14:02:50] <James_F>	 Always the way!
[14:03:02] <Dreamy_Jazz>	 Maybe we need the backport windows to be 24 hours long, then there is never a worry about going over :D
[14:03:22] <wikibugs>	 (03PS3) 10Btullis: Remove the job that synced the phab dumps to the clouddumps servers [puppet] - 10https://gerrit.wikimedia.org/r/1245419 (https://phabricator.wikimedia.org/T417824)
[14:04:12] <anzx>	 Dreamy_Jazz: Thanks for deploying 
[14:04:16] <Dreamy_Jazz>	 Np
[14:04:35] <wikibugs>	 (03PS13) 10Btullis: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg)
[14:04:58] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2026-06-18-181627 to 2026-06-23-135458 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305394 (https://phabricator.wikimedia.org/T416144) (owner: 10Jforrester)
[14:05:01] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp3066.esams.wmnet
[14:05:01] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3066.esams.wmnet
[14:05:08] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp3074.esams.wmnet
[14:05:09] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3074.esams.wmnet
[14:05:28] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2206: Migration of db2206.codfw.wmnet completed
[14:06:35] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:07:30] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:08:41] <wikibugs>	 (03Merged) 10jenkins-bot: Create ConfirmEditGetGlobalInstanceFromContext hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305431 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[14:08:43] <wikibugs>	 (03Merged) 10jenkins-bot: Handle the ConfirmEditGetGlobalInstanceFromContext hook [extensions/ContactPage] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305430 (https://phabricator.wikimedia.org/T429848) (owner: 10Dreamy Jazz)
[14:08:59] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:09:13] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305430|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1305431|Create ConfirmEditGetGlobalInstanceFromContext hook (T429848)]]
[14:09:18] <stashbot>	 T429848: hCaptcha: Use hCaptcha for contact pages on metawiki - https://phabricator.wikimedia.org/T429848
[14:09:35] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:09:42] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:10:06] <wikibugs>	 (03PS1) 10Kamila Součková: shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404)
[14:10:10] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:10:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[14:11:10] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Upgrade orchestrator from 2026-06-17-182805 to 2026-06-23-115555 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305395 (https://phabricator.wikimedia.org/T416144) (owner: 10Jforrester)
[14:11:18] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1305430|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1305431|Create ConfirmEditGetGlobalInstanceFromContext hook (T429848)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:12:08] <wikibugs>	 (03PS2) 10Kamila Součková: shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404)
[14:12:47] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment
[14:13:23] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2026-06-17-182805 to 2026-06-23-115555 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305395 (https://phabricator.wikimedia.org/T416144) (owner: 10Jforrester)
[14:14:01] <wikibugs>	 (03Abandoned) 10Bernard Wang: Restore menu tab underline style [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305191 (https://phabricator.wikimedia.org/T428519) (owner: 10Jdlrobson)
[14:14:23] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:14:28] <icinga-wm>	 PROBLEM - Host db1208 is DOWN: PING CRITICAL - Packet loss = 100%
[14:14:41] <wikibugs>	 (03Restored) 10Bernard Wang: Restore menu tab underline style [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305191 (https://phabricator.wikimedia.org/T428519) (owner: 10Jdlrobson)
[14:15:00] <wikibugs>	 (03CR) 10Bernard Wang: [C:03+1] "sorry! didn’t realize this was a backport patch" [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305191 (https://phabricator.wikimedia.org/T428519) (owner: 10Jdlrobson)
[14:15:04] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:15:32] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:15:56] <icinga-wm>	 RECOVERY - Host db1208 is UP: PING OK - Packet loss = 0%, RTA = 7.76 ms
[14:16:00] <icinga-wm>	 PROBLEM - MariaDB read only matomo on db1208 is CRITICAL: Could not connect to localhost:3351 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[14:16:10] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:16:10] <icinga-wm>	 RECOVERY - MariaDB read only analytics_meta on db1208 is OK: Version 10.6.18-MariaDB-log, Uptime 21s, read_only: True, event_scheduler: True, 2582.64 QPS, connection latency: 0.041477s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[14:16:17] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:16:34] <icinga-wm>	 RECOVERY - MariaDB disk space on db1208 is OK: DISK OK https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[14:16:34] <icinga-wm>	 RECOVERY - MariaDB Replica IO: analytics_meta on db1208 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:34] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: matomo on db1208 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:34] <icinga-wm>	 PROBLEM - mysqld processes on db1208 is CRITICAL: PROCS CRITICAL: 1 process with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[14:16:34] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: matomo on db1208 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:35] <icinga-wm>	 PROBLEM - MariaDB Replica IO: matomo on db1208 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:36] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: analytics_meta on db1208 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:36] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: analytics_meta on db1208 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 42341.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:16:59] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:17:01] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305430|Handle the ConfirmEditGetGlobalInstanceFromContext hook (T429848)]], [[gerrit:1305431|Create ConfirmEditGetGlobalInstanceFromContext hook (T429848)]] (duration: 07m 48s)
[14:17:06] <stashbot>	 T429848: hCaptcha: Use hCaptcha for contact pages on metawiki - https://phabricator.wikimedia.org/T429848
[14:17:23] <wikibugs>	 (03PS1) 10Slyngshede: P:cache::haproxy image provenance hashing [puppet] - 10https://gerrit.wikimedia.org/r/1305433 (https://phabricator.wikimedia.org/T414338)
[14:17:30] <Dreamy_Jazz>	 !log Afternoon UTC backport window done
[14:17:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:34] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: analytics_meta on db1208 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:19:25] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Double memory for evaluators from 1G to 2G [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305396 (owner: 10Jforrester)
[14:19:42] <wikibugs>	 (03PS2) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[14:20:16] <wikibugs>	 (03PS2) 10Klausman: hiera: Switch ml-staging k8s to Maglev LVS config [puppet] - 10https://gerrit.wikimedia.org/r/1305397 (https://phabricator.wikimedia.org/T420438)
[14:21:42] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Double memory for evaluators from 1G to 2G [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305396 (owner: 10Jforrester)
[14:22:34] <wikibugs>	 (03PS6) 10Jelto: profile::base: add parameter to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[14:23:03] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Adding Jesse to approvers for Bitu - https://phabricator.wikimedia.org/T430059 (10MoritzMuehlenhoff) 03NEW
[14:23:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.86% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:24:45] <wikibugs>	 (03PS1) 10Eric Gardner: Restore the per-reader opt-out for the mobile image carousel [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305436 (https://phabricator.wikimedia.org/T419786)
[14:24:55] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:24:56] <wikibugs>	 (03PS1) 10Eric Gardner: Restore the per-reader opt-out for the mobile image carousel [extensions/MultimediaViewer] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305437 (https://phabricator.wikimedia.org/T419786)
[14:24:59] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:25:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface cr1-eqiad:ae2 (asw2-b-eqiad:ae1) - https://phabricator.wikimedia.org/T429116#12050708 (10Jclark-ctr) 05Open→03Resolved graph looks to have cleaned up since replacement of cable and optic.  Closing ticket for now  {F90294556}
[14:25:23] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:25:27] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:25:37] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:25:42] <matthiasmullie>	 James_F: we have another urgent thing, once again coinciding with wikifunctions services deployment slot; would it be fine to do another emergency scap once again?
[14:25:52] <fabfur>	 !log repooling cp3066 and cp3074 after reimage (T419825)
[14:25:53] <James_F>	 matthiasmullie: go for it.
[14:25:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:57] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[14:26:00] <EricGardner>	 I came here to ask the same thing
[14:26:00] <James_F>	 We're just doing a last services push.
[14:26:04] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp3066.*
[14:26:07] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp3066.*
[14:26:07] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:26:12] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp3074.*
[14:26:12] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2003 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:26:14] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:26:21] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp3066.esams.wmnet
[14:26:22] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3066.esams.wmnet
[14:26:28] <logmsgbot>	 !log fabfur@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp3074.esams.wmnet
[14:26:29] <logmsgbot>	 !log fabfur@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3074.esams.wmnet
[14:26:49] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:27:01] <matthiasmullie>	 James_F: thanks!
[14:27:42] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 8): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8771/co" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[14:28:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.86% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:29:37] <wikibugs>	 (03CR) 10Jforrester: "Yay." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[14:30:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1400)
[14:30:05] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1430)
[14:31:09] <wikibugs>	 (03CR) 10Jelto: [V:03+1] profile::base: add parameter to mark hosts for unattended reboots (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[14:31:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 19.18% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[14:32:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mfossati@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305436 (https://phabricator.wikimedia.org/T419786) (owner: 10Eric Gardner)
[14:32:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mfossati@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305437 (https://phabricator.wikimedia.org/T419786) (owner: 10Eric Gardner)
[14:34:01] <wikibugs>	 (03Merged) 10jenkins-bot: Restore the per-reader opt-out for the mobile image carousel [extensions/MultimediaViewer] (wmf/1.47.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1305436 (https://phabricator.wikimedia.org/T419786) (owner: 10Eric Gardner)
[14:34:02] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[14:34:03] <wikibugs>	 (03Merged) 10jenkins-bot: Restore the per-reader opt-out for the mobile image carousel [extensions/MultimediaViewer] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305437 (https://phabricator.wikimedia.org/T419786) (owner: 10Eric Gardner)
[14:34:31] <logmsgbot>	 !log mfossati@deploy1003 Started scap sync-world: Backport for [[gerrit:1305436|Restore the per-reader opt-out for the mobile image carousel (T419786)]], [[gerrit:1305437|Restore the per-reader opt-out for the mobile image carousel (T419786)]]
[14:34:37] <stashbot>	 T419786: Image Browsing: Opt-out mechanisms - https://phabricator.wikimedia.org/T419786
[14:41:53] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, one additional comment inline. And let's doublecheck with a PCC run against" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[14:46:10] <moritzm>	 !log installing postgresql security updates
[14:46:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:07] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1238: Migration of db1238.eqiad.wmnet completed
[14:47:08] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[14:50:23] <wikibugs>	 (03CR) 10Scott French: [C:03+1] shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[14:50:59] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2206: Migration of db2206.codfw.wmnet completed
[14:51:01] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[14:51:18] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Remove the job that synced the phab dumps to the clouddumps servers [puppet] - 10https://gerrit.wikimedia.org/r/1245419 (https://phabricator.wikimedia.org/T417824) (owner: 10Btullis)
[14:53:39] <logmsgbot>	 !log mfossati@deploy1003 egardner, mfossati: Backport for [[gerrit:1305436|Restore the per-reader opt-out for the mobile image carousel (T419786)]], [[gerrit:1305437|Restore the per-reader opt-out for the mobile image carousel (T419786)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:53:43] <stashbot>	 T419786: Image Browsing: Opt-out mechanisms - https://phabricator.wikimedia.org/T419786
[14:53:58] <wikibugs>	 (03CR) 10Aleksandar Mastilovic: [V:03+1 C:03+1] "LGTM! Thank you." [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[14:54:09] <wikibugs>	 (03CR) 10Aleksandar Mastilovic: [V:03+1 C:03+1] "LGTM! Thank you." [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[14:54:45] <wikibugs>	 (03PS14) 10Btullis: dse-k8s-services: Enable ingress on WDQS namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1302784 (https://phabricator.wikimedia.org/T429313) (owner: 10Trueg)
[14:54:50] <wikibugs>	 (03CR) 10Atsuko: [C:03+1] "+1!" [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[14:54:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#12050803 (10MoritzMuehlenhoff)
[14:55:16] <moritzm>	 !og installing libarchive security updates
[14:57:19] <logmsgbot>	 !log mfossati@deploy1003 egardner, mfossati: Continuing with deployment
[14:58:29] <wikibugs>	 (03CR) 10Btullis: [C:03+2] presto: Test resource groups and spill features on the test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305108 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[15:01:17] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Adding Jesse to approvers for Bitu - https://phabricator.wikimedia.org/T430059#12050822 (10LSobanski) Approved.
[15:02:04] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q3:rack/setup/install cloudcephosd105[3456] - https://phabricator.wikimedia.org/T419892#12050824 (10Jclark-ctr) a:03Andrew Is there a ticket tracking the rebalancing of CloudVirts for power to make room? Has any decision been made regarding...
[15:02:35] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1305358 (https://phabricator.wikimedia.org/T427668) (owner: 10Blake)
[15:03:01] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[15:03:52] <claime>	 jouncebot: nowandnext
[15:03:52] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 56 minute(s)
[15:03:52] <jouncebot>	 In 1 hour(s) and 56 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1700)
[15:05:37] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: pick up new images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305432 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[15:07:49] <wikibugs>	 (03PS1) 10Clément Goubert: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305442 (https://phabricator.wikimedia.org/T429372)
[15:08:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Jesse to Bitu approvers [puppet] - 10https://gerrit.wikimedia.org/r/1305443 (https://phabricator.wikimedia.org/T430059)
[15:08:23] <wikibugs>	 (03PS1) 10Jdlrobson: Replace Tools button with vertical ellipsis [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305444 (https://phabricator.wikimedia.org/T429258)
[15:09:19] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply
[15:09:57] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[15:10:03] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[15:10:23] <logmsgbot>	 !log mfossati@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305436|Restore the per-reader opt-out for the mobile image carousel (T419786)]], [[gerrit:1305437|Restore the per-reader opt-out for the mobile image carousel (T419786)]] (duration: 35m 52s)
[15:10:26] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[15:10:28] <stashbot>	 T419786: Image Browsing: Opt-out mechanisms - https://phabricator.wikimedia.org/T419786
[15:10:32] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[15:10:57] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[15:11:03] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
[15:11:22] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[15:11:28] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[15:11:57] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[15:12:03] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[15:12:30] <wikibugs>	 (03CR) 10Zabe: [C:03+1] Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305442 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[15:12:49] <wikibugs>	 (03PS3) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[15:12:51] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305442 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[15:14:07] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[15:14:41] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cgoubert@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305442 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[15:14:57] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[15:15:15] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[15:15:37] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305442 (https://phabricator.wikimedia.org/T429372) (owner: 10Clément Goubert)
[15:16:06] <logmsgbot>	 !log cgoubert@deploy1003 Started scap sync-world: Backport for [[gerrit:1305442|Update interwiki map (T429372 T418494)]]
[15:16:13] <stashbot>	 T429372: Remove API Portal from WMF MediaWiki config - https://phabricator.wikimedia.org/T429372
[15:16:13] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[15:20:36] <logmsgbot>	 !log cgoubert@deploy1003 cgoubert: Backport for [[gerrit:1305442|Update interwiki map (T429372 T418494)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:21:12] <logmsgbot>	 !log cgoubert@deploy1003 cgoubert: Continuing with deployment
[15:23:23] <wikibugs>	 (03PS2) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305289 (https://phabricator.wikimedia.org/T428984)
[15:25:02] <wikibugs>	 (03PS2) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305288 (https://phabricator.wikimedia.org/T428984)
[15:25:49] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[15:25:55] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[15:26:12] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2003 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[15:26:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.42% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:26:55] <wikibugs>	 (03PS4) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[15:27:51] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[15:28:46] <wikibugs>	 (03PS1) 10Brouberol: global_config: register phabricator in the external-services [puppet] - 10https://gerrit.wikimedia.org/r/1305449 (https://phabricator.wikimedia.org/T430024)
[15:29:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.08% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:29:50] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[15:29:56] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[15:30:15] <logmsgbot>	 !log cgoubert@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305442|Update interwiki map (T429372 T418494)]] (duration: 14m 09s)
[15:30:21] <stashbot>	 T429372: Remove API Portal from WMF MediaWiki config - https://phabricator.wikimedia.org/T429372
[15:30:22] <stashbot>	 T418494: Delete the API Portal wiki - https://phabricator.wikimedia.org/T418494
[15:30:35] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[15:30:41] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[15:31:50] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[15:31:56] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[15:32:30] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[15:32:36] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[15:33:36] <claime>	 Dreamy_Jazz: Any way the backports fro ConfirmEditGetGlobalInstanceFromContext you did earlier would vause worker usage to juump 25%?
[15:33:42] <claime>	 s/vause/cause/
[15:33:47] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[15:33:55] <claime>	 Dreamy_Jazz: https://grafana.wikimedia.org/goto/efq4525qr55vke?orgId=1
[15:35:48] <wikibugs>	 (03CR) 10Bking: [C:03+1] global_config: register phabricator in the external-services [puppet] - 10https://gerrit.wikimedia.org/r/1305449 (https://phabricator.wikimedia.org/T430024) (owner: 10Brouberol)
[15:36:10] <Dreamy_Jazz>	 I wouldn't have thought that could cause the jump
[15:36:28] <Dreamy_Jazz>	 The changes should only apply to Special:Contact on meta.wikimedia.org
[15:36:41] <claime>	 Dreamy_Jazz: ok looking for another cause then
[15:36:47] <wikibugs>	 (03PS1) 10Kamila Součková: admin: increase shellbox CPU limit quota [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305457 (https://phabricator.wikimedia.org/T385404)
[15:36:57] <wikibugs>	 (03CR) 10AikoChou: ml-services: Deploy artest version of ticle-country model on staging. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[15:37:20] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[15:37:29] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Remove saved groups config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305287 (https://phabricator.wikimedia.org/T429959) (owner: 10Clare Ming)
[15:38:20] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[15:38:20] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893
[15:38:40] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1241: Upgrading db1241.eqiad.wmnet
[15:38:40] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[15:38:40] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@codfw T429893
[15:39:00] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2210: Upgrading db2210.codfw.wmnet
[15:39:10] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1241: Upgrading db1241.eqiad.wmnet
[15:39:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.24% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:39:28] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[15:39:32] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2210: Upgrading db2210.codfw.wmnet
[15:39:45] <jinxer-wm>	 FIRING: [2x] PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext releases routed via main at eqiad: 0% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:39:53] <wikibugs>	 (03Merged) 10jenkins-bot: Remove saved groups config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305287 (https://phabricator.wikimedia.org/T429959) (owner: 10Clare Ming)
[15:40:16] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-internal-scholarly_443: Servers wdqs1027.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:40:28] <stashbot>	 T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893
[15:41:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext releases routed via main at eqiad: 0% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:41:22] <claime>	 Yeah on it
[15:41:26] <claime>	 !incidents
[15:41:26] <sirenbot>	 8096 (UNACKED)  PHPFPMTooBusy sre (mw-api-ext main eqiad)
[15:41:29] <claime>	 !ack 8096
[15:41:29] <sirenbot>	 8096 (ACKED)  PHPFPMTooBusy sre (mw-api-ext main eqiad)
[15:42:15] <jinxer-wm>	 FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:42:16] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-internal-scholarly_443: Servers wdqs1027.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:43:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:43:23] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[15:43:41] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS trixie
[15:44:04] * Raine is slightly suspicious of DB upgrades
[15:44:13] <logmsgbot>	 cwilliams@cumin1003 major-upgrade (PID 2620532) is awaiting input
[15:44:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-api-ext releases routed via main (k8s) 1.543s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:45:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:45:24] * Raine is also slightly suspicious of dumps that needed to be restarted because bookworm made them unhappy
[15:45:35] <wikibugs>	 (03PS1) 10Brouberol: phabricator: enable egress from the dse kubepods networks [puppet] - 10https://gerrit.wikimedia.org/r/1305460 (https://phabricator.wikimedia.org/T430024)
[15:46:43] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[15:46:49] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[15:47:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:47:31] * Raine actually asked and dumps shouldn't be using mw-api-ext
[15:48:16] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-internal-scholarly_443: Servers wdqs1027.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:49:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-api-ext releases routed via main (k8s) 2.09s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:49:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:49:24] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[15:49:30] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext releases routed via main at eqiad: 5.926% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:50:53] <wikibugs>	 (03CR) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.5 release to staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305288 (https://phabricator.wikimedia.org/T428984) (owner: 10Clare Ming)
[15:50:58] <wikibugs>	 (03CR) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.5 release to production (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305289 (https://phabricator.wikimedia.org/T428984) (owner: 10Clare Ming)
[15:51:12] <cezmunsta>	 Raine: I am not seeing much in the way of errors relating to s4 - still concerned?
[15:51:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext releases routed via main at eqiad: 0% idle #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:51:34] <Raine>	 cezmunsta: no, resolved we think, thank you!
[15:52:06] * cezmunsta sighs with relief
[15:52:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[15:52:16] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy artest version of ticle-country model on staging. (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305382 (https://phabricator.wikimedia.org/T429675) (owner: 10Gkyziridis)
[15:52:16] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[15:52:23] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
[15:52:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q3:rack/setup/install cloudcephosd105[3456] - https://phabricator.wikimedia.org/T419892#12051057 (10fgiunchedi) Yes the cloudvirt rebalancing task is {T424658} and the last table in the description lists the host moves to get to balance. I'll...
[15:52:50] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[15:52:56] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[15:52:56] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2210.codfw.wmnet with OS trixie
[15:53:02] <wikibugs>	 (03PS8) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[15:53:02] <wikibugs>	 (03PS1) 10Btullis: presto: update the properties for spilling [puppet] - 10https://gerrit.wikimedia.org/r/1305462 (https://phabricator.wikimedia.org/T424112)
[15:53:16] <wikibugs>	 (03PS2) 10Btullis: presto: update the properties for spilling [puppet] - 10https://gerrit.wikimedia.org/r/1305462 (https://phabricator.wikimedia.org/T424112)
[15:53:23] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[15:53:25] <wikibugs>	 (03PS9) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[15:53:30] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[15:54:31] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[15:56:31] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[15:57:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10observability: Q3:rack/setup/install kafka-logging200[6-8] - https://phabricator.wikimedia.org/T418931#12051106 (10elukey) All three provisioned, I'll reimage them tomorrow. Note that they need a new version of reimage as well.
[15:58:49] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
[16:01:56] <wikibugs>	 (03PS1) 10Btullis: Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773)
[16:02:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773) (owner: 10Btullis)
[16:04:32] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
[16:05:00] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Commons, 10media-backups, 10MediaWiki-File-management: Uncompressed TIFFs on commons - https://phabricator.wikimedia.org/T427949#12051135 (10GPSLeo) >>! In T427949#12039056, @jcrespo wrote: > My suggestion was to build a dedicated, but separate, repository for professional g...
[16:06:27] <wikibugs>	 (03PS2) 10Btullis: Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773)
[16:06:57] <wikibugs>	 (03PS7) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[16:08:41] <wikibugs>	 (03PS8) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[16:09:41] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:10:32] <wikibugs>	 (03CR) 10Btullis: [C:03+2] presto: update the properties for spilling [puppet] - 10https://gerrit.wikimedia.org/r/1305462 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[16:10:59] <wikibugs>	 (03CR) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[16:12:09] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2210.codfw.wmnet with reason: host reimage
[16:12:36] <icinga-wm>	 RECOVERY - mysqld processes on db1208 is OK: PROCS OK: 2 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[16:13:02] <icinga-wm>	 RECOVERY - MariaDB read only matomo on db1208 is OK: Version 10.6.18-MariaDB-log, Uptime 27s, read_only: True, event_scheduler: True, 11.23 QPS, connection latency: 0.041757s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:13:36] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: matomo on db1208 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[16:14:36] <icinga-wm>	 RECOVERY - MariaDB Replica IO: matomo on db1208 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[16:14:36] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: matomo on db1208 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 18724.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[16:14:41] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:14:46] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 5 NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[16:15:37] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: matomo on db1208 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[16:17:16] <jinxer-wm>	 FIRING: [2x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1023:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[16:18:53] <wikibugs>	 (03PS9) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[16:19:41] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:19:46] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2210.codfw.wmnet with reason: host reimage
[16:21:50] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1241.eqiad.wmnet with OS trixie
[16:23:01] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 6 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[16:23:19] <wikibugs>	 (03PS10) 10Jelto: profile::base::reboot_unattended: add class to mark hosts for unattended reboots [puppet] - 10https://gerrit.wikimedia.org/r/1251406
[16:26:29] <wikibugs>	 (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (NOOP 2 CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1251406 (owner: 10Jelto)
[16:30:43] <wikibugs>	 (03CR) 10Scott French: [C:03+1] admin: increase shellbox CPU limit quota (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305457 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[16:33:33] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1241: Migration of db1241.eqiad.wmnet completed
[16:37:10] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:37:14] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2210.codfw.wmnet with OS trixie
[16:37:21] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:41:36] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:41:45] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:45:15] <wikibugs>	 (03PS3) 10Btullis: Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773)
[16:51:02] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2210: Migration of db2210.codfw.wmnet completed
[16:56:08] <wikibugs>	 (03PS4) 10Btullis: Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773)
[17:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1700)
[17:00:21] <wikibugs>	 (03PS3) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305288 (https://phabricator.wikimedia.org/T428984)
[17:01:39] <wikibugs>	 (03PS3) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305289 (https://phabricator.wikimedia.org/T428984)
[17:01:54] <wikibugs>	 (03CR) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305288 (https://phabricator.wikimedia.org/T428984) (owner: 10Clare Ming)
[17:02:04] <wikibugs>	 (03CR) 10Clare Ming: Test Kitchen UI: Deploy v1.4.5 release to production (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305289 (https://phabricator.wikimedia.org/T428984) (owner: 10Clare Ming)
[17:02:49] <wikibugs>	 (03PS2) 10Pushpaktiwari: T429269: Send logged-in experiment events to ins-502b [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303490
[17:05:53] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[17:06:10] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[17:06:13] <wikibugs>	 (03CR) 10Pushpaktiwari: T429269: Send logged-in experiment events to ins-502b (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1303490 (owner: 10Pushpaktiwari)
[17:15:05] <wikibugs>	 (03CR) 10Bking: [C:03+1] Temporarily remove dse-k8s-worker101[567] from service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773) (owner: 10Btullis)
[17:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[17:19:00] <wikibugs>	 (03PS5) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[17:19:04] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1241: Migration of db1241.eqiad.wmnet completed
[17:19:05] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[17:19:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[17:21:03] <wikibugs>	 (03PS1) 10CDanis: turnilo: drop kind:number from X-Is-Browser dim [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305478
[17:23:59] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] admin: increase shellbox CPU limit quota [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305457 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[17:25:27] <wikibugs>	 (03PS6) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[17:26:20] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Temporarily remove dse-k8s-worker101[567] from service [puppet] - 10https://gerrit.wikimedia.org/r/1305467 (https://phabricator.wikimedia.org/T429773) (owner: 10Btullis)
[17:28:34] <wikibugs>	 (03CR) 10Scott French: [C:03+1] turnilo: drop kind:number from X-Is-Browser dim [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305478 (owner: 10CDanis)
[17:28:37] <logmsgbot>	 !log aokoth@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T430072
[17:29:05] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[17:32:25] <wikibugs>	 (03Merged) 10jenkins-bot: admin: increase shellbox CPU limit quota [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305457 (https://phabricator.wikimedia.org/T385404) (owner: 10Kamila Součková)
[17:34:14] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[17:34:51] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[17:35:04] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[17:35:39] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[17:36:32] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2210: Migration of db2210.codfw.wmnet completed
[17:36:33] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[17:36:45] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[17:38:15] <logmsgbot>	 !log aokoth@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T430072
[17:38:22] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[17:41:34] <wikibugs>	 (03PS10) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[17:41:34] <wikibugs>	 (03PS1) 10Btullis: presto: Fix the resource-groups configuration [puppet] - 10https://gerrit.wikimedia.org/r/1305481 (https://phabricator.wikimedia.org/T424112)
[17:42:20] <logmsgbot>	 !log aokoth@cumin1003 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T430072
[17:44:19] <wikibugs>	 (03PS1) 10Ahmon Dancy: scap.cfg.erb: Add jobrunner to beta mw_web_clusters list [puppet] - 10https://gerrit.wikimedia.org/r/1305483 (https://phabricator.wikimedia.org/T430075)
[17:44:54] <wikibugs>	 (03CR) 10Aleksandar Mastilovic: [V:03+1 C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1305481 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[17:45:17] <logmsgbot>	 !log cscott@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[17:45:47] <logmsgbot>	 !log cscott@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[17:45:48] <logmsgbot>	 !log cscott@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[17:46:09] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host dse-k8s-worker1016.eqiad.wmnet
[17:46:09] <logmsgbot>	 !log btullis@cumin1003 END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host dse-k8s-worker1016.eqiad.wmnet
[17:46:16] <logmsgbot>	 !log cscott@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[17:47:54] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305481 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[17:50:34] <wikibugs>	 (03CR) 10Btullis: [C:03+2] presto: Fix the resource-groups configuration [puppet] - 10https://gerrit.wikimedia.org/r/1305481 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[17:51:13] <wikibugs>	 (03CR) 10CDanis: [C:03+2] turnilo: drop kind:number from X-Is-Browser dim [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305478 (owner: 10CDanis)
[17:51:49] <logmsgbot>	 !log aokoth@cumin1003 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T430072
[17:51:49] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[17:52:52] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
[17:52:54] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
[17:52:55] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
[17:53:20] <wikibugs>	 (03Merged) 10jenkins-bot: turnilo: drop kind:number from X-Is-Browser dim [deployment-charts] - 10https://gerrit.wikimedia.org/r/1305478 (owner: 10CDanis)
[17:54:11] <logmsgbot>	 !log cdanis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
[17:54:35] <logmsgbot>	 !log cdanis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
[17:55:36] <wikibugs>	 (03PS7) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[17:55:38] <logmsgbot>	 !log cdanis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
[17:55:50] <logmsgbot>	 !log cdanis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
[17:59:19] <jinxer-wm>	 FIRING: [2x] HelmReleaseBadStatus: Helm release wdqs/main-internal on k8s-dse@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=wdqs - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[18:00:03] <wikibugs>	 (03PS8) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[18:00:04] <jouncebot>	 brennen and jeena: Your horoscope predicts another MediaWiki train - Utc-7 Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T1800).
[18:00:11] <brennen>	 o/
[18:00:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[18:01:51] <brennen>	 !log 1.47.0-wmf.8 train status (T423917): no current blockers, logs no worse than expected, rolling to group1
[18:01:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:56] <stashbot>	 T423917: 1.47.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T423917
[18:03:32] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305484 (https://phabricator.wikimedia.org/T423917)
[18:03:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by brennen@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305484 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot)
[18:03:41] <wikibugs>	 (03PS3) 10Ahmon Dancy: modules/profile/files/puppet/bin: cleanup puppet SSL on CA server mismatch [puppet] - 10https://gerrit.wikimedia.org/r/1302978 (https://phabricator.wikimedia.org/T429413)
[18:04:32] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.47.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305484 (https://phabricator.wikimedia.org/T423917) (owner: 10TrainBranchBot)
[18:05:14] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[18:05:14] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893
[18:05:21] <stashbot>	 T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893
[18:05:34] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1242: Upgrading db1242.eqiad.wmnet
[18:06:05] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1242: Upgrading db1242.eqiad.wmnet
[18:08:38] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
[18:09:08] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
[18:10:22] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
[18:12:44] <logmsgbot>	 !log brennen@deploy1003 rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.8  refs T423917
[18:12:45] <logmsgbot>	 cwilliams@cumin1003 major-upgrade (PID 2641087) is awaiting input
[18:12:48] <stashbot>	 T423917: 1.47.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T423917
[18:14:50] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
[18:16:00] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[18:16:55] <wikibugs>	 06SRE, 10hCaptcha, 06Product Safety and Integrity: hcaptcha failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430045#12051694 (10Scott_French) FWIW, it does not look like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1303341 was ever applied - i.e., th...
[18:17:40] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
[18:19:24] <logmsgbot>	 !log kamila@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[18:19:50] <logmsgbot>	 !log cwilliams@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[18:21:54] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[18:21:55] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@eqiad T429893
[18:22:01] <stashbot>	 T429893: Migrate s4 section to Debian Trixie - https://phabricator.wikimedia.org/T429893
[18:22:03] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1242: Upgrading db1242.eqiad.wmnet
[18:22:04] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
[18:22:13] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1242: Upgrading db1242.eqiad.wmnet
[18:22:43] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1242.eqiad.wmnet with OS trixie
[18:23:09] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[18:23:09] <logmsgbot>	 !log cwilliams@cumin1003 dbmaint on s4@codfw T429893
[18:23:30] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2219: Upgrading db2219.codfw.wmnet
[18:23:40] <wikibugs>	 (03CR) 10Ahmon Dancy: "Beta-only change.  Already live and working properly in beta." [puppet] - 10https://gerrit.wikimedia.org/r/1305483 (https://phabricator.wikimedia.org/T430075) (owner: 10Ahmon Dancy)
[18:23:52] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2219: Upgrading db2219.codfw.wmnet
[18:25:21] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2219.codfw.wmnet with OS trixie
[18:25:27] <wikibugs>	 (03PS11) 10Btullis: presto: Enable resource groups and spill on the production cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305109 (https://phabricator.wikimedia.org/T424112)
[18:25:27] <wikibugs>	 (03PS1) 10Btullis: presto: Fix up the values for the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305486 (https://phabricator.wikimedia.org/T424112)
[18:25:37] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305486 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[18:29:21] <wikibugs>	 (03CR) 10Btullis: [C:03+2] presto: Fix up the values for the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/1305486 (https://phabricator.wikimedia.org/T424112) (owner: 10Btullis)
[18:31:35] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
[18:35:01] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
[18:35:28] <wikibugs>	 (03PS4) 10Ahmon Dancy: modules/profile/files/puppet/bin: cleanup puppet SSL on CA server mismatch [puppet] - 10https://gerrit.wikimedia.org/r/1302978 (https://phabricator.wikimedia.org/T429413)
[18:39:35] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
[18:40:11] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
[18:40:19] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[18:43:01] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[18:43:45] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[18:44:04] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2219.codfw.wmnet with reason: host reimage
[18:44:21] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
[18:45:37] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[18:46:46] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[18:48:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q3:rack/setup/install cloudcephosd105[3456] - https://phabricator.wikimedia.org/T419892#12051798 (10ayounsi) {T424871} are the switch replacement tracking task to support 25G. But even after they arrive some time will be needed to configure/au...
[18:48:09] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2219.codfw.wmnet with reason: host reimage
[18:49:28] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[18:52:09] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] scap.cfg.erb: Add jobrunner to beta mw_web_clusters list [puppet] - 10https://gerrit.wikimedia.org/r/1305483 (https://phabricator.wikimedia.org/T430075) (owner: 10Ahmon Dancy)
[18:52:26] <wikibugs>	 (03PS1) 10Dzahn: jenkins: configure upstream_host: "localhost" for envoy [puppet] - 10https://gerrit.wikimedia.org/r/1305488 (https://phabricator.wikimedia.org/T418521)
[18:53:25] <wikibugs>	 06SRE, 10DNS, 06Traffic, 13Patch-For-Review: new CNAME record for WikiLearn - https://phabricator.wikimedia.org/T429628#12051808 (10BCornwall) 05Open→03Resolved I'm marking this as resolved: Please feel free to reopen if this hasn't been!
[18:56:17] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[18:58:35] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[19:01:31] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1242.eqiad.wmnet with OS trixie
[19:01:58] <swfrench-wmf>	 !log applied latent admin_ng diffs for mw-pretrain - T427668
[19:02:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:02] <stashbot>	 T427668: Turn up the Pretrain MVP environment - https://phabricator.wikimedia.org/T427668
[19:02:18] <swfrench-wmf>	 !log applied latent admin_ng diffs for allow-urldownloaders GlobalNetworkPolicy - T430045 T427282
[19:02:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:02:25] <stashbot>	 T430045: hcaptcha failed to connect to the new URL downloader proxies - https://phabricator.wikimedia.org/T430045
[19:02:25] <stashbot>	 T427282: Move URL downloaders to trixie - https://phabricator.wikimedia.org/T427282
[19:03:55] <wikibugs>	 (03PS2) 10Dzahn: jenkins: configure upstream_addr as localhost for envoy [puppet] - 10https://gerrit.wikimedia.org/r/1305488 (https://phabricator.wikimedia.org/T418521)
[19:05:49] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2219.codfw.wmnet with OS trixie
[19:06:48] <wikibugs>	 (03PS3) 10Dzahn: jenkins: configure upstream_addr as localhost for envoy [puppet] - 10https://gerrit.wikimedia.org/r/1305488 (https://phabricator.wikimedia.org/T418521)
[19:08:11] <wikibugs>	 (03PS9) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[19:08:40] <wikibugs>	 (03PS1) 10Jgreen: Remove deprecated civi.wm.o and civi.frdev.wm.o CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/1305490
[19:13:17] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[19:14:04] <logmsgbot>	 !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[19:14:14] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox: apply
[19:15:02] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1305488/8778/contint1003.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1305488 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn)
[19:15:13] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1242: Migration of db1242.eqiad.wmnet completed
[19:15:14] <logmsgbot>	 !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
[19:16:13] <wikibugs>	 (03CR) 10Dwisehaupt: [C:03+2] Remove deprecated civi.wm.o and civi.frdev.wm.o CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/1305490 (owner: 10Jgreen)
[19:17:13] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[19:17:45] <wikibugs>	 (03CR) 10Jgreen: [C:03+2] Remove deprecated civi.wm.o and civi.frdev.wm.o CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/1305490 (owner: 10Jgreen)
[19:18:17] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2219: Migration of db2219.codfw.wmnet completed
[19:18:25] <logmsgbot>	 !log jgreen@dns1004 START - running authdns-update
[19:20:22] <logmsgbot>	 !log jgreen@dns1004 END - running authdns-update
[19:45:42] <wikibugs>	 (03PS10) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[19:47:08] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:00:05] <jouncebot>	 RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T2000).
[20:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[20:00:45] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1242: Migration of db1242.eqiad.wmnet completed
[20:00:46] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[20:02:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:03:47] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2219: Migration of db2219.codfw.wmnet completed
[20:03:48] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[20:04:07] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:04:19] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:15:36] <wikibugs>	 (03PS2) 10Jdlrobson: Restore menu tab underline style [skins/Vector] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305191 (https://phabricator.wikimedia.org/T428519)
[20:15:41] <wikibugs>	 (03PS1) 10C. Scott Ananian: Add $wgParserMigrationEnableParsoid as unified/fine-grained config [extensions/ParserMigration] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305500
[20:15:48] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 24 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/ParserMigration] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305500 (owner: 10C. Scott Ananian)
[20:16:36] <cscott>	 Is there really no one in this window?
[20:16:56] <cscott>	 RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: I'd like to add a last minute patch to this window
[20:17:01] <jinxer-wm>	 FIRING: [6x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1017:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[20:18:27] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:18:37] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:19:06] <TheresNoTime>	 cscott: if you're able to deploy, go for it :)
[20:19:34] <logmsgbot>	 !log cscott@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[20:20:01] <logmsgbot>	 !log cscott@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[20:20:03] <logmsgbot>	 !log cscott@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[20:20:31] <logmsgbot>	 !log cscott@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[20:21:32] <cscott>	 TheresNoTime: spiderpig, spiderpig, no one slings code like a spiderpig...
[20:21:54] <TheresNoTime>	 :D
[20:22:01] <jinxer-wm>	 FIRING: [10x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1015:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[20:23:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy1003 using scap backport" [extensions/ParserMigration] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305500 (owner: 10C. Scott Ananian)
[20:24:40] <wikibugs>	 (03Merged) 10jenkins-bot: Add $wgParserMigrationEnableParsoid as unified/fine-grained config [extensions/ParserMigration] (wmf/1.47.0-wmf.8) - 10https://gerrit.wikimedia.org/r/1305500 (owner: 10C. Scott Ananian)
[20:25:12] <logmsgbot>	 !log cscott@deploy1003 Started scap sync-world: Backport for [[gerrit:1305500|Add $wgParserMigrationEnableParsoid as unified/fine-grained config]]
[20:27:01] <jinxer-wm>	 FIRING: [14x] NodeBGPSessionStatusNotEstablished: Kubernetes node dse-k8s-worker1015:0 has a BGP session which is not in the 'established' state. - https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#NodeBGPSessionStatusNotEstablished  - https://alerts.wikimedia.org/?q=alertname%3DNodeBGPSessionStatusNotEstablished
[20:27:13] <logmsgbot>	 !log cscott@deploy1003 cscott: Backport for [[gerrit:1305500|Add $wgParserMigrationEnableParsoid as unified/fine-grained config]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:27:36] <XioNoX>	 !log cr1-eqiad# set chassis fpc 1 pic 1 port 5 speed 100g - T429623
[20:27:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:16] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:28:30] <logmsgbot>	 !log ryankemper@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[20:29:10] <logmsgbot>	 !log cscott@deploy1003 cscott: Continuing with deployment
[20:33:33] <logmsgbot>	 !log cscott@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305500|Add $wgParserMigrationEnableParsoid as unified/fine-grained config]] (duration: 08m 21s)
[20:33:36] <wikibugs>	 (03PS11) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:33:50] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:34:06] <XioNoX>	 !log draining one of eqiad-codfw transports for PIC bounce
[20:34:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:41] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[20:35:27] <XioNoX>	 expected ^
[20:37:16] <XioNoX>	 !log bouncing cr1-eqiad FPC1 PIC1 - T429623
[20:37:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:51] <wikibugs>	 (03PS1) 10Reedy: InitialiseSettings: Require 2FA for all on arbcom_*wiki and conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305501 (https://phabricator.wikimedia.org/T428103)
[20:44:41] <jinxer-wm>	 RESOLVED: NetworkDeviceAlarmActive: Alarm active on cr1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[20:49:15] <jinxer-wm>	 FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 18.56% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[20:50:42] <wikibugs>	 (03PS12) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:51:12] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:53:42] <wikibugs>	 (03PS1) 10Dreamy Jazz: Drop $wmgEmergencyCaptcha [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305503 (https://phabricator.wikimedia.org/T429849)
[20:53:48] <Dreamy_Jazz>	 jouncebot: nowandnext
[20:53:48] <jouncebot>	 For the next 0 hour(s) and 6 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T2000)
[20:53:48] <jouncebot>	 In 0 hour(s) and 6 minute(s): Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T2100)
[20:55:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305503 (https://phabricator.wikimedia.org/T429849) (owner: 10Dreamy Jazz)
[20:55:56] <wikibugs>	 (03CR) 10JHathaway: sre.hosts.provision: introduce the wmfroot user (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291994 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey)
[20:56:21] <wikibugs>	 (03Merged) 10jenkins-bot: Drop $wmgEmergencyCaptcha [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1305503 (https://phabricator.wikimedia.org/T429849) (owner: 10Dreamy Jazz)
[20:56:47] <logmsgbot>	 !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1305503|Drop $wmgEmergencyCaptcha (T429849)]]
[20:56:52] <stashbot>	 T429849: hCaptcha: Emergency CAPTCHA uses FancyCaptcha - https://phabricator.wikimedia.org/T429849
[20:57:41] <wikibugs>	 (03PS13) 10Bking: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:57:49] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844) (owner: 10Ryan Kemper)
[20:58:50] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1305503|Drop $wmgEmergencyCaptcha (T429849)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:59:08] <logmsgbot>	 !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment
[21:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260624T2100)
[21:03:31] <logmsgbot>	 !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1305503|Drop $wmgEmergencyCaptcha (T429849)]] (duration: 06m 43s)
[21:03:37] <stashbot>	 T429849: hCaptcha: Emergency CAPTCHA uses FancyCaptcha - https://phabricator.wikimedia.org/T429849
[21:09:15] <jinxer-wm>	 RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 16.99% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[21:14:02] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 761558048 and 61 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[21:18:54] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-codfw and Hurricane Electric (2001:504:61::1b1b:0:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[21:23:02] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 24576 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[21:25:40] <wikibugs>	 (03PS14) 10Ryan Kemper: opensearch: split plugins_mandatory into own key [puppet] - 10https://gerrit.wikimedia.org/r/1305321 (https://phabricator.wikimedia.org/T429844)