[00:05:26] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:16:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:17:06] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [00:19:28] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1297670 (owner: 10L10n-bot) [00:25:07] (03PS1) 10Neriah: NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) [00:28:10] (03CR) 10Ladsgroup: "let's enable it gradually btw." [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [00:53:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 11.88% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:08:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.62% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:09:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:09:40] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1298419 [01:09:40] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1298419 (owner: 10TrainBranchBot) [01:14:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.69% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [01:20:41] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1298419 (owner: 10TrainBranchBot) [02:08:57] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:33:57] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:19:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:20:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:26:32] PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 652.83 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response [03:28:30] RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 0.42 seconds https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response [03:50:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:53:12] PROBLEM - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2007 bytes in 0.176 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [03:54:12] RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 27958 bytes in 0.082 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [04:00:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:15:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:16:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:17:06] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:20:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:39:55] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Add new private info stub for hiddenparma [labs/private] - 10https://gerrit.wikimedia.org/r/1297493 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [04:39:56] FIRING: ProbeDown: Service gitlab1004:443 has failed probes (http_gitlab_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gitlab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:43:20] (03PS3) 10Giuseppe Lavagetto: requestctl: sync script [puppet] - 10https://gerrit.wikimedia.org/r/1297289 (https://phabricator.wikimedia.org/T428119) [04:43:20] (03PS3) 10Giuseppe Lavagetto: hiddenparma: switch to db-backed api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1297290 (https://phabricator.wikimedia.org/T428119) [04:43:20] (03PS4) 10Giuseppe Lavagetto: requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) [04:44:42] (03CR) 10CI reject: [V:04-1] requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [04:45:00] RESOLVED: ProbeDown: Service gitlab1004:443 has failed probes (http_gitlab_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gitlab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:04:51] (03CR) 10Giuseppe Lavagetto: [C:03+2] requestctl: sync script [puppet] - 10https://gerrit.wikimedia.org/r/1297289 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [05:18:01] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [05:18:22] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2052: Upgrading es2052.codfw.wmnet [05:18:43] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2052: Upgrading es2052.codfw.wmnet [05:19:27] (03PS4) 10Giuseppe Lavagetto: hiddenparma: switch to db-backed api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1297290 (https://phabricator.wikimedia.org/T428119) [05:19:27] (03PS5) 10Giuseppe Lavagetto: requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) [05:19:56] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2052.codfw.wmnet with OS trixie [05:21:20] (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8659/console" [puppet] - 10https://gerrit.wikimedia.org/r/1297290 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [05:23:36] (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8660/co" [puppet] - 10https://gerrit.wikimedia.org/r/1297290 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [05:24:17] (03CR) 10Giuseppe Lavagetto: [V:03+1 C:03+2] hiddenparma: switch to db-backed api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1297290 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [05:31:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote es1054 to es3 eqiad primary T428050', diff saved to https://phabricator.wikimedia.org/P93895 and previous config saved to /var/cache/conftool/dbconfig/20260608-053156-marostegui.json [05:32:01] T428050: Migrate es3 section to Debian Trixie - https://phabricator.wikimedia.org/T428050 [05:32:58] (03PS1) 10Marostegui: wmnet: Update es3-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1298421 (https://phabricator.wikimedia.org/T428050) [05:33:24] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [05:33:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1051: Upgrading es1051.eqiad.wmnet [05:33:56] (03CR) 10Marostegui: [C:03+2] wmnet: Update es3-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1298421 (https://phabricator.wikimedia.org/T428050) (owner: 10Marostegui) [05:34:00] !log marostegui@dns1004 START - running authdns-update [05:35:31] !log marostegui@dns1004 END - running authdns-update [05:35:46] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2052.codfw.wmnet with reason: host reimage [05:39:11] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2052.codfw.wmnet with reason: host reimage [05:42:45] (03PS6) 10Giuseppe Lavagetto: requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) [05:44:46] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool es1051: Upgrading es1051.eqiad.wmnet [05:47:46] marostegui@cumin1003 major-upgrade (PID 1789103) is awaiting input [05:52:22] (03PS7) 10Giuseppe Lavagetto: requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) [05:53:15] (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8664/co" [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [05:54:42] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2052.codfw.wmnet with OS trixie [05:58:01] marostegui@cumin1003 major-upgrade (PID 1786432) is awaiting input [05:58:52] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1051.eqiad.wmnet with OS trixie [05:59:56] (03CR) 10AikoChou: ml-services: makes editing-suggestions publicly available (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [06:10:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:12:53] (03CR) 10Giuseppe Lavagetto: [V:03+1 C:03+2] requestctl: fetch api credentials from hiddenparma [puppet] - 10https://gerrit.wikimedia.org/r/1297291 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [06:15:16] !log taavi@cumin1003 START - Cookbook sre.wikireplicas.add-wiki for database urwikisource (T415977) [06:15:21] T415977: [wikireplicas] Create views for new wiki urwikisource - https://phabricator.wikimedia.org/T415977 [06:15:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:15:55] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1051.eqiad.wmnet with reason: host reimage [06:20:23] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1051.eqiad.wmnet with reason: host reimage [06:21:56] (03PS1) 10Giuseppe Lavagetto: requestctl_client: sync cli script [puppet] - 10https://gerrit.wikimedia.org/r/1298422 [06:25:21] (03CR) 10Giuseppe Lavagetto: [C:03+2] requestctl_client: sync cli script [puppet] - 10https://gerrit.wikimedia.org/r/1298422 (owner: 10Giuseppe Lavagetto) [06:25:26] FIRING: [3x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:30:26] FIRING: [3x] SystemdUnitFailed: requestctl-credential-refresh.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:35:28] (03PS1) 10Giuseppe Lavagetto: requestctl_client: sync client script [puppet] - 10https://gerrit.wikimedia.org/r/1298433 [06:35:59] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] requestctl_client: sync client script [puppet] - 10https://gerrit.wikimedia.org/r/1298433 (owner: 10Giuseppe Lavagetto) [06:36:55] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1051.eqiad.wmnet with OS trixie [06:40:12] marostegui@cumin1003 major-upgrade (PID 1789103) is awaiting input [06:40:26] FIRING: [5x] SystemdUnitFailed: requestctl-credential-refresh.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:45:28] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [06:47:13] PROBLEM - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2007 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [06:48:13] RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 27958 bytes in 0.104 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [06:50:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.49% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [06:54:02] (03CR) 10Slyngshede: [C:03+1] admin: upgrade Audrey Penven from ldap_only to restricted [puppet] - 10https://gerrit.wikimedia.org/r/1298299 (https://phabricator.wikimedia.org/T427531) (owner: 10Dzahn) [06:55:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.59% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [06:55:26] FIRING: [5x] SystemdUnitFailed: requestctl-credential-refresh.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:56:50] (03CR) 10Slyngshede: [C:03+1] "LGTM, I assume we verified that key out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1297191 (https://phabricator.wikimedia.org/T428037) (owner: 10Kamila Součková) [06:57:26] (03CR) 10Slyngshede: [C:03+1] admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková) [07:00:05] Amir1, urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T0700). [07:00:05] WMDE-Fisch, VadymTS1, and Neriah: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:24] \o [07:00:29] \o [07:00:46] I'll self serve my config change first. [07:01:15] (03CR) 10TrainBranchBot: [C:03+2] "Approved by wmde-fisch@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297681 (https://phabricator.wikimedia.org/T425662) (owner: 10Svantje Lilienthal) [07:02:28] (03Merged) 10jenkins-bot: Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297681 (https://phabricator.wikimedia.org/T425662) (owner: 10Svantje Lilienthal) [07:06:18] (03PS1) 10Slyngshede: data.yaml: offboarding hmonroy [puppet] - 10https://gerrit.wikimedia.org/r/1298446 [07:09:21] There's currently an issue with uncommited changes on the servers, I'll have to look into it [07:09:22] https://phabricator.wikimedia.org/T426631#11990595 [07:11:56] !log wmde-fisch@deploy1003 Started scap sync-world: Backport for [[gerrit:1297681|Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki (T425662)]] [07:12:00] T425662: Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki - https://phabricator.wikimedia.org/T425662 [07:12:31] !log upgrade exim4 packages on seaborgium for security upgrades [07:12:32] !log taavi@cumin1003 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database urwikisource (T415977) [07:12:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:37] T415977: [wikireplicas] Create views for new wiki urwikisource - https://phabricator.wikimedia.org/T415977 [07:15:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.79% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [07:17:11] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [07:17:24] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [07:18:20] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1051: repool after maintenance [07:18:27] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2052: repool after upgrade [07:19:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:20:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.79% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [07:21:50] !log upgrade sudo package on an-* hosts for T428384 [07:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:23:31] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy1003 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner [07:25:23] (03PS1) 10Slyngshede: data.yaml: offboarding dmaza [puppet] - 10https://gerrit.wikimedia.org/r/1298450 [07:25:26] FIRING: [4x] SystemdUnitFailed: requestctl-credential-refresh.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:27:50] (03CR) 10Slyngshede: [C:03+1] admin: update SSH key for tchanders [puppet] - 10https://gerrit.wikimedia.org/r/1298282 (owner: 10Ssingh) [07:29:37] !log wmde-fisch@deploy1003 wmde-fisch, lilients: Backport for [[gerrit:1297681|Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki (T425662)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:29:41] T425662: Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki - https://phabricator.wikimedia.org/T425662 [07:29:43] Testing [07:32:22] !log wmde-fisch@deploy1003 wmde-fisch, lilients: Continuing with deployment [07:33:32] VadymTS1: Do you mind if I bundle the config changes and deploy them together? [07:33:36] Neriah: I think there won't be time for your backport in this slot. Try rescheduling it in the afternoon. [07:34:54] WMDE-Fisch I don't mind [07:36:32] (03CR) 10Marostegui: tables-catalog: set betafeatures_user_counts to public visibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298329 (https://phabricator.wikimedia.org/T402145) (owner: 10SD0001) [07:39:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.56% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [07:40:26] FIRING: [4x] SystemdUnitFailed: requestctl-credential-refresh.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:40:33] VadymTS1: I fear I also won't have time for your patches... it's going sooo slow this morning ;-/ [07:40:50] Maybe someone else can take over deployments when I'm done. [07:41:28] I reschedule this changes to last back port window today [07:41:58] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298390 (https://phabricator.wikimedia.org/T428329) (owner: 10VadymTS1) [07:42:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298328 (https://phabricator.wikimedia.org/T428269) (owner: 10VadymTS1) [07:44:48] !log wmde-fisch@deploy1003 Finished scap sync-world: Backport for [[gerrit:1297681|Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki (T425662)]] (duration: 32m 51s) [07:44:52] T425662: Global rollout - Sub-ref deployments to Group 0, Group 1 and frwiki - https://phabricator.wikimedia.org/T425662 [07:48:29] (03PS7) 10Ozge: rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) [07:48:40] I'm done :-) [07:49:07] (03PS5) 10Elukey: Fix datetime-related and pytest warnings [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293719 [07:49:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.34% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [07:50:02] !log fceratto@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis urwikisource in section s5 [07:50:04] !log fceratto@cumin1003 END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=97) Managing sanitization for wikis urwikisource in section s5 [07:50:12] !log fceratto@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis urwikisource in section s5 [07:52:36] !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis urwikisource in section s5 [07:53:03] !log fceratto@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis urwikisource in section s5 [07:53:12] (03PS8) 10Ozge: rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) [07:53:26] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: reimage [07:53:45] (03CR) 10Ozge: "comments addressed." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [07:53:50] (03PS1) 10Marostegui: db1217: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1298458 (https://phabricator.wikimedia.org/T423069) [07:54:39] (03CR) 10Marostegui: [C:03+2] db1217: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1298458 (https://phabricator.wikimedia.org/T423069) (owner: 10Marostegui) [07:55:13] (03CR) 10Elukey: redfish: improve add_account with AccountTypes (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [07:55:26] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:55:43] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS trixie [07:57:05] (03CR) 10Elukey: [C:03+2] Fix datetime-related and pytest warnings [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293719 (owner: 10Elukey) [07:57:11] PROBLEM - haproxy failover on dbproxy1025 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:11] PROBLEM - haproxy failover on dbproxy1027 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:11] PROBLEM - haproxy failover on dbproxy1022 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:13] PROBLEM - haproxy failover on dbproxy1028 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:13] PROBLEM - haproxy failover on dbproxy1023 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:25] PROBLEM - haproxy failover on dbproxy1029 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:27] PROBLEM - haproxy failover on dbproxy1024 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:57:36] (03PS1) 10JMeybohm: kind.sh: Fix path to istio config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298459 (https://phabricator.wikimedia.org/T396107) [07:57:43] PROBLEM - haproxy failover on dbproxy1026 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:58:30] WMDE-Fisch: ok, no problem [07:58:57] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [07:59:17] (03CR) 10JMeybohm: [C:03+1] k8s: add wikikube-worker2331 [puppet] - 10https://gerrit.wikimedia.org/r/1289022 (https://phabricator.wikimedia.org/T426688) (owner: 10Jasmine) [07:59:25] (03CR) 10JMeybohm: [C:03+2] kind.sh: Fix path to istio config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298459 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm) [08:00:07] (03CR) 10JMeybohm: [C:03+1] dse-k8s-aux: migrate internal kafka-ui disc and svc records to k8s-aux [dns] - 10https://gerrit.wikimedia.org/r/1298262 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:00:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [08:01:07] (03CR) 10JMeybohm: [C:03+1] CI: add aux-k8s-codfw to the list of environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298283 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:01:29] (03CR) 10JMeybohm: [C:03+1] aux-k8s: define the kafka-ui namespace in both clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298266 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:02:02] (03CR) 10JMeybohm: [C:03+1] aux-k8s: define the kafka-ui helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298267 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:02:12] (03CR) 10JMeybohm: [C:03+2] dse-k8s: remove the kafka-ui namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298268 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:02:19] (03CR) 10JMeybohm: [C:03+1] dse-k8s: remove the kafka-ui namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298268 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:02:54] (03CR) 10JMeybohm: [C:03+1] "Remember to 'helmfile destroy' this release before merging" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298269 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:03:23] !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis urwikisource in section s5 [08:03:29] (03CR) 10JMeybohm: [C:03+1] aux-k8s: define the kafka-ui kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1298264 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:03:45] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1051: repool after maintenance [08:03:45] (03CR) 10JMeybohm: [C:03+1] dse-k8s: remove kafka-ui kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1298265 (https://phabricator.wikimedia.org/T428053) (owner: 10Brouberol) [08:03:54] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2052: repool after upgrade [08:04:59] (03Merged) 10jenkins-bot: kind.sh: Fix path to istio config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298459 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm) [08:05:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 21.49% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [08:06:25] 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management: Undeleted file is an incorrect version - https://phabricator.wikimedia.org/T399892#11992357 (10MatthewVernon) 05Open→03Declined a:03Pppery Thanks @Pppery . [08:11:48] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage [08:12:10] (03PS1) 10Volans: config: type config_file as PathLike[str] [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298541 [08:14:55] !log taavi@cumin1003 START - Cookbook sre.wikireplicas.add-wiki for database urwikisource (T415977) [08:14:59] T415977: [wikireplicas] Create views for new wiki urwikisource - https://phabricator.wikimedia.org/T415977 [08:15:04] (03CR) 10AikoChou: [C:03+1] "LGTM! We need an SRE to review/merge/deploy this :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [08:15:08] !log taavi@cumin1003 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database urwikisource (T415977) [08:16:19] (03PS1) 10Neriah: Enable wgNewUserMessageOnAutoCreate on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) [08:16:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:16:42] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [08:17:47] (03PS2) 10Neriah: Enable wgNewUserMessageOnAutoCreateFirstEdit on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) [08:18:21] (03PS1) 10Volans: decorators: fix dynamic callbacks bug in retry [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298656 [08:18:21] (03PS1) 10Volans: config: raise on missing INI file when raises=True [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298657 [08:18:22] (03PS1) 10Volans: __init__: fail clearly when unknown __version__ [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298658 [08:18:22] (03PS1) 10Volans: phabricator: reject trailing newline in task ID [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298659 [08:18:23] (03PS1) 10Volans: dns: resolve() instead of deprecated query() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298660 [08:18:24] (03PS1) 10Volans: actions: fix ActionsDict docstring example output [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298661 [08:18:28] (03PS1) 10Volans: interactive: fix ask_input Returns docstring [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298662 [08:18:32] (03PS1) 10Volans: interactive: improve error message with validators [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298663 [08:18:42] (03PS1) 10Volans: irc: set the handler level via setLevel() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298664 [08:18:50] (03CR) 10SD0001: tables-catalog: set betafeatures_user_counts to public visibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298329 (https://phabricator.wikimedia.org/T402145) (owner: 10SD0001) [08:19:04] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage [08:19:06] (03PS1) 10MVernon: swift: remove 2 drained nodes for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1298665 (https://phabricator.wikimedia.org/T354872) [08:19:14] (03PS1) 10MVernon: swift: move ms-be206[2,3] to new-style storage [puppet] - 10https://gerrit.wikimedia.org/r/1298666 (https://phabricator.wikimedia.org/T354872) [08:20:05] (03CR) 10SD0001: tables-catalog: set betafeatures_user_counts to public visibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298329 (https://phabricator.wikimedia.org/T402145) (owner: 10SD0001) [08:22:54] (03PS2) 10MVernon: swift: move ms-be206[2,3] to new-style storage [puppet] - 10https://gerrit.wikimedia.org/r/1298666 (https://phabricator.wikimedia.org/T354872) [08:25:53] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297644 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven) [08:26:53] (03CR) 10Tchanders: [C:03+1] admin: update SSH key for tchanders [puppet] - 10https://gerrit.wikimedia.org/r/1298282 (owner: 10Ssingh) [08:29:10] (03CR) 10Volans: config: type config_file as PathLike[str] (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298541 (owner: 10Volans) [08:31:35] PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1021 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [08:31:46] (03CR) 10Marostegui: tables-catalog: set betafeatures_user_counts to public visibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298329 (https://phabricator.wikimedia.org/T402145) (owner: 10SD0001) [08:32:35] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1021 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [08:35:17] (03PS1) 10Marostegui: Revert "db1217: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1298707 [08:38:27] RECOVERY - haproxy failover on dbproxy1024 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:38:43] RECOVERY - haproxy failover on dbproxy1026 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:11] RECOVERY - haproxy failover on dbproxy1025 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:11] RECOVERY - haproxy failover on dbproxy1027 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:11] RECOVERY - haproxy failover on dbproxy1022 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:13] RECOVERY - haproxy failover on dbproxy1028 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:13] RECOVERY - haproxy failover on dbproxy1023 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:39:13] (03CR) 10Marostegui: [C:03+2] Revert "db1217: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1298707 (owner: 10Marostegui) [08:39:25] RECOVERY - haproxy failover on dbproxy1029 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:40:26] RESOLVED: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:41:44] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS trixie [08:43:00] (03PS1) 10Jelto: gitlab: temporary block chrome on GitLab hosts [puppet] - 10https://gerrit.wikimedia.org/r/1298708 (https://phabricator.wikimedia.org/T428381) [08:43:32] (03CR) 10Arnaudb: [C:03+1] gitlab: temporary block chrome on GitLab hosts [puppet] - 10https://gerrit.wikimedia.org/r/1298708 (https://phabricator.wikimedia.org/T428381) (owner: 10Jelto) [08:44:57] (03PS3) 10Lucas Werkmeister (WMDE): WikiProject links - remove 'text' config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297644 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven) [08:44:57] (03PS2) 10Lucas Werkmeister (WMDE): Add Wikidata configuration for WikiProject links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298293 (https://phabricator.wikimedia.org/T422935) [08:46:27] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8665/co" [puppet] - 10https://gerrit.wikimedia.org/r/1298708 (https://phabricator.wikimedia.org/T428381) (owner: 10Jelto) [08:46:54] (03CR) 10CWilliams: [C:03+2] Provide downtime duration information in sre.mysql cookbooks [software/spicerack] - 10https://gerrit.wikimedia.org/r/1297126 (https://phabricator.wikimedia.org/T427780) (owner: 10CWilliams) [08:47:32] (03CR) 10Jelto: [V:03+1 C:03+2] gitlab: temporary block chrome on GitLab hosts [puppet] - 10https://gerrit.wikimedia.org/r/1298708 (https://phabricator.wikimedia.org/T428381) (owner: 10Jelto) [08:48:16] (03PS1) 10Lucas Werkmeister (WMDE): Add translatable messages for WikiProject names [extensions/Wikidata.org] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298709 (https://phabricator.wikimedia.org/T427804) [08:48:17] (03PS1) 10Lucas Werkmeister (WMDE): Use translatable messages for WikiProject links [extensions/Wikibase] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298710 (https://phabricator.wikimedia.org/T427804) [08:49:05] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/Wikidata.org] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298709 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [08:49:12] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/Wikibase] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298710 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [08:49:26] (03CR) 10CWilliams: [C:03+1] swift: remove 2 drained nodes for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1298665 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [08:50:28] (03CR) 10CWilliams: [C:03+1] swift: move ms-be206[2,3] to new-style storage [puppet] - 10https://gerrit.wikimedia.org/r/1298666 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [08:50:54] (03CR) 10Lucas Werkmeister (WMDE): NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (031 comment) [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [08:52:07] (03Merged) 10jenkins-bot: Provide downtime duration information in sre.mysql cookbooks [software/spicerack] - 10https://gerrit.wikimedia.org/r/1297126 (https://phabricator.wikimedia.org/T427780) (owner: 10CWilliams) [08:53:33] 10SRE-tools, 06DBA, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Provide downtime duration information in sre.mysql cookbooks - https://phabricator.wikimedia.org/T427780#11992668 (10CWilliams-WMF) 05Open→03Resolved [08:59:27] (03CR) 10Neriah: NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (031 comment) [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [09:00:56] FIRING: ProbeDown: Service gitlab1004:443 has failed probes (http_gitlab_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gitlab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:01:05] (03CR) 10Dpogorzelski: [C:03+1] rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [09:01:12] PROBLEM - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 2353 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [09:01:21] (03CR) 10Ozge: [C:03+2] rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [09:03:03] (03CR) 10Ozge: [V:03+2 C:03+2] rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [09:03:44] (03CR) 10MVernon: [C:03+2] swift: remove 2 drained nodes for reimage [puppet] - 10https://gerrit.wikimedia.org/r/1298665 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [09:04:08] (03Merged) 10jenkins-bot: rest-gateway: Add liftwing editing-suggestions experimental api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297748 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [09:05:49] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [09:05:56] FIRING: [2x] ProbeDown: Service gitlab1004:443 has failed probes (http_gitlab_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gitlab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:06:11] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2043: Upgrading es2043.codfw.wmnet [09:06:32] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2043: Upgrading es2043.codfw.wmnet [09:07:37] (03CR) 10SD0001: tables-catalog: set betafeatures_user_counts to public visibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1298329 (https://phabricator.wikimedia.org/T402145) (owner: 10SD0001) [09:07:47] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS trixie [09:13:48] FIRING: PuppetFailure: Puppet has failed on puppetserver1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:15:03] (03CR) 10Dpogorzelski: [C:03+2] liftwing-openapi-server: Add new admin_ng service for serving OpenAPI specs [puppet] - 10https://gerrit.wikimedia.org/r/1297168 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [09:15:46] !log ozge@deploy1003 helmfile [staging] START helmfile.d/services/rest-gateway: sync [09:15:52] !log ozge@deploy1003 helmfile [staging] DONE helmfile.d/services/rest-gateway: sync [09:16:33] (03CR) 10Lucas Werkmeister (WMDE): NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (031 comment) [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [09:16:58] (03PS1) 10Santiago Faci: Deploy GrowthBook 4.4.0 to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298711 (https://phabricator.wikimedia.org/T427506) [09:17:37] !log jelto@cumin1003 START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org [09:18:20] (03PS22) 10Ayounsi: Create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) [09:19:06] PROBLEM - Host gitlab.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [09:22:14] RECOVERY - Host gitlab.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [09:22:32] (03CR) 10Ayounsi: "Thanks for all the great feedback. I've updated it to take it into account and I've added content to def downtime()" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [09:23:12] RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 28102 bytes in 0.103 second response time https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [09:23:56] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2043.codfw.wmnet with reason: host reimage [09:24:12] PROBLEM - Gitlab SSH healthcheck git daemon on gitlab.wikimedia.org is CRITICAL: connect to address gitlab.wikimedia.org and port 22: Connection refused https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [09:25:12] RECOVERY - Gitlab SSH healthcheck git daemon on gitlab.wikimedia.org is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/GitLab%23Monitoring [09:27:12] !log jelto@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org [09:28:05] (03CR) 10Kamila Součková: [C:03+2] php8.3: Rebuild 8.3 image stack on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1295044 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French) [09:28:48] RESOLVED: PuppetFailure: Puppet has failed on puppetserver1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:29:27] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2043.codfw.wmnet with reason: host reimage [09:30:56] RESOLVED: [4x] ProbeDown: Service gitlab1004:22 has failed probes (tcp_gitlab_wikimedia_org_ssh_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:33:53] jouncebot: nowandnext [09:33:53] No deployments scheduled for the next 0 hour(s) and 26 minute(s) [09:33:53] In 0 hour(s) and 26 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1000) [09:34:23] anyone mind if I use the open window? [09:34:48] (03PS3) 10Neriah: Replace NewUserMessageOnAutoCreateFirstEdit with wgNewUserMessageOnFirstEdit [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298717 (https://phabricator.wikimedia.org/T426206) [09:35:07] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298717 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [09:38:51] (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297111 (owner: 10PipelineBot) [09:39:23] (03PS3) 10Neriah: Enable wgNewUserMessageOnFirstEdit on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) [09:39:28] 06SRE, 06ServiceOps new, 10ServiceOps-Services-Oids, 10Thumbor: Thumbor-k8s performance improvements - https://phabricator.wikimedia.org/T333445#11992924 (10Clement_Goubert) [09:39:52] (03CR) 10Jelto: "After our most recent discussion this change is not needed anymore? Instead a dedicated ssh hostname is used" [puppet] - 10https://gerrit.wikimedia.org/r/1282428 (https://phabricator.wikimedia.org/T425441) (owner: 10Dzahn) [09:40:56] (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297111 (owner: 10PipelineBot) [09:41:11] !log ozge@deploy1003 helmfile [eqiad] START helmfile.d/services/rest-gateway: sync [09:41:24] !log ozge@deploy1003 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync [09:41:43] !log ozge@deploy1003 helmfile [codfw] START helmfile.d/services/rest-gateway: sync [09:41:57] !log ozge@deploy1003 helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync [09:42:02] !log ozge@deploy1003 helmfile [eqiad] START helmfile.d/services/rest-gateway: sync [09:42:05] !log ozge@deploy1003 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync [09:44:04] !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply [09:44:22] !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply [09:46:48] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2043.codfw.wmnet with OS trixie [09:48:38] (03CR) 10Jelto: service: add gitlab-https and gitlab-ssh service to service catalog (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1290684 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [09:49:04] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [09:49:49] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2043: repool after upgrade [09:49:50] !log mvolz@deploy1003 helmfile [eqiad] START helmfile.d/services/citoid: apply [09:50:21] !log mvolz@deploy1003 helmfile [eqiad] DONE helmfile.d/services/citoid: apply [09:51:14] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416 (10mahmoud.abdelsattar.wmde) 03NEW [09:52:08] !log mvolz@deploy1003 helmfile [codfw] START helmfile.d/services/citoid: apply [09:52:38] !log mvolz@deploy1003 helmfile [codfw] DONE helmfile.d/services/citoid: apply [09:58:57] (03PS5) 10Gkyziridis: ml-services: add liftwing-openapi-server deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297167 (https://phabricator.wikimedia.org/T427902) [09:59:15] (03CR) 10Filippo Giunchedi: [C:03+1] P:openstack: cloudweb_mcrouter: Migrate to firewall defines [puppet] - 10https://gerrit.wikimedia.org/r/1294946 (owner: 10Majavah) [09:59:43] (03CR) 10Filippo Giunchedi: [C:03+1] P:wmcs::kubeadm::etcd: Migrate to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1295905 (https://phabricator.wikimedia.org/T427799) (owner: 10Majavah) [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1000) [10:00:43] (03CR) 10Clément Goubert: [C:03+2] swift::proxy: Deploy shadow ratelimit [puppet] - 10https://gerrit.wikimedia.org/r/1295430 (https://phabricator.wikimedia.org/T414440) (owner: 10Clément Goubert) [10:01:35] (03CR) 10Majavah: [V:03+1 C:03+2] P:openstack: cloudweb_mcrouter: Migrate to firewall defines [puppet] - 10https://gerrit.wikimedia.org/r/1294946 (owner: 10Majavah) [10:05:36] (03CR) 10MVernon: [C:03+2] swift: move ms-be206[2,3] to new-style storage [puppet] - 10https://gerrit.wikimedia.org/r/1298666 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [10:06:35] !log ihurbain@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [10:07:17] !log ihurbain@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [10:07:18] !log ihurbain@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [10:07:59] !log ihurbain@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [10:09:31] !log mvernon@cumin2002 START - Cookbook sre.swift.convert-disks for host ms-be2062 [10:11:37] (03PS3) 10Kamila Součková: php8.3: Rebuild 8.3 image stack on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1295044 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French) [10:12:21] (03CR) 10Kamila Součková: [C:03+2] php8.3: Rebuild 8.3 image stack on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1295044 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French) [10:12:40] !log mvernon@cumin2002 START - Cookbook sre.swift.convert-disks for host ms-be2063 [10:12:53] jouncebot: nowandnext [10:12:54] For the next 0 hour(s) and 47 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1000) [10:12:54] In 2 hour(s) and 47 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300) [10:13:34] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [10:13:54] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1042: Upgrading es1042.eqiad.wmnet [10:14:21] !log ihurbain@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [10:14:23] !log ihurbain@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [10:14:24] !log ihurbain@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [10:14:25] (03PS1) 10Ladsgroup: GuessedThumbnailInfo: Also allow showing webp originals [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298721 (https://phabricator.wikimedia.org/T428202) [10:14:28] !log ihurbain@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [10:15:14] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1042: Upgrading es1042.eqiad.wmnet [10:15:40] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298721 (https://phabricator.wikimedia.org/T428202) (owner: 10Ladsgroup) [10:16:20] (03CR) 10Kamila Součková: [V:03+2 C:03+2] php8.3: Rebuild 8.3 image stack on bookworm [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1295044 (https://phabricator.wikimedia.org/T418200) (owner: 10Scott French) [10:16:42] !log ihurbain@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [10:16:45] !log ihurbain@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [10:16:46] !log ihurbain@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [10:16:49] !log ihurbain@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [10:18:02] !log ihurbain@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [10:18:05] !log ihurbain@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [10:18:06] !log ihurbain@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [10:18:09] !log ihurbain@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [10:18:14] marostegui@cumin1003 major-upgrade (PID 1894481) is awaiting input [10:18:59] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1042.eqiad.wmnet with OS trixie [10:19:06] (03PS3) 10Clément Goubert: ratelimit-media: policy and user-class level metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295457 (https://phabricator.wikimedia.org/T424051) [10:19:48] (03CR) 10CI reject: [V:04-1] GuessedThumbnailInfo: Also allow showing webp originals [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298721 (https://phabricator.wikimedia.org/T428202) (owner: 10Ladsgroup) [10:20:55] (03Merged) 10jenkins-bot: GuessedThumbnailInfo: Also allow showing webp originals [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298721 (https://phabricator.wikimedia.org/T428202) (owner: 10Ladsgroup) [10:22:19] (03PS1) 10Atsuko: kubernetes: dummy secret for opensearch CFSSL profile [labs/private] - 10https://gerrit.wikimedia.org/r/1298722 (https://phabricator.wikimedia.org/T427517) [10:23:25] (03CR) 10Atsuko: "check experimental" [labs/private] - 10https://gerrit.wikimedia.org/r/1298722 (https://phabricator.wikimedia.org/T427517) (owner: 10Atsuko) [10:25:18] (03PS1) 10Giuseppe Lavagetto: Remove api token definitions [labs/private] - 10https://gerrit.wikimedia.org/r/1298723 (https://phabricator.wikimedia.org/T428119) [10:25:31] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Remove api token definitions [labs/private] - 10https://gerrit.wikimedia.org/r/1298723 (https://phabricator.wikimedia.org/T428119) (owner: 10Giuseppe Lavagetto) [10:30:50] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1298721|GuessedThumbnailInfo: Also allow showing webp originals (T428202)]] [10:30:55] T428202: Wikipedia full-screen image view does not display the image - https://phabricator.wikimedia.org/T428202 [10:32:14] (03PS1) 10Marco Fossati: Add exception for main page [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298726 (https://phabricator.wikimedia.org/T421019) [10:32:50] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298726 (https://phabricator.wikimedia.org/T421019) (owner: 10Marco Fossati) [10:34:17] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1042.eqiad.wmnet with reason: host reimage [10:34:38] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1298721|GuessedThumbnailInfo: Also allow showing webp originals (T428202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [10:35:02] !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2160.codfw.wmnet with reason: Reboot [10:35:13] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2043: repool after upgrade [10:36:35] !log fceratto@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2160.codfw.wmnet [10:36:36] !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2160.codfw.wmnet [10:37:46] (03PS4) 10Clément Goubert: ratelimit-media: policy and user-class level metrics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295457 (https://phabricator.wikimedia.org/T424051) [10:38:09] (03CR) 10Clément Goubert: ratelimit-media: policy and user-class level metrics (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1295457 (https://phabricator.wikimedia.org/T424051) (owner: 10Clément Goubert) [10:38:55] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply [10:39:15] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply [10:39:35] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [10:39:51] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1042.eqiad.wmnet with reason: host reimage [10:43:48] (03CR) 10Clément Goubert: [C:03+1] ml-services: add liftwing-openapi-server deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297167 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [10:47:31] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298721|GuessedThumbnailInfo: Also allow showing webp originals (T428202)]] (duration: 16m 41s) [10:47:35] T428202: Wikipedia full-screen image view does not display the image - https://phabricator.wikimedia.org/T428202 [10:48:52] (03PS1) 10Ladsgroup: SpecialMediaSearch: Prefer thumb steps over thumb limits [extensions/MediaSearch] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298728 (https://phabricator.wikimedia.org/T424032) [10:49:17] (03PS1) 10Kamila Součková: shellbox: switch to bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) [10:50:02] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] Enable wgNewUserMessageOnFirstEdit on commonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [10:51:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.93% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [10:51:29] (03CR) 10Ladsgroup: [C:04-1] "- The config is now renamed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [10:53:42] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "> The config is now renamed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [10:55:31] (03CR) 10Ladsgroup: [C:04-1] "> This change was already updated for that, with an extra Depends-On." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [10:56:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.9% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [10:56:49] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1042.eqiad.wmnet with OS trixie [10:57:52] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "> Yeah, that's my point. Right going for commons is a bit scary. Let's have it enabled in a small multilingual wiki (incubator maybe?) to " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [10:57:55] (03CR) 10Effie Mouzeli: "nit: please add in the commit comment exactly that we are using I92b0d173bca87777da77a8f040fd86886a6a6964" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) (owner: 10Kamila Součková) [10:58:25] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [10:58:35] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1042: repool after maintenance [10:58:57] mvernon@cumin2002 convert-disks (PID 1075602) is awaiting input [11:00:14] (03PS2) 10Kamila Součková: shellbox: switch to bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) [11:00:32] (03CR) 10Kamila Součková: "Done" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) (owner: 10Kamila Součková) [11:00:58] (03CR) 10Neriah: "I can do it on a smaller wiki. The reason I went with commons is that they're the ones who complained the most about the change..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [11:01:02] mvernon@cumin2002 convert-disks (PID 1076093) is awaiting input [11:01:09] (03CR) 10Atsuko: [V:03+2 C:03+2] kubernetes: dummy secret for opensearch CFSSL profile [labs/private] - 10https://gerrit.wikimedia.org/r/1298722 (https://phabricator.wikimedia.org/T427517) (owner: 10Atsuko) [11:02:16] !log mvernon@cumin2002 END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2063 [11:02:19] !log mvernon@cumin2002 END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2062 [11:05:39] (03CR) 10Effie Mouzeli: [C:03+1] shellbox: switch to bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) (owner: 10Kamila Součková) [11:05:53] (03CR) 10Kamila Součková: [C:03+2] shellbox: switch to bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) (owner: 10Kamila Součková) [11:07:25] (03CR) 10Cparle: [C:03+2] SpecialMediaSearch: Prefer thumb steps over thumb limits [extensions/MediaSearch] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298728 (https://phabricator.wikimedia.org/T424032) (owner: 10Ladsgroup) [11:08:35] (03Merged) 10jenkins-bot: shellbox: switch to bookworm [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298729 (https://phabricator.wikimedia.org/T427820) (owner: 10Kamila Součková) [11:08:39] (03Merged) 10jenkins-bot: SpecialMediaSearch: Prefer thumb steps over thumb limits [extensions/MediaSearch] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298728 (https://phabricator.wikimedia.org/T424032) (owner: 10Ladsgroup) [11:12:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [extensions/MediaSearch] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298728 (https://phabricator.wikimedia.org/T424032) (owner: 10Ladsgroup) [11:12:21] !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1298728|SpecialMediaSearch: Prefer thumb steps over thumb limits (T424032)]] [11:12:25] T424032: MediaSearch results does not use the standard thumbnail sizes - https://phabricator.wikimedia.org/T424032 [11:12:35] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply [11:13:50] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply [11:14:07] !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1298728|SpecialMediaSearch: Prefer thumb steps over thumb limits (T424032)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [11:14:25] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply [11:14:57] 10ops-codfw, 06SRE, 06DC-Ops, 07Wikimedia-Incident: 2022-12-15 codfw worker exhaustion - https://phabricator.wikimedia.org/T328353#11993448 (10hnowlan) [11:15:15] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [11:16:34] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:17:32] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1017.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:17:34] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2015.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:18:28] !log progressively switching shellbox to bookworm (start) [11:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:20:34] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1017.eqiad.wmnet, wdqs1018.eqiad.wmnet, wdqs1011.eqiad.wmnet, wdqs1013.eqiad.wmnet, wdqs1014.eqiad.wmnet, wdqs1021.eqiad.wmnet, wdqs1016.eqiad.wmnet, wdqs1020.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1022.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:21:32] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:21:34] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:25:34] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:25:43] !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment [11:26:34] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:28:46] (03PS1) 10Neriah: Enable wgNewUserMessageOnFirstEdit on incubatorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) [11:30:01] !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298728|SpecialMediaSearch: Prefer thumb steps over thumb limits (T424032)]] (duration: 17m 39s) [11:30:05] T424032: MediaSearch results does not use the standard thumbnail sizes - https://phabricator.wikimedia.org/T424032 [11:30:27] (03CR) 10Neriah: "Per https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1298654?tab=comments." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [11:33:07] (03PS2) 10Neriah: Enable wgNewUserMessageOnFirstEdit on incubatorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) [11:34:09] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [11:35:35] 10ops-eqiad, 06SRE, 06DC-Ops: C/D refresh Nokia switches Exhaust direction is reversed - https://phabricator.wikimedia.org/T428260#11993571 (10Jclark-ctr) Dell advised performing the same steps that had already been completed: a flea-power drain, firmware updates, and hardware diagnostic testing. I ran the... [11:36:12] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1274 is not booting up - https://phabricator.wikimedia.org/T428240#11993578 (10Jclark-ctr) Dell advised performing the same steps that had already been completed: a flea-power drain, firmware updates, and hardware diagnostic testing. I ran the diagnostics twice on Fr... [11:37:51] (03CR) 10Neriah: [C:04-1] "(I submitted a patch for incubator, I789695d76a7cd61e0054fb386f5214e741b2cb7c)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [11:39:10] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1274 is not booting up - https://phabricator.wikimedia.org/T428240#11993579 (10Marostegui) 05Open→03Resolved Thanks John - we can close this. [11:40:05] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: CPU1 thermal fault for wdqs1015.eqiad.wmnet - https://phabricator.wikimedia.org/T427852#11993595 (10Jclark-ctr) Discussed with @RKemper via IRC. He mentioned that we should decommission this one if the replacement is already here T423314 and is racked and cab... [11:42:50] !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2062.codfw.wmnet with OS bullseye [11:43:01] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993622 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be2062.codfw.wmnet with OS bullseye [11:43:09] !log mvernon@cumin2002 START - Cookbook sre.hosts.move-vlan for host ms-be2062 [11:43:19] !log mvernon@cumin2002 START - Cookbook sre.dns.netbox [11:44:00] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: repool after maintenance [11:44:30] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [11:44:49] !log marostegui@cumin1003 END (ERROR) - Cookbook sre.mysql.major-upgrade (exit_code=97) [11:44:53] 10ops-codfw, 06SRE, 06DC-Ops: codfw: move public baremetal servers to per rack vlan - https://phabricator.wikimedia.org/T428060#11993627 (10ayounsi) @ssingh For the DNS servers, the ones peering with the core routers will have a higher priority (as-path) than the ones peering with the ToR switches. So if on... [11:44:55] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [11:45:15] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2041: Upgrading es2041.codfw.wmnet [11:45:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2041: Upgrading es2041.codfw.wmnet [11:47:21] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS trixie [11:48:43] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-d1-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T428361#11993634 (10Jclark-ctr) Rebalanced the PDU. I will leave the ticket open to monitor for any additional alerts [11:49:18] !log mvernon@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2062 - mvernon@cumin2002" [11:49:25] !log mvernon@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2062 - mvernon@cumin2002" [11:49:25] !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [11:49:26] !log mvernon@cumin2002 START - Cookbook sre.dns.wipe-cache ms-be2062.codfw.wmnet 123.0.192.10.in-addr.arpa 3.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [11:49:29] !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2062.codfw.wmnet 123.0.192.10.in-addr.arpa 3.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [11:49:30] !log mvernon@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ms-be2062 [11:50:20] !log mvernon@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2062 [11:50:20] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2062 [11:50:45] !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2063.codfw.wmnet with OS bullseye [11:50:54] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993644 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be2063.codfw.wmnet with OS bullseye [11:51:06] !log mvernon@cumin2002 START - Cookbook sre.hosts.move-vlan for host ms-be2063 [11:51:16] !log mvernon@cumin2002 START - Cookbook sre.dns.netbox [11:52:28] 06SRE, 06DBA, 07Incident Severity 3, 07Wikimedia-Incident: External store unreachable: "Database servers in clusterXX are overloaded" - https://phabricator.wikimedia.org/T422130#11993658 (10MLechvien-WMF) [11:54:00] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: Replacement top-of-rack switch for rack C1 - https://phabricator.wikimedia.org/T403031#11993667 (10ayounsi) [11:54:27] (03PS1) 10Jcrespo: bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) [11:54:31] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Eqiad: Replacement top-of-rack switch for rack C1 - https://phabricator.wikimedia.org/T403031#11993669 (10ayounsi) [11:54:58] 10ops-codfw, 06SRE, 06DC-Ops: codfw: move public baremetal servers to per rack vlan - https://phabricator.wikimedia.org/T428060#11993677 (10taavi) Cloudweb hosts are in an interesting state: T411783 proposes moving those to the cloud racks, while T392478 proposes getting rid of them entirely. [11:56:42] (03CR) 10CI reject: [V:04-1] bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [11:56:49] !log mvernon@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2063 - mvernon@cumin2002" [11:56:54] !log mvernon@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2063 - mvernon@cumin2002" [11:56:55] !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [11:56:55] !log mvernon@cumin2002 START - Cookbook sre.dns.wipe-cache ms-be2063.codfw.wmnet 52.16.192.10.in-addr.arpa 2.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [11:56:59] !log mvernon@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2063.codfw.wmnet 52.16.192.10.in-addr.arpa 2.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [11:57:00] !log mvernon@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ms-be2063 [11:57:14] !log mvernon@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2063 [11:57:14] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2063 [11:57:47] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [12:00:20] (03PS2) 10Jcrespo: bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) [12:00:24] !log cgoubert@deploy1003 helmfile [codfw] START helmfile.d/services/ratelimit: apply [12:00:28] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [12:00:37] !log cgoubert@deploy1003 helmfile [codfw] DONE helmfile.d/services/ratelimit: apply [12:00:39] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [12:00:42] !log ayounsi@cumin1003 END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) [12:01:17] !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox [12:01:44] !log joal@deploy1003 Started deploy [analytics/refinery@d67c584] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d67c584f] [12:02:31] (03CR) 10CI reject: [V:04-1] bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [12:03:09] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2041.codfw.wmnet with reason: host reimage [12:03:44] !log joal@deploy1003 Finished deploy [analytics/refinery@d67c584] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d67c584f] (duration: 02m 00s) [12:06:14] !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add eqiad e8 public vlans - ayounsi@cumin1003" [12:06:18] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add eqiad e8 public vlans - ayounsi@cumin1003" [12:06:18] !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [12:06:59] (03CR) 10Ladsgroup: [C:03+1] Enable wgNewUserMessageOnFirstEdit on incubatorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [12:08:26] !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage [12:08:41] !log joal@deploy1003 Started deploy [analytics/refinery@d67c584]: Regular analytics weekly train [analytics/refinery@d67c584f] [12:09:30] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2041.codfw.wmnet with reason: host reimage [12:10:47] (03PS1) 10Filippo Giunchedi: icinga: remove toolschecker-based checks [puppet] - 10https://gerrit.wikimedia.org/r/1298742 (https://phabricator.wikimedia.org/T313030) [12:13:08] (03CR) 10CI reject: [V:04-1] icinga: remove toolschecker-based checks [puppet] - 10https://gerrit.wikimedia.org/r/1298742 (https://phabricator.wikimedia.org/T313030) (owner: 10Filippo Giunchedi) [12:13:24] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage [12:15:22] !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage [12:16:34] !log joal@deploy1003 Finished deploy [analytics/refinery@d67c584]: Regular analytics weekly train [analytics/refinery@d67c584f] (duration: 07m 52s) [12:16:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:17:03] (03PS1) 10Arnaudb: gitlab: add gitlab-ssh.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1298744 (https://phabricator.wikimedia.org/T425441) [12:17:55] !log cgoubert@deploy1003 helmfile [codfw] START helmfile.d/services/ratelimit: apply [12:18:02] !log cgoubert@deploy1003 helmfile [codfw] DONE helmfile.d/services/ratelimit: apply [12:18:42] (03CR) 10Gkyziridis: [C:03+2] ml-services: add liftwing-openapi-server deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297167 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [12:19:04] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage [12:19:33] !log joal@deploy1003 Started deploy [analytics/refinery@d67c584] (thin): Regular analytics weekly train THIN [analytics/refinery@d67c584f] [12:21:33] !log joal@deploy1003 Finished deploy [analytics/refinery@d67c584] (thin): Regular analytics weekly train THIN [analytics/refinery@d67c584f] (duration: 02m 00s) [12:21:55] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1298746 (owner: 10L10n-bot) [12:22:05] jouncebot: nowandnext [12:22:06] No deployments scheduled for the next 0 hour(s) and 37 minute(s) [12:22:06] In 0 hour(s) and 37 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300) [12:27:06] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2041.codfw.wmnet with OS trixie [12:27:58] (03PS1) 10Michael Große: feat(V2): toggle experiment features based on custom url override [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298758 (https://phabricator.wikimedia.org/T424646) [12:28:13] (03Merged) 10jenkins-bot: ml-services: add liftwing-openapi-server deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1297167 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [12:30:33] marostegui@cumin1003 major-upgrade (PID 1957788) is awaiting input [12:30:40] (03PS1) 10Michael Große: specialCreateAccount: use GECreateAccountExperimentV2 instead of hook [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298762 (https://phabricator.wikimedia.org/T424646) [12:30:45] (03PS1) 10Aklapper: Remove Phabricator Diffusion as canonical repository source [puppet] - 10https://gerrit.wikimedia.org/r/1298763 (https://phabricator.wikimedia.org/T405596) [12:31:11] (03PS1) 10Michael Große: fix: correctly read experiments param on Special:UserLogin [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298764 [12:31:31] (03PS1) 10Michael Große: signup.js: use JS var instead of TestKitchen to show experiment [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298765 (https://phabricator.wikimedia.org/T424646) [12:32:07] (03PS1) 10Michael Große: UsernamePolicyPopover: add instrumentation for links [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298766 (https://phabricator.wikimedia.org/T424246) [12:32:22] jouncebot: nowandnext [12:32:22] No deployments scheduled for the next 0 hour(s) and 27 minute(s) [12:32:22] In 0 hour(s) and 27 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300) [12:32:42] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2062.codfw.wmnet with OS bullseye [12:32:48] (03PS1) 10Dreamy Jazz: Follow-up: Allow CaptchaConsequence to be skipped via hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298767 (https://phabricator.wikimedia.org/T427608) [12:32:48] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993816 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be2062.codfw.wmnet with OS bullseye compl... [12:33:00] jnuche: Are you using scap? [12:33:01] (03CR) 10CI reject: [V:04-1] Remove Phabricator Diffusion as canonical repository source [puppet] - 10https://gerrit.wikimedia.org/r/1298763 (https://phabricator.wikimedia.org/T405596) (owner: 10Aklapper) [12:33:08] If not, I'd like to backport [12:33:24] Dreamy_Jazz: nope, you can go ahead [12:33:25] (03PS1) 10Giuseppe Lavagetto: hiddenparma: remove remaining references to api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1298768 [12:33:28] Thanks [12:35:50] (03CR) 10CI reject: [V:04-1] hiddenparma: remove remaining references to api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1298768 (owner: 10Giuseppe Lavagetto) [12:37:40] (03PS2) 10Giuseppe Lavagetto: hiddenparma: remove remaining references to api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1298768 [12:37:43] <_joe_> sith that was stupid [12:37:51] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298758 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [12:38:09] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298762 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [12:38:28] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298764 (owner: 10Michael Große) [12:38:40] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298767 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz) [12:38:58] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298765 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [12:39:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298766 (https://phabricator.wikimedia.org/T424246) (owner: 10Michael Große) [12:40:08] (03CR) 10Giuseppe Lavagetto: [C:03+2] hiddenparma: remove remaining references to api tokens [puppet] - 10https://gerrit.wikimedia.org/r/1298768 (owner: 10Giuseppe Lavagetto) [12:40:21] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2063.codfw.wmnet with OS bullseye [12:40:35] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993849 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be2063.codfw.wmnet with OS bullseye compl... [12:41:38] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [12:42:50] (03PS1) 10MVernon: swift: restore 2 nodes to rings [puppet] - 10https://gerrit.wikimedia.org/r/1298773 (https://phabricator.wikimedia.org/T354872) [12:43:04] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [12:43:29] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [12:44:26] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [12:46:19] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [12:46:57] (03CR) 10Marostegui: [C:03+1] swift: restore 2 nodes to rings [puppet] - 10https://gerrit.wikimedia.org/r/1298773 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [12:47:26] !log dpogorzelski@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [12:47:47] jouncebot: nowandnext [12:47:48] No deployments scheduled for the next 0 hour(s) and 12 minute(s) [12:47:48] In 0 hour(s) and 12 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300) [12:49:16] !log dpogorzelski@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [12:49:23] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2041: repool after upgrade [12:49:27] !log dpogorzelski@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [12:50:55] (03Merged) 10jenkins-bot: Follow-up: Allow CaptchaConsequence to be skipped via hook [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298767 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz) [12:51:14] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1298767|Follow-up: Allow CaptchaConsequence to be skipped via hook (T427608)]] [12:51:18] T427608: hCaptcha: Edits made via the API on WMF wikis that trigger AbuseFilter still require hCaptcha completion - https://phabricator.wikimedia.org/T427608 [12:51:21] !log dpogorzelski@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [12:52:59] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1298767|Follow-up: Allow CaptchaConsequence to be skipped via hook (T427608)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:53:24] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [12:54:38] !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [12:55:16] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [12:55:33] (03PS1) 10Giuseppe Lavagetto: profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 [12:55:36] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1041: Upgrading es1041.eqiad.wmnet [12:55:40] !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [12:55:48] !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [12:56:06] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1041: Upgrading es1041.eqiad.wmnet [12:56:47] !log dpogorzelski@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [12:57:02] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1041.eqiad.wmnet with OS trixie [12:57:33] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298767|Follow-up: Allow CaptchaConsequence to be skipped via hook (T427608)]] (duration: 06m 20s) [12:57:37] T427608: hCaptcha: Edits made via the API on WMF wikis that trigger AbuseFilter still require hCaptcha completion - https://phabricator.wikimedia.org/T427608 [12:57:40] (03CR) 10CI reject: [V:04-1] profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [12:58:21] (03CR) 10MVernon: [C:03+2] swift: restore 2 nodes to rings [puppet] - 10https://gerrit.wikimedia.org/r/1298773 (https://phabricator.wikimedia.org/T354872) (owner: 10MVernon) [12:59:25] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations, 13Patch-For-Review: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993964 (10MatthewVernon) [12:59:41] 06SRE, 10SRE-swift-storage, 06Infrastructure-Foundations, 13Patch-For-Review: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11993965 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon All done! And all codfw backends moved to new-style stor... [13:00:04] Lucas_WMDE, urbanecm, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300). [13:00:04] Neriah, Lucas_WMDE, yerdua_wmde, and MichaelG_WMF: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:12] hi [13:00:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.66% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:00:29] hey hey [13:00:43] hi [13:00:59] (03PS3) 10Jcrespo: bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) [13:01:26] (Sorry for adding such a big backport, those changes were waiting for one of them, and it can only be tested at enwiki) [13:01:54] o/ [13:01:57] I can deploy! [13:02:25] let’s see how far we get [13:02:36] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz) [13:02:40] ❤️ [13:02:54] (03PS2) 10Giuseppe Lavagetto: profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 [13:03:14] who is first? [13:03:27] you are ^^ [13:03:37] oh wait [13:04:03] I clicked through scap without reading the message at first [13:04:10] “Change(s) 1298418, 1298717 touch l10n-related files and are likely to trigger a large l10n rebuild, resulting in a slow deployment (~20 minutes)” [13:04:11] why? [13:04:23] Any .json file modification causes the warning I've found [13:04:24] does extension.json count as l10n-related? o_O [13:04:25] RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:04:35] It doesn't seem to affect the run time of the job though [13:04:44] (because it could technically change the $wgMessagesDirs? [13:04:47] okay [13:04:50] then let’s do it anyway, thanks [13:04:52] Like I've seen the warning for extension.json and it's ran in the normal time [13:04:55] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:04:56] I suspect it's a bug? [13:04:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298717 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:04:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:05:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.66% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:05:24] (03CR) 10CI reject: [V:04-1] profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [13:05:35] I think it’s probably because scap can only look at the diffstat, and the extension.json change has the *potential* to require a l10n cache rebuild (because it could change $wgMessagesDirs) [13:05:41] (03CR) 10Ladsgroup: [C:03+1] profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [13:05:49] whereas the actual l10n cache rebuild later can see that it didn’t change, and finish quickly (0 languages rebuilt) [13:05:52] let’s hope for the best [13:06:20] (some of the later backports definitely will require a l10n cache rebuild, so let’s hope we don’t have two separate ones in one window) [13:07:02] (03PS3) 10Giuseppe Lavagetto: profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 [13:07:57] (03Merged) 10jenkins-bot: NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298418 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:07:59] (03Merged) 10jenkins-bot: Replace NewUserMessageOnAutoCreateFirstEdit with wgNewUserMessageOnFirstEdit [extensions/NewUserMessage] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298717 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:08:02] (03CR) 10Filippo Giunchedi: [C:03+1] profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [13:08:21] (03CR) 10Volans: [C:04-1] "Too bad an AI agent will never see it, as they run non-interactive shells usually that will not trigger MOTD ;)" [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [13:08:33] PROBLEM - HTTPS non-canonical-redirect-10 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-39 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-20 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-29 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-6 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-17 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-33 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:41] PROBLEM - HTTPS non-canonical-redirect-28 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:42] PROBLEM - HTTPS non-canonical-redirect-23 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:42] PROBLEM - HTTPS non-canonical-redirect-35 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:43] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "poke zuul?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:08:43] PROBLEM - HTTPS non-canonical-redirect-24 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:43] PROBLEM - HTTPS non-canonical-redirect-27 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:44] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:08:44] PROBLEM - HTTPS non-canonical-redirect-21 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:09:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.58% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:09:18] (03CR) 10CI reject: [V:04-1] profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 (owner: 10Giuseppe Lavagetto) [13:09:33] RECOVERY - HTTPS non-canonical-redirect-10 on ncredir3006 is OK: SSL OK - Certificate wikipediya.org valid until 2026-07-30 14:39:31 +0000 (expires in 52 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:09:45] PROBLEM - HTTPS non-canonical-redirect-36 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:09:45] PROBLEM - HTTPS non-canonical-redirect-12 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:10:11] what’s going on with ncredir3006? [13:10:16] (03Merged) 10jenkins-bot: Enable wgNewUserMessageOnFirstEdit on incubatorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298734 (https://phabricator.wikimedia.org/T426206) (owner: 10Neriah) [13:10:17] FIRING: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:10:38] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1298418|NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (T426206)]], [[gerrit:1298717|Replace NewUserMessageOnAutoCreateFirstEdit with wgNewUserMessageOnFirstEdit (T426206)]], [[gerrit:1298734|Enable wgNewUserMessageOnFirstEdit on incubatorwiki (T426206)]] [13:10:41] RECOVERY - HTTPS non-canonical-redirect-35 on ncredir3006 is OK: SSL OK - Certificate wikipdeia.org valid until 2026-07-18 18:57:14 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-33 on ncredir3006 is OK: SSL OK - Certificate wikiwpedia.com valid until 2026-07-18 17:57:10 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-17 on ncredir3006 is OK: SSL OK - Certificate wikipediaparticlecreation.com valid until 2026-08-26 03:22:02 +0000 (expires in 78 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-6 on ncredir3006 is OK: SSL OK - Certificate wikipedia.fi valid until 2026-08-21 00:13:57 +0000 (expires in 73 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-28 on ncredir3006 is OK: SSL OK - Certificate wikimedia.li valid until 2026-07-17 19:58:58 +0000 (expires in 39 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-20 on ncredir3006 is OK: SSL OK - Certificate wikidestination.org valid until 2026-07-15 18:55:00 +0000 (expires in 37 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:41] RECOVERY - HTTPS non-canonical-redirect-39 on ncredir3006 is OK: SSL OK - Certificate wikipedia.ie valid until 2026-08-24 08:14:52 +0000 (expires in 76 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:42] T426206: Per global RfC, only welcome users on Wikimedia projects where they created their account or have edited - https://phabricator.wikimedia.org/T426206 [13:10:42] RECOVERY - HTTPS non-canonical-redirect-29 on ncredir3006 is OK: SSL OK - Certificate wikimediacommons.com valid until 2026-07-18 14:57:06 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:42] RECOVERY - HTTPS non-canonical-redirect-23 on ncredir3006 is OK: SSL OK - Certificate wikipedianet.work valid until 2026-07-16 15:55:52 +0000 (expires in 38 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:43] no spikes on https://www.wikimediastatus.net/ yet… [13:10:43] RECOVERY - HTTPS non-canonical-redirect-21 on ncredir3006 is OK: SSL OK - Certificate wikipedia.org.tr valid until 2026-07-15 21:55:11 +0000 (expires in 37 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:43] RECOVERY - HTTPS non-canonical-redirect-24 on ncredir3006 is OK: SSL OK - Certificate wikiversity.us valid until 2026-07-17 15:58:57 +0000 (expires in 39 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:44] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir3006 is OK: SSL OK - Certificate 2wikipedia.com valid until 2026-07-04 04:49:50 +0000 (expires in 25 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:44] RECOVERY - HTTPS non-canonical-redirect-27 on ncredir3006 is OK: SSL OK - Certificate wiktionary.ee valid until 2026-07-17 19:56:54 +0000 (expires in 39 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:45] RECOVERY - HTTPS non-canonical-redirect-36 on ncredir3006 is OK: SSL OK - Certificate wikipediamovement.com valid until 2026-07-18 19:57:16 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:10:45] RECOVERY - HTTPS non-canonical-redirect-12 on ncredir3006 is OK: SSL OK - Certificate wikiedia.org valid until 2026-08-12 12:04:29 +0000 (expires in 64 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:11:10] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply [13:11:36] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [13:11:59] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1041.eqiad.wmnet with reason: host reimage [13:12:04] “0 languages rebuilt out of 549” yay [13:12:11] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply [13:12:23] looks like the alerts recovered, so I’ll continue deploying unless someone shouts [13:12:23] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, neriah: Backport for [[gerrit:1298418|NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (T426206)]], [[gerrit:1298717|Replace NewUserMessageOnAutoCreateFirstEdit with wgNewUserMessageOnFirstEdit (T426206)]], [[gerrit:1298734|Enable wgNewUserMessageOnFirstEdit on incubatorwiki (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki [13:12:23] /Mwdebug). Changes can now be verified there. [13:12:38] Neriah: anything to test on mwdebug for this change? [13:12:44] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply [13:12:49] (03PS4) 10Giuseppe Lavagetto: profile::base::production: add motd to aid LLM agents [puppet] - 10https://gerrit.wikimedia.org/r/1298775 [13:13:14] I think just making sure it works well [13:13:36] okay, are you testing it? [13:13:46] ya [13:14:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.23% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:14:23] thanks [13:14:27] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298758 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:14:31] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298762 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:14:34] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298764 (owner: 10Michael Große) [13:14:37] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298765 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:14:40] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298766 (https://phabricator.wikimedia.org/T424246) (owner: 10Michael Große) [13:14:53] I’m preparing for MichaelG_WMF in the meantime [13:15:13] 06SRE, 06DBA, 07Incident Severity 3, 07Wikimedia-Incident: External store unreachable: "Database servers in clusterXX are overloaded" - https://phabricator.wikimedia.org/T422130#11994071 (10MLechvien-WMF) 05Open→03Resolved a:03MLechvien-WMF [13:15:17] RESOLVED: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:15:24] thank you 🙏 [13:15:42] !incidents [13:15:43] 8063 (RESOLVED) PHPFPMTooBusy sre (mw-web main eqiad) [13:15:43] 8062 (RESOLVED) PHPFPMTooBusy sre (mw-web main eqiad) [13:15:43] 8061 (RESOLVED) PHPFPMTooBusy sre (mw-web main eqiad) [13:15:43] 8060 (RESOLVED) PHPFPMTooBusy sre (mw-web main eqiad) [13:15:50] All resolved, good [13:17:18] good [13:17:22] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, neriah: Continuing with deployment [13:17:24] ok, thanks! [13:18:40] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1041.eqiad.wmnet with reason: host reimage [13:18:42] (03CR) 10Elukey: [C:03+1] config: type config_file as PathLike[str] [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298541 (owner: 10Volans) [13:19:48] (03CR) 10Elukey: [C:03+1] decorators: fix dynamic callbacks bug in retry [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298656 (owner: 10Volans) [13:20:33] (03CR) 10Elukey: [C:03+1] config: raise on missing INI file when raises=True [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298657 (owner: 10Volans) [13:20:59] (03PS1) 10Clément Goubert: service::catalog: Add liftwing-openapi-server [puppet] - 10https://gerrit.wikimedia.org/r/1298779 (https://phabricator.wikimedia.org/T427902) [13:21:11] anyone happen to know what /srv/parsoid-testing is? (seen in T428452) [13:21:11] T428452: Error: Call to a member function getId() on null - https://phabricator.wikimedia.org/T428452 [13:21:11] (03CR) 10Elukey: [C:03+1] __init__: fail clearly when unknown __version__ [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298658 (owner: 10Volans) [13:21:33] (03PS2) 10Clément Goubert: service::catalog: Add liftwing-openapi-server [puppet] - 10https://gerrit.wikimedia.org/r/1298779 (https://phabricator.wikimedia.org/T427902) [13:21:35] (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1298779 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [13:21:39] (03CR) 10Elukey: [C:03+1] phabricator: reject trailing newline in task ID [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298659 (owner: 10Volans) [13:21:40] (03Abandoned) 10Jgiannelos: wikifeeds: remove rest-gateway references [deployment-charts] - 10https://gerrit.wikimedia.org/r/1156416 (https://phabricator.wikimedia.org/T367418) (owner: 10Hnowlan) [13:21:44] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298418|NewUserMessage: Add $wgNewUserMessageOnAutoCreateFirstEdit (T426206)]], [[gerrit:1298717|Replace NewUserMessageOnAutoCreateFirstEdit with wgNewUserMessageOnFirstEdit (T426206)]], [[gerrit:1298734|Enable wgNewUserMessageOnFirstEdit on incubatorwiki (T426206)]] (duration: 11m 06s) [13:21:48] T426206: Per global RfC, only welcome users on Wikimedia projects where they created their account or have edited - https://phabricator.wikimedia.org/T426206 [13:21:52] (03CR) 10Ssingh: [C:03+2] admin: update SSH key for tchanders [puppet] - 10https://gerrit.wikimedia.org/r/1298282 (owner: 10Ssingh) [13:21:55] (03CR) 10Gkyziridis: [C:03+1] "Thnx for fixing this!" [puppet] - 10https://gerrit.wikimedia.org/r/1298779 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [13:22:00] (03CR) 10Elukey: [C:03+1] dns: resolve() instead of deprecated query() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298660 (owner: 10Volans) [13:22:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.26% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:22:22] (03CR) 10Elukey: [C:03+1] actions: fix ActionsDict docstring example output [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298661 (owner: 10Volans) [13:22:27] “Change(s) 1298762 touch l10n-related files and are likely to trigger a large l10n rebuild, resulting in a slow deployment (~20 minutes)” [13:22:35] Thank you Lucas_WMDE! [13:22:38] extension.json again [13:22:39] Neriah: np :) [13:22:40] (03CR) 10Elukey: [C:03+1] interactive: fix ask_input Returns docstring [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298662 (owner: 10Volans) [13:22:47] Lucas_WMDE: parsoid rt testing maybe? [13:22:49] (03Abandoned) 10Jgiannelos: Configure stream for parser cache change events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1170174 (https://phabricator.wikimedia.org/T397072) (owner: 10Jgiannelos) [13:22:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298758 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:22:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298762 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:22:54] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298764 (owner: 10Michael Große) [13:22:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298765 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:23:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298766 (https://phabricator.wikimedia.org/T424246) (owner: 10Michael Große) [13:23:07] (03CR) 10Elukey: [C:03+1] interactive: improve error message with validators [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298663 (owner: 10Volans) [13:23:22] (03CR) 10Elukey: [C:03+1] irc: set the handler level via setLevel() [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1298664 (owner: 10Volans) [13:23:28] (03CR) 10Ssingh: "Yes, try profile::cache::varnish::frontend::fe_vcl_config::csp_header." [puppet] - 10https://gerrit.wikimedia.org/r/1297769 (owner: 10CDobbins) [13:23:41] claime: I guess so, it seems to be mentioned a few times at https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing [13:25:19] (03CR) 10Filippo Giunchedi: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1298742 (https://phabricator.wikimedia.org/T313030) (owner: 10Filippo Giunchedi) [13:26:26] waiting for CI on those backports… [13:26:28] (03CR) 10Clément Goubert: [C:03+2] service::catalog: Add liftwing-openapi-server [puppet] - 10https://gerrit.wikimedia.org/r/1298779 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [13:26:33] (03PS5) 10Neriah: Enable wgNewUserMessageOnFirstEdit on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298654 (https://phabricator.wikimedia.org/T426206) [13:26:34] (03PS2) 10Tiziano Fogli: slothslos/report2drive: add modules [puppet] - 10https://gerrit.wikimedia.org/r/1298294 (https://phabricator.wikimedia.org/T425795) [13:26:39] (03PS4) 10Tiziano Fogli: slothslos/report2drive: add profiles [puppet] - 10https://gerrit.wikimedia.org/r/1298295 (https://phabricator.wikimedia.org/T425795) [13:26:40] PROBLEM - HTTPS non-canonical-redirect-29 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:40] PROBLEM - HTTPS non-canonical-redirect-35 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:40] PROBLEM - HTTPS non-canonical-redirect-20 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:42] PROBLEM - HTTPS non-canonical-redirect-33 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:42] PROBLEM - HTTPS non-canonical-redirect-39 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:42] PROBLEM - HTTPS non-canonical-redirect-13 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:42] PROBLEM - HTTPS non-canonical-redirect-21 on ncredir3006 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/Ncredir [13:26:44] (03PS4) 10Tiziano Fogli: slothslos/report2drive: instantiate resources [puppet] - 10https://gerrit.wikimedia.org/r/1298296 (https://phabricator.wikimedia.org/T425795) [13:26:50] (03PS4) 10Tiziano Fogli: slothslos/report2drive: add Hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/1298297 (https://phabricator.wikimedia.org/T425795) [13:26:55] (03PS4) 10Tiziano Fogli: slothslos/report2drive: enable deep merge for vars [puppet] - 10https://gerrit.wikimedia.org/r/1298298 (https://phabricator.wikimedia.org/T425795) [13:26:55] ncredir3006 getting noisy again [13:27:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.83% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:27:22] (03PS1) 10Urbanecm: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298785 (https://phabricator.wikimedia.org/T420255) [13:27:24] (03Merged) 10jenkins-bot: feat(V2): toggle experiment features based on custom url override [extensions/GrowthExperiments] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298758 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:27:26] (03Merged) 10jenkins-bot: specialCreateAccount: use GECreateAccountExperimentV2 instead of hook [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298762 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:27:28] (03Merged) 10jenkins-bot: fix: correctly read experiments param on Special:UserLogin [extensions/WikimediaEvents] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298764 (owner: 10Michael Große) [13:27:36] :O https://integration.wikimedia.org/ci/job/quibble-with-gated-extensions-vendor-mysql-php83/39287/console had castor-save-workspace-cache waiting, started and completed within one second [13:27:37] jouncebot: nowandnext [13:27:37] For the next 0 hour(s) and 32 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1300) [13:27:37] In 1 hour(s) and 2 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1430) [13:27:39] I didn’t know that was possible!! [13:27:42] RECOVERY - HTTPS non-canonical-redirect-20 on ncredir3006 is OK: SSL OK - Certificate wikidestination.org valid until 2026-07-15 18:55:00 +0000 (expires in 37 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-29 on ncredir3006 is OK: SSL OK - Certificate wikimediacommons.com valid until 2026-07-18 14:57:06 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-35 on ncredir3006 is OK: SSL OK - Certificate wikipdeia.org valid until 2026-07-18 18:57:14 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-33 on ncredir3006 is OK: SSL OK - Certificate wikiwpedia.com valid until 2026-07-18 17:57:10 +0000 (expires in 40 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-39 on ncredir3006 is OK: SSL OK - Certificate wikipedia.ie valid until 2026-08-24 08:14:52 +0000 (expires in 76 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-13 on ncredir3006 is OK: SSL OK - Certificate 2wikipedia.com valid until 2026-07-04 04:49:50 +0000 (expires in 25 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:42] RECOVERY - HTTPS non-canonical-redirect-21 on ncredir3006 is OK: SSL OK - Certificate wikipedia.org.tr valid until 2026-07-15 21:55:11 +0000 (expires in 37 days) https://wikitech.wikimedia.org/wiki/Ncredir [13:27:57] (03CR) 10Urbanecm: [C:03+2] linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298785 (https://phabricator.wikimedia.org/T420255) (owner: 10Urbanecm) [13:28:26] (03PS5) 10Arnaudb: gitlab: support extra ssh host_aliases [puppet] - 10https://gerrit.wikimedia.org/r/1298771 (https://phabricator.wikimedia.org/T425441) [13:28:26] (03CR) 10Arnaudb: "This change is followed up by 1298781" [puppet] - 10https://gerrit.wikimedia.org/r/1298771 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb) [13:28:30] (03Merged) 10jenkins-bot: signup.js: use JS var instead of TestKitchen to show experiment [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298765 (https://phabricator.wikimedia.org/T424646) (owner: 10Michael Große) [13:28:38] (03Merged) 10jenkins-bot: UsernamePolicyPopover: add instrumentation for links [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298766 (https://phabricator.wikimedia.org/T424246) (owner: 10Michael Große) [13:28:55] (03PS1) 10Arnaudb: gitlab: advertise gitlab-ssh.wikimedia.org in UI clone URLs [puppet] - 10https://gerrit.wikimedia.org/r/1298781 (https://phabricator.wikimedia.org/T425441) [13:29:02] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1298758|feat(V2): toggle experiment features based on custom url override (T424646)]], [[gerrit:1298762|specialCreateAccount: use GECreateAccountExperimentV2 instead of hook (T424646)]], [[gerrit:1298764|fix: correctly read experiments param on Special:UserLogin]], [[gerrit:1298765|signup.js: use JS var instead of TestKitchen to show expe [13:29:02] riment (T424646)]], [[gerrit:1298766|UsernamePolicyPopover: add instrumentation for links (T424246)]] [13:29:06] T424646: Prepare V2 experiment for improved mobile account creation form (builds on V1) - https://phabricator.wikimedia.org/T424646 [13:29:06] T424246: Instrument policy popover links - https://phabricator.wikimedia.org/T424246 [13:29:19] “0 languages rebuilt out of 549” phew [13:30:18] (03Merged) 10jenkins-bot: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298785 (https://phabricator.wikimedia.org/T420255) (owner: 10Urbanecm) [13:30:47] !log lucaswerkmeister-wmde@deploy1003 migr, lucaswerkmeister-wmde: Backport for [[gerrit:1298758|feat(V2): toggle experiment features based on custom url override (T424646)]], [[gerrit:1298762|specialCreateAccount: use GECreateAccountExperimentV2 instead of hook (T424646)]], [[gerrit:1298764|fix: correctly read experiments param on Special:UserLogin]], [[gerrit:1298765|signup.js: use JS var instead of TestKitchen to show [13:30:48] experiment (T424646)]], [[gerrit:1298766|UsernamePolicyPopover: add instrumentation for links (T424246)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:30:54] MichaelG_WMF: please test :) [13:31:01] Lucas_WMDE: will do! [13:31:11] Dreamy_Jazz: I’m a bit confused why https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1296550 is only checking wbsetclaim, there are lots of Wikibase API modules [13:31:25] I guess mostly it’s because this is a Commons/SDC problem, where only a limited number of APIs are used? [13:31:27] It's the one we saw the issue on prod [13:31:30] Yes [13:31:41] Also because it's limited to Wikimedia Commons [13:31:51] If we find other places where it's broken we would update that [13:31:56] RECOVERY - Confd vcl based reload on cp6014 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:32:02] I suspect wbsetlabel might be useful though [13:32:04] (that’s used for captions) [13:32:37] (I just tested it at https://commons.wikimedia.org/w/index.php?title=File:PNG_Test.png&diff=prev&oldid=1227902574) [13:32:59] !log urbanecm@deploy1003 helmfile [staging] START helmfile.d/services/linkrecommendation: apply [13:33:15] It doesn't trigger the issue for me [13:33:26] ok, if you say so [13:33:39] b/c it's when an AbuseFilter asks for a CAPTCHA [13:33:40] (me neither but idk how it decides who gets capchaed and who doesn’t) [13:33:42] RECOVERY - Confd vcl based reload on cp6009 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:33:42] ah ok [13:33:48] The only case we saw was when adding external URLs [13:33:56] RECOVERY - Confd vcl based reload on cp6010 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:33:56] Which can't be done in the caption AFAICS [13:34:08] !log urbanecm@deploy1003 helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply [13:34:14] @Lucas_WMDE Tested and it looks good to me 👍. Ready to move forward from my side. [13:34:17] !log lucaswerkmeister-wmde@deploy1003 migr, lucaswerkmeister-wmde: Continuing with deployment [13:34:18] thanks! [13:34:47] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2041: repool after upgrade [13:34:51] !log urbanecm@deploy1003 helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply [13:35:01] urbanecm: did you want to scap something btw? [13:35:12] or just helmfile? [13:35:17] Lucas_WMDE: just helmfile [13:35:20] ok [13:35:37] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1041.eqiad.wmnet with OS trixie [13:36:46] !log urbanecm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply [13:37:11] !log urbanecm@deploy1003 helmfile [codfw] START helmfile.d/services/linkrecommendation: apply [13:37:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.78% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:37:56] PROBLEM - Confd vcl based reload on cp6014 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [13:38:01] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [13:38:17] !log gkyziridis@deploy1003 helmfile [ml-staging-codfw] 'sync' command on namespace 'liftwing-openapi-server' for release 'main' . [13:38:19] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1041: repool after maintenance [13:38:27] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298758|feat(V2): toggle experiment features based on custom url override (T424646)]], [[gerrit:1298762|specialCreateAccount: use GECreateAccountExperimentV2 instead of hook (T424646)]], [[gerrit:1298764|fix: correctly read experiments param on Special:UserLogin]], [[gerrit:1298765|signup.js: use JS var instead of TestKitchen to show exp [13:38:27] eriment (T424646)]], [[gerrit:1298766|UsernamePolicyPopover: add instrumentation for links (T424246)]] (duration: 09m 24s) [13:38:29] Dreamy_Jazz: over to you, please let me know when you’re done so I can do the last backports + config changes :) [13:38:29] (which will take ages because l10n cache rebuild) [13:38:30] T424646: Prepare V2 experiment for improved mobile account creation form (builds on V1) - https://phabricator.wikimedia.org/T424646 [13:38:31] T424246: Instrument policy popover links - https://phabricator.wikimedia.org/T424246 [13:38:35] Sure [13:39:26] !log urbanecm@deploy1003 helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply [13:40:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz) [13:40:56] RECOVERY - Confd vcl based reload on cp6014 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:41:29] (03Merged) 10jenkins-bot: hCaptcha: Don't show AbuseFilter CAPTCHA for wbsetclaim API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1296550 (https://phabricator.wikimedia.org/T427608) (owner: 10Dreamy Jazz) [13:41:44] (03CR) 10Clément Goubert: [C:03+2] dns: Add liftwing-openapi-server CNAME records [dns] - 10https://gerrit.wikimedia.org/r/1297710 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [13:41:47] !log dreamyjazz@deploy1003 Started scap sync-world: Backport for [[gerrit:1296550|hCaptcha: Don't show AbuseFilter CAPTCHA for wbsetclaim API (T427608)]] [13:41:52] T427608: hCaptcha: Edits made via the API on WMF wikis that trigger AbuseFilter still require hCaptcha completion - https://phabricator.wikimedia.org/T427608 [13:41:59] !log cgoubert@dns1004 START - running authdns-update [13:43:34] !log dreamyjazz@deploy1003 dreamyjazz: Backport for [[gerrit:1296550|hCaptcha: Don't show AbuseFilter CAPTCHA for wbsetclaim API (T427608)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:43:59] !log cgoubert@dns1004 END - running authdns-update [13:45:22] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/Wikidata.org] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298709 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:45:27] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/Wikibase] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298710 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:46:04] !log dreamyjazz@deploy1003 dreamyjazz: Continuing with deployment [13:48:32] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [13:48:56] RECOVERY - Confd vcl based reload on cp6011 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:50:19] !log dreamyjazz@deploy1003 Finished scap sync-world: Backport for [[gerrit:1296550|hCaptcha: Don't show AbuseFilter CAPTCHA for wbsetclaim API (T427608)]] (duration: 08m 31s) [13:50:23] T427608: hCaptcha: Edits made via the API on WMF wikis that trigger AbuseFilter still require hCaptcha completion - https://phabricator.wikimedia.org/T427608 [13:50:24] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [13:50:37] Lucas_WMDE: Over to you [13:50:44] thanks! [13:50:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/Wikidata.org] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298709 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:50:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/Wikibase] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298710 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:50:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297644 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven) [13:50:53] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [13:51:44] note: i'm currently assuming that the logspam coming from our rt-testing (T428452) can be filtered out and is not impairing current production/deployment operations - let us know if we should be more proactive in squashing it (I *think* we'd rather have it run its course - it's another few hours, but since we're probably not going to deploy this anyway... it can probably be halted if that's necessary) [13:51:45] T428452: Error: Call to a member function getId() on null - https://phabricator.wikimedia.org/T428452 [13:52:06] ihurbain: if it’s only a few more hours then IMHO it’s okay [13:52:46] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [13:53:18] (03Abandoned) 10JHathaway: WIP: rowlf-pp [puppet] - 10https://gerrit.wikimedia.org/r/1295078 (owner: 10JHathaway) [13:53:23] (03Abandoned) 10JHathaway: WIP: Puppet 8 legacy fact removal [puppet] - 10https://gerrit.wikimedia.org/r/1282364 (owner: 10JHathaway) [13:53:26] (03Merged) 10jenkins-bot: Add translatable messages for WikiProject names [extensions/Wikidata.org] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298709 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:53:33] (03Abandoned) 10JHathaway: puppet8: migrate "easy" legacy puppet facts to structured facts [puppet] - 10https://gerrit.wikimedia.org/r/1074239 (owner: 10JHathaway) [13:54:09] (03CR) 10JHathaway: profile::postfix::mx: Mark the SMTP port as intentionally open (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1283043 (https://phabricator.wikimedia.org/T149804) (owner: 10Muehlenhoff) [13:54:22] (03CR) 10Giuseppe Lavagetto: [C:03+1] P:cache:haproxy add image generator information [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338) (owner: 10Slyngshede) [13:54:28] !log dpogorzelski@deploy1003 helmfile [ml-staging-codfw] 'sync' command on namespace 'liftwing-openapi-server' for release 'main' . [13:54:56] RECOVERY - Confd vcl based reload on cp6012 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:55:22] (03PS3) 10Federico Ceratto: sre.mysql: add local ruff.toml [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 [13:55:26] (03PS5) 10Atsuko: admin_ng/dse-k8s: create opensearch ClusterIssuer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298327 (https://phabricator.wikimedia.org/T427517) [13:56:32] !log cgoubert@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [13:57:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.24% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:57:19] !log cgoubert@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [13:57:23] would there be time for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1298745 as well now? [13:57:39] !log cgoubert@deploy1003 helmfile [ml-staging-codfw] 'sync' command on namespace 'liftwing-openapi-server' for release 'main' . [13:57:47] Neriah33: I doubt it, sorry :/ [13:57:57] I started another deploy that includes message changes and so will take ages to run [13:58:03] (that’s currently waiting for CI) [13:58:13] np [13:58:54] (03Merged) 10jenkins-bot: Use translatable messages for WikiProject links [extensions/Wikibase] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298710 (https://phabricator.wikimedia.org/T427804) (owner: 10Lucas Werkmeister (WMDE)) [13:58:56] RECOVERY - Confd vcl based reload on cp6016 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [13:59:01] i just noticed that this patch got merged, and it fixes an issue that had been bothering me :) [13:59:06] (03PS1) 10Clément Goubert: admin_ng: Disable istio injection for openapi-server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298794 (https://phabricator.wikimedia.org/T427902) [13:59:11] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "poke zuul" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297644 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven) [13:59:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 19.87% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:59:43] (03CR) 10Elukey: [C:03+1] "I think it is good to a first real test! Really nice job :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [14:00:05] (03Merged) 10jenkins-bot: WikiProject links - remove 'text' config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297644 (https://phabricator.wikimedia.org/T427804) (owner: 10Audrey Penven) [14:00:29] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1298709|Add translatable messages for WikiProject names (T427804)]], [[gerrit:1298710|Use translatable messages for WikiProject links (T427804)]], [[gerrit:1297644|WikiProject links - remove 'text' config (T427804)]] [14:01:16] !log cgoubert@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [14:01:36] 10ops-codfw, 06SRE, 06DC-Ops: codfw: move public baremetal servers to per rack vlan - https://phabricator.wikimedia.org/T428060#11994303 (10ssingh) >>! In T428060#11993627, @ayounsi wrote: > > @ssingh For the DNS servers, the ones peering with the core routers will have a higher priority (as-path) than the... [14:02:05] !log cgoubert@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [14:02:39] jouncebot: now [14:02:39] No deployments scheduled for the next 0 hour(s) and 27 minute(s) [14:02:42] * Lucas_WMDE is still deploying btw [14:02:50] (03PS1) 10Bartosz Wójtowicz: ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298796 (https://phabricator.wikimedia.org/T428127) [14:02:51] !log cgoubert@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [14:02:51] hopefully it’ll finish within those 27 minutes [14:02:57] (03PS4) 10Jcrespo: bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) [14:03:44] !log cgoubert@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [14:03:46] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [14:04:56] (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1297749 (owner: 10CDanis) [14:05:27] (03CR) 10Jcrespo: [C:03+2] bacula: Reenable ro ES bacula backups to finalize eqiad->codfw ones [puppet] - 10https://gerrit.wikimedia.org/r/1298737 (https://phabricator.wikimedia.org/T424661) (owner: 10Jcrespo) [14:05:32] !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] 'sync' command on namespace 'liftwing-openapi-server' for release 'main' . [14:05:51] !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] 'sync' command on namespace 'liftwing-openapi-server' for release 'main' . [14:07:25] !log cgoubert@cumin1003 START - Cookbook sre.dns.netbox [14:07:50] (03PS1) 10Clément Goubert: conftool-data: Add liftwing-openapi-server [puppet] - 10https://gerrit.wikimedia.org/r/1298798 (https://phabricator.wikimedia.org/T427902) [14:07:56] RECOVERY - Confd vcl based reload on cp6015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [14:07:59] !log depooling cp6013 to restart varnish [14:08:03] (03CR) 10CDanis: [C:03+2] cache: haproxy: enable_mlock globally 🚀🌍 [puppet] - 10https://gerrit.wikimedia.org/r/1297749 (owner: 10CDanis) [14:08:06] !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp6013.* [14:08:58] (03CR) 10Clément Goubert: [C:03+2] conftool-data: Add liftwing-openapi-server [puppet] - 10https://gerrit.wikimedia.org/r/1298798 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [14:09:14] jynus: ok to merge your bacula patch too? [14:09:34] cdanis: if it's got my conftool patch go ahead and merge that as well [14:10:17] claime: nope, but i'm done now [14:10:26] !log cgoubert@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:10:39] !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp6013.* [14:10:42] PROBLEM - Confd vcl based reload on cp6009 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:10:56] RECOVERY - Confd vcl based reload on cp6013 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [14:11:51] !log cgoubert@cumin1003 conftool action : set/pooled=true; selector: dnsdisc=liftwing-openapi-server.* [14:12:35] (03PS5) 10Federico Ceratto: sre.mysql.pool Remove linter hints, rename logger [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) [14:13:02] (03PS1) 10JMeybohm: Bump kubeconform checks to 1.34.8, remove 1.23.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298799 (https://phabricator.wikimedia.org/T427069) [14:13:32] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy1003 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner [14:13:56] PROBLEM - Confd vcl based reload on cp6016 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:13:56] PROBLEM - Confd vcl based reload on cp6015 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:13:56] PROBLEM - Confd vcl based reload on cp6012 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:13:56] PROBLEM - Confd vcl based reload on cp6011 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:13:56] PROBLEM - Confd vcl based reload on cp6014 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:13:56] images built \o/ [14:13:58] PROBLEM - Confd vcl based reload on cp6010 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [14:14:30] (03CR) 10CI reject: [V:04-1] Bump kubeconform checks to 1.34.8, remove 1.23.6 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298799 (https://phabricator.wikimedia.org/T427069) (owner: 10JMeybohm) [14:17:36] (03CR) 10Gkyziridis: [C:03+1] "Thnx for fixing this!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298794 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [14:17:56] (03CR) 10AikoChou: [C:03+1] ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298796 (https://phabricator.wikimedia.org/T428127) (owner: 10Bartosz Wójtowicz) [14:18:08] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, audreypenven: Backport for [[gerrit:1298709|Add translatable messages for WikiProject names (T427804)]], [[gerrit:1298710|Use translatable messages for WikiProject links (T427804)]], [[gerrit:1297644|WikiProject links - remove 'text' config (T427804)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:18:12] T427804: [WIPR] Allow for translations of Wikiproject names - https://phabricator.wikimedia.org/T427804 [14:18:14] yerdua_wmde: please test :) [14:18:36] I see the message key at https://test.wikidata.org/wiki/Q42?uselang=qqx when I turn on WikimediaDebug, yay [14:19:51] looks right to me [14:19:57] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, audreypenven: Continuing with deployment [14:19:59] alright, then let’s go [14:20:04] and find out how long the prod deploy will take [14:20:09] thanks! [14:20:22] (03CR) 10Bartosz Wójtowicz: [C:03+2] ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298796 (https://phabricator.wikimedia.org/T428127) (owner: 10Bartosz Wójtowicz) [14:21:41] (03PS2) 10Federico Ceratto: sre.mysql: Auto-lint imports [cookbooks] - 10https://gerrit.wikimedia.org/r/1293666 (https://phabricator.wikimedia.org/T419874) [14:22:25] (03Merged) 10jenkins-bot: ml-services: Bump outlink-topic-model image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298796 (https://phabricator.wikimedia.org/T428127) (owner: 10Bartosz Wójtowicz) [14:23:44] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1041: repool after maintenance [14:23:56] !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [14:24:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote es2043 to es4 codfw primary T428386', diff saved to https://phabricator.wikimedia.org/P93926 and previous config saved to /var/cache/conftool/dbconfig/20260608-142423-marostegui.json [14:24:28] T428386: Migrate es4 section to Debian Trixie - https://phabricator.wikimedia.org/T428386 [14:25:02] !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade [14:25:12] !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2042: Upgrading es2042.codfw.wmnet [14:25:43] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2042: Upgrading es2042.codfw.wmnet [14:26:07] !log bwojtowicz@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [14:26:42] !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS trixie [14:27:04] !log bwojtowicz@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [14:29:16] I think we might run slightly into the test kitchen window, but hopefully not too much [14:30:06] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1430) [14:30:37] still deploying, sorry [14:30:38] (03CR) 10Elukey: "I have to say that I am not 100% happy that we are keeping different linting standards, it would be nice if we kept all cookbooks to the s" [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 (owner: 10Federico Ceratto) [14:32:26] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298709|Add translatable messages for WikiProject names (T427804)]], [[gerrit:1298710|Use translatable messages for WikiProject links (T427804)]], [[gerrit:1297644|WikiProject links - remove 'text' config (T427804)]] (duration: 31m 57s) [14:32:29] !log UTC afternoon backport+config window done [14:32:31] T427804: [WIPR] Allow for translations of Wikiproject names - https://phabricator.wikimedia.org/T427804 [14:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:36] over to the test kitcheners, sorry for the delay [14:33:32] (03CR) 10CWilliams: "I don't follow why there was a rename of log -> logger, nor why it is mixed in with removing the linting hints for pylint etc... please ca" [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [14:36:53] (03CR) 10CWilliams: [C:03+1] sre.mysql: Auto-lint imports [cookbooks] - 10https://gerrit.wikimedia.org/r/1293666 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto) [14:38:29] (03PS3) 10Lucas Werkmeister (WMDE): Add Wikidata configuration for WikiProject links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298293 (https://phabricator.wikimedia.org/T422935) [14:41:23] (03PS1) 10Eevans: cassandra: GRANT access to linked_artifacts.editor_counts_per_page [puppet] - 10https://gerrit.wikimedia.org/r/1298809 (https://phabricator.wikimedia.org/T428218) [14:42:22] (03CR) 10Eevans: [C:03+2] cassandra: GRANT access to linked_artifacts.editor_counts_per_page [puppet] - 10https://gerrit.wikimedia.org/r/1298809 (https://phabricator.wikimedia.org/T428218) (owner: 10Eevans) [14:42:38] !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2042.codfw.wmnet with reason: host reimage [14:44:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [14:44:24] (03CR) 10Federico Ceratto: [C:03+2] sre.mysql: Auto-lint imports [cookbooks] - 10https://gerrit.wikimedia.org/r/1293666 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto) [14:45:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.45% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [14:45:28] (03PS1) 10Hnowlan: thumbor: change readiness probes to make surge recovery safer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298811 (https://phabricator.wikimedia.org/T357145) [14:46:03] (03PS2) 10Hnowlan: thumbor: change readiness probes to make surge recovery safer [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298811 (https://phabricator.wikimedia.org/T357145) [14:47:13] (03PS23) 10Ayounsi: Create cookbook to depool all services in a given rack [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) [14:47:48] (03CR) 10Federico Ceratto: [V:03+2 C:03+2] sre.mysql: Auto-lint imports [cookbooks] - 10https://gerrit.wikimedia.org/r/1293666 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto) [14:48:06] (03CR) 10Ayounsi: Create cookbook to depool all services in a given rack (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1239896 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [14:49:26] (03CR) 10CWilliams: "@ltoscano@wikimedia.org I am not sure for the reason, but perhaps given that if I run "ruff format" then rather a lot changes, perhaps it " [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 (owner: 10Federico Ceratto) [14:49:35] (03CR) 10Lucas Werkmeister (WMDE): "Done, seemed to work fine (T422935#11994576)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298293 (https://phabricator.wikimedia.org/T422935) (owner: 10Lucas Werkmeister (WMDE)) [14:49:50] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2042.codfw.wmnet with reason: host reimage [14:49:56] !log jgiannelos@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply [14:50:23] !log jgiannelos@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply [14:50:24] !log jgiannelos@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply [14:50:26] (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2026-06-03-023342 to 2026-06-06-013944 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298812 (https://phabricator.wikimedia.org/T426332) [14:50:34] (03PS1) 10Jforrester: wikifunctions: Switch Python evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298813 (https://phabricator.wikimedia.org/T417870) [14:50:35] (03CR) 10Dzahn: [C:03+2] admin: upgrade Audrey Penven from ldap_only to restricted [puppet] - 10https://gerrit.wikimedia.org/r/1298299 (https://phabricator.wikimedia.org/T427531) (owner: 10Dzahn) [14:50:48] (03PS1) 10Eevans: data-gateway: deploy version v1.0.15 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298814 (https://phabricator.wikimedia.org/T428218) [14:50:50] !log jgiannelos@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply [14:50:56] (03CR) 10Jforrester: [C:03+2] wikifunctions: Upgrade evaluators from 2026-06-03-023342 to 2026-06-06-013944 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298812 (https://phabricator.wikimedia.org/T426332) (owner: 10Jforrester) [14:53:09] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2026-06-03-023342 to 2026-06-06-013944 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298812 (https://phabricator.wikimedia.org/T426332) (owner: 10Jforrester) [14:54:11] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [14:54:16] (03CR) 10Federico Ceratto: "The rename goes from logger to log to keep consistent to our other cookbooks and scripts (and less verbose)." [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [14:55:09] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [14:55:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 22.35% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [14:55:41] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [14:55:58] (03CR) 10Eevans: [C:03+2] data-gateway: deploy version v1.0.15 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298814 (https://phabricator.wikimedia.org/T428218) (owner: 10Eevans) [14:56:59] (03PS2) 10Elukey: role::cache::{text,upload}: enable webrequest tagging globally [puppet] - 10https://gerrit.wikimedia.org/r/1298318 (https://phabricator.wikimedia.org/T402512) [14:58:05] (03PS2) 10Jforrester: wikifunctions: Switch Python evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298813 (https://phabricator.wikimedia.org/T417870) [14:58:12] (03Merged) 10jenkins-bot: data-gateway: deploy version v1.0.15 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298814 (https://phabricator.wikimedia.org/T428218) (owner: 10Eevans) [14:59:02] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [14:59:16] (03CR) 10Jforrester: [C:03+2] wikifunctions: Switch Python evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298813 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [14:59:54] (03PS6) 10Federico Ceratto: sre.mysql.pool Remove linter hints, rename logger [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) [15:00:28] (03CR) 10CWilliams: "OK... maybe that deserved a separate commit? Easier to review and not ask questions that way ;)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [15:00:54] !log eevans@deploy1003 helmfile [staging] START helmfile.d/services/data-gateway: apply [15:01:29] !log eevans@deploy1003 helmfile [staging] DONE helmfile.d/services/data-gateway: apply [15:01:51] (03Merged) 10jenkins-bot: wikifunctions: Switch Python evaluator to Rust-based version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298813 (https://phabricator.wikimedia.org/T417870) (owner: 10Jforrester) [15:01:59] (03PS1) 10Jcrespo: mariadb: Switchover backup1-codfw primary db2183->db2184 [puppet] - 10https://gerrit.wikimedia.org/r/1298816 (https://phabricator.wikimedia.org/T427357) [15:02:28] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:03:04] (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1298816 (https://phabricator.wikimedia.org/T427357) (owner: 10Jcrespo) [15:03:18] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:03:24] (03CR) 10CDanis: [C:03+1] role::cache::{text,upload}: enable webrequest tagging globally [puppet] - 10https://gerrit.wikimedia.org/r/1298318 (https://phabricator.wikimedia.org/T402512) (owner: 10Elukey) [15:03:40] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:03:42] !log jynus@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2183-2184].codfw.wmnet with reason: Switchover db [15:03:52] (03PS1) 10Trueg: dse-k8s: Allow the usage of ceph-rdb-ssd for wdqs namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298817 (https://phabricator.wikimedia.org/T425007) [15:04:10] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:04:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.86% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:04:32] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:06:02] (03CR) 10Elukey: [C:03+2] role::cache::{text,upload}: enable webrequest tagging globally [puppet] - 10https://gerrit.wikimedia.org/r/1298318 (https://phabricator.wikimedia.org/T402512) (owner: 10Elukey) [15:07:14] !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2042.codfw.wmnet with OS trixie [15:08:06] (03CR) 10Clément Goubert: [C:03+2] admin_ng: Disable istio injection for openapi-server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298794 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [15:08:28] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:08:46] !log jforrester@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:08:50] !log jforrester@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:08:58] !log jforrester@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:09:01] !log jforrester@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:09:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.86% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:09:18] !log jforrester@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:09:23] !log jforrester@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:09:51] (03PS6) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) [15:10:10] (03CR) 10Gmodena: wdqs-backend: Deployment chart for the WDQS triple-store (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1286374 (https://phabricator.wikimedia.org/T425007) (owner: 10Trueg) [15:10:35] (03CR) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [15:10:41] marostegui@cumin1003 major-upgrade (PID 1980850) is awaiting input [15:12:03] !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99) [15:12:11] !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2042: repool after upgrade [15:12:36] (03CR) 10CI reject: [V:04-1] cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [15:15:07] (03CR) 10Federico Ceratto: "In gerrit multiple commits become individual code reviews 😐 there is no way AFAIK to bundle more commits in one CR, so I try avoid making " [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [15:15:10] (03CR) 10Federico Ceratto: [C:03+2] sre.mysql.pool Remove linter hints, rename logger [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [15:16:47] (03PS2) 10AOkoth: site: apply production role to phab2003 [puppet] - 10https://gerrit.wikimedia.org/r/1295460 (https://phabricator.wikimedia.org/T423727) [15:16:52] (03Merged) 10jenkins-bot: admin_ng: Disable istio injection for openapi-server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298794 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [15:18:18] !log dbmaint on backup1-codfw@codfw (T428467) [15:18:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:22] T428467: Switchover backup1-eqiad primary before network maintenance - https://phabricator.wikimedia.org/T428467 [15:18:47] (03Merged) 10jenkins-bot: sre.mysql.pool Remove linter hints, rename logger [cookbooks] - 10https://gerrit.wikimedia.org/r/1294331 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto) [15:20:02] (03PS1) 10Ladsgroup: [WIP] Start of thumb.wikimedia.org in text [puppet] - 10https://gerrit.wikimedia.org/r/1298820 (https://phabricator.wikimedia.org/T427465) [15:20:19] (03CR) 10Elukey: "Looks good, the Python script looks ok at first glance, I haven't tested it though. Left some minor questions in the code to understand so" [puppet] - 10https://gerrit.wikimedia.org/r/1298294 (https://phabricator.wikimedia.org/T425795) (owner: 10Tiziano Fogli) [15:20:46] (03PS1) 10Ladsgroup: wikimedia.org: Introduce thumb.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1298821 (https://phabricator.wikimedia.org/T427465) [15:23:32] (03CR) 10Federico Ceratto: "By setting parameters in a custom ruff.yaml I introduce more strict checks (where useful) in a more gentle, incremental way." [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 (owner: 10Federico Ceratto) [15:25:14] (03PS1) 10Clément Goubert: rest-gateway: Add routing for liftwing-openapi-server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298819 (https://phabricator.wikimedia.org/T427902) [15:28:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.48% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:28:42] (03CR) 10Jcrespo: [C:03+2] mariadb: Switchover backup1-codfw primary db2183->db2184 [puppet] - 10https://gerrit.wikimedia.org/r/1298816 (https://phabricator.wikimedia.org/T427357) (owner: 10Jcrespo) [15:30:04] jan_drewniak: Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1530). Please do the needful. [15:31:31] (03CR) 10Gkyziridis: [C:03+1] "Thnx for configuring this!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298819 (https://phabricator.wikimedia.org/T427902) (owner: 10Clément Goubert) [15:33:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 23.48% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:40:14] (03PS4) 10Federico Ceratto: sre.mysql: add local ruff.toml [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 [15:40:30] (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.4.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298823 (https://phabricator.wikimedia.org/T427976) [15:40:51] (03PS1) 10Jcrespo: dbbackups: Point media backups to the new replica, db2183 [puppet] - 10https://gerrit.wikimedia.org/r/1298824 (https://phabricator.wikimedia.org/T427357) [15:41:41] (03CR) 10Elukey: "Sure but it also introduces a new config for a subset of cookbooks, that we may forget etc.. I am not against it, I think it is fine to pr" [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 (owner: 10Federico Ceratto) [15:41:55] (03Abandoned) 10Fabfur: hiera: disable cidergrinder (as emergency measure) [puppet] - 10https://gerrit.wikimedia.org/r/1292002 (owner: 10Fabfur) [15:43:20] (03CR) 10Hnowlan: "recheck" [software/klaxon] - 10https://gerrit.wikimedia.org/r/1274026 (owner: 10CDanis) [15:45:15] !log jynus@cumin2002 START - Cookbook sre.hosts.remove-downtime for db[2183-2184].codfw.wmnet [15:45:17] !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2183-2184].codfw.wmnet [15:47:04] (03CR) 10Jcrespo: [C:03+2] dbbackups: Point media backups to the new replica, db2183 [puppet] - 10https://gerrit.wikimedia.org/r/1298824 (https://phabricator.wikimedia.org/T427357) (owner: 10Jcrespo) [15:51:57] (03PS8) 10Federico Ceratto: sre.mysql: split pool/depool [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) [15:51:57] (03CR) 10Federico Ceratto: "This initial CR move code from the pool cookbook into depool as needed." [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) (owner: 10Federico Ceratto) [15:53:26] (03CR) 10CI reject: [V:04-1] sre.mysql: split pool/depool [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) (owner: 10Federico Ceratto) [15:57:36] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2042: repool after upgrade [15:59:37] (03CR) 10Clare Ming: [C:03+2] Test Kitchen UI: Deploy v1.4.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298823 (https://phabricator.wikimedia.org/T427976) (owner: 10Santiago Faci) [16:01:54] (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v1.4.1 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298823 (https://phabricator.wikimedia.org/T427976) (owner: 10Santiago Faci) [16:04:25] (03PS1) 10Dzahn: admin: add osleger to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/1298830 (https://phabricator.wikimedia.org/T428262) [16:05:11] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298831 [16:06:46] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply [16:07:26] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply [16:08:03] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply [16:08:31] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply [16:08:57] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:09:24] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply [16:10:04] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply [16:10:10] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply [16:10:18] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply [16:10:24] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply [16:10:38] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply [16:10:44] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply [16:12:00] (03PS1) 10Mmartorana: config: Disable EmailConfirmationBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298834 (https://phabricator.wikimedia.org/T428291) [16:12:36] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [16:12:43] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply [16:13:04] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply [16:13:10] !log kamila@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply [16:13:36] !log kamila@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply [16:13:55] (03PS1) 10FNegri: toolsdb: automatically terminate idle transactions [puppet] - 10https://gerrit.wikimedia.org/r/1298835 (https://phabricator.wikimedia.org/T409857) [16:14:04] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298834 (https://phabricator.wikimedia.org/T428291) (owner: 10Mmartorana) [16:14:22] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply [16:14:25] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297162 (owner: 10Matthias Mullie) [16:14:32] (03CR) 10Btullis: [C:03+2] dse-k8s: Allow the usage of ceph-rdb-ssd for wdqs namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298817 (https://phabricator.wikimedia.org/T425007) (owner: 10Trueg) [16:14:41] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply [16:14:48] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply [16:14:54] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply [16:15:00] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply [16:15:42] (03CR) 10Matthias Mullie: [C:03+1] "Approved, will deploy shortly" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297162 (owner: 10Matthias Mullie) [16:15:53] (03PS2) 10Matthias Mullie: MultimediaViewer: enable image carousel as a beta feature on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297162 [16:16:12] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply [16:16:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.69% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:16:18] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply [16:16:25] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:16:26] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [16:16:32] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply [16:17:37] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply [16:17:44] !log kamila@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply [16:18:55] !log kamila@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply [16:21:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at eqiad: 24.69% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:21:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [16:23:33] (03Merged) 10jenkins-bot: dse-k8s: Allow the usage of ceph-rdb-ssd for wdqs namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298817 (https://phabricator.wikimedia.org/T425007) (owner: 10Trueg) [16:25:55] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox: apply [16:26:48] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply [16:26:54] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply [16:27:11] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply [16:27:17] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply [16:28:04] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply [16:28:10] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply [16:28:21] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [16:28:28] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply [16:28:45] (03PS3) 10Mpostoronca: wmf-config: Enable hCaptcha on UploadWizard publish for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298829 (https://phabricator.wikimedia.org/T426126) [16:29:08] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply [16:29:14] !log kamila@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply [16:29:16] (03PS7) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) [16:30:33] !log kamila@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply [16:32:25] (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1298830 (https://phabricator.wikimedia.org/T428262) (owner: 10Dzahn) [16:32:29] (03CR) 10CI reject: [V:04-1] cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto) [16:33:21] (03PS1) 10Urbanecm: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298837 (https://phabricator.wikimedia.org/T321316) [16:33:57] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:34:09] (03CR) 10Urbanecm: [C:03+2] linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298837 (https://phabricator.wikimedia.org/T321316) (owner: 10Urbanecm) [16:35:58] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:36:22] (03Merged) 10jenkins-bot: linkrecommendation: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1298837 (https://phabricator.wikimedia.org/T321316) (owner: 10Urbanecm) [16:38:08] (03Abandoned) 10Bearloga: shiny_server: Minimal dependencies [puppet] - 10https://gerrit.wikimedia.org/r/817903 (owner: 10Bearloga) [16:38:20] (03Abandoned) 10Bearloga: r_lang: Switch from devtools to remotes [puppet] - 10https://gerrit.wikimedia.org/r/817907 (owner: 10Bearloga) [16:39:17] `/srv/deployment-charts` seems to be in dirty state, is that intended/expected? [16:39:30] PROBLEM - Host lsw1-b2-codfw.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:39:54] PROBLEM - Host ps1-b2-codfw is DOWN: PING CRITICAL - Packet loss = 100% [16:40:06] urbanecm: not expected, asking in -sre [16:40:13] ty! [16:43:22] FIRING: CertAlmostExpired: gNMI TLS certificate for lsw1-b2-codfw.mgmt.codfw.wmnet is going to expire in 0s - https://wikitech.wikimedia.org/wiki/Network_monitoring#CertAlmostExpired - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic?var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:44:02] RECOVERY - Host ps1-b2-codfw is UP: PING OK - Packet loss = 0%, RTA = 33.36 ms [16:44:03] Raine: i see a message from someone else in #wikimedia-sre; let me know if i should monitor somewhere else for follow-up [16:44:04] RECOVERY - Host lsw1-b2-codfw.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.42 ms [16:44:35] urbanecm: b.tullis beat me to it [16:44:41] sounds ood [16:44:59] if nobody responds soon, I will save the change and clean it up [16:45:06] ty ❤️ [16:48:08] (03PS9) 10Federico Ceratto: sre.mysql: split pool/depool [cookbooks] - 10https://gerrit.wikimedia.org/r/1295480 (https://phabricator.wikimedia.org/T422361) [16:48:22] RESOLVED: CertAlmostExpired: gNMI TLS certificate for lsw1-b2-codfw.mgmt.codfw.wmnet is going to expire in 0s - https://wikitech.wikimedia.org/wiki/Network_monitoring#CertAlmostExpired - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic?var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:49:04] (03PS1) 10Matthias Mullie: Squashed diff to master [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298841 [16:49:41] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298841 (owner: 10Matthias Mullie) [16:51:31] urbanecm: cleaned up [16:51:34] ty! [16:51:58] RECOVERY - Confd vcl based reload on cp6015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [16:51:58] RECOVERY - Confd vcl based reload on cp6016 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [16:52:01] repo is `behind 'origin/master' by 4 commits` , i guess that'll be fixed by the autopuller? [16:52:07] fixed now [16:52:10] * urbanecm deploys [16:52:24] !log urbanecm@deploy1003 helmfile [staging] START helmfile.d/services/linkrecommendation: apply [16:52:47] (03PS5) 10Federico Ceratto: sre.mysql: add local ruff.toml [cookbooks] - 10https://gerrit.wikimedia.org/r/1297100 [16:52:58] RECOVERY - Confd vcl based reload on cp6012 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [16:52:58] RECOVERY - Confd vcl based reload on cp6010 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [16:53:36] !log urbanecm@deploy1003 helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply [16:53:56] !log urbanecm@deploy1003 helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply [16:55:42] !log urbanecm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply [16:57:15] !log urbanecm@deploy1003 helmfile [codfw] START helmfile.d/services/linkrecommendation: apply [16:58:59] !log urbanecm@deploy1003 helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply [16:59:20] * urbanecm done [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1700) [17:00:05] ryankemper: That opportune time for a Wikidata Query Service weekly deploy deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T1700). [17:03:25] (03CR) 10Dreamy Jazz: wmf-config: Enable hCaptcha on UploadWizard publish for testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298829 (https://phabricator.wikimedia.org/T426126) (owner: 10Mpostoronca) [17:05:29] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply [17:05:52] !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply [17:08:32] (03CR) 10Dreamy Jazz: "Additionally would be good to document this config in https://wikitech.wikimedia.org/wiki/HCaptcha incase SRE want to disable it in an eme" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298829 (https://phabricator.wikimedia.org/T426126) (owner: 10Mpostoronca) [17:11:58] PROBLEM - Confd vcl based reload on cp6012 is CRITICAL: reload-vcl failed to run since 0h, 1 minutes. https://wikitech.wikimedia.org/wiki/Varnish [17:12:08] (03PS5) 10CDobbins: trying out `alias` to get rid of redundancy [puppet] - 10https://gerrit.wikimedia.org/r/1297769 [17:12:58] PROBLEM - Confd vcl based reload on cp6016 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [17:13:53] !log bounce sirenbot to get it to re-join a channel [17:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:51] (03PS9) 10Aleksandar Mastilovic: Presto memory tuning, resource groups [puppet] - 10https://gerrit.wikimedia.org/r/1285926 (https://phabricator.wikimedia.org/T424112) [17:18:51] (03PS1) 10Aleksandar Mastilovic: Presto memory tuning, resource groups [puppet] - 10https://gerrit.wikimedia.org/r/1298852 (https://phabricator.wikimedia.org/T424112) [17:19:13] (03CR) 10CDobbins: "Same error: https://puppet-compiler.wmflabs.org/output/1297769/8666/cp2044.codfw.wmnet/change.cp2044.codfw.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/1297769 (owner: 10CDobbins) [17:21:15] !log restarted varnish-frontend service on cp6009 [17:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:20] !log restarting varnish-frontend service on cp6011 [17:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:55] !log restarting varnish-frontend service on cp6012 [17:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:52] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [17:26:52] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [17:26:58] RECOVERY - Confd vcl based reload on cp6014 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [17:28:58] RECOVERY - Confd vcl based reload on cp6016 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [17:33:52] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [17:34:00] RECOVERY - Confd vcl based reload on cp6011 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [17:35:57] !log jnuche@deploy1003 Installing scap version "4.268.0" for 2 host(s) [17:36:52] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [17:37:49] !log jnuche@deploy1003 Installation of scap version "4.268.0" completed for 2 hosts [17:45:02] (03CR) 10BCornwall: [C:03+1] wikimedia.org: Introduce thumb.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/1298821 (https://phabricator.wikimedia.org/T427465) (owner: 10Ladsgroup) [17:46:18] (03CR) 10Majavah: "is this intentionally not a CNAME to `dyna`?" [dns] - 10https://gerrit.wikimedia.org/r/1298821 (https://phabricator.wikimedia.org/T427465) (owner: 10Ladsgroup) [17:53:04] (03PS3) 10DLynch: Add script to get constructive edits for all wikis [puppet] - 10https://gerrit.wikimedia.org/r/1272633 (https://phabricator.wikimedia.org/T428490) (owner: 10Clare Ming) [17:53:16] (03CR) 10AOkoth: [C:03+2] site: apply production role to phab2003 [puppet] - 10https://gerrit.wikimedia.org/r/1295460 (https://phabricator.wikimedia.org/T423727) (owner: 10AOkoth) [17:55:05] (03PS1) 10DCausse: deployment-prep: drop cirrus opensearch settings [puppet] - 10https://gerrit.wikimedia.org/r/1298864 [17:59:18] (03CR) 10Majavah: [C:03+2] deployment-prep: drop cirrus opensearch settings [puppet] - 10https://gerrit.wikimedia.org/r/1298864 (owner: 10DCausse) [18:02:46] !log aokoth@deploy1003 Started deploy [phabricator/deployment@939557b]: deploy phab2003 - T427286 [18:02:50] T427286: Deploy Phab/Phorge 2026-05-26 - https://phabricator.wikimedia.org/T427286 [18:02:58] !log aokoth@deploy1003 Finished deploy [phabricator/deployment@939557b]: deploy phab2003 - T427286 (duration: 00m 12s) [18:06:35] (03PS1) 10BCornwall: varnish: Set VCL reload delay to 5 seconds [puppet] - 10https://gerrit.wikimedia.org/r/1298867 [18:07:13] (03CR) 10Ssingh: trying out `alias` to get rid of redundancy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1297769 (owner: 10CDobbins) [18:13:14] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8669/co" [puppet] - 10https://gerrit.wikimedia.org/r/1298867 (owner: 10BCornwall) [18:17:20] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [18:21:31] (03CR) 10Ssingh: [C:03+1] varnish: Set VCL reload delay to 5 seconds [puppet] - 10https://gerrit.wikimedia.org/r/1298867 (owner: 10BCornwall) [18:21:34] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating dse-k8s-wdqs2001 to codfw - jhancock@cumin2002" [18:21:37] (03CR) 10Ssingh: [C:03+1] "Thanks for the fix!" [puppet] - 10https://gerrit.wikimedia.org/r/1298867 (owner: 10BCornwall) [18:21:39] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating dse-k8s-wdqs2001 to codfw - jhancock@cumin2002" [18:21:39] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:21:59] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs2001 [18:22:02] (03CR) 10BCornwall: [V:03+1 C:03+2] P:cache:haproxy add image generator information [puppet] - 10https://gerrit.wikimedia.org/r/1295921 (https://phabricator.wikimedia.org/T414338) (owner: 10Slyngshede) [18:22:15] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs2001 [18:22:18] (03CR) 10BCornwall: [V:03+1 C:03+2] varnish: Set VCL reload delay to 5 seconds [puppet] - 10https://gerrit.wikimedia.org/r/1298867 (owner: 10BCornwall) [18:25:47] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [18:31:50] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-wdqs2002 to codfw - jhancock@cumin2002" [18:31:56] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-wdqs2002 to codfw - jhancock@cumin2002" [18:31:56] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:32:02] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs2002 [18:33:33] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs2002 [18:37:41] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [18:42:26] (03PS1) 10Cathal Mooney: Rancid: add config backup for missing leaf lsw1-d3-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1298871 [18:42:39] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2030 to codfw - jhancock@cumin2002" [18:42:45] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2030 to codfw - jhancock@cumin2002" [18:42:45] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:43:49] (03PS2) 10Cathal Mooney: Rancid: add config backup for missing leaf lsw1-d3-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1298871 [18:44:38] !log jhancock@cumin2002 START - Cookbook sre.dns.netbox [18:50:28] jhancock@cumin2002 netbox (PID 1183346) is awaiting input [18:50:58] (03CR) 10Catrope: [C:03+1] config: Disable EmailConfirmationBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298834 (https://phabricator.wikimedia.org/T428291) (owner: 10Mmartorana) [18:51:48] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-wdqs2004 to codfw - jhancock@cumin2002" [18:51:54] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-wdqs2004 to codfw - jhancock@cumin2002" [18:51:54] !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:52:17] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs2003 [18:52:27] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs2003 [18:52:29] (03PS1) 10Kimberly Sarabia: Remove custom streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298875 (https://phabricator.wikimedia.org/T423148) [18:52:33] !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs2004 [18:54:35] 06SRE, 10SRE-Access-Requests: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11995873 (10AKhatun_WMF) @KOfori would you be able to take a look and approve? [18:55:36] jhancock@cumin2002 configure-switch-interfaces (PID 1184470) is awaiting input [18:57:41] !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs2004 [18:58:23] (03PS1) 10Kosta Harlan: SimpleCaptcha: Re-render captcha when edit form is redisplayed [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298879 (https://phabricator.wikimedia.org/T428437) [18:58:32] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298879 (https://phabricator.wikimedia.org/T428437) (owner: 10Kosta Harlan) [18:59:06] jouncebot: nowandnext [18:59:06] No deployments scheduled for the next 1 hour(s) and 0 minute(s) [18:59:06] In 1 hour(s) and 0 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2000) [18:59:19] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:03:08] (03CR) 10Ssingh: "Just for clarity: CNAME to dyna and dyna!geoip are _mostly_ equivalent, with a preference for the former to improve cache hitrates in gene" [dns] - 10https://gerrit.wikimedia.org/r/1298821 (https://phabricator.wikimedia.org/T427465) (owner: 10Ladsgroup) [19:06:25] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-wdqs2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:07:27] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:08:29] 06SRE, 10SRE-Access-Requests: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11995921 (10KOfori) Sure. Approved. [19:14:33] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:16:35] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:18:15] (03CR) 10Ayounsi: [C:03+1] Rancid: add config backup for missing leaf lsw1-d3-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1298871 (owner: 10Cathal Mooney) [19:19:18] !log aokoth@deploy1003 Started deploy [phabricator/deployment@939557b]: deploy phab [19:20:59] !log aokoth@deploy1003 Finished deploy [phabricator/deployment@939557b]: deploy phab (duration: 01m 40s) [19:22:11] Going to deploy a wmf.5 change in a few minutes, unless someone else is deploying now [19:22:32] !log aokoth@deploy1003 Started deploy [phabricator/deployment@939557b]: deploy phab [19:23:47] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:23:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298879 (https://phabricator.wikimedia.org/T428437) (owner: 10Kosta Harlan) [19:24:04] !log aokoth@deploy1003 Finished deploy [phabricator/deployment@939557b]: deploy phab (duration: 01m 32s) [19:25:20] (03Merged) 10jenkins-bot: SimpleCaptcha: Re-render captcha when edit form is redisplayed [extensions/ConfirmEdit] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298879 (https://phabricator.wikimedia.org/T428437) (owner: 10Kosta Harlan) [19:25:41] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1298879|SimpleCaptcha: Re-render captcha when edit form is redisplayed (T428437)]] [19:25:45] T428437: hCaptcha widget not rendering after "warn" AbuseFilter consequence on desktop wikitext editor - https://phabricator.wikimedia.org/T428437 [19:27:36] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:28:28] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-wdqs2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:29:41] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1298879|SimpleCaptcha: Re-render captcha when edit form is redisplayed (T428437)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:30:00] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:30:25] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-wdqs2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:31:35] !log kharlan@deploy1003 kharlan: Continuing with deployment [19:32:18] (03PS1) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [19:32:30] FIRING: [3x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [19:32:33] (03PS2) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [19:32:43] (03PS2) 10Matthias Mullie: Squashed diff to master [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298841 [19:33:08] (03CR) 10CI reject: [V:04-1] varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [19:34:08] jhancock@cumin2002 provision (PID 1194028) is awaiting input [19:35:52] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [19:37:30] FIRING: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [19:38:52] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [19:38:56] (03PS3) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [19:40:29] (03CR) 10CI reject: [V:04-1] varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [19:43:17] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host dse-k8s-wdqs2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:43:33] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-wdqs2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [19:46:12] (03PS4) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [19:46:43] (03CR) 10CI reject: [V:04-1] varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [19:46:56] Hmm, K8s deployment to stage canaries failed [19:47:30] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8672/co" [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [19:47:32] cdanis: should I retry? https://spiderpig.wikimedia.org/jobs/2210 [19:52:30] FIRING: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [19:53:30] (03PS1) 10Alex.sanford: Add 2FA enforcement demotion config for phase 3 groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298890 (https://phabricator.wikimedia.org/T423120) [19:55:12] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 11 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298890 (https://phabricator.wikimedia.org/T423120) (owner: 10Alex.sanford) [19:55:24] (03PS1) 10Neriah: OOUIHTMLForm: Avoid treating form header as a clickable label [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) [19:55:38] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, June 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [19:57:30] RESOLVED: Traffic bill over quota: Alert for device cr2-eqord.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [20:00:04] RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2000). nyaa~ [20:00:05] VadymTS1, matthiasmullie, kostajh, and Neriah: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:10] o/ [20:00:15] hey [20:00:21] here [20:00:22] I’m finishing a backport, but scap failed, so retrying now [20:01:05] Seems to be going better on the retry. I’m not able to backport other patches though, can someone else deploy? [20:03:25] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298879|SimpleCaptcha: Re-render captcha when edit form is redisplayed (T428437)]] (duration: 37m 43s) [20:03:29] T428437: hCaptcha widget not rendering after "warn" AbuseFilter consequence on desktop wikitext editor - https://phabricator.wikimedia.org/T428437 [20:05:11] matthiasmullie: are you able to deploy? [20:05:31] I can certainly take care of my own [20:05:36] looking at the other patches now [20:06:02] yeah, should be fine [20:06:24] matthiasmullie: thank you [20:06:48] (03PS1) 10Hnowlan: logging: use ECS formatter [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1298894 (https://phabricator.wikimedia.org/T368180) [20:07:28] (03PS5) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [20:07:57] FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:08:08] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298390 (https://phabricator.wikimedia.org/T428329) (owner: 10VadymTS1) [20:08:09] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298328 (https://phabricator.wikimedia.org/T428269) (owner: 10VadymTS1) [20:08:53] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8673/co" [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [20:08:56] (03CR) 10Komla Sapaty: "Friendly ping on this when you have a moment." [puppet] - 10https://gerrit.wikimedia.org/r/1294864 (https://phabricator.wikimedia.org/T423549) (owner: 10Komla Sapaty) [20:10:05] (03PS6) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [20:11:28] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (DIFF 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [20:11:44] (03CR) 10CI reject: [V:04-1] logging: use ECS formatter [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1298894 (https://phabricator.wikimedia.org/T368180) (owner: 10Hnowlan) [20:12:22] (03PS7) 10BCornwall: varnish: Remove reload_vcl_opts function [puppet] - 10https://gerrit.wikimedia.org/r/1298885 [20:12:38] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Moving switches to make space for the refreshed switches. - https://phabricator.wikimedia.org/T428195#11996138 (10VRiley-WMF) 05Open→03Resolved All of the switches in row A and most of row B have been shifted in order to make s... [20:12:57] RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:13:12] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#11996145 (10RLazarus) > SSH public key (must be a separate key from Wikimedia cloud SSH access): N/A (already in modules/admin/data/data.yaml) Expanding on what @Dzahn said -- @medelius, you do a... [20:13:40] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8675/console" [puppet] - 10https://gerrit.wikimedia.org/r/1298885 (owner: 10BCornwall) [20:15:44] FIRING: KubernetesDeploymentUnavailableReplicas: ... [20:15:44] Deployment linkrecommendation-internal in linkrecommendation at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=linkrecommendation&var-deployment=linkrecommendation-internal - ... [20:15:44] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [20:20:15] (03Merged) 10jenkins-bot: English Wikibooks: update FlaggedRevs configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298390 (https://phabricator.wikimedia.org/T428329) (owner: 10VadymTS1) [20:20:20] (03Merged) 10jenkins-bot: English Wikiversity: Add new user group "autopatrolled" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298328 (https://phabricator.wikimedia.org/T428269) (owner: 10VadymTS1) [20:20:36] !log mlitn@deploy1003 Started scap sync-world: Backport for [[gerrit:1298390|English Wikibooks: update FlaggedRevs configuration (T428329)]], [[gerrit:1298328|English Wikiversity: Add new user group "autopatrolled" (T428269)]] [20:20:41] T428329: Update FlaggedRevs configuration for English Wikibooks - https://phabricator.wikimedia.org/T428329 [20:20:42] T428269: Create an autopatroller user group on English Wikiversity - https://phabricator.wikimedia.org/T428269 [20:21:44] (03PS1) 10RLazarus: admin: Grant access to cassandra-staging-devs for akhatun [puppet] - 10https://gerrit.wikimedia.org/r/1298898 (https://phabricator.wikimedia.org/T427701) [20:21:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [20:22:22] !log mlitn@deploy1003 mlitn, vadymts1: Backport for [[gerrit:1298390|English Wikibooks: update FlaggedRevs configuration (T428329)]], [[gerrit:1298328|English Wikiversity: Add new user group "autopatrolled" (T428269)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:22:35] testing [20:22:43] VadymTS1: both of your patches are on test servers - can you check & confirm it's ok to proceed? [20:23:46] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11996232 (10RLazarus) [20:25:00] alls good [20:25:12] (03CR) 10RLazarus: [C:03+2] admin: add osleger to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/1298830 (https://phabricator.wikimedia.org/T428262) (owner: 10Dzahn) [20:25:17] !log mlitn@deploy1003 mlitn, vadymts1: Continuing with deployment [20:27:59] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for OSleger_WMF - https://phabricator.wikimedia.org/T428262#11996255 (10RLazarus) 05In progress→03Resolved a:03Dzahn All done! This will take up to 30 minutes to roll out everywhere, then make sure to follow the instr... [20:28:58] matthiasmullie: i have to drive now for 20 minutes, so deploy yours first [20:29:33] !log mlitn@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298390|English Wikibooks: update FlaggedRevs configuration (T428329)]], [[gerrit:1298328|English Wikiversity: Add new user group "autopatrolled" (T428269)]] (duration: 08m 58s) [20:29:39] T428329: Update FlaggedRevs configuration for English Wikibooks - https://phabricator.wikimedia.org/T428329 [20:29:39] T428269: Create an autopatroller user group on English Wikiversity - https://phabricator.wikimedia.org/T428269 [20:29:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297162 (owner: 10Matthias Mullie) [20:29:57] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298841 (owner: 10Matthias Mullie) [20:30:33] matthiasmullie: Thanks [20:30:52] actually the patch I want to deploy is quite simple, so if you prefer to do it without me feel free to do it [20:31:40] Neriah: no need to test? [20:32:42] (03CR) 10Ladsgroup: "I did that to mirror exactly upload.wikimedia.org (except pointing to text). I defer to your experience and knowledge on why upload.wikime" [dns] - 10https://gerrit.wikimedia.org/r/1298821 (https://phabricator.wikimedia.org/T427465) (owner: 10Ladsgroup) [20:35:42] (03Merged) 10jenkins-bot: MultimediaViewer: enable image carousel as a beta feature on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1297162 (owner: 10Matthias Mullie) [20:36:06] (03Merged) 10jenkins-bot: Squashed diff to master [extensions/MultimediaViewer] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298841 (owner: 10Matthias Mullie) [20:36:26] !log mlitn@deploy1003 Started scap sync-world: Backport for [[gerrit:1297162|MultimediaViewer: enable image carousel as a beta feature on Wikipedias]], [[gerrit:1298841|Squashed diff to master]] [20:37:31] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#11996297 (10VPuffetMichel) Hi all, I approve this access for Caro. She will continue the set up when she is back. Thank you all! [20:38:18] !log mlitn@deploy1003 mlitn: Backport for [[gerrit:1297162|MultimediaViewer: enable image carousel as a beta feature on Wikipedias]], [[gerrit:1298841|Squashed diff to master]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:39:17] (03CR) 10BCornwall: "I would advise using `map` instead of `if` when in location context. More info at https://github.com/nginxinc/nginx-wiki/blob/master/sourc" [puppet] - 10https://gerrit.wikimedia.org/r/1297102 (https://phabricator.wikimedia.org/T427836) (owner: 10Slyngshede) [20:39:18] FIRING: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:39:20] !log mlitn@deploy1003 mlitn: Continuing with deployment [20:39:32] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#11996300 (10RLazarus) 05Open→03In progress a:03karapayneWMDE Hi @mahmoud.abdelsattar.wmde! I see you already have `restricted` access (using the SSH... [20:40:02] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#11996307 (10RLazarus) [20:40:16] (03CR) 10BCornwall: "Ugh, we're not in a location context. Silly me... Still, I'd probably argue for using `map` but LMK if you think your version is more main" [puppet] - 10https://gerrit.wikimedia.org/r/1297102 (https://phabricator.wikimedia.org/T427836) (owner: 10Slyngshede) [20:41:49] (03PS1) 10Cwhite: logstash: add drop for istio ratelimit-media.svc [puppet] - 10https://gerrit.wikimedia.org/r/1298904 (https://phabricator.wikimedia.org/T390215) [20:43:32] !log mlitn@deploy1003 Finished scap sync-world: Backport for [[gerrit:1297162|MultimediaViewer: enable image carousel as a beta feature on Wikipedias]], [[gerrit:1298841|Squashed diff to master]] (duration: 07m 05s) [20:44:00] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [20:44:20] Neriah: I've begun backporting yours. Ping me if you're here - otherwise will proceed without testing [20:44:26] 06SRE, 10SRE-Access-Requests: SSH key replacement for tchanders - https://phabricator.wikimedia.org/T417056#11996316 (10RLazarus) 05Open→03In progress a:03RLazarus Verified out of band, updating. [20:44:30] 06SRE, 06Traffic, 13Patch-For-Review: WE5.2.13 Dumps UA enforcement - https://phabricator.wikimedia.org/T427836#11996319 (10BCornwall) 05Open→03In progress [20:45:52] (03CR) 10Cwhite: [C:03+2] logstash: add drop for istio ratelimit-media.svc [puppet] - 10https://gerrit.wikimedia.org/r/1298904 (https://phabricator.wikimedia.org/T390215) (owner: 10Cwhite) [20:47:28] 06SRE, 10SRE-Access-Requests: SSH key replacement for tchanders - https://phabricator.wikimedia.org/T417056#11996321 (10RLazarus) 05In progress→03Resolved a:05RLazarus→03ssingh Never mind! Imagine my surprise to find that key already there. :) This was done in https://gerrit.wikimedia.org/r/1298282... [20:52:05] matthiasmullie: I'm back [20:52:18] CI seems to have failed [20:52:50] 23:44:37 Syncing... [20:52:50] 23:44:37 rsync: [sender] change_dir "/mediawiki-core/wmf-1.47.0-wmf.5/mediawiki-node24" (in caches) failed: No such file or directory (2) [20:52:51] 23:44:37 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [Receiver=3.2.3] [20:52:51] 23:44:37 rsync: [Receiver] read error: Connection reset by peer (104) [20:54:06] (03CR) 10Matthias Mullie: "resubmit" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [20:56:02] matthiasmullie: don't you have to add +2 to resubmit? [20:56:21] (03CR) 10Matthias Mullie: [C:03+2] OOUIHTMLForm: Avoid treating form header as a clickable label [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [20:56:26] (03CR) 10CI reject: [V:04-1] OOUIHTMLForm: Avoid treating form header as a clickable label [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [20:56:27] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for caro - https://phabricator.wikimedia.org/T426995#11996350 (10RLazarus) [20:56:48] It used to just resubmit when there was a preexisting +2 - not sure why it didn't this time (or my recollection is just plain wrong :p) [20:58:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [21:00:05] alexsanford, Reedy, sbassett, Maryum, and manfredi: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Weekly Security deployment window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2100). [21:03:30] (03Merged) 10jenkins-bot: OOUIHTMLForm: Avoid treating form header as a clickable label [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298891 (https://phabricator.wikimedia.org/T428359) (owner: 10Neriah) [21:03:49] !log mlitn@deploy1003 Started scap sync-world: Backport for [[gerrit:1298891|OOUIHTMLForm: Avoid treating form header as a clickable label (T428359)]] [21:03:53] T428359: GlobalRenameQueue: Clicking anywhere on the page opens the UserInfoCard - https://phabricator.wikimedia.org/T428359 [21:05:40] !log mlitn@deploy1003 mlitn, neriah: Backport for [[gerrit:1298891|OOUIHTMLForm: Avoid treating form header as a clickable label (T428359)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:05:46] testing [21:06:05] Neriah: changes are on test - can you check & confirm we're good to proceed? [21:06:17] (03CR) 10RLazarus: [C:03+2] admin: add apdube-wmf user [puppet] - 10https://gerrit.wikimedia.org/r/1295979 (https://phabricator.wikimedia.org/T427553) (owner: 10Kamila Součková) [21:07:33] looks good [21:07:36] Continue [21:07:42] matthiasmullie [21:07:48] !log mlitn@deploy1003 mlitn, neriah: Continuing with deployment [21:12:00] !log mlitn@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298891|OOUIHTMLForm: Avoid treating form header as a clickable label (T428359)]] (duration: 08m 10s) [21:12:04] T428359: GlobalRenameQueue: Clicking anywhere on the page opens the UserInfoCard - https://phabricator.wikimedia.org/T428359 [21:13:03] thank you! [21:13:56] (03PS1) 10Cwhite: logstash: move page-analytics and ratelimit webrequest log filters [puppet] - 10https://gerrit.wikimedia.org/r/1298907 (https://phabricator.wikimedia.org/T390215) [21:14:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:16:46] (03CR) 10CDobbins: "After some research, I'm convinced that there's no easy way to do this. More importantly, this feels contrary to the intended use of Hiera" [puppet] - 10https://gerrit.wikimedia.org/r/1297769 (owner: 10CDobbins) [21:18:09] (03PS6) 10CDobbins: trying out `alias` to get rid of redundancy [puppet] - 10https://gerrit.wikimedia.org/r/1297769 [21:18:09] (03CR) 10Cwhite: [C:03+2] logstash: move page-analytics and ratelimit webrequest log filters [puppet] - 10https://gerrit.wikimedia.org/r/1298907 (https://phabricator.wikimedia.org/T390215) (owner: 10Cwhite) [21:20:39] (03PS1) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:22:44] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [21:24:14] (03CR) 10CDobbins: trying out `alias` to get rid of redundancy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1297769 (owner: 10CDobbins) [21:29:08] (03PS1) 10BCornwall: This repo and this ci makes me sad [alerts] - 10https://gerrit.wikimedia.org/r/1298911 [21:32:30] (03Abandoned) 10BCornwall: This repo and this ci makes me sad [alerts] - 10https://gerrit.wikimedia.org/r/1298911 (owner: 10BCornwall) [21:33:05] (03PS1) 10Reedy: CommonSettings: Set $wgScoreSafeMode = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298915 (https://phabricator.wikimedia.org/T428484) [21:33:14] (03PS2) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:35:00] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [21:38:12] (03PS3) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:40:09] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [21:42:11] (03CR) 10Jforrester: [C:03+1] CommonSettings: Set $wgScoreSafeMode = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298915 (https://phabricator.wikimedia.org/T428484) (owner: 10Reedy) [21:46:41] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#11996515 (10KFrancis) Hi @RLazarus, yes Mahmoud's NDA is on file. Thanks! [21:47:10] (03PS4) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:48:54] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [21:50:45] jouncebot: nowandnext [21:50:45] For the next 1 hour(s) and 9 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2100) [21:50:45] In 1 hour(s) and 9 minute(s): Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2300) [21:50:50] oh, perfect [21:50:57] (03PS5) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:51:02] (03CR) 10Reedy: [C:03+2] CommonSettings: Set $wgScoreSafeMode = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298915 (https://phabricator.wikimedia.org/T428484) (owner: 10Reedy) [21:52:19] (03Merged) 10jenkins-bot: CommonSettings: Set $wgScoreSafeMode = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298915 (https://phabricator.wikimedia.org/T428484) (owner: 10Reedy) [21:52:41] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [21:52:59] !log reedy@deploy1003 Started scap sync-world: Backport for [[gerrit:1298915|CommonSettings: Set $wgScoreSafeMode = false (T428484)]] [21:53:03] T428484: Including Lilypond notation with the "score" tag results in error message "Safe mode has been removed from LilyPond as of version 2.23.12." - https://phabricator.wikimedia.org/T428484 [21:53:17] (03PS6) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [21:54:47] !log reedy@deploy1003 reedy: Backport for [[gerrit:1298915|CommonSettings: Set $wgScoreSafeMode = false (T428484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:56:27] !log reedy@deploy1003 reedy: Continuing with deployment [21:58:35] 06SRE, 10SRE-Access-Requests, 06Data-Engineering: Requesting access to for - https://phabricator.wikimedia.org/T427553#11996540 (10RLazarus) Hi @APDube-WMF! I see you provided an SSH key on the task, but if Superset access is all you need, we won't actually need it. I'll set you up wi... [21:59:12] (03CR) 10Effie Mouzeli: [C:03+1] admin: Grant access to cassandra-staging-devs for akhatun [puppet] - 10https://gerrit.wikimedia.org/r/1298898 (https://phabricator.wikimedia.org/T427701) (owner: 10RLazarus) [21:59:52] (03CR) 10CI reject: [V:04-1] Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 (owner: 10BCornwall) [22:00:41] !log reedy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1298915|CommonSettings: Set $wgScoreSafeMode = false (T428484)]] (duration: 07m 42s) [22:00:46] T428484: Including Lilypond notation with the "score" tag results in error message "Safe mode has been removed from LilyPond as of version 2.23.12." - https://phabricator.wikimedia.org/T428484 [22:00:55] (03PS1) 10RLazarus: admin: Add apdube to analytics-private-datausers [puppet] - 10https://gerrit.wikimedia.org/r/1298924 (https://phabricator.wikimedia.org/T427553) [22:01:25] (03PS7) 10BCornwall: Rewrite VarnishHighThreadCount to trigger less [alerts] - 10https://gerrit.wikimedia.org/r/1298909 [22:01:48] (03CR) 10RLazarus: [C:03+2] admin: Grant access to cassandra-staging-devs for akhatun [puppet] - 10https://gerrit.wikimedia.org/r/1298898 (https://phabricator.wikimedia.org/T427701) (owner: 10RLazarus) [22:03:23] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11996603 (10RLazarus) 05Open→03Resolved a:03RLazarus This is done! Wait up to 30 minutes for it to propagate to all hosts, and you'll be all set. Resolvin... [22:11:11] (03PS2) 10C. Scott Ananian: Move ::getFragmentsToTransform() to Content{Text,DOM}TransformStage [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298926 [22:11:46] (03PS1) 10C. Scott Ananian: OutputTransform: Rename DeduplicateStyles and ExpandToAbsoluteUrls stages [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298927 [22:11:54] PROBLEM - zuul_merger_service_running on contint2002 is CRITICAL: PROCS CRITICAL: 2 processes with regex args bin/zuul-merger https://www.mediawiki.org/wiki/Continuous_integration/Zuul [22:12:54] RECOVERY - zuul_merger_service_running on contint2002 is OK: PROCS OK: 1 process with regex args bin/zuul-merger https://www.mediawiki.org/wiki/Continuous_integration/Zuul [22:15:43] (03PS1) 10Reedy: CommonSettings: Set $wgScoreUseSvg = true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298928 (https://phabricator.wikimedia.org/T49578) [22:16:36] (03PS3) 10C. Scott Ananian: Move ::getFragmentsToTransform() to Content{Text,DOM}TransformStage [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298926 [22:16:36] (03PS2) 10C. Scott Ananian: OutputTransform: Rename DeduplicateStyles and ExpandToAbsoluteUrls stages [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298927 [22:16:38] (03PS1) 10C. Scott Ananian: Simplify fragment processing [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298929 (https://phabricator.wikimedia.org/T423700) [22:17:32] 06SRE, 10SRE-Access-Requests: Requesting access to "analytics-privatedata-users" for Mahmoud Abdelsattar (WMDE) - https://phabricator.wikimedia.org/T428416#11996629 (10RLazarus) [22:20:34] (03PS2) 10C. Scott Ananian: Simplify fragment processing [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298929 (https://phabricator.wikimedia.org/T423700) [22:20:34] (03PS4) 10C. Scott Ananian: Move ::getFragmentsToTransform() to Content{Text,DOM}TransformStage [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298926 [22:20:36] (03PS3) 10C. Scott Ananian: OutputTransform: Rename DeduplicateStyles and ExpandToAbsoluteUrls stages [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298927 [22:21:41] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298929 (https://phabricator.wikimedia.org/T423700) (owner: 10C. Scott Ananian) [22:22:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298926 (owner: 10C. Scott Ananian) [22:22:13] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298927 (owner: 10C. Scott Ananian) [22:22:23] (03PS3) 10C. Scott Ananian: Reset DeduplicateStyles state between different pipeline executions [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298925 (https://phabricator.wikimedia.org/T428336) [22:22:39] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, June 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.47.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1298925 (https://phabricator.wikimedia.org/T428336) (owner: 10C. Scott Ananian) [22:29:10] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to Cassandra staging for akhatun - https://phabricator.wikimedia.org/T427701#11996657 (10AKhatun_WMF) [22:44:22] (03PS1) 10Dzahn: gerrit: ensure error_log.json, sshd_log.json are always shipped to ELK [puppet] - 10https://gerrit.wikimedia.org/r/1298931 (https://phabricator.wikimedia.org/T425667) [22:46:35] (03PS1) 10Dzahn: gerrit: adjust path to gc_log [puppet] - 10https://gerrit.wikimedia.org/r/1298932 (https://phabricator.wikimedia.org/T425667) [22:49:36] (03PS1) 10Eevans: cassandra: aqsloader GRANT for linked_artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1298934 (https://phabricator.wikimedia.org/T428218) [22:50:17] (03PS1) 10Dzahn: gerrit: flip direction of symlink for log directories [puppet] - 10https://gerrit.wikimedia.org/r/1298938 (https://phabricator.wikimedia.org/T425667) [22:51:04] (03CR) 10CI reject: [V:04-1] gerrit: flip direction of symlink for log directories [puppet] - 10https://gerrit.wikimedia.org/r/1298938 (https://phabricator.wikimedia.org/T425667) (owner: 10Dzahn) [22:51:49] (03PS2) 10Eevans: cassandra: aqsloader GRANT for linked_artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1298934 (https://phabricator.wikimedia.org/T428218) [22:53:10] (03CR) 10Eevans: [C:03+2] cassandra: aqsloader GRANT for linked_artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1298934 (https://phabricator.wikimedia.org/T428218) (owner: 10Eevans) [22:54:04] (03PS1) 10Dzahn: gerrit: move httpd logs to $site_path/logs [puppet] - 10https://gerrit.wikimedia.org/r/1298939 (https://phabricator.wikimedia.org/T425667) [23:00:04] Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260608T2300) [23:22:16] (03PS11) 10JHathaway: redfish: improve add_account with AccountTypes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [23:23:51] (03PS12) 10JHathaway: redfish: improve add_account with AccountTypes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [23:27:51] (03CR) 10JHathaway: "@ltoscano@wikimedia.org, while reviewing and testing the code, I had an alternative idea, let me know if you think it is worse or better. " [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [23:28:41] (03CR) 10CI reject: [V:04-1] redfish: improve add_account with AccountTypes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1293593 (https://phabricator.wikimedia.org/T426180) (owner: 10Elukey) [23:34:11] (03PS1) 10Mstyles: ReauthenticateForActions: Add new config var [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1298944 (https://phabricator.wikimedia.org/T427947) [23:39:46] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1298945 [23:39:46] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1298945 (owner: 10TrainBranchBot) [23:51:29] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1298945 (owner: 10TrainBranchBot)