[00:04:26] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:07] (03Abandoned) 10Krinkle: Use Request-Timeout header to set jobrunner PHP timeouts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577642 (https://phabricator.wikimedia.org/T247114) (owner: 10Ppchelko) [00:33:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88056 and previous config saved to /var/cache/conftool/dbconfig/20260129-003310-marostegui.json [00:33:16] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [00:40:22] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 [00:40:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [00:48:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88057 and previous config saved to /var/cache/conftool/dbconfig/20260129-004818-marostegui.json [00:53:47] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [01:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88058 and previous config saved to /var/cache/conftool/dbconfig/20260129-010327-marostegui.json [01:10:31] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 [01:10:31] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:18:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88059 and previous config saved to /var/cache/conftool/dbconfig/20260129-011836-marostegui.json [01:18:42] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [01:18:52] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance [01:19:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88060 and previous config saved to /var/cache/conftool/dbconfig/20260129-011900-marostegui.json [01:29:54] PROBLEM - MariaDB Replica Lag: m2 on db1217 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2231.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:31:54] RECOVERY - MariaDB Replica Lag: m2 on db1217 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:36:11] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:41:00] PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 647.38 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:44:02] RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 30.65 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:01:00] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:13:44] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 44s) [02:14:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88061 and previous config saved to /var/cache/conftool/dbconfig/20260129-021418-marostegui.json [02:14:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [02:29:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88062 and previous config saved to /var/cache/conftool/dbconfig/20260129-022926-marostegui.json [02:44:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88063 and previous config saved to /var/cache/conftool/dbconfig/20260129-024435-marostegui.json [02:59:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88064 and previous config saved to /var/cache/conftool/dbconfig/20260129-025943-marostegui.json [02:59:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:00:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2229.codfw.wmnet with reason: Maintenance [03:00:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88065 and previous config saved to /var/cache/conftool/dbconfig/20260129-030008-marostegui.json [03:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:39:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:49:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88066 and previous config saved to /var/cache/conftool/dbconfig/20260129-034917-marostegui.json [03:49:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:02:56] (03PS6) 10Ryan Kemper: opensearch-semantic-search: provision namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:02:56] (03PS2) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:04:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88067 and previous config saved to /var/cache/conftool/dbconfig/20260129-040426-marostegui.json [04:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:34] (03PS7) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:08:34] (03PS3) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:08:34] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:08:36] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS8) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:18:37] (03PS4) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:18:38] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:19:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88068 and previous config saved to /var/cache/conftool/dbconfig/20260129-041934-marostegui.json [04:24:08] (03CR) 10Ryan Kemper: "Should be ready for final review now" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [04:34:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88069 and previous config saved to /var/cache/conftool/dbconfig/20260129-043443-marostegui.json [04:34:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:48:06] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl2002.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [04:49:06] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [05:09:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:14:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:18:46] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:21:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:22:36] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:33:14] PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.096e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [05:34:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:42:42] (03CR) 10Ryan Kemper: Replace elasticsearch lib w/ spicerack APIClient (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper) [05:49:05] (03PS8) 10Ryan Kemper: hadoop.reboot-workers: make host override smarter [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) [06:03:30] (03Abandoned) 10Ryan Kemper: wdqs: Add new endpoints to allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1201296 (https://phabricator.wikimedia.org/T407407) (owner: 10Bking) [06:06:37] (03Abandoned) 10Ryan Kemper: flink-kubernetes-operator: change flink download URL [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1008534 (https://phabricator.wikimedia.org/T358879) (owner: 10Bking) [06:16:32] (03PS3) 10Bking: wdqs-categories: enable scrapes for jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) [06:16:50] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:17:14] RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [06:17:18] (03CR) 10Ryan Kemper: "addressed by ps2" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:22:57] (03Abandoned) 10Ryan Kemper: elasticsearch: move to opensearch client [software/spicerack] - 10https://gerrit.wikimedia.org/r/966492 (https://phabricator.wikimedia.org/T345337) (owner: 10David Caro) [06:35:37] (03PS1) 10Marostegui: Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 [06:35:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db2212: After schema change [06:36:41] (03CR) 10Marostegui: [C:03+2] Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 (owner: 10Marostegui) [06:38:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance [06:38:13] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1159.eqiad.wmnet with reason: Maintenance [06:38:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88071 and previous config saved to /var/cache/conftool/dbconfig/20260129-063813-marostegui.json [06:38:21] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [06:38:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88072 and previous config saved to /var/cache/conftool/dbconfig/20260129-063820-marostegui.json [06:52:08] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) [06:52:38] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2229 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234788 (https://phabricator.wikimedia.org/T415862) [06:52:45] (03PS1) 10Gerrit maintenance bot: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1234789 (https://phabricator.wikimedia.org/T415862) [06:55:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1173 with weight 0 T415861', diff saved to https://phabricator.wikimedia.org/P88074 and previous config saved to /var/cache/conftool/dbconfig/20260129-065528-marostegui.json [06:55:38] T415861: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T415861 [06:55:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s6 T415861 [06:56:02] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) (owner: 10Gerrit maintenance bot) [06:57:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1173 to s6 primary T415861', diff saved to https://phabricator.wikimedia.org/P88075 and previous config saved to /var/cache/conftool/dbconfig/20260129-065753-marostegui.json [06:58:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1201 T415861', diff saved to https://phabricator.wikimedia.org/P88076 and previous config saved to /var/cache/conftool/dbconfig/20260129-065838-marostegui.json [06:58:48] !log Starting s6 eqiad failover from db1201 to db1173 - T415861 [06:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1201.eqiad.wmnet with reason: Schema change on db1201 [07:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700) [07:00:04] marostegui, Amir1, and federico3: #bothumor My software never has bugs. It just develops random features. Rise for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700). [07:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:17:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88078 and previous config saved to /var/cache/conftool/dbconfig/20260129-071724-marostegui.json [07:17:31] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:21:13] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2212: After schema change [07:32:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88080 and previous config saved to /var/cache/conftool/dbconfig/20260129-073232-marostegui.json [07:41:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88081 and previous config saved to /var/cache/conftool/dbconfig/20260129-074130-marostegui.json [07:41:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:47:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88082 and previous config saved to /var/cache/conftool/dbconfig/20260129-074742-marostegui.json [07:53:30] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565014 (10KartikMistry) I'm still debugging, and probably best way to check with reverting original memory allocation. Patch is co... [07:54:04] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565016 (10KartikMistry) 05Open→03In progress [07:54:38] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565017 (10KartikMistry) a:03KartikMistry [07:56:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88083 and previous config saved to /var/cache/conftool/dbconfig/20260129-075639-marostegui.json [07:57:31] Hey folks! My name is Charlie and I found my way here from the "Get involved" page on WikiTech. I just moved on from 13 years with Puppet Labs as a tech lead on their support team. It looks like you folks are using Puppet for this and that and I would love to put my experience to work volunteering if there's anything I could help with. [07:58:19] Also, if anyone happens to be in Belgium this weekend for Fosdem or CfgMgmtCamp next week, I would love to say hi! [07:59:08] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [07:59:45] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [07:59:57] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [08:00:05] Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0800). [08:00:05] No Gerrit patches in the queue for this window AFAICS. [08:00:12] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [08:01:37] csharpsteen: you'll probably have more of a chance of not getting lost in noise in #wikimedia-sre [08:02:23] Awesome. Thanks for the pointer! [08:02:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88084 and previous config saved to /var/cache/conftool/dbconfig/20260129-080251-marostegui.json [08:02:58] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:03:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance [08:03:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [08:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88085 and previous config saved to /var/cache/conftool/dbconfig/20260129-080327-marostegui.json [08:04:26] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:11:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88086 and previous config saved to /var/cache/conftool/dbconfig/20260129-081148-marostegui.json [08:26:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88087 and previous config saved to /var/cache/conftool/dbconfig/20260129-082656-marostegui.json [08:27:00] I have a patch to backport [08:27:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:27:14] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance [08:27:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88088 and previous config saved to /var/cache/conftool/dbconfig/20260129-082722-marostegui.json [08:30:37] (03PS1) 10Kosta Harlan: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) [08:30:50] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:31:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:32:54] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:35:16] (03Merged) 10jenkins-bot: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:36:33] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] [08:36:38] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:38:55] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:40:04] !log kharlan@deploy2002 kharlan: Continuing with sync [08:42:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88089 and previous config saved to /var/cache/conftool/dbconfig/20260129-084216-marostegui.json [08:42:22] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:44:18] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] (duration: 07m 45s) [08:44:23] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:57:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88091 and previous config saved to /var/cache/conftool/dbconfig/20260129-085724-marostegui.json [09:00:05] brennen and andre: Time to do the MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0900). [09:00:25] nah [09:05:53] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for trueg - https://phabricator.wikimedia.org/T415632#11565160 (10DSantamaria) Approved! [09:06:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88092 and previous config saved to /var/cache/conftool/dbconfig/20260129-090628-marostegui.json [09:06:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:12:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88093 and previous config saved to /var/cache/conftool/dbconfig/20260129-091232-marostegui.json [09:21:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88094 and previous config saved to /var/cache/conftool/dbconfig/20260129-092135-marostegui.json [09:27:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88095 and previous config saved to /var/cache/conftool/dbconfig/20260129-092741-marostegui.json [09:27:50] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:27:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance [09:28:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88096 and previous config saved to /var/cache/conftool/dbconfig/20260129-092806-marostegui.json [09:30:05] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 [09:32:11] (03PS2) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [09:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:36:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88097 and previous config saved to /var/cache/conftool/dbconfig/20260129-093644-marostegui.json [09:41:16] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1193 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234935 (https://phabricator.wikimedia.org/T415879) [09:42:46] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234940 [09:51:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88098 and previous config saved to /var/cache/conftool/dbconfig/20260129-095151-marostegui.json [09:51:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:52:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance [09:52:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88099 and previous config saved to /var/cache/conftool/dbconfig/20260129-095216-marostegui.json [10:01:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88100 and previous config saved to /var/cache/conftool/dbconfig/20260129-100158-marostegui.json [10:02:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:02:30] hi everyone! can someone help me with this problem https://phabricator.wikimedia.org/T415876 ? on it.wiki the recent deploy broken module/templates that handles datetime triggered by a fault localization on translatewiki. the changes were reverted but we don't want to wait another week to fix the problem. thanks! [10:17:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88101 and previous config saved to /var/cache/conftool/dbconfig/20260129-101706-marostegui.json [10:17:22] (03PS3) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:18:06] (03PS4) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:28:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88103 and previous config saved to /var/cache/conftool/dbconfig/20260129-102834-marostegui.json [10:28:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:32:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88104 and previous config saved to /var/cache/conftool/dbconfig/20260129-103215-marostegui.json [10:43:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88105 and previous config saved to /var/cache/conftool/dbconfig/20260129-104343-marostegui.json [10:47:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88106 and previous config saved to /var/cache/conftool/dbconfig/20260129-104723-marostegui.json [10:47:33] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:47:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance [10:47:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88107 and previous config saved to /var/cache/conftool/dbconfig/20260129-104748-marostegui.json [10:58:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88108 and previous config saved to /var/cache/conftool/dbconfig/20260129-105851-marostegui.json [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1100) [11:01:17] FIRING: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:04:41] !log root@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None [11:14:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88109 and previous config saved to /var/cache/conftool/dbconfig/20260129-111359-marostegui.json [11:14:06] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:14:17] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance [11:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:21:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88110 and previous config saved to /var/cache/conftool/dbconfig/20260129-112137-marostegui.json [11:21:45] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:24:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88111 and previous config saved to /var/cache/conftool/dbconfig/20260129-112437-marostegui.json [11:24:47] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [11:24:47] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [11:34:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88112 and previous config saved to /var/cache/conftool/dbconfig/20260129-113446-marostegui.json [11:36:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P88113 and previous config saved to /var/cache/conftool/dbconfig/20260129-113645-marostegui.json [11:44:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88114 and previous config saved to /var/cache/conftool/dbconfig/20260129-114455-marostegui.json [11:46:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance [11:47:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88115 and previous config saved to /var/cache/conftool/dbconfig/20260129-114701-marostegui.json [11:47:07] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:51:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P88116 and previous config saved to /var/cache/conftool/dbconfig/20260129-115154-marostegui.json [11:55:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88117 and previous config saved to /var/cache/conftool/dbconfig/20260129-115503-marostegui.json [11:55:12] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [11:55:13] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [11:55:21] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [12:04:41] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:06:40] (03PS1) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:07:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88118 and previous config saved to /var/cache/conftool/dbconfig/20260129-120702-marostegui.json [12:07:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:07:10] (03CR) 10CI reject: [V:04-1] gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:07:19] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance [12:07:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88119 and previous config saved to /var/cache/conftool/dbconfig/20260129-120727-marostegui.json [12:08:07] (03PS2) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:11:18] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7960/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:15:14] (03PS3) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:17:52] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7961/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:21:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88120 and previous config saved to /var/cache/conftool/dbconfig/20260129-122138-marostegui.json [12:21:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [12:36:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88121 and previous config saved to /var/cache/conftool/dbconfig/20260129-123630-marostegui.json [12:36:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:36:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P88122 and previous config saved to /var/cache/conftool/dbconfig/20260129-123647-marostegui.json [12:51:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P88123 and previous config saved to /var/cache/conftool/dbconfig/20260129-125138-marostegui.json [12:51:47] RESOLVED: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:51:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P88124 and previous config saved to /var/cache/conftool/dbconfig/20260129-125157-marostegui.json [12:53:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1193 with weight 0 T415879', diff saved to https://phabricator.wikimedia.org/P88125 and previous config saved to /var/cache/conftool/dbconfig/20260129-125312-marostegui.json [12:53:23] T415879: Switchover s8 master (db1209 -> db1193) - https://phabricator.wikimedia.org/T415879 [12:53:28] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1193 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234935 (https://phabricator.wikimedia.org/T415879) (owner: 10Gerrit maintenance bot) [12:53:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T415879 [12:56:34] !log Starting s8 eqiad failover from db1209 to db1193 - T415879 [12:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1193 to s8 primary T415879', diff saved to https://phabricator.wikimedia.org/P88126 and previous config saved to /var/cache/conftool/dbconfig/20260129-125700-marostegui.json [12:57:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1209 T415879', diff saved to https://phabricator.wikimedia.org/P88127 and previous config saved to /var/cache/conftool/dbconfig/20260129-125739-marostegui.json [12:58:51] (03PS2) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) [13:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1300) [13:00:21] (03PS1) 10Marostegui: db1209: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1235001 [13:00:48] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1209.eqiad.wmnet with reason: long schema change on db1209 [13:00:59] (03CR) 10Marostegui: [C:03+2] db1209: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1235001 (owner: 10Marostegui) [13:02:07] (03PS3) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) [13:02:50] !log Run schema change on old s8 eqiad master (db1209) T411164 T411163 [13:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:58] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [13:02:58] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [13:06:04] (03CR) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) (owner: 10Federico Ceratto) [13:06:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P88129 and previous config saved to /var/cache/conftool/dbconfig/20260129-130646-marostegui.json [13:07:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88130 and previous config saved to /var/cache/conftool/dbconfig/20260129-130705-marostegui.json [13:07:11] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:07:24] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance [13:07:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88131 and previous config saved to /var/cache/conftool/dbconfig/20260129-130731-marostegui.json [13:21:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88132 and previous config saved to /var/cache/conftool/dbconfig/20260129-132154-marostegui.json [13:22:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:22:15] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance [13:27:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1008.eqiad.wmnet with reason: long schema change [13:28:57] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1245.eqiad.wmnet with reason: long schema change [13:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:41:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88133 and previous config saved to /var/cache/conftool/dbconfig/20260129-134153-marostegui.json [13:42:00] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:48:20] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1230 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235032 (https://phabricator.wikimedia.org/T415893) [13:54:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance [13:55:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1230 (T415786)', diff saved to https://phabricator.wikimedia.org/P88134 and previous config saved to /var/cache/conftool/dbconfig/20260129-135501-marostegui.json [13:55:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:57:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P88135 and previous config saved to /var/cache/conftool/dbconfig/20260129-135702-marostegui.json [14:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1400). [14:00:05] No Gerrit patches in the queue for this window AFAICS. [14:00:32] o/ [14:00:40] Lucas_WMDE: if I got a patch together, do you think you'd be willing/able to depoly the backport requested in T415876? [14:00:41] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:00:53] (would just be reverting the changes to datetime/it.json IIUC) [14:01:23] can’t they override that in the MediaWiki: namespace to unbreak their wiki? [14:01:25] but yeah sure [14:01:38] ty :] [14:01:59] given that the changes seem to have been reverted on twn already [14:02:02] i think they have actually (which might make the backport a bit hard to test...) but I guess they might want to stop doing that sooner rather than later [14:02:14] (re: MediaWiki: namespace changes) [14:02:19] (03PS1) 10Kareid: Test Kitchen UI: Deploy v.1.1.7 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) [14:02:28] ack [14:02:45] ETA 2 mins on patch [14:04:26] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:30] (03PS1) 10A smart kitten: Update Italian datetime messages (from https://translatewiki.net) [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) [14:05:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:05:20] * Lucas_WMDE opens deployment calendar, misreads Test Kitchen as Test Kitten [14:05:42] lets rename it again! [14:05:50] rename ALL the things! [14:07:12] patch should be ready now; i guess, given the CI success caching, it might be quicker to actually wait for the main test to pass before +2ing it for deployment [14:07:14] * Lucas_WMDE reviews the messages on twn [14:08:46] !log Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/fixFunctionTesterImplementationIssues.php --wiki=wikifunctionswiki` for T399934 [14:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:50] T399934: tests moved to a different function still show on implementations of the original - https://phabricator.wikimedia.org/T399934 [14:09:28] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "I checked on TWN that all of these messages match the current on-wiki revision." [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:09:39] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:09:45] * Lucas_WMDE checks wmf.12 [14:10:21] ok everything’s still lowercase there, no further backport needed [14:10:28] (I guess otherwise they would’ve noticed the error sooner anyway) [14:11:10] Yeah, FWICS https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1229450 was first included in wmf.13 [14:12:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P88137 and previous config saved to /var/cache/conftool/dbconfig/20260129-141210-marostegui.json [14:13:01] hm, https://it.wikipedia.org/w/index.php?title=MediaWiki:January&action=history and https://it.wikipedia.org/w/index.php?title=MediaWiki:Jan&action=history both exist since 2005 [14:13:17] ok but https://it.wikipedia.org/wiki/MediaWiki:Monday doesn’t [14:13:33] they were undeleted today FWICS [14:13:36] (I also got caught out by that) [14:13:41] ah, I couldn’t see that ^^ [14:14:17] oh, I just realized this will be one of those super long deployments /o\ [14:14:20] due to the touched l10n cache [14:14:23] but no way around it [14:15:02] oh yeah, apologies :/ (and apologies also for the VERY late-notice request) [14:18:53] RESOLVED: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:19:15] RESOLVED: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:22:54] (FWIW, my perspective on the request that was made in T415876 is something like: the i18n change being reverted here (that's been objected to by folks on the wiki) is a very recent one, AFAICS (from looking at TWN history pages) the previous versions of those messages have been that way since at least 2008, and the updated message would presumably be deployed by next-week's train anyway) [14:22:55] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:23:22] (03Merged) 10jenkins-bot: Update Italian datetime messages (from https://translatewiki.net) [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:23:56] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] [14:24:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1230 (T415786)', diff saved to https://phabricator.wikimedia.org/P88138 and previous config saved to /var/cache/conftool/dbconfig/20260129-142428-marostegui.json [14:24:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:26:24] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, asmartkitten: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:26:41] looking [14:26:45] thanks [14:27:04] and yeah IMHO an out-of-cadence sync with TWN is fine; if the messages hadn’t been reverted on TWN already then I’d be more hesitant about the deploy [14:27:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1230 with weight 0 T415893', diff saved to https://phabricator.wikimedia.org/P88139 and previous config saved to /var/cache/conftool/dbconfig/20260129-142716-marostegui.json [14:27:22] T415893: Switchover s5 master (db1210 -> db1230) - https://phabricator.wikimedia.org/T415893 [14:27:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88140 and previous config saved to /var/cache/conftool/dbconfig/20260129-142725-marostegui.json [14:27:32] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2223.codfw.wmnet with reason: Maintenance [14:27:38] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T415893 [14:27:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88141 and previous config saved to /var/cache/conftool/dbconfig/20260129-142740-marostegui.json [14:27:48] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1230 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235032 (https://phabricator.wikimedia.org/T415893) (owner: 10Gerrit maintenance bot) [14:27:53] well, https://it.wikivoyage.org/wiki/MediaWiki:January has the new lowercase-first version of the message on mwdebug [14:28:34] and so as I might not be able to test on itwiki itself (as they've overridden their messages temporarily), I'd personally call that okay I think [14:29:06] !log Starting s5 eqiad failover from db1210 to db1230 - T415893 [14:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1230 to s5 primary T415893', diff saved to https://phabricator.wikimedia.org/P88142 and previous config saved to /var/cache/conftool/dbconfig/20260129-142953-marostegui.json [14:30:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1210 T415893', diff saved to https://phabricator.wikimedia.org/P88143 and previous config saved to /var/cache/conftool/dbconfig/20260129-143030-marostegui.json [14:31:34] Lucas_WMDE: ^ [14:31:59] sorry, got distracted for a second [14:32:07] no worries :) [14:32:12] so long as everything seems okay your end [14:32:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1210.eqiad.wmnet with reason: Long schema change [14:32:26] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, asmartkitten: Continuing with sync [14:32:29] yeah let’s go [14:35:11] FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:36:35] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] (duration: 12m 39s) [14:36:41] :o [14:36:42] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:36:47] that’s a lot faster than I expected [14:37:04] same here! [14:37:12] “17 languages rebuilt out of 545” [14:37:26] I wonder if it was faster because it didn’t affect en.json, and therefore didn’t change the many languages that end up copying the English message [14:38:24] thanks again for deploying Lucas_WMDE :) I will try and give a fair amount more notice the next time I ask if something can be deployed... [14:38:33] np, thanks for the backport :) [14:38:37] !log UTC afternoon backport+config window done [14:38:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:15] RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:45:31] (03CR) 10Clare Ming: [C:03+2] "looks good !" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) (owner: 10Kareid) [14:47:17] (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v.1.1.7 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) (owner: 10Kareid) [15:01:13] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: DHCP failing for at least 2 ms-be servers in codfw - https://phabricator.wikimedia.org/T415189#11566106 (10cmooney) >>! In T415189#11546923, @jhathaway wrote: > Perhaps there is a race condition with that script updating the... [15:01:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88144 and previous config saved to /var/cache/conftool/dbconfig/20260129-150156-marostegui.json [15:02:05] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:08:37] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance [15:08:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [15:08:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88145 and previous config saved to /var/cache/conftool/dbconfig/20260129-150852-marostegui.json [15:09:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:09:15] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:53] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1210: After schema change [15:10:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1201: After schema change [15:17:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P88148 and previous config saved to /var/cache/conftool/dbconfig/20260129-151705-marostegui.json [15:18:01] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2213 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235054 (https://phabricator.wikimedia.org/T415900) [15:18:11] (03PS1) 10Gerrit maintenance bot: wmnet: Update s5-master alias [dns] - 10https://gerrit.wikimedia.org/r/1235055 (https://phabricator.wikimedia.org/T415900) [15:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:32:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P88151 and previous config saved to /var/cache/conftool/dbconfig/20260129-153215-marostegui.json [15:34:15] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:47:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88154 and previous config saved to /var/cache/conftool/dbconfig/20260129-154725-marostegui.json [15:47:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:47:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2228.codfw.wmnet with reason: Maintenance [15:47:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88155 and previous config saved to /var/cache/conftool/dbconfig/20260129-154751-marostegui.json [15:55:24] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1210: After schema change [15:56:15] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1201: After schema change [16:17:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88158 and previous config saved to /var/cache/conftool/dbconfig/20260129-161741-marostegui.json [16:17:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:18:18] (03PS1) 10Dzahn: admin/nagios/wmcs: offboard akosiaris [puppet] - 10https://gerrit.wikimedia.org/r/1235066 [16:32:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P88159 and previous config saved to /var/cache/conftool/dbconfig/20260129-163250-marostegui.json [16:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [16:33:38] (03PS1) 10Dzahn: cumin/insetup_role_report: send wmcs report to Mark and Levi, not Alex [puppet] - 10https://gerrit.wikimedia.org/r/1235071 [16:33:52] !log dancy@deploy2002 Installing scap version "4.240.0" for 2 host(s) [16:35:43] !log dancy@deploy2002 Installation of scap version "4.240.0" completed for 2 hosts [16:48:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P88160 and previous config saved to /var/cache/conftool/dbconfig/20260129-164800-marostegui.json [16:54:50] (03PS1) 10Bking: apt: mirror opensearch 2 and 3 repos in trixie-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/1235075 (https://phabricator.wikimedia.org/T415699) [16:57:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88161 and previous config saved to /var/cache/conftool/dbconfig/20260129-165701-marostegui.json [16:57:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:00:05] jhathaway and rzl: That opportune time for a Puppet request window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1700). [17:00:05] No Gerrit patches in the queue for this window AFAICS. [17:03:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88162 and previous config saved to /var/cache/conftool/dbconfig/20260129-170308-marostegui.json [17:03:14] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:04:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance [17:05:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2148 (T415786)', diff saved to https://phabricator.wikimedia.org/P88163 and previous config saved to /var/cache/conftool/dbconfig/20260129-170501-marostegui.json [17:12:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P88164 and previous config saved to /var/cache/conftool/dbconfig/20260129-171210-marostegui.json [17:27:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P88166 and previous config saved to /var/cache/conftool/dbconfig/20260129-172718-marostegui.json [17:42:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88167 and previous config saved to /var/cache/conftool/dbconfig/20260129-174226-marostegui.json [17:42:33] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:42:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance [17:42:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1162 (T415786)', diff saved to https://phabricator.wikimedia.org/P88168 and previous config saved to /var/cache/conftool/dbconfig/20260129-174252-marostegui.json [17:52:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:00:05] bd808: gettimeofday() says it's time for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1800) [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1800) [18:00:43] I don't have anything to ship today [18:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:14:08] (03PS1) 10Jdlrobson: Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) [18:14:49] (03PS2) 10Jdlrobson: Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) [18:32:39] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11567357 (10Lupascriptix) I've been having the same issue this week. My script doesn't edit wikidata at all, just pulls demographic information about various entit... [18:49:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T415786)', diff saved to https://phabricator.wikimedia.org/P88170 and previous config saved to /var/cache/conftool/dbconfig/20260129-184938-marostegui.json [18:49:46] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:53:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T415786)', diff saved to https://phabricator.wikimedia.org/P88171 and previous config saved to /var/cache/conftool/dbconfig/20260129-185300-marostegui.json [18:56:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) (owner: 10Jdlrobson) [19:00:05] brennen and andre: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - Utc-7+Utc-0 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1900). [19:04:24] !log 1.46.0-wmf.13 train status (T413804): logs look good, no current blockers. will roll to all wikis after i finish eating a sandwich. [19:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:30] T413804: 1.46.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T413804 [19:04:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P88172 and previous config saved to /var/cache/conftool/dbconfig/20260129-190446-marostegui.json [19:04:55] inquiring minds want to know for operational purposes: is it a good sandwich [19:05:38] i would say somewhat above average [19:06:07] nice [19:08:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P88173 and previous config saved to /var/cache/conftool/dbconfig/20260129-190810-marostegui.json [19:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:19:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P88174 and previous config saved to /var/cache/conftool/dbconfig/20260129-191955-marostegui.json [19:21:11] (03PS1) 10TrainBranchBot: group2 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235099 (https://phabricator.wikimedia.org/T413804) [19:21:14] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by brennen@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235099 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [19:22:40] (03Merged) 10jenkins-bot: group2 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235099 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [19:23:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P88175 and previous config saved to /var/cache/conftool/dbconfig/20260129-192318-marostegui.json [19:28:56] !log brennen@deploy2002 rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.13 refs T413804 [19:29:03] T413804: 1.46.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T413804 [19:30:46] (03PS1) 10Stoyofuku-wmf: Revert "Fix sticky header TOC spacing and TOC disappearing on viewport change" [skins/Vector] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235102 [19:32:08] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [skins/Vector] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235102 (owner: 10Stoyofuku-wmf) [19:35:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T415786)', diff saved to https://phabricator.wikimedia.org/P88176 and previous config saved to /var/cache/conftool/dbconfig/20260129-193503-marostegui.json [19:35:10] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:35:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance [19:35:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88177 and previous config saved to /var/cache/conftool/dbconfig/20260129-193528-marostegui.json [19:38:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T415786)', diff saved to https://phabricator.wikimedia.org/P88178 and previous config saved to /var/cache/conftool/dbconfig/20260129-193829-marostegui.json [19:38:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance [19:38:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88179 and previous config saved to /var/cache/conftool/dbconfig/20260129-193855-marostegui.json [19:52:02] (03PS1) 10Bvibber: Update chart-renderer to 2026-01-29-153835-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235107 (https://phabricator.wikimedia.org/T411319) [19:52:57] any objection to my pulling a service update for chart-renderer in the above^ ? :D [19:54:54] (03CR) 10Kimberly Sarabia: [C:03+1] "Thank you!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235107 (https://phabricator.wikimedia.org/T411319) (owner: 10Bvibber) [19:56:30] (03CR) 10Bvibber: [C:03+2] "self +2'ing for deployment of service code update reviewed by my team" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235107 (https://phabricator.wikimedia.org/T411319) (owner: 10Bvibber) [19:58:26] (03Merged) 10jenkins-bot: Update chart-renderer to 2026-01-29-153835-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235107 (https://phabricator.wikimedia.org/T411319) (owner: 10Bvibber) [20:01:02] !log bvibber@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply [20:01:40] !log bvibber@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply [20:02:36] !log bvibber@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply [20:03:05] !log bvibber@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply [20:03:34] !log bvibber@deploy2002 helmfile [eqiad] START helmfile.d/services/chart-renderer: apply [20:04:05] !log bvibber@deploy2002 helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply [20:06:51] (03PS1) 10Esanders: Enable suggestions BeatFeature on beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235111 (https://phabricator.wikimedia.org/T415504) [20:08:31] edsanders chose violence [20:13:37] (03PS2) 10Esanders: Enable suggestions BetaFeature on beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235111 (https://phabricator.wikimedia.org/T415504) [20:13:47] (03PS1) 10Ryan Kemper: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) [20:14:00] (03PS1) 10Ryan Kemper: sre.elasticsearch.rolling-operation: use boottime for reboot operations [cookbooks] - 10https://gerrit.wikimedia.org/r/1235113 (https://phabricator.wikimedia.org/T410577) [20:27:26] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11567834 (10Benwing2) FWIW I have not been having this issue recently with enwiktionary at least. My Pywikibot settings look like ` user_agent_format = '{script_p... [20:31:47] (03PS2) 10Ryan Kemper: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) [20:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [20:33:29] (03PS1) 10RLazarus: sophroid: Bump to 2026-01-29-200319 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235120 [20:35:00] (03CR) 10Ryan Kemper: "Alright, I think I'm mostly happy with this implementation, but could definitely use some feedback" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper) [20:36:25] (03CR) 10RLazarus: [C:03+2] sophroid: Bump to 2026-01-29-200319 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235120 (owner: 10RLazarus) [20:41:45] (03CR) 10CI reject: [V:04-1] elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper) [20:41:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [20:46:08] (03Merged) 10jenkins-bot: sophroid: Bump to 2026-01-29-200319 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235120 (owner: 10RLazarus) [20:46:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [20:59:40] (03PS1) 10Superpes15: [kaawiki] Add a temporary logo for Wikipedia 25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235129 (https://phabricator.wikimedia.org/T415457) [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T2100). [21:00:05] jdlrobson, jan_drewniak, and Superpes: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:40] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) (owner: 10Jdlrobson) [21:01:29] (03Abandoned) 10Ryan Kemper: Start Blazegraph from systemd unit, without runBlazegraph.sh [puppet] - 10https://gerrit.wikimedia.org/r/956432 (https://phabricator.wikimedia.org/T342361) (owner: 10Gehel) [21:01:46] I'll be deploying for Jdlrobson , so I guess I'll start with my two patches [21:02:37] o/ [21:03:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [skins/Vector] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235102 (owner: 10Stoyofuku-wmf) [21:03:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) (owner: 10Jdlrobson) [21:05:19] Superpes do you need someone do your deploy? I can offer to help with it if so. [21:05:36] (03Merged) 10jenkins-bot: Revert "Fix sticky header TOC spacing and TOC disappearing on viewport change" [skins/Vector] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235102 (owner: 10Stoyofuku-wmf) [21:05:38] Yep thanks jan_drewniak :) [21:07:19] PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100% [21:07:39] RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [21:18:08] (03Merged) 10jenkins-bot: Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235088 (https://phabricator.wikimedia.org/T413378) (owner: 10Jdlrobson) [21:18:29] !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1235102|Revert "Fix sticky header TOC spacing and TOC disappearing on viewport change"]], [[gerrit:1235088|Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" (T413378 T415677)]] [21:18:37] T413378: TAHS search menu appears in wrong position when first initialized with mobile search overlay - https://phabricator.wikimedia.org/T413378 [21:18:38] T415677: Search field doesn't get focus when selecting the magnifying glass on mobile - https://phabricator.wikimedia.org/T415677 [21:20:26] !log jdrewniak@deploy2002 toyofuku, jdlrobson, jdrewniak: Backport for [[gerrit:1235102|Revert "Fix sticky header TOC spacing and TOC disappearing on viewport change"]], [[gerrit:1235088|Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" (T413378 T415677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve [21:20:26] rified there. [21:24:16] !log jdrewniak@deploy2002 toyofuku, jdlrobson, jdrewniak: Continuing with sync [21:24:55] (03PS1) 10Superpes15: [specieswiki] Enable block feature for AbuseFilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235134 (https://phabricator.wikimedia.org/T415802) [21:25:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88180 and previous config saved to /var/cache/conftool/dbconfig/20260129-212552-marostegui.json [21:25:58] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:27:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88181 and previous config saved to /var/cache/conftool/dbconfig/20260129-212714-marostegui.json [21:27:37] !log kareid@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply [21:28:14] !log kareid@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply [21:28:26] !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235102|Revert "Fix sticky header TOC spacing and TOC disappearing on viewport change"]], [[gerrit:1235088|Revert "TypeaheadSearch: ensure fix for mobile keyboard is only applied when the search is using the mobile experience" (T413378 T415677)]] (duration: 09m 56s) [21:28:33] T413378: TAHS search menu appears in wrong position when first initialized with mobile search overlay - https://phabricator.wikimedia.org/T413378 [21:28:33] T415677: Search field doesn't get focus when selecting the magnifying glass on mobile - https://phabricator.wikimedia.org/T415677 [21:29:14] Superpes ok now I can start your change [21:29:29] Thanks [21:29:35] jan_drewniak There are 2 patches [21:29:54] !log rzl@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply [21:30:06] !log rzl@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply [21:30:53] Superpes: yup I got both of them, they can both be done at the same time right? [21:30:54] !log rzl@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/sophroid: apply [21:31:02] Yep absolutely :) [21:31:06] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235129 (https://phabricator.wikimedia.org/T415457) (owner: 10Superpes15) [21:31:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdrewniak@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235134 (https://phabricator.wikimedia.org/T415802) (owner: 10Superpes15) [21:31:07] !log rzl@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/sophroid: apply [21:31:57] (03PS2) 10RLazarus: sophroid: Re-insert readiness probe, as a gRPC probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230544 [21:32:13] (03Merged) 10jenkins-bot: [kaawiki] Add a temporary logo for Wikipedia 25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235129 (https://phabricator.wikimedia.org/T415457) (owner: 10Superpes15) [21:32:30] (03Merged) 10jenkins-bot: [specieswiki] Enable block feature for AbuseFilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235134 (https://phabricator.wikimedia.org/T415802) (owner: 10Superpes15) [21:32:52] !log jdrewniak@deploy2002 Started scap sync-world: Backport for [[gerrit:1235129|[kaawiki] Add a temporary logo for Wikipedia 25 (T415457)]], [[gerrit:1235134|[specieswiki] Enable block feature for AbuseFilter (T415802)]] [21:32:59] T415457: Changes to the Karakalpak Wikipedia logo - https://phabricator.wikimedia.org/T415457 [21:32:59] T415802: Enable the AbuseFilter block action on Wikispecies - https://phabricator.wikimedia.org/T415802 [21:34:53] !log jdrewniak@deploy2002 jdrewniak, superpes: Backport for [[gerrit:1235129|[kaawiki] Add a temporary logo for Wikipedia 25 (T415457)]], [[gerrit:1235134|[specieswiki] Enable block feature for AbuseFilter (T415802)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:34:57] Testing [21:38:15] Everything looks fine to me jan_drewniak sorry for keeping you waiting [21:38:32] Superpes: no problem at all, continuing with the sync :) [21:38:38] !log jdrewniak@deploy2002 jdrewniak, superpes: Continuing with sync [21:41:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P88182 and previous config saved to /var/cache/conftool/dbconfig/20260129-214101-marostegui.json [21:42:20] (03CR) 10RLazarus: [C:03+2] sophroid: Re-insert readiness probe, as a gRPC probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230544 (owner: 10RLazarus) [21:42:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P88183 and previous config saved to /var/cache/conftool/dbconfig/20260129-214223-marostegui.json [21:42:44] !log jdrewniak@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235129|[kaawiki] Add a temporary logo for Wikipedia 25 (T415457)]], [[gerrit:1235134|[specieswiki] Enable block feature for AbuseFilter (T415802)]] (duration: 09m 53s) [21:42:50] T415457: Changes to the Karakalpak Wikipedia logo - https://phabricator.wikimedia.org/T415457 [21:42:51] T415802: Enable the AbuseFilter block action on Wikispecies - https://phabricator.wikimedia.org/T415802 [21:43:17] Many thanks for your assistance jan_drewniak :3 [21:43:44] alright Superpes all done! no problem, the wp25 looks good 👍 [21:44:31] (03Merged) 10jenkins-bot: sophroid: Re-insert readiness probe, as a gRPC probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230544 (owner: 10RLazarus) [21:46:26] !log rzl@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply [21:52:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:56:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P88184 and previous config saved to /var/cache/conftool/dbconfig/20260129-215609-marostegui.json [21:56:35] !log rzl@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply [21:56:48] !log Ran `foreachwikiindblist checkuser-suggested-investigations.dblist extensions/CheckUser/maintenance/populateSicUpdatedTimestamp.php` for T415055 [21:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:53] T415055: Populate sic_updated_timestamp for existing cases - https://phabricator.wikimedia.org/T415055 [21:57:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P88185 and previous config saved to /var/cache/conftool/dbconfig/20260129-215732-marostegui.json [22:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T2200) [22:01:03] (03PS1) 10RLazarus: sophroid: Re-remove readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235149 [22:03:23] (03CR) 10Zabe: [C:03+2] BETA: Stop writing to il_to [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [22:04:10] (03Merged) 10jenkins-bot: BETA: Stop writing to il_to [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [22:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:05:31] (03CR) 10Zabe: [C:03+2] Start reading from il_target_id on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233947 (https://phabricator.wikimedia.org/T413669) (owner: 10Zabe) [22:05:50] (03PS2) 10RLazarus: sophroid: Re-remove readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235149 [22:06:34] (03Merged) 10jenkins-bot: Start reading from il_target_id on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233947 (https://phabricator.wikimedia.org/T413669) (owner: 10Zabe) [22:06:58] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1233947|Start reading from il_target_id on enwiki (T413669)]] [22:07:03] T413669: Set imagelinks migration to read new - https://phabricator.wikimedia.org/T413669 [22:07:11] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl2001.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [22:07:47] (03CR) 10RLazarus: [C:03+2] sophroid: Re-remove readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235149 (owner: 10RLazarus) [22:08:11] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:08:55] !log zabe@deploy2002 zabe: Backport for [[gerrit:1233947|Start reading from il_target_id on enwiki (T413669)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:09:47] !log zabe@deploy2002 zabe: Continuing with sync [22:10:09] (03Merged) 10jenkins-bot: sophroid: Re-remove readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235149 (owner: 10RLazarus) [22:11:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88186 and previous config saved to /var/cache/conftool/dbconfig/20260129-221118-marostegui.json [22:11:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [22:11:23] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance [22:11:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88187 and previous config saved to /var/cache/conftool/dbconfig/20260129-221131-marostegui.json [22:12:39] !log rzl@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply [22:12:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88188 and previous config saved to /var/cache/conftool/dbconfig/20260129-221242-marostegui.json [22:12:45] !log rzl@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply [22:12:53] !log rzl@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/sophroid: apply [22:12:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance [22:13:06] !log rzl@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/sophroid: apply [22:13:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88189 and previous config saved to /var/cache/conftool/dbconfig/20260129-221306-marostegui.json [22:13:57] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1233947|Start reading from il_target_id on enwiki (T413669)]] (duration: 06m 59s) [22:14:02] T413669: Set imagelinks migration to read new - https://phabricator.wikimedia.org/T413669 [22:19:12] (03PS1) 10Zabe: Update composer requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235153 [22:20:09] (03PS1) 10Zabe: Prepare pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235155 (https://phabricator.wikimedia.org/T413273) [22:20:12] (03PS1) 10Zabe: Activate pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235156 (https://phabricator.wikimedia.org/T413273) [22:23:34] (03PS1) 10Zabe: manage-dblist: Only add trailing new line if dblist is not empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235158 [22:26:11] (03PS2) 10Zabe: manage-dblist: Only add trailing new line if dblist is not empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235158 [22:26:24] (03CR) 10Zabe: [C:03+2] Update composer requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235153 (owner: 10Zabe) [22:26:32] (03CR) 10Zabe: [C:03+2] Prepare pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235155 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [22:27:18] (03Merged) 10jenkins-bot: Update composer requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235153 (owner: 10Zabe) [22:27:20] (03Merged) 10jenkins-bot: Prepare pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235155 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [22:27:50] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1235155|Prepare pplwiki (T413273)]] [22:27:56] T413273: Create Nawat Wikipedia - https://phabricator.wikimedia.org/T413273 [22:29:50] !log zabe@deploy2002 zabe: Backport for [[gerrit:1235155|Prepare pplwiki (T413273)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:30:41] !log zabe@deploy2002 zabe: Continuing with sync [22:32:21] (03PS2) 10Zabe: Activate pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235156 (https://phabricator.wikimedia.org/T413273) [22:32:24] (03CR) 10Zabe: [C:03+2] Activate pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235156 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [22:33:11] (03Merged) 10jenkins-bot: Activate pplwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235156 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [22:33:19] (03PS3) 10Zabe: manage-dblist: Only add trailing new line if dblist is not empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235158 [22:33:26] (03CR) 10Zabe: [C:03+2] manage-dblist: Only add trailing new line if dblist is not empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235158 (owner: 10Zabe) [22:34:15] (03Merged) 10jenkins-bot: manage-dblist: Only add trailing new line if dblist is not empty [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235158 (owner: 10Zabe) [22:34:51] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235155|Prepare pplwiki (T413273)]] (duration: 07m 00s) [22:34:56] T413273: Create Nawat Wikipedia - https://phabricator.wikimedia.org/T413273 [22:35:42] Ah its been so long [22:36:24] addWiki breaking [22:39:03] (03Abandoned) 10Wargo: Revert "REST: enable the site.v1 module" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234437 (https://phabricator.wikimedia.org/T415771) (owner: 10Wargo) [22:42:06] is pplwiki gonna happen today? 🥳 [22:44:17] !log zabe@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [22:44:33] Yep:) [22:45:07] nice [22:45:15] !log zabe@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [22:45:28] Martin said addwiki only created half the database tables for kajwiki, so he had to create the rest manually [22:45:33] has that problem been fixed? [22:49:40] no:/ [22:50:04] ah :\ [22:50:15] but it will still happen? 😅 [22:52:21] yes, I just have to manually check everything else is now done correctly, but I think it should be fine [22:53:55] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1235156|Activate pplwiki (T413273)]] [22:54:01] T413273: Create Nawat Wikipedia - https://phabricator.wikimedia.org/T413273 [22:55:56] !log zabe@deploy2002 zabe: Backport for [[gerrit:1235156|Activate pplwiki (T413273)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:57:33] !log zabe@deploy2002 zabe: Continuing with sync [23:01:38] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235156|Activate pplwiki (T413273)]] (duration: 07m 42s) [23:01:44] T413273: Create Nawat Wikipedia - https://phabricator.wikimedia.org/T413273 [23:02:19] (03PS1) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235163 (https://phabricator.wikimedia.org/T413273) [23:02:21] (03CR) 10Zabe: [C:03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235163 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [23:03:13] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235163 (https://phabricator.wikimedia.org/T413273) (owner: 10Zabe) [23:03:35] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1235163|Update interwiki cache (T413273)]] [23:05:33] !log zabe@deploy2002 zabe: Backport for [[gerrit:1235163|Update interwiki cache (T413273)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [23:06:04] !log zabe@deploy2002 zabe: Continuing with sync [23:06:39] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11568268 (10Aklapper) @Lupascriptix, @Benwing2: Do your issues involve //GitHub action tests//? What is the full error message output? [23:11:26] !log zabe@deploy2002 Started scap sync-world: (no justification provided) [23:14:13] !log zabe@deploy2002 Finished scap sync-world: (no justification provided) (duration: 02m 51s) [23:15:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88192 and previous config saved to /var/cache/conftool/dbconfig/20260129-231542-marostegui.json [23:15:51] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:24:25] (03CR) 10Brouberol: [C:03+1] apt: mirror opensearch 2 and 3 repos in trixie-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/1235075 (https://phabricator.wikimedia.org/T415699) (owner: 10Bking) [23:30:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P88195 and previous config saved to /var/cache/conftool/dbconfig/20260129-233052-marostegui.json [23:45:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88198 and previous config saved to /var/cache/conftool/dbconfig/20260129-234526-marostegui.json [23:45:32] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:46:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P88199 and previous config saved to /var/cache/conftool/dbconfig/20260129-234601-marostegui.json [23:46:52] !log zabe@deploy2002 mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # T414241,T415051 [23:46:59] T414241: Add Wikidata support for kaiwiki - https://phabricator.wikimedia.org/T414241 [23:46:59] T415051: Add Wikidata support for pplwiki - https://phabricator.wikimedia.org/T415051 [23:51:56] zabe, kajwiki, not kaiwiki (unless you also created kaiwiki and I didn't notice yet)? [23:52:38] ah yeah, linked the wrong wikidata task