[00:00:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P88200 and previous config saved to /var/cache/conftool/dbconfig/20260130-000034-marostegui.json [00:01:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88201 and previous config saved to /var/cache/conftool/dbconfig/20260130-000109-marostegui.json [00:01:16] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [00:01:27] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance [00:01:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1197 (T415786)', diff saved to https://phabricator.wikimedia.org/P88202 and previous config saved to /var/cache/conftool/dbconfig/20260130-000135-marostegui.json [00:09:27] FIRING: [9x] SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P88203 and previous config saved to /var/cache/conftool/dbconfig/20260130-001543-marostegui.json [00:25:47] !log rzl@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply [00:25:58] !log rzl@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply [00:28:46] (03CR) 10Zabe: "ping:)" [puppet] - 10https://gerrit.wikimedia.org/r/1225119 (https://phabricator.wikimedia.org/T371662) (owner: 10Zabe) [00:30:02] (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [00:30:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88204 and previous config saved to /var/cache/conftool/dbconfig/20260130-003051-marostegui.json [00:31:00] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [00:31:10] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance [00:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [00:40:39] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1235179 [00:40:39] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1235179 (owner: 10TrainBranchBot) [00:53:21] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1235179 (owner: 10TrainBranchBot) [01:05:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T415786)', diff saved to https://phabricator.wikimedia.org/P88205 and previous config saved to /var/cache/conftool/dbconfig/20260130-010500-marostegui.json [01:05:20] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [01:11:39] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1235180 [01:11:39] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1235180 (owner: 10TrainBranchBot) [01:20:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P88206 and previous config saved to /var/cache/conftool/dbconfig/20260130-012008-marostegui.json [01:35:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P88207 and previous config saved to /var/cache/conftool/dbconfig/20260130-013517-marostegui.json [01:35:58] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1235180 (owner: 10TrainBranchBot) [01:50:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T415786)', diff saved to https://phabricator.wikimedia.org/P88208 and previous config saved to /var/cache/conftool/dbconfig/20260130-015025-marostegui.json [01:50:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [01:50:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance [01:52:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:59:38] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance [01:59:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88209 and previous config saved to /var/cache/conftool/dbconfig/20260130-015946-marostegui.json [01:59:53] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [02:01:01] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:14:28] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 13m 27s) [02:36:58] (03PS1) 10RLazarus: sophroid: Fork app.generic.container template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235191 [02:36:59] (03PS1) 10RLazarus: sophroid: Combine our own volumeMounts with the ones from the template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235192 [02:36:59] (03PS1) 10RLazarus: sophroid: Move our custom arguments into the chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235193 [02:41:21] (03PS2) 10RLazarus: sophroid: Move our custom arguments into the chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235193 [03:16:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance [03:16:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88210 and previous config saved to /var/cache/conftool/dbconfig/20260130-031644-marostegui.json [03:16:54] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:33:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88211 and previous config saved to /var/cache/conftool/dbconfig/20260130-033318-marostegui.json [03:33:28] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:38:35] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11568636 (10Lupascriptix) Hi, No to the Github action tests- I'm sorry, I thought this was the right place to put this post, but I can make a new task and delete m... [03:48:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P88212 and previous config saved to /var/cache/conftool/dbconfig/20260130-034828-marostegui.json [04:03:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P88213 and previous config saved to /var/cache/conftool/dbconfig/20260130-040336-marostegui.json [04:09:41] FIRING: [9x] SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:10:06] (03PS1) 10Gerrit maintenance bot: Add ur to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1235199 (https://phabricator.wikimedia.org/T415960) [04:10:54] (03CR) 10CI reject: [V:04-1] Add ur to langlist helper [dns] - 10https://gerrit.wikimedia.org/r/1235199 (https://phabricator.wikimedia.org/T415960) (owner: 10Gerrit maintenance bot) [04:18:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88214 and previous config saved to /var/cache/conftool/dbconfig/20260130-041845-marostegui.json [04:18:53] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:19:03] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2225.codfw.wmnet with reason: Maintenance [04:19:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2225 (T415786)', diff saved to https://phabricator.wikimedia.org/P88215 and previous config saved to /var/cache/conftool/dbconfig/20260130-041910-marostegui.json [04:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:50:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88216 and previous config saved to /var/cache/conftool/dbconfig/20260130-045045-marostegui.json [04:50:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [05:05:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P88217 and previous config saved to /var/cache/conftool/dbconfig/20260130-050555-marostegui.json [05:09:15] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:21:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P88218 and previous config saved to /var/cache/conftool/dbconfig/20260130-052104-marostegui.json [05:34:15] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:35:11] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:36:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88219 and previous config saved to /var/cache/conftool/dbconfig/20260130-053612-marostegui.json [05:36:27] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [05:36:30] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance [05:36:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1233 (T415786)', diff saved to https://phabricator.wikimedia.org/P88220 and previous config saved to /var/cache/conftool/dbconfig/20260130-053638-marostegui.json [05:45:42] !log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis pplwiki in section s5 [05:48:49] (03PS1) 10Marostegui: Revert "db1209: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1235209 [05:48:54] (03CR) 10Marostegui: [C:04-2] "Not yet" [puppet] - 10https://gerrit.wikimedia.org/r/1235209 (owner: 10Marostegui) [05:51:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T415786)', diff saved to https://phabricator.wikimedia.org/P88221 and previous config saved to /var/cache/conftool/dbconfig/20260130-055112-marostegui.json [05:51:19] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [05:52:04] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis pplwiki in section s5 [05:52:18] !log marostegui@cumin1003 START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis pplwiki in section s5 [05:52:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:55:02] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis pplwiki in section s5 [06:04:27] FIRING: [9x] SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:06:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P88222 and previous config saved to /var/cache/conftool/dbconfig/20260130-060621-marostegui.json [06:19:25] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1162 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/1235227 (https://phabricator.wikimedia.org/T415983) [06:21:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-062130-marostegui.json [06:36:48] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-063643-marostegui.json [06:37:04] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2226.codfw.wmnet with reason: Maintenance [06:37:07] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [06:37:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2226 (T415786)', diff saved to https://phabricator.wikimedia.org/P88225 and previous config saved to /var/cache/conftool/dbconfig/20260130-063712-marostegui.json [07:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260130T0700) [07:11:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T415786)', diff saved to https://phabricator.wikimedia.org/P88227 and previous config saved to /var/cache/conftool/dbconfig/20260130-071153-marostegui.json [07:12:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:27:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P88228 and previous config saved to /var/cache/conftool/dbconfig/20260130-072702-marostegui.json [07:30:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T415786)', diff saved to https://phabricator.wikimedia.org/P88229 and previous config saved to /var/cache/conftool/dbconfig/20260130-073032-marostegui.json [07:30:43] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:42:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P88230 and previous config saved to /var/cache/conftool/dbconfig/20260130-074213-marostegui.json [07:45:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P88231 and previous config saved to /var/cache/conftool/dbconfig/20260130-074540-marostegui.json [07:47:44] FIRING: SLOMetricAbsent: charts-client-side-availability-v1 - https://slo.wikimedia.org/?search=charts-client-side-availability-v1 - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:49:24] FIRING: SLOMetricAbsent: edit-check-pre-save-checks-ratio - https://slo.wikimedia.org/?search=edit-check-pre-save-checks-ratio - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:52:44] RESOLVED: SLOMetricAbsent: charts-client-side-availability-v1 - https://slo.wikimedia.org/?search=charts-client-side-availability-v1 - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:54:24] RESOLVED: SLOMetricAbsent: edit-check-pre-save-checks-ratio - https://slo.wikimedia.org/?search=edit-check-pre-save-checks-ratio - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:57:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T415786)', diff saved to https://phabricator.wikimedia.org/P88232 and previous config saved to /var/cache/conftool/dbconfig/20260130-075721-marostegui.json [07:57:28] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance [07:57:32] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260130T0800) [08:00:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P88233 and previous config saved to /var/cache/conftool/dbconfig/20260130-080051-marostegui.json [08:16:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T415786)', diff saved to https://phabricator.wikimedia.org/P88234 and previous config saved to /var/cache/conftool/dbconfig/20260130-081559-marostegui.json [08:16:21] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2238.codfw.wmnet with reason: Maintenance [08:16:27] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:16:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2238 (T415786)', diff saved to https://phabricator.wikimedia.org/P88235 and previous config saved to /var/cache/conftool/dbconfig/20260130-081629-marostegui.json [08:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:33:15] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11569031 (10Aklapper) > Hi, No to the Github action tests- I'm sorry, I thought this was the right place to put this post **This ticket is only about Github actio... [08:44:28] (03PS1) 10Dpogorzelski: aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1235307 [08:44:53] (03Abandoned) 10Dpogorzelski: aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1234291 (owner: 10Dpogorzelski) [08:52:24] (03CR) 10Gehel: [C:03+1] "lgtm, see previous approvals in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1234291" [puppet] - 10https://gerrit.wikimedia.org/r/1235307 (owner: 10Dpogorzelski) [08:54:13] (03CR) 10Dpogorzelski: [C:03+2] aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1235307 (owner: 10Dpogorzelski) [09:21:00] (03CR) 10Btullis: [C:03+1] aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1235307 (owner: 10Dpogorzelski) [09:27:21] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1254.eqiad.wmnet with reason: Maintenance [09:27:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1254 (T415786)', diff saved to https://phabricator.wikimedia.org/P88236 and previous config saved to /var/cache/conftool/dbconfig/20260130-092729-marostegui.json [09:27:36] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:50:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T415786)', diff saved to https://phabricator.wikimedia.org/P88237 and previous config saved to /var/cache/conftool/dbconfig/20260130-095042-marostegui.json [09:50:57] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:52:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:04:31] !log fnegri@cumin1003 START - Cookbook sre.wikireplicas.add-wiki for database pplwiki (T415050) [10:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:04:42] T415050: [wikireplicas] Create views for new wiki pplwiki - https://phabricator.wikimedia.org/T415050 [10:05:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P88238 and previous config saved to /var/cache/conftool/dbconfig/20260130-100552-marostegui.json [10:21:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P88240 and previous config saved to /var/cache/conftool/dbconfig/20260130-102102-marostegui.json [10:25:00] (03PS1) 10Dpogorzelski: ml_builder: add profile::docker::ferm [puppet] - 10https://gerrit.wikimedia.org/r/1235314 [10:25:29] (03CR) 10Volans: [C:03+1] "Thanks for the patch Daniel, LGTM. If we ship it today/tomorrow they will get it already from Feb. 1st." [puppet] - 10https://gerrit.wikimedia.org/r/1235071 (owner: 10Dzahn) [10:26:01] (03CR) 10Dpogorzelski: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1235314 (owner: 10Dpogorzelski) [10:31:54] (03CR) 10Brouberol: [C:03+1] ml_builder: add profile::docker::ferm [puppet] - 10https://gerrit.wikimedia.org/r/1235314 (owner: 10Dpogorzelski) [10:32:35] (03CR) 10Dpogorzelski: [C:03+2] ml_builder: add profile::docker::ferm [puppet] - 10https://gerrit.wikimedia.org/r/1235314 (owner: 10Dpogorzelski) [10:36:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-103610-marostegui.json [10:36:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:00:30] (03CR) 10Zabe: "can be abandoned" [dns] - 10https://gerrit.wikimedia.org/r/1235199 (https://phabricator.wikimedia.org/T415960) (owner: 10Gerrit maintenance bot) [11:01:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T415786)', diff saved to https://phabricator.wikimedia.org/P88242 and previous config saved to /var/cache/conftool/dbconfig/20260130-110122-marostegui.json [11:01:29] !log fnegri@cumin1003 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database pplwiki (T415050) [11:01:45] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:02:04] T415050: [wikireplicas] Create views for new wiki pplwiki - https://phabricator.wikimedia.org/T415050 [11:16:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-111633-marostegui.json [11:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:21:29] (03CR) 10Scott French: [C:03+1] sophroid: Fork app.generic.container template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235191 (owner: 10RLazarus) [11:21:31] (03CR) 10Scott French: [C:03+1] sophroid: Combine our own volumeMounts with the ones from the template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235192 (owner: 10RLazarus) [11:23:38] (03CR) 10Scott French: "Thanks, Reuven! One question about indentation, but otherwise looks good." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235193 (owner: 10RLazarus) [11:30:03] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1209: After schema change [11:31:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P88244 and previous config saved to /var/cache/conftool/dbconfig/20260130-113146-marostegui.json [11:32:16] (03CR) 10Marostegui: Revert "db1209: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1235209 (owner: 10Marostegui) [11:32:21] (03CR) 10Marostegui: [C:03+2] Revert "db1209: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1235209 (owner: 10Marostegui) [11:33:16] !log marostegui@cumin1003 END (ERROR) - Cookbook sre.mysql.newpool (exit_code=97) pool db1209: After schema change [11:33:26] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1209: After schema change [11:38:52] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance [11:39:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2152 (T415786)', diff saved to https://phabricator.wikimedia.org/P88246 and previous config saved to /var/cache/conftool/dbconfig/20260130-113900-marostegui.json [11:39:06] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:46:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T415786)', diff saved to https://phabricator.wikimedia.org/P88248 and previous config saved to /var/cache/conftool/dbconfig/20260130-114654-marostegui.json [11:47:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:47:12] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1259.eqiad.wmnet with reason: Maintenance [11:47:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1259 (T415786)', diff saved to https://phabricator.wikimedia.org/P88249 and previous config saved to /var/cache/conftool/dbconfig/20260130-114719-marostegui.json [11:49:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T415786)', diff saved to https://phabricator.wikimedia.org/P88251 and previous config saved to /var/cache/conftool/dbconfig/20260130-114913-marostegui.json [12:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260130T0800) [12:00:05] jelto, arnoldokoth, mutante, and arnaudb: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) GitLab version upgrades deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260130T1200). [12:00:13] 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.3 point update - https://phabricator.wikimedia.org/T414179#11569372 (10MoritzMuehlenhoff) [12:04:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P88253 and previous config saved to /var/cache/conftool/dbconfig/20260130-120423-marostegui.json [12:06:39] (03CR) 10Muehlenhoff: [C:03+1] "Looks good, I'll go ahead and merge" [puppet] - 10https://gerrit.wikimedia.org/r/1234520 (owner: 10Clare Ming) [12:09:29] (03CR) 10Muehlenhoff: [C:03+2] Remove old ssh key for cjming [puppet] - 10https://gerrit.wikimedia.org/r/1234520 (owner: 10Clare Ming) [12:18:56] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1209: After schema change [12:19:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P88256 and previous config saved to /var/cache/conftool/dbconfig/20260130-121934-marostegui.json [12:20:48] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1230336 (https://phabricator.wikimedia.org/T273950) (owner: 10Majavah) [12:28:48] (03CR) 10Ladsgroup: "Thanks. I‌ will deploy this once I'm back from SRE‌ summit!" [puppet] - 10https://gerrit.wikimedia.org/r/1225119 (https://phabricator.wikimedia.org/T371662) (owner: 10Zabe) [12:33:10] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [12:34:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T415786)', diff saved to https://phabricator.wikimedia.org/P88257 and previous config saved to /var/cache/conftool/dbconfig/20260130-123442-marostegui.json [12:35:02] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance [12:35:08] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:35:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2154 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-123510-marostegui.json [12:39:56] (03CR) 10Muehlenhoff: apt: mirror opensearch 2 and 3 repos in trixie-wikimedia (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1235075 (https://phabricator.wikimedia.org/T415699) (owner: 10Bking) [12:45:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T415786)', diff saved to https://phabricator.wikimedia.org/P88259 and previous config saved to /var/cache/conftool/dbconfig/20260130-124515-marostegui.json [12:45:24] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:00:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P88260 and previous config saved to /var/cache/conftool/dbconfig/20260130-130024-marostegui.json [13:02:25] RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:15:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P88261 and previous config saved to /var/cache/conftool/dbconfig/20260130-131533-marostegui.json [13:24:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T415786)', diff saved to https://phabricator.wikimedia.org/P88262 and previous config saved to /var/cache/conftool/dbconfig/20260130-132452-marostegui.json [13:25:00] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:30:48] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T415786)', diff saved to https://phabricator.wikimedia.org/P88263 and previous config saved to /var/cache/conftool/dbconfig/20260130-133045-marostegui.json [13:30:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:31:03] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance [13:31:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88264 and previous config saved to /var/cache/conftool/dbconfig/20260130-133111-marostegui.json [13:40:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P88265 and previous config saved to /var/cache/conftool/dbconfig/20260130-134001-marostegui.json [13:41:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88266 and previous config saved to /var/cache/conftool/dbconfig/20260130-134117-marostegui.json [13:41:48] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:55:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-135513-marostegui.json [13:56:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P88268 and previous config saved to /var/cache/conftool/dbconfig/20260130-135627-marostegui.json [14:04:42] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:10:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T415786)', diff saved to https://phabricator.wikimedia.org/P88269 and previous config saved to /var/cache/conftool/dbconfig/20260130-141026-marostegui.json [14:10:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [14:10:46] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:11:23] !log puppet enabled and services repooled on titan2001 T410152 [14:11:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:30] T410152: Disk space saturation (/srv) on Titan hosts - https://phabricator.wikimedia.org/T410152 [14:11:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P88270 and previous config saved to /var/cache/conftool/dbconfig/20260130-141139-marostegui.json [14:26:50] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88271 and previous config saved to /var/cache/conftool/dbconfig/20260130-142648-marostegui.json [14:26:58] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:27:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance [14:27:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2164 (T415786)', diff saved to https://phabricator.wikimedia.org/P88272 and previous config saved to /var/cache/conftool/dbconfig/20260130-142714-marostegui.json [14:39:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T415786)', diff saved to https://phabricator.wikimedia.org/P88273 and previous config saved to /var/cache/conftool/dbconfig/20260130-143913-marostegui.json [14:39:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:54:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P88274 and previous config saved to /var/cache/conftool/dbconfig/20260130-145421-marostegui.json [15:09:15] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P88275 and previous config saved to /var/cache/conftool/dbconfig/20260130-150929-marostegui.json [15:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:24:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T415786)', diff saved to https://phabricator.wikimedia.org/P88276 and previous config saved to /var/cache/conftool/dbconfig/20260130-152438-marostegui.json [15:24:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:24:59] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance [15:25:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2165 (T415786)', diff saved to https://phabricator.wikimedia.org/P88277 and previous config saved to /var/cache/conftool/dbconfig/20260130-152507-marostegui.json [15:34:15] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:35:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2165 (T415786)', diff saved to https://phabricator.wikimedia.org/P88278 and previous config saved to /var/cache/conftool/dbconfig/20260130-153508-marostegui.json [15:35:17] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:48:04] PROBLEM - Host mr1-esams.oob IPv6 is DOWN: CRITICAL - Host Unreachable (2a00:1188:5:e::4) [15:50:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P88279 and previous config saved to /var/cache/conftool/dbconfig/20260130-155017-marostegui.json [15:58:10] RECOVERY - Host mr1-esams.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 84.07 ms [16:00:33] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance [16:00:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [16:01:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88280 and previous config saved to /var/cache/conftool/dbconfig/20260130-160101-marostegui.json [16:01:08] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:05:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-160525-marostegui.json [16:11:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88282 and previous config saved to /var/cache/conftool/dbconfig/20260130-161112-marostegui.json [16:11:24] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:20:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2165 (T415786)', diff saved to https://phabricator.wikimedia.org/P88283 and previous config saved to /var/cache/conftool/dbconfig/20260130-162038-marostegui.json [16:20:44] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:20:55] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance [16:21:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88284 and previous config saved to /var/cache/conftool/dbconfig/20260130-162103-marostegui.json [16:26:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P88285 and previous config saved to /var/cache/conftool/dbconfig/20260130-162620-marostegui.json [16:31:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88286 and previous config saved to /var/cache/conftool/dbconfig/20260130-163108-marostegui.json [16:31:16] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:33:10] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [16:41:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P88287 and previous config saved to /var/cache/conftool/dbconfig/20260130-164129-marostegui.json [16:46:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P88288 and previous config saved to /var/cache/conftool/dbconfig/20260130-164617-marostegui.json [16:56:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88289 and previous config saved to /var/cache/conftool/dbconfig/20260130-165637-marostegui.json [16:56:44] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:56:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance [17:01:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P88291 and previous config saved to /var/cache/conftool/dbconfig/20260130-170125-marostegui.json [17:05:17] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance [17:05:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88292 and previous config saved to /var/cache/conftool/dbconfig/20260130-170525-marostegui.json [17:05:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:15:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88293 and previous config saved to /var/cache/conftool/dbconfig/20260130-171539-marostegui.json [17:15:47] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:16:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88294 and previous config saved to /var/cache/conftool/dbconfig/20260130-171635-marostegui.json [17:16:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance [17:17:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88295 and previous config saved to /var/cache/conftool/dbconfig/20260130-171701-marostegui.json [17:27:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88297 and previous config saved to /var/cache/conftool/dbconfig/20260130-172701-marostegui.json [17:27:08] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:30:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P88298 and previous config saved to /var/cache/conftool/dbconfig/20260130-173048-marostegui.json [17:34:55] 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q2:install (1) SSD each into franio200[1-3] - https://phabricator.wikimedia.org/T405982#11570200 (10Dwisehaupt) [17:35:12] 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q2:install (1) SSD each into franio200[1-3] - https://phabricator.wikimedia.org/T405982#11570201 (10Dwisehaupt) 05Open→03Resolved Disks were added to the 3 hosts this morning and they all show up properly in dmesg. Thanks for the help with this. [17:42:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P88299 and previous config saved to /var/cache/conftool/dbconfig/20260130-174211-marostegui.json [17:45:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P88300 and previous config saved to /var/cache/conftool/dbconfig/20260130-174556-marostegui.json [17:57:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P88301 and previous config saved to /var/cache/conftool/dbconfig/20260130-175720-marostegui.json [18:01:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88302 and previous config saved to /var/cache/conftool/dbconfig/20260130-180106-marostegui.json [18:01:12] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:01:23] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [18:01:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88303 and previous config saved to /var/cache/conftool/dbconfig/20260130-180131-marostegui.json [18:04:24] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1013.eqiad.wmnet [18:04:40] !log robh@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts backup1013.eqiad.wmnet [18:04:42] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:08:03] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1013.eqiad.wmnet [18:08:22] !log robh@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts backup1013.eqiad.wmnet [18:08:49] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1013.eqiad.wmnet [18:08:54] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts backup1013.eqiad.wmnet [18:09:39] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1013.eqiad.wmnet [18:10:06] all that nonsense is me [18:10:27] moving around new firmware flags and had perm issues not allowing script to see them so had to fire off the start (but not proceed) with it a few times. [18:10:45] i wish it stated idrac firmware cuz its non evasive but oh well [18:11:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88304 and previous config saved to /var/cache/conftool/dbconfig/20260130-181143-marostegui.json [18:11:51] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:12:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2167 (T415786)', diff saved to https://phabricator.wikimedia.org/P88305 and previous config saved to /var/cache/conftool/dbconfig/20260130-181228-marostegui.json [18:12:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance [18:12:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88306 and previous config saved to /var/cache/conftool/dbconfig/20260130-181244-marostegui.json [18:19:36] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts backup1013.eqiad.wmnet [18:20:18] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1013.eqiad.wmnet [18:21:58] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1014.eqiad.wmnet [18:22:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88307 and previous config saved to /var/cache/conftool/dbconfig/20260130-182241-marostegui.json [18:22:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:26:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P88308 and previous config saved to /var/cache/conftool/dbconfig/20260130-182653-marostegui.json [18:33:44] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts backup1014.eqiad.wmnet [18:34:10] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1047.eqiad.wmnet [18:37:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P88309 and previous config saved to /var/cache/conftool/dbconfig/20260130-183749-marostegui.json [18:42:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P88310 and previous config saved to /var/cache/conftool/dbconfig/20260130-184202-marostegui.json [18:50:25] (03PS2) 10RLazarus: sophroid: Fork app.generic.container template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235191 [18:50:25] (03PS2) 10RLazarus: sophroid: Combine our own volumeMounts with the ones from the template [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235192 [18:50:25] (03PS3) 10RLazarus: sophroid: Move our custom arguments into the chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235193 [18:51:42] !log robh@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts backup1013.eqiad.wmnet [18:53:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P88311 and previous config saved to /var/cache/conftool/dbconfig/20260130-185302-marostegui.json [18:53:40] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts backup1014.eqiad.wmnet [18:54:20] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1047.eqiad.wmnet [18:54:41] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1046.eqiad.wmnet [18:56:03] (03CR) 10RLazarus: "Great catch, thank you." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235193 (owner: 10RLazarus) [18:57:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88312 and previous config saved to /var/cache/conftool/dbconfig/20260130-185713-marostegui.json [18:57:21] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:57:33] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance [18:57:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88313 and previous config saved to /var/cache/conftool/dbconfig/20260130-185741-marostegui.json [19:07:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88314 and previous config saved to /var/cache/conftool/dbconfig/20260130-190754-marostegui.json [19:07:55] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1046.eqiad.wmnet [19:08:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:08:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88315 and previous config saved to /var/cache/conftool/dbconfig/20260130-190811-marostegui.json [19:08:28] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance [19:08:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88316 and previous config saved to /var/cache/conftool/dbconfig/20260130-190836-marostegui.json [19:09:18] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1045.eqiad.wmnet [19:18:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88317 and previous config saved to /var/cache/conftool/dbconfig/20260130-191804-marostegui.json [19:18:13] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:23:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-192303-marostegui.json [19:25:10] FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and 208.80.153.215 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [19:29:17] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts backup1014.eqiad.wmnet [19:29:24] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1045.eqiad.wmnet [19:30:10] RESOLVED: [2x] BFDdown: BFD session down between cr1-eqiad and 208.80.153.215 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [19:32:42] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1044.eqiad.wmnet [19:32:57] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1043.eqiad.wmnet [19:33:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P88320 and previous config saved to /var/cache/conftool/dbconfig/20260130-193312-marostegui.json [19:33:14] !log robh@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1042.eqiad.wmnet [19:38:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P88321 and previous config saved to /var/cache/conftool/dbconfig/20260130-193815-marostegui.json [19:45:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:48:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P88322 and previous config saved to /var/cache/conftool/dbconfig/20260130-194821-marostegui.json [19:51:07] hey there SREs -- topranks and raine are on call? we (content transform team) might need to do an emergency friday deploy of a new version of parsoid to fix some citation-corruption issues on dewiki. [19:53:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88323 and previous config saved to /var/cache/conftool/dbconfig/20260130-195325-marostegui.json [19:53:36] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:53:43] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance [19:53:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1192 (T415786)', diff saved to https://phabricator.wikimedia.org/P88324 and previous config saved to /var/cache/conftool/dbconfig/20260130-195350-marostegui.json [19:54:15] cscott: i think deploy is ok from a releng perspective; logs are more or less ok at the moment (sort of noisy but not at a high volume). (cc: thcipriani) [19:54:27] not speaking for sre obviously. [19:55:29] (03PS2) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.23.0-a13.1 [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235384 (https://phabricator.wikimedia.org/T415888) [19:55:56] (03PS1) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.23.0-a13.1 [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235386 (https://phabricator.wikimedia.org/T415328) [19:56:09] cscott: despite the topic we're actually All™️ on call right now -- folks are still traveling back home from the offsite [19:57:10] please take an even more cautious than usual view of "is this *really* an emergency that can't wait, and am I *really* sure this fix is safe," because our coverage is very thin -- but after that if you still think it's the right move, go ahead and I'll be nearby if needed [19:59:04] to be specific, the patches to deploy would be https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1235386 and https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1235384 and the bug is (the latest instance of) https://phabricator.wikimedia.org/T411238, a corruption in tags caused by nearby edits. I'm also supposed to cc: thcipriani according to wiki. :) We can self-deploy with spiderpig. [20:00:01] The new version of parsoid is on master and live on beta now, and we're running our usual QA process (https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing), but that's not expected to finish for a few hours yet. [20:00:59] But we wanted to start the conversation early. The info that coverage is thin is useful, I'll bring that back to the group. If our QA looks dodgy in any way or we decide prudence is best, we'll do a 'normal' backport deploy on monday morning. [20:03:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88325 and previous config saved to /var/cache/conftool/dbconfig/20260130-200329-marostegui.json [20:03:38] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance [20:03:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:04:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T415786)', diff saved to https://phabricator.wikimedia.org/P88326 and previous config saved to /var/cache/conftool/dbconfig/20260130-200401-marostegui.json [20:04:04] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1044.eqiad.wmnet [20:04:06] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1043.eqiad.wmnet [20:05:25] cscott: cc ack'd and appreciated :) [20:05:29] good luck [20:07:24] !log robh@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1042.eqiad.wmnet [20:08:15] cscott: okay sounds good -- I'll be online for the next five hours or so, just give me a ping if you end up going ahead, but no need to wait for me. after that you should probably get a fresh ack from SRE, and TZ-wise it might be hard to come by [20:08:37] got it, thanks [20:11:57] feel free to check with me as well [20:19:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P88327 and previous config saved to /var/cache/conftool/dbconfig/20260130-201910-marostegui.json [20:22:32] (03PS1) 10DLynch: Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235392 (https://phabricator.wikimedia.org/T411914) [20:33:10] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [20:34:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P88328 and previous config saved to /var/cache/conftool/dbconfig/20260130-203419-marostegui.json [20:39:15] FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:49:15] RESOLVED: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:49:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T415786)', diff saved to https://phabricator.wikimedia.org/P88329 and previous config saved to /var/cache/conftool/dbconfig/20260130-204928-marostegui.json [20:49:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:49:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance [20:49:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1203 (T415786)', diff saved to https://phabricator.wikimedia.org/P88330 and previous config saved to /var/cache/conftool/dbconfig/20260130-204954-marostegui.json [21:00:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260130-210003-marostegui.json [21:00:19] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:15:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P88332 and previous config saved to /var/cache/conftool/dbconfig/20260130-211516-marostegui.json [21:30:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P88333 and previous config saved to /var/cache/conftool/dbconfig/20260130-213025-marostegui.json [21:45:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T415786)', diff saved to https://phabricator.wikimedia.org/P88334 and previous config saved to /var/cache/conftool/dbconfig/20260130-214534-marostegui.json [21:45:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:45:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1209.eqiad.wmnet with reason: Maintenance [21:45:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88335 and previous config saved to /var/cache/conftool/dbconfig/20260130-214548-marostegui.json [21:55:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88336 and previous config saved to /var/cache/conftool/dbconfig/20260130-215517-marostegui.json [21:55:25] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [22:04:42] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:10:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P88337 and previous config saved to /var/cache/conftool/dbconfig/20260130-221025-marostegui.json [22:25:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P88338 and previous config saved to /var/cache/conftool/dbconfig/20260130-222534-marostegui.json [22:28:28] (03PS1) 10Reedy: Upgrading psy/psysh (v0.12.10 => v0.12.19) [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235418 (https://phabricator.wikimedia.org/T416050) [22:29:30] ^ Will deploy that next week probably [22:37:52] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:40:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88339 and previous config saved to /var/cache/conftool/dbconfig/20260130-224043-marostegui.json [22:40:51] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [22:41:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance [22:41:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1214 (T415786)', diff saved to https://phabricator.wikimedia.org/P88340 and previous config saved to /var/cache/conftool/dbconfig/20260130-224108-marostegui.json [22:43:42] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:51:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T415786)', diff saved to https://phabricator.wikimedia.org/P88341 and previous config saved to /var/cache/conftool/dbconfig/20260130-225111-marostegui.json [22:51:20] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:06:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88342 and previous config saved to /var/cache/conftool/dbconfig/20260130-230620-marostegui.json [23:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:21:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88343 and previous config saved to /var/cache/conftool/dbconfig/20260130-232129-marostegui.json [23:24:21] (03PS2) 10Gergő Tisza: WikimediaCustomizations: Set WMCBadEmailDomainsFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1230462 (https://phabricator.wikimedia.org/T397244) [23:36:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T415786)', diff saved to https://phabricator.wikimedia.org/P88344 and previous config saved to /var/cache/conftool/dbconfig/20260130-233638-marostegui.json [23:36:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance [23:36:46] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:36:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1226 (T415786)', diff saved to https://phabricator.wikimedia.org/P88345 and previous config saved to /var/cache/conftool/dbconfig/20260130-233652-marostegui.json [23:45:03] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (conflict) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [23:45:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:46:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T415786)', diff saved to https://phabricator.wikimedia.org/P88346 and previous config saved to /var/cache/conftool/dbconfig/20260130-234616-marostegui.json [23:46:26] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:50:03] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (conflict) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures