[00:15:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T413525)', diff saved to https://phabricator.wikimedia.org/P87109 and previous config saved to /var/cache/conftool/dbconfig/20260111-001543-marostegui.json [00:15:47] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [00:25:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P87110 and previous config saved to /var/cache/conftool/dbconfig/20260111-002551-marostegui.json [00:27:09] (03PS1) 10Zabe: Stop updating Deadendpages and Lonelypages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225118 (https://phabricator.wikimedia.org/T371662) [00:28:15] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [00:31:31] FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1 - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [00:33:34] (03PS1) 10Zabe: mediawiki: Do not run updateSpecialPages for DeadendPages on commons [puppet] - 10https://gerrit.wikimedia.org/r/1225119 (https://phabricator.wikimedia.org/T371662) [00:36:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P87111 and previous config saved to /var/cache/conftool/dbconfig/20260111-003559-marostegui.json [00:39:55] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1225120 [00:39:56] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1225120 (owner: 10TrainBranchBot) [00:46:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T413525)', diff saved to https://phabricator.wikimedia.org/P87112 and previous config saved to /var/cache/conftool/dbconfig/20260111-004608-marostegui.json [00:46:12] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [00:46:25] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance [00:46:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87113 and previous config saved to /var/cache/conftool/dbconfig/20260111-004632-marostegui.json [00:53:17] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1225120 (owner: 10TrainBranchBot) [01:10:05] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1225123 [01:10:05] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1225123 (owner: 10TrainBranchBot) [01:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [01:32:36] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1225123 (owner: 10TrainBranchBot) [01:34:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87114 and previous config saved to /var/cache/conftool/dbconfig/20260111-013429-marostegui.json [01:34:34] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [01:44:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P87115 and previous config saved to /var/cache/conftool/dbconfig/20260111-014438-marostegui.json [01:54:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P87116 and previous config saved to /var/cache/conftool/dbconfig/20260111-015446-marostegui.json [02:04:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87117 and previous config saved to /var/cache/conftool/dbconfig/20260111-020454-marostegui.json [02:04:59] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [02:05:12] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance [02:05:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2209 (T413525)', diff saved to https://phabricator.wikimedia.org/P87118 and previous config saved to /var/cache/conftool/dbconfig/20260111-020520-marostegui.json [02:15:21] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [02:20:15] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [02:24:10] FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:42:17] (03PS1) 10Chlod Alejandro: enwiki: change to 25th anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T272094) [02:43:05] (03CR) 10CI reject: [V:04-1] enwiki: change to 25th anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T272094) (owner: 10Chlod Alejandro) [02:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [02:49:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T413525)', diff saved to https://phabricator.wikimedia.org/P87119 and previous config saved to /var/cache/conftool/dbconfig/20260111-024936-marostegui.json [02:49:41] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [02:50:43] (03PS2) 10Chlod Alejandro: enwiki: change to 25th anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T272094) [02:52:48] (03PS3) 10Chlod Alejandro: enwiki: change to 25th anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T414271) [02:59:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P87120 and previous config saved to /var/cache/conftool/dbconfig/20260111-025945-marostegui.json [02:59:56] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T414271) (owner: 10Chlod Alejandro) [03:01:21] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:03:13] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:09:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P87121 and previous config saved to /var/cache/conftool/dbconfig/20260111-030953-marostegui.json [03:20:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T413525)', diff saved to https://phabricator.wikimedia.org/P87122 and previous config saved to /var/cache/conftool/dbconfig/20260111-032001-marostegui.json [03:20:06] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [03:20:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance [03:20:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87123 and previous config saved to /var/cache/conftool/dbconfig/20260111-032015-marostegui.json [04:05:50] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87124 and previous config saved to /var/cache/conftool/dbconfig/20260111-040549-marostegui.json [04:05:54] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [04:15:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P87125 and previous config saved to /var/cache/conftool/dbconfig/20260111-041558-marostegui.json [04:26:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P87126 and previous config saved to /var/cache/conftool/dbconfig/20260111-042606-marostegui.json [04:31:31] FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1 - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:36:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87127 and previous config saved to /var/cache/conftool/dbconfig/20260111-043614-marostegui.json [04:36:18] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [04:36:31] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance [05:09:10] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:09:20] FIRING: [2x] PfwCoreBGPDown: Fundraising Firewall core BGP session down between pfw1-codfw and (null) (10.195.0.248) - group VPN - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DPfwCoreBGPDown [05:16:57] PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [05:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [05:29:11] FIRING: [2x] PfwCoreBGPDown: Fundraising Firewall core BGP session down between pfw1-codfw and (null) (10.195.0.248) - group VPN - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DPfwCoreBGPDown [05:34:10] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:34:11] RESOLVED: [2x] PfwCoreBGPDown: Fundraising Firewall core BGP session down between pfw1-codfw and (null) (10.195.0.248) - group VPN - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DPfwCoreBGPDown [06:16:57] RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [06:24:10] FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [06:30:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance [06:32:24] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance [06:32:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2150 (T413525)', diff saved to https://phabricator.wikimedia.org/P87128 and previous config saved to /var/cache/conftool/dbconfig/20260111-063232-marostegui.json [06:32:36] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [06:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [06:50:01] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance [06:50:10] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [06:50:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1158 (T413525)', diff saved to https://phabricator.wikimedia.org/P87129 and previous config saved to /var/cache/conftool/dbconfig/20260111-065018-marostegui.json [06:50:22] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [07:00:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:03:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T413525)', diff saved to https://phabricator.wikimedia.org/P87130 and previous config saved to /var/cache/conftool/dbconfig/20260111-070306-marostegui.json [07:03:10] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [07:04:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T413525)', diff saved to https://phabricator.wikimedia.org/P87131 and previous config saved to /var/cache/conftool/dbconfig/20260111-070404-marostegui.json [07:13:15] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P87132 and previous config saved to /var/cache/conftool/dbconfig/20260111-071314-marostegui.json [07:14:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P87133 and previous config saved to /var/cache/conftool/dbconfig/20260111-071412-marostegui.json [07:23:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P87134 and previous config saved to /var/cache/conftool/dbconfig/20260111-072322-marostegui.json [07:24:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P87135 and previous config saved to /var/cache/conftool/dbconfig/20260111-072421-marostegui.json [07:33:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T413525)', diff saved to https://phabricator.wikimedia.org/P87136 and previous config saved to /var/cache/conftool/dbconfig/20260111-073330-marostegui.json [07:33:35] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [07:33:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance [07:33:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1170 (T413525)', diff saved to https://phabricator.wikimedia.org/P87137 and previous config saved to /var/cache/conftool/dbconfig/20260111-073354-marostegui.json [07:34:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T413525)', diff saved to https://phabricator.wikimedia.org/P87138 and previous config saved to /var/cache/conftool/dbconfig/20260111-073429-marostegui.json [07:34:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance [07:34:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2159 (T413525)', diff saved to https://phabricator.wikimedia.org/P87139 and previous config saved to /var/cache/conftool/dbconfig/20260111-073453-marostegui.json [07:36:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87140 and previous config saved to /var/cache/conftool/dbconfig/20260111-073655-marostegui.json [07:37:00] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [07:37:01] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [07:47:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P87141 and previous config saved to /var/cache/conftool/dbconfig/20260111-074703-marostegui.json [07:57:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P87142 and previous config saved to /var/cache/conftool/dbconfig/20260111-075712-marostegui.json [08:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260111T0800) [08:04:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T413525)', diff saved to https://phabricator.wikimedia.org/P87143 and previous config saved to /var/cache/conftool/dbconfig/20260111-080454-marostegui.json [08:04:58] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [08:06:10] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T413525)', diff saved to https://phabricator.wikimedia.org/P87144 and previous config saved to /var/cache/conftool/dbconfig/20260111-080609-marostegui.json [08:07:05] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87145 and previous config saved to /var/cache/conftool/dbconfig/20260111-080704-marostegui.json [08:07:10] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [08:07:10] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [08:07:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87146 and previous config saved to /var/cache/conftool/dbconfig/20260111-080720-marostegui.json [08:07:37] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance [08:07:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2236 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87147 and previous config saved to /var/cache/conftool/dbconfig/20260111-080744-marostegui.json [08:15:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P87148 and previous config saved to /var/cache/conftool/dbconfig/20260111-081502-marostegui.json [08:16:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P87149 and previous config saved to /var/cache/conftool/dbconfig/20260111-081617-marostegui.json [08:17:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P87150 and previous config saved to /var/cache/conftool/dbconfig/20260111-081712-marostegui.json [08:25:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P87151 and previous config saved to /var/cache/conftool/dbconfig/20260111-082511-marostegui.json [08:26:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P87152 and previous config saved to /var/cache/conftool/dbconfig/20260111-082626-marostegui.json [08:27:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P87153 and previous config saved to /var/cache/conftool/dbconfig/20260111-082721-marostegui.json [08:31:31] FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1 - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [08:35:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T413525)', diff saved to https://phabricator.wikimedia.org/P87154 and previous config saved to /var/cache/conftool/dbconfig/20260111-083519-marostegui.json [08:35:23] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [08:35:35] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance [08:36:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T413525)', diff saved to https://phabricator.wikimedia.org/P87155 and previous config saved to /var/cache/conftool/dbconfig/20260111-083634-marostegui.json [08:36:51] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance [08:36:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2168 (T413525)', diff saved to https://phabricator.wikimedia.org/P87156 and previous config saved to /var/cache/conftool/dbconfig/20260111-083659-marostegui.json [08:37:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87157 and previous config saved to /var/cache/conftool/dbconfig/20260111-083729-marostegui.json [08:37:34] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [08:37:35] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [08:37:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance [08:37:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1243 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87158 and previous config saved to /var/cache/conftool/dbconfig/20260111-083753-marostegui.json [08:41:49] (03PS4) 10Chlod Alejandro: enwiki: change to 25th anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225127 (https://phabricator.wikimedia.org/T414271) [09:04:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance [09:04:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1174 (T413525)', diff saved to https://phabricator.wikimedia.org/P87159 and previous config saved to /var/cache/conftool/dbconfig/20260111-090455-marostegui.json [09:04:59] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [09:08:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T413525)', diff saved to https://phabricator.wikimedia.org/P87160 and previous config saved to /var/cache/conftool/dbconfig/20260111-090811-marostegui.json [09:18:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P87161 and previous config saved to /var/cache/conftool/dbconfig/20260111-091819-marostegui.json [09:18:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T413525)', diff saved to https://phabricator.wikimedia.org/P87162 and previous config saved to /var/cache/conftool/dbconfig/20260111-091838-marostegui.json [09:18:42] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [09:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [09:28:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P87163 and previous config saved to /var/cache/conftool/dbconfig/20260111-092828-marostegui.json [09:28:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P87164 and previous config saved to /var/cache/conftool/dbconfig/20260111-092846-marostegui.json [09:38:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T413525)', diff saved to https://phabricator.wikimedia.org/P87165 and previous config saved to /var/cache/conftool/dbconfig/20260111-093836-marostegui.json [09:38:40] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [09:38:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance [09:38:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P87166 and previous config saved to /var/cache/conftool/dbconfig/20260111-093855-marostegui.json [09:39:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2182 (T413525)', diff saved to https://phabricator.wikimedia.org/P87167 and previous config saved to /var/cache/conftool/dbconfig/20260111-093907-marostegui.json [09:48:54] (03PS1) 10Giuseppe Lavagetto: admin: add the ssh key for my backup yubikey [puppet] - 10https://gerrit.wikimedia.org/r/1225136 [09:49:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T413525)', diff saved to https://phabricator.wikimedia.org/P87168 and previous config saved to /var/cache/conftool/dbconfig/20260111-094903-marostegui.json [09:49:07] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [09:49:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance [09:49:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1181 (T413525)', diff saved to https://phabricator.wikimedia.org/P87169 and previous config saved to /var/cache/conftool/dbconfig/20260111-094928-marostegui.json [10:03:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T413525)', diff saved to https://phabricator.wikimedia.org/P87170 and previous config saved to /var/cache/conftool/dbconfig/20260111-100344-marostegui.json [10:03:48] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [10:11:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T413525)', diff saved to https://phabricator.wikimedia.org/P87171 and previous config saved to /var/cache/conftool/dbconfig/20260111-101155-marostegui.json [10:11:59] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [10:13:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P87172 and previous config saved to /var/cache/conftool/dbconfig/20260111-101352-marostegui.json [10:22:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P87173 and previous config saved to /var/cache/conftool/dbconfig/20260111-102203-marostegui.json [10:24:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P87174 and previous config saved to /var/cache/conftool/dbconfig/20260111-102401-marostegui.json [10:24:10] FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [10:32:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P87175 and previous config saved to /var/cache/conftool/dbconfig/20260111-103211-marostegui.json [10:34:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T413525)', diff saved to https://phabricator.wikimedia.org/P87176 and previous config saved to /var/cache/conftool/dbconfig/20260111-103409-marostegui.json [10:34:14] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [10:34:27] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance [10:34:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1191 (T413525)', diff saved to https://phabricator.wikimedia.org/P87177 and previous config saved to /var/cache/conftool/dbconfig/20260111-103435-marostegui.json [10:42:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T413525)', diff saved to https://phabricator.wikimedia.org/P87178 and previous config saved to /var/cache/conftool/dbconfig/20260111-104219-marostegui.json [10:42:25] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [10:42:37] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance [10:47:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T413525)', diff saved to https://phabricator.wikimedia.org/P87179 and previous config saved to /var/cache/conftool/dbconfig/20260111-104726-marostegui.json [10:47:31] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [10:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [10:57:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P87180 and previous config saved to /var/cache/conftool/dbconfig/20260111-105735-marostegui.json [11:00:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:07:42] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet with reason: Maintenance [11:07:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P87181 and previous config saved to /var/cache/conftool/dbconfig/20260111-110743-marostegui.json [11:17:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T413525)', diff saved to https://phabricator.wikimedia.org/P87182 and previous config saved to /var/cache/conftool/dbconfig/20260111-111751-marostegui.json [11:17:55] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [11:17:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance [11:18:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87183 and previous config saved to /var/cache/conftool/dbconfig/20260111-111805-marostegui.json [11:30:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87184 and previous config saved to /var/cache/conftool/dbconfig/20260111-113051-marostegui.json [11:30:55] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [11:33:23] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2208.codfw.wmnet with reason: Maintenance [11:33:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2208 (T413525)', diff saved to https://phabricator.wikimedia.org/P87185 and previous config saved to /var/cache/conftool/dbconfig/20260111-113331-marostegui.json [11:41:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P87186 and previous config saved to /var/cache/conftool/dbconfig/20260111-114059-marostegui.json [11:51:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P87187 and previous config saved to /var/cache/conftool/dbconfig/20260111-115107-marostegui.json [12:01:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T413525)', diff saved to https://phabricator.wikimedia.org/P87188 and previous config saved to /var/cache/conftool/dbconfig/20260111-120115-marostegui.json [12:01:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208 (T413525)', diff saved to https://phabricator.wikimedia.org/P87189 and previous config saved to /var/cache/conftool/dbconfig/20260111-120116-marostegui.json [12:01:19] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [12:01:32] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance [12:01:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1202 (T413525)', diff saved to https://phabricator.wikimedia.org/P87190 and previous config saved to /var/cache/conftool/dbconfig/20260111-120139-marostegui.json [12:04:10] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [12:11:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P87191 and previous config saved to /var/cache/conftool/dbconfig/20260111-121124-marostegui.json [12:14:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T413525)', diff saved to https://phabricator.wikimedia.org/P87192 and previous config saved to /var/cache/conftool/dbconfig/20260111-121436-marostegui.json [12:14:40] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [12:21:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P87193 and previous config saved to /var/cache/conftool/dbconfig/20260111-122132-marostegui.json [12:24:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P87194 and previous config saved to /var/cache/conftool/dbconfig/20260111-122444-marostegui.json [12:31:31] FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1 - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [12:31:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208 (T413525)', diff saved to https://phabricator.wikimedia.org/P87195 and previous config saved to /var/cache/conftool/dbconfig/20260111-123140-marostegui.json [12:31:44] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [12:31:57] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2220.codfw.wmnet with reason: Maintenance [12:32:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2220 (T413525)', diff saved to https://phabricator.wikimedia.org/P87196 and previous config saved to /var/cache/conftool/dbconfig/20260111-123205-marostegui.json [12:34:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P87197 and previous config saved to /var/cache/conftool/dbconfig/20260111-123452-marostegui.json [12:35:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [12:45:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T413525)', diff saved to https://phabricator.wikimedia.org/P87198 and previous config saved to /var/cache/conftool/dbconfig/20260111-124501-marostegui.json [12:45:05] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [12:45:18] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance [12:45:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87199 and previous config saved to /var/cache/conftool/dbconfig/20260111-124525-marostegui.json [12:59:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220 (T413525)', diff saved to https://phabricator.wikimedia.org/P87200 and previous config saved to /var/cache/conftool/dbconfig/20260111-125930-marostegui.json [12:59:34] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [13:05:25] RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:09:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P87201 and previous config saved to /var/cache/conftool/dbconfig/20260111-130938-marostegui.json [13:13:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87202 and previous config saved to /var/cache/conftool/dbconfig/20260111-131305-marostegui.json [13:13:10] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [13:19:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P87203 and previous config saved to /var/cache/conftool/dbconfig/20260111-131946-marostegui.json [13:23:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P87204 and previous config saved to /var/cache/conftool/dbconfig/20260111-132314-marostegui.json [13:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [13:29:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220 (T413525)', diff saved to https://phabricator.wikimedia.org/P87205 and previous config saved to /var/cache/conftool/dbconfig/20260111-132955-marostegui.json [13:29:59] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [13:30:12] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2221.codfw.wmnet with reason: Maintenance [13:30:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2221 (T413525)', diff saved to https://phabricator.wikimedia.org/P87206 and previous config saved to /var/cache/conftool/dbconfig/20260111-133019-marostegui.json [13:33:23] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P87207 and previous config saved to /var/cache/conftool/dbconfig/20260111-133322-marostegui.json [13:40:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [13:43:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T413525)', diff saved to https://phabricator.wikimedia.org/P87208 and previous config saved to /var/cache/conftool/dbconfig/20260111-134330-marostegui.json [13:43:35] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [13:43:47] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance [13:43:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1231 (T413525)', diff saved to https://phabricator.wikimedia.org/P87209 and previous config saved to /var/cache/conftool/dbconfig/20260111-134355-marostegui.json [13:57:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T413525)', diff saved to https://phabricator.wikimedia.org/P87210 and previous config saved to /var/cache/conftool/dbconfig/20260111-135710-marostegui.json [13:57:14] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [13:58:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221 (T413525)', diff saved to https://phabricator.wikimedia.org/P87211 and previous config saved to /var/cache/conftool/dbconfig/20260111-135829-marostegui.json [14:07:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P87212 and previous config saved to /var/cache/conftool/dbconfig/20260111-140718-marostegui.json [14:08:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P87213 and previous config saved to /var/cache/conftool/dbconfig/20260111-140837-marostegui.json [14:17:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P87214 and previous config saved to /var/cache/conftool/dbconfig/20260111-141726-marostegui.json [14:18:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P87215 and previous config saved to /var/cache/conftool/dbconfig/20260111-141846-marostegui.json [14:27:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T413525)', diff saved to https://phabricator.wikimedia.org/P87216 and previous config saved to /var/cache/conftool/dbconfig/20260111-142735-marostegui.json [14:27:39] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [14:27:51] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1253.eqiad.wmnet with reason: Maintenance [14:28:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1253 (T413525)', diff saved to https://phabricator.wikimedia.org/P87217 and previous config saved to /var/cache/conftool/dbconfig/20260111-142759-marostegui.json [14:28:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221 (T413525)', diff saved to https://phabricator.wikimedia.org/P87218 and previous config saved to /var/cache/conftool/dbconfig/20260111-142854-marostegui.json [14:29:12] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2222.codfw.wmnet with reason: Maintenance [14:29:20] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2222 (T413525)', diff saved to https://phabricator.wikimedia.org/P87219 and previous config saved to /var/cache/conftool/dbconfig/20260111-142919-marostegui.json [14:42:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253 (T413525)', diff saved to https://phabricator.wikimedia.org/P87220 and previous config saved to /var/cache/conftool/dbconfig/20260111-144213-marostegui.json [14:42:17] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [14:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [14:48:56] 06SRE, 06Traffic: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11510449 (10Paladox) Hi, Is there any update on this? I see some *.wikimedia.org have ipv6 addresses which is useless if the host is ipv6 only. Since they can't query the dns as it's ipv4 only. [14:52:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P87221 and previous config saved to /var/cache/conftool/dbconfig/20260111-145221-marostegui.json [14:57:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222 (T413525)', diff saved to https://phabricator.wikimedia.org/P87222 and previous config saved to /var/cache/conftool/dbconfig/20260111-145721-marostegui.json [14:57:26] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [15:02:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P87223 and previous config saved to /var/cache/conftool/dbconfig/20260111-150230-marostegui.json [15:07:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P87224 and previous config saved to /var/cache/conftool/dbconfig/20260111-150729-marostegui.json [15:09:10] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:12:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253 (T413525)', diff saved to https://phabricator.wikimedia.org/P87225 and previous config saved to /var/cache/conftool/dbconfig/20260111-151238-marostegui.json [15:12:43] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [15:12:55] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance [15:17:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P87226 and previous config saved to /var/cache/conftool/dbconfig/20260111-151738-marostegui.json [15:27:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222 (T413525)', diff saved to https://phabricator.wikimedia.org/P87227 and previous config saved to /var/cache/conftool/dbconfig/20260111-152746-marostegui.json [15:27:50] T413525: Add il_target_id to imagelinks table in wmf production - https://phabricator.wikimedia.org/T413525 [15:34:10] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:52:01] PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:10] FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:54:31] RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [15:59:10] RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:04:10] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [16:41:16] RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1 - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [17:10:17] (03CR) 10Gergő Tisza: cache:haproxy: add new contact type (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1224977 (https://phabricator.wikimedia.org/T414173) (owner: 10Fabfur) [17:19:00] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11510527 (10Tgr) >>! In T414173#11507767, @Joe wrote: > But I agree with @revi's point that the string respects the spirit if not the letter of the policy. We'... [17:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [17:34:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87228 and previous config saved to /var/cache/conftool/dbconfig/20260111-173442-marostegui.json [17:34:48] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [17:34:48] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [17:44:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P87229 and previous config saved to /var/cache/conftool/dbconfig/20260111-174451-marostegui.json [17:55:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P87230 and previous config saved to /var/cache/conftool/dbconfig/20260111-175459-marostegui.json [18:05:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87231 and previous config saved to /var/cache/conftool/dbconfig/20260111-180507-marostegui.json [18:05:13] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [18:05:13] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [18:05:24] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance [18:05:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2237 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87232 and previous config saved to /var/cache/conftool/dbconfig/20260111-180532-marostegui.json [18:41:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87233 and previous config saved to /var/cache/conftool/dbconfig/20260111-184150-marostegui.json [18:41:55] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [18:41:56] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [18:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [18:51:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P87234 and previous config saved to /var/cache/conftool/dbconfig/20260111-185157-marostegui.json [19:02:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P87235 and previous config saved to /var/cache/conftool/dbconfig/20260111-190206-marostegui.json [19:12:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87237 and previous config saved to /var/cache/conftool/dbconfig/20260111-191214-marostegui.json [19:12:19] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [19:12:20] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [19:12:30] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance [19:12:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1244 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87238 and previous config saved to /var/cache/conftool/dbconfig/20260111-191238-marostegui.json [19:29:13] (03PS1) 10Scott French: varnish: remove unit from retry-after header value [puppet] - 10https://gerrit.wikimedia.org/r/1225103 [19:30:12] (03PS2) 10Scott French: varnish: remove unit from retry-after header value [puppet] - 10https://gerrit.wikimedia.org/r/1225103 (https://phabricator.wikimedia.org/T406545) [20:01:19] (03CR) 10Scott French: [C:03+2] varnish: remove unit from retry-after header value [puppet] - 10https://gerrit.wikimedia.org/r/1225103 (https://phabricator.wikimedia.org/T406545) (owner: 10Scott French) [20:04:10] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [20:34:11] 06SRE, 06Traffic: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11510658 (10Scott_French) @Benwing2 - Thanks for calling our attention to the Retry-After response header format issue. We've made a change that we believe sho... [20:48:49] (03PS1) 10Zabe: Stop setting $wgBlockTargetMigrationStage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225165 (https://phabricator.wikimedia.org/T355034) [20:54:29] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2008.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2011.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [20:55:29] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [21:00:29] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:00:29] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:24:10] FIRING: KubernetesCalicoDown: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [21:36:17] (03PS1) 10Zabe: manage-dblist: Improve generation of db-sections.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225171 [22:04:44] (03PS4) 10Zabe: manage-dblist: Improve generation of db-sections.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1225171 [22:47:59] FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to magru RIPE Atlas anchor: failures over threshold for measurement 95133216 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [22:54:10] FIRING: [4x] ProbeDown: Service wdqs2011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [23:19:10] FIRING: [6x] ProbeDown: Service wdqs1024:443 has failed probes (http_wdqs_scholarly_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown