[00:02:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P93113 and previous config saved to /var/cache/conftool/dbconfig/20260527-000209-fceratto.json
[00:12:21] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P93114 and previous config saved to /var/cache/conftool/dbconfig/20260527-001220-fceratto.json
[00:17:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1018:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:22:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T426633)', diff saved to https://phabricator.wikimedia.org/P93115 and previous config saved to /var/cache/conftool/dbconfig/20260527-002228-fceratto.json
[00:23:02] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
[00:23:10] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2155 (T426633)', diff saved to https://phabricator.wikimedia.org/P93116 and previous config saved to /var/cache/conftool/dbconfig/20260527-002309-fceratto.json
[00:31:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T426633)', diff saved to https://phabricator.wikimedia.org/P93117 and previous config saved to /var/cache/conftool/dbconfig/20260527-003141-fceratto.json
[00:41:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P93118 and previous config saved to /var/cache/conftool/dbconfig/20260527-004149-fceratto.json
[00:51:58] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P93119 and previous config saved to /var/cache/conftool/dbconfig/20260527-005157-fceratto.json
[00:52:33] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1293102 (owner: 10L10n-bot)
[01:02:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T426633)', diff saved to https://phabricator.wikimedia.org/P93120 and previous config saved to /var/cache/conftool/dbconfig/20260527-010205-fceratto.json
[01:02:27] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
[01:02:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2172 (T426633)', diff saved to https://phabricator.wikimedia.org/P93121 and previous config saved to /var/cache/conftool/dbconfig/20260527-010234-fceratto.json
[01:04:49] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/1293107 (owner: 10L10n-bot)
[01:09:35] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1293825
[01:09:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1293825 (owner: 10TrainBranchBot)
[01:11:12] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T426633)', diff saved to https://phabricator.wikimedia.org/P93122 and previous config saved to /var/cache/conftool/dbconfig/20260527-011111-fceratto.json
[01:21:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P93123 and previous config saved to /var/cache/conftool/dbconfig/20260527-012119-fceratto.json
[01:23:38] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1293825 (owner: 10TrainBranchBot)
[01:31:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P93124 and previous config saved to /var/cache/conftool/dbconfig/20260527-013126-fceratto.json
[01:41:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T426633)', diff saved to https://phabricator.wikimedia.org/P93125 and previous config saved to /var/cache/conftool/dbconfig/20260527-014134-fceratto.json
[01:41:57] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
[01:42:04] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93126 and previous config saved to /var/cache/conftool/dbconfig/20260527-014204-fceratto.json
[01:50:38] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93127 and previous config saved to /var/cache/conftool/dbconfig/20260527-015037-fceratto.json
[02:00:45] <logmsgbot>	 !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image
[02:00:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P93128 and previous config saved to /var/cache/conftool/dbconfig/20260527-020045-fceratto.json
[02:07:15] <logmsgbot>	 !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 29s)
[02:08:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:09:14] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:10:54] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P93129 and previous config saved to /var/cache/conftool/dbconfig/20260527-021053-fceratto.json
[02:21:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93130 and previous config saved to /var/cache/conftool/dbconfig/20260527-022100-fceratto.json
[02:21:25] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
[02:21:33] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2206 (T426633)', diff saved to https://phabricator.wikimedia.org/P93131 and previous config saved to /var/cache/conftool/dbconfig/20260527-022133-fceratto.json
[02:29:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T426633)', diff saved to https://phabricator.wikimedia.org/P93132 and previous config saved to /var/cache/conftool/dbconfig/20260527-022953-fceratto.json
[02:34:14] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:35:29] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:39:14] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:40:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P93133 and previous config saved to /var/cache/conftool/dbconfig/20260527-024000-fceratto.json
[02:47:08] <wikibugs>	 (03PS1) 10RLazarus: Refactor the backend regex in ATSBackendErrorsHigh [alerts] - 10https://gerrit.wikimedia.org/r/1293839
[02:50:09] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P93134 and previous config saved to /var/cache/conftool/dbconfig/20260527-025008-fceratto.json
[02:54:41] <wikibugs>	 (03CR) 10RLazarus: "Not particularly urgent, just a tiny quality-of-life improvement. :)" [alerts] - 10https://gerrit.wikimedia.org/r/1293839 (owner: 10RLazarus)
[03:00:17] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T426633)', diff saved to https://phabricator.wikimedia.org/P93135 and previous config saved to /var/cache/conftool/dbconfig/20260527-030016-fceratto.json
[03:00:35] <jinxer-wm>	 FIRING: DiskSpace: Disk space krb1002:9100:/ 2.867% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=krb1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[03:00:38] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
[03:00:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93136 and previous config saved to /var/cache/conftool/dbconfig/20260527-030045-fceratto.json
[03:05:35] <jinxer-wm>	 RESOLVED: DiskSpace: Disk space krb1002:9100:/ 2.263% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=krb1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[03:05:53] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:06:07] <icinga-wm>	 PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:06:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=drmrs&var-device=cr1-drmrs:9804&var-bgp_group=Confed_eqiad&var-bgp_neighbor=cr2-eqiad - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[03:07:10] <jinxer-wm>	 FIRING: [4x] BFDdown: BFD session down between cr1-drmrs and 185.15.58.138 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[03:07:55] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:08:07] <icinga-wm>	 RECOVERY - OSPF status on cr1-drmrs is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:09:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93137 and previous config saved to /var/cache/conftool/dbconfig/20260527-030915-fceratto.json
[03:11:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=drmrs&var-device=cr1-drmrs:9804&var-bgp_group=Confed_eqiad&var-bgp_neighbor=cr2-eqiad - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[03:12:10] <jinxer-wm>	 RESOLVED: [4x] BFDdown: BFD session down between cr1-drmrs and 185.15.58.138 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[03:19:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P93138 and previous config saved to /var/cache/conftool/dbconfig/20260527-031923-fceratto.json
[03:29:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P93139 and previous config saved to /var/cache/conftool/dbconfig/20260527-032931-fceratto.json
[03:39:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T426633)', diff saved to https://phabricator.wikimedia.org/P93140 and previous config saved to /var/cache/conftool/dbconfig/20260527-033938-fceratto.json
[03:40:01] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
[03:40:09] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2219 (T426633)', diff saved to https://phabricator.wikimedia.org/P93141 and previous config saved to /var/cache/conftool/dbconfig/20260527-034008-fceratto.json
[03:48:29] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T426633)', diff saved to https://phabricator.wikimedia.org/P93142 and previous config saved to /var/cache/conftool/dbconfig/20260527-034828-fceratto.json
[03:58:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P93143 and previous config saved to /var/cache/conftool/dbconfig/20260527-035836-fceratto.json
[04:07:51] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[04:08:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P93144 and previous config saved to /var/cache/conftool/dbconfig/20260527-040844-fceratto.json
[04:18:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T426633)', diff saved to https://phabricator.wikimedia.org/P93145 and previous config saved to /var/cache/conftool/dbconfig/20260527-041852-fceratto.json
[04:19:14] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
[04:19:22] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2236 (T426633)', diff saved to https://phabricator.wikimedia.org/P93146 and previous config saved to /var/cache/conftool/dbconfig/20260527-041921-fceratto.json
[04:27:37] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T426633)', diff saved to https://phabricator.wikimedia.org/P93147 and previous config saved to /var/cache/conftool/dbconfig/20260527-042737-fceratto.json
[04:37:45] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P93148 and previous config saved to /var/cache/conftool/dbconfig/20260527-043744-fceratto.json
[04:47:52] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P93149 and previous config saved to /var/cache/conftool/dbconfig/20260527-044751-fceratto.json
[04:57:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2236 (T426633)', diff saved to https://phabricator.wikimedia.org/P93150 and previous config saved to /var/cache/conftool/dbconfig/20260527-045759-fceratto.json
[04:58:21] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
[04:58:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2237 (T426633)', diff saved to https://phabricator.wikimedia.org/P93151 and previous config saved to /var/cache/conftool/dbconfig/20260527-045827-fceratto.json
[05:06:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T426633)', diff saved to https://phabricator.wikimedia.org/P93152 and previous config saved to /var/cache/conftool/dbconfig/20260527-050645-fceratto.json
[05:08:17] <wikibugs>	 (03PS1) 10Marostegui: pc1024: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1293842
[05:09:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] pc1024: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1293842 (owner: 10Marostegui)
[05:16:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P93153 and previous config saved to /var/cache/conftool/dbconfig/20260527-051653-fceratto.json
[05:22:57] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Remove pc1014 [puppet] - 10https://gerrit.wikimedia.org/r/1293843 (https://phabricator.wikimedia.org/T427270)
[05:24:21] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Remove pc1014 [puppet] - 10https://gerrit.wikimedia.org/r/1293843 (https://phabricator.wikimedia.org/T427270) (owner: 10Marostegui)
[05:26:24] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove pc1014 from dbctl T427270', diff saved to https://phabricator.wikimedia.org/P93154 and previous config saved to /var/cache/conftool/dbconfig/20260527-052624-marostegui.json
[05:26:29] <stashbot>	 T427270: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270
[05:27:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P93155 and previous config saved to /var/cache/conftool/dbconfig/20260527-052700-fceratto.json
[05:28:32] <wikibugs>	 (03PS2) 10Robertsky: Update wikimania wordmark for 2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331)
[05:33:00] <wikibugs>	 (03CR) 10Marostegui: "Yeah, you'd need to restart replication on sanitarium." [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[05:33:03] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] Add config for conductwiki [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[05:37:09] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T426633)', diff saved to https://phabricator.wikimedia.org/P93156 and previous config saved to /var/cache/conftool/dbconfig/20260527-053708-fceratto.json
[05:37:20] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance
[05:37:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2245 (T426633)', diff saved to https://phabricator.wikimedia.org/P93157 and previous config saved to /var/cache/conftool/dbconfig/20260527-053727-fceratto.json
[05:38:07] <moritzm>	 !log remove ganeti1026 from eqiad Ganeti cluster T424680
[05:38:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:38:11] <stashbot>	 T424680: Add ganeti105[5678] and decom ganeti102[3456] - https://phabricator.wikimedia.org/T424680
[05:39:14] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:40:05] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[05:40:28] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2055: Upgrading es2055.codfw.wmnet
[05:40:29] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:40:47] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2055: Upgrading es2055.codfw.wmnet
[05:41:31] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1026 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[05:41:31] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1026 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[05:41:34] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2055.codfw.wmnet with OS trixie
[05:42:50] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1026:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:45:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245 (T426633)', diff saved to https://phabricator.wikimedia.org/P93159 and previous config saved to /var/cache/conftool/dbconfig/20260527-054550-fceratto.json
[05:49:14] <icinga-wm>	 PROBLEM - SSH on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[05:49:38] <icinga-wm>	 PROBLEM - librenms.wikimedia.org requires authentication on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[05:50:02] <icinga-wm>	 PROBLEM - librenms.wikimedia.org tls expiry on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[05:52:10] <icinga-wm>	 RECOVERY - SSH on netmon2002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u10 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[05:53:35] <wikibugs>	 (03PS3) 10JavierMonton: image: Flink 2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1293664 (https://phabricator.wikimedia.org/T412978)
[05:54:14] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:55:14] <icinga-wm>	 PROBLEM - SSH on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[05:55:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P93160 and previous config saved to /var/cache/conftool/dbconfig/20260527-055558-fceratto.json
[05:56:43] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
[05:56:57] <wikibugs>	 (03PS3) 10Robertsky: Update wikimania wordmark for 2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331)
[05:57:06] <icinga-wm>	 RECOVERY - SSH on netmon2002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u10 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[05:57:28] <icinga-wm>	 RECOVERY - librenms.wikimedia.org requires authentication on netmon2002 is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 701 bytes in 0.143 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[05:57:52] <icinga-wm>	 RECOVERY - librenms.wikimedia.org tls expiry on netmon2002 is OK: OK - Certificate librenms.wikimedia.org will expire on Sun 12 Jul 2026 02:51:32 AM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[05:58:14] <wikibugs>	 (03CR) 10Robertsky: Update wikimania wordmark for 2026 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331) (owner: 10Robertsky)
[06:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T0600)
[06:02:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331) (owner: 10Robertsky)
[06:04:14] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:04:17] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
[06:06:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P93161 and previous config saved to /var/cache/conftool/dbconfig/20260527-060606-fceratto.json
[06:08:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:16:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2245 (T426633)', diff saved to https://phabricator.wikimedia.org/P93162 and previous config saved to /var/cache/conftool/dbconfig/20260527-061613-fceratto.json
[06:16:36] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance
[06:16:41] <wikibugs>	 (03CR) 10Chlod Alejandro: [C:03+1] Update wikimania wordmark for 2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331) (owner: 10Robertsky)
[06:16:44] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2246 (T426633)', diff saved to https://phabricator.wikimedia.org/P93163 and previous config saved to /var/cache/conftool/dbconfig/20260527-061643-fceratto.json
[06:21:09] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2055.codfw.wmnet with OS trixie
[06:21:56] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[06:22:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2055: repool after maintenance
[06:22:50] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1026:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:25:04] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246 (T426633)', diff saved to https://phabricator.wikimedia.org/P93165 and previous config saved to /var/cache/conftool/dbconfig/20260527-062503-fceratto.json
[06:30:42] <wikibugs>	 (03PS1) 10JavierMonton: html-enrichment: relax offset lag monitors [alerts] - 10https://gerrit.wikimedia.org/r/1294113 (https://phabricator.wikimedia.org/T423920)
[06:35:12] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P93166 and previous config saved to /var/cache/conftool/dbconfig/20260527-063511-fceratto.json
[06:36:28] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:36:35] <wikibugs>	 (03PS2) 10JavierMonton: html-enrichment: relax offset lag monitors [alerts] - 10https://gerrit.wikimedia.org/r/1294113 (https://phabricator.wikimedia.org/T423920)
[06:44:02] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Remove k8s version from all services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1273967 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[06:44:13] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] CI: Fix race condition [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293757 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[06:45:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P93168 and previous config saved to /var/cache/conftool/dbconfig/20260527-064519-fceratto.json
[06:45:30] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] relforge: remove logstash (gelf) profile [puppet] - 10https://gerrit.wikimedia.org/r/1293809 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[06:45:39] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "I think it's just taking it's time because changing the rake_modules triggers a full CI run. ~25min is not uncommon for that." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293757 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[06:50:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1293790 (https://phabricator.wikimedia.org/T427312) (owner: 10Scott French)
[06:51:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1293789 (https://phabricator.wikimedia.org/T427312) (owner: 10Scott French)
[06:54:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ganeti1025/1026 [puppet] - 10https://gerrit.wikimedia.org/r/1294115 (https://phabricator.wikimedia.org/T424680)
[06:54:42] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts ganeti1025.eqiad.wmnet
[06:55:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2246 (T426633)', diff saved to https://phabricator.wikimedia.org/P93170 and previous config saved to /var/cache/conftool/dbconfig/20260527-065526-fceratto.json
[06:55:38] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance
[06:55:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2247 (T426633)', diff saved to https://phabricator.wikimedia.org/P93171 and previous config saved to /var/cache/conftool/dbconfig/20260527-065545-fceratto.json
[06:59:55] <wikibugs>	 (03Abandoned) 10Elukey: admin_ng: disable tag->sha256 for all ml clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1163712 (https://phabricator.wikimedia.org/T397696) (owner: 10Elukey)
[06:59:56] <logmsgbot>	 jmm@cumin2002 decommission (PID 1477266) is awaiting input
[07:00:05] <jouncebot>	 Amir1, urbanecm, and awight: UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T0700). Please do the needful.
[07:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:00:35] <wikibugs>	 (03CR) 10Elukey: [C:03+2] team-sre: modify pki's alert to notify users earlier [alerts] - 10https://gerrit.wikimedia.org/r/1286923 (owner: 10Elukey)
[07:02:32] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
[07:02:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:04:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247 (T426633)', diff saved to https://phabricator.wikimedia.org/P93172 and previous config saved to /var/cache/conftool/dbconfig/20260527-070410-fceratto.json
[07:06:02] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance on db1190
[07:06:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[07:07:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
[07:07:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
[07:07:57] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2055: repool after maintenance
[07:11:04] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission pc1014 [puppet] - 10https://gerrit.wikimedia.org/r/1294123 (https://phabricator.wikimedia.org/T427270)
[07:11:55] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
[07:11:59] <logmsgbot>	 jmm@cumin2002 decommission (PID 1477266) is awaiting input
[07:13:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:13:43] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.decommission
[07:13:58] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts pc1014.eqiad.wmnet
[07:14:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover url downloaders for reboots [dns] - 10https://gerrit.wikimedia.org/r/1294124
[07:14:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P93174 and previous config saved to /var/cache/conftool/dbconfig/20260527-071418-fceratto.json
[07:14:25] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:14:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:14:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1025.eqiad.wmnet
[07:14:37] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Add ganeti105[5678] and decom ganeti102[3456] - https://phabricator.wikimedia.org/T424680#11958012 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `ganeti1025.eqiad.wmnet` - ganeti1025.eqiad.wmne...
[07:15:19] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts ganeti1026.eqiad.wmnet
[07:18:27] <logmsgbot>	 jmm@cumin2002 decommission (PID 1491224) is awaiting input
[07:18:57] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.dns.netbox
[07:20:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Decommission pc1014 [puppet] - 10https://gerrit.wikimedia.org/r/1294123 (https://phabricator.wikimedia.org/T427270) (owner: 10Marostegui)
[07:23:09] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[07:23:24] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[07:23:24] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:23:25] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1014.eqiad.wmnet
[07:23:28] <logmsgbot>	 !log marostegui@cumin1003 Removing pc1014 from zarcillo T427190
[07:23:31] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.decommission (exit_code=0)
[07:23:33] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1013.eqiad.wmnet - https://phabricator.wikimedia.org/T427190#11958022 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1003 for hosts: `pc1014.eqiad.wmnet` - pc1014.eqiad.wmnet (**PASS**)   - D...
[07:23:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1013.eqiad.wmnet - https://phabricator.wikimedia.org/T427190#11958023 (10ops-monitoring-bot) pc1014 has been deleted from zarcillo
[07:23:36] <stashbot>	 T427190: decommission pc1013.eqiad.wmnet - https://phabricator.wikimedia.org/T427190
[07:23:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1013.eqiad.wmnet - https://phabricator.wikimedia.org/T427190#11958024 (10ops-monitoring-bot) pc1014 has been decommissioned by Data Persistence
[07:24:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P93175 and previous config saved to /var/cache/conftool/dbconfig/20260527-072426-fceratto.json
[07:24:54] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11958028 (10Marostegui) a:05Marostegui→03None
[07:24:57] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11958034 (10Marostegui) This host is ready for DC-Ops to decommission
[07:25:38] <icinga-wm>	 PROBLEM - orchestrator resolve cache non-FQDNs on dborch1002 is CRITICAL: CRITICAL: 1 non-FQDN entries in orchestrator resolve cache: https://wikitech.wikimedia.org/wiki/Orchestrator
[07:26:28] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:26:34] <wikibugs>	 (03PS1) 10Mszwarc: Add script to demote ineligible members of restricted global groups [extensions/CentralAuth] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294125 (https://phabricator.wikimedia.org/T425395)
[07:26:49] <wikibugs>	 (03PS1) 10Mszwarc: Add script to demote ineligible members of restricted global groups [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294126 (https://phabricator.wikimedia.org/T425395)
[07:28:08] <wikibugs>	 (03CR) 10Marostegui: "Worked nicely, check the comment below, mostly UI related." [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[07:28:55] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[07:30:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy1003 using scap backport" [extensions/CentralAuth] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294125 (https://phabricator.wikimedia.org/T425395) (owner: 10Mszwarc)
[07:30:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy1003 using scap backport" [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294126 (https://phabricator.wikimedia.org/T425395) (owner: 10Mszwarc)
[07:32:06] <wikibugs>	 (03Merged) 10jenkins-bot: Add script to demote ineligible members of restricted global groups [extensions/CentralAuth] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294125 (https://phabricator.wikimedia.org/T425395) (owner: 10Mszwarc)
[07:32:11] <wikibugs>	 (03Merged) 10jenkins-bot: Add script to demote ineligible members of restricted global groups [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294126 (https://phabricator.wikimedia.org/T425395) (owner: 10Mszwarc)
[07:32:42] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/1294124 (owner: 10Muehlenhoff)
[07:33:36] <logmsgbot>	 !log mszwarc@deploy1003 Started scap sync-world: Backport for [[gerrit:1294125|Add script to demote ineligible members of restricted global groups (T425395)]], [[gerrit:1294126|Add script to demote ineligible members of restricted global groups (T425395)]]
[07:33:41] <stashbot>	 T425395: Add a script to demote ineligible users from restricted global groups - https://phabricator.wikimedia.org/T425395
[07:34:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2247 (T426633)', diff saved to https://phabricator.wikimedia.org/P93176 and previous config saved to /var/cache/conftool/dbconfig/20260527-073434-fceratto.json
[07:34:40] <logmsgbot>	 jmm@cumin2002 decommission (PID 1491224) is awaiting input
[07:34:57] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2248.codfw.wmnet with reason: Maintenance
[07:35:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2248 (T426633)', diff saved to https://phabricator.wikimedia.org/P93177 and previous config saved to /var/cache/conftool/dbconfig/20260527-073504-fceratto.json
[07:35:35] <logmsgbot>	 !log mszwarc@deploy1003 mszwarc: Backport for [[gerrit:1294125|Add script to demote ineligible members of restricted global groups (T425395)]], [[gerrit:1294126|Add script to demote ineligible members of restricted global groups (T425395)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:36:01] <logmsgbot>	 !log mszwarc@deploy1003 mszwarc: Continuing with deployment
[07:38:33] <wikibugs>	 (03CR) 10Marostegui: "We should also update: https://wikitech.wikimedia.org/wiki/MariaDB/Decommissioning_a_DB_Host" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[07:40:19] <logmsgbot>	 !log mszwarc@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294125|Add script to demote ineligible members of restricted global groups (T425395)]], [[gerrit:1294126|Add script to demote ineligible members of restricted global groups (T425395)]] (duration: 06m 42s)
[07:40:24] <stashbot>	 T425395: Add a script to demote ineligible users from restricted global groups - https://phabricator.wikimedia.org/T425395
[07:40:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248 (T426633)', diff saved to https://phabricator.wikimedia.org/P93178 and previous config saved to /var/cache/conftool/dbconfig/20260527-074031-fceratto.json
[07:41:37] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[07:42:02] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es2051: Upgrading es2051.codfw.wmnet
[07:42:21] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es2051: Upgrading es2051.codfw.wmnet
[07:43:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:43:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:43:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:43:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1026.eqiad.wmnet
[07:43:53] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Add ganeti105[5678] and decom ganeti102[3456] - https://phabricator.wikimedia.org/T424680#11958085 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `ganeti1026.eqiad.wmnet` - ganeti1026.eqiad.wmne...
[07:49:04] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove ganeti1025/1026 [puppet] - 10https://gerrit.wikimedia.org/r/1294115 (https://phabricator.wikimedia.org/T424680)
[07:49:39] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Add urldownloader[12]00[56] [puppet] - 10https://gerrit.wikimedia.org/r/1293743 (https://phabricator.wikimedia.org/T427282) (owner: 10Muehlenhoff)
[07:50:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P93180 and previous config saved to /var/cache/conftool/dbconfig/20260527-075039-fceratto.json
[07:52:50] <wikibugs>	 (03PS1) 10Elukey: Set pki-root1001 to role insetup [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664)
[07:53:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove ganeti1025/1026 [puppet] - 10https://gerrit.wikimedia.org/r/1294115 (https://phabricator.wikimedia.org/T424680) (owner: 10Muehlenhoff)
[07:56:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10procurement, and 2 others: decomission deploy2002.codfw.wmnet - https://phabricator.wikimedia.org/T426222#11958159 (10MLechvien-WMF) p:05Triage→03Medium
[07:56:42] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10procurement, and 2 others: decommission deploy2002.codfw.wmnet - https://phabricator.wikimedia.org/T426222#11958160 (10MLechvien-WMF)
[07:59:09] <wikibugs>	 (03PS4) 10Mszwarc: Periodic jobs: add demote_ineligible_users (and _central_ counterpart) [puppet] - 10https://gerrit.wikimedia.org/r/1285315 (https://phabricator.wikimedia.org/T425396)
[07:59:09] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es2051.codfw.wmnet with OS trixie
[07:59:22] <wikibugs>	 (03PS5) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613)
[08:00:05] <jouncebot>	 jnuche and hashar: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T0800)
[08:00:20] <jnuche>	 morning, train will start soon
[08:00:47] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P93181 and previous config saved to /var/cache/conftool/dbconfig/20260527-080046-fceratto.json
[08:01:57] <wikibugs>	 (03PS1) 10Jelto: miscweb: remove wmf-navigator public and private config from web container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405)
[08:02:14] <wikibugs>	 (03CR) 10Federico Ceratto: cookbooks/sre/mysql/decommission: add cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[08:02:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cookbooks/sre/mysql/decommission: add cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[08:02:59] <wikibugs>	 (03CR) 10Mszwarc: "I433a6c82f42550b9c91d1ed5691dc5b12d4c34df has been merged and backported to wikis" [puppet] - 10https://gerrit.wikimedia.org/r/1285315 (https://phabricator.wikimedia.org/T425396) (owner: 10Mszwarc)
[08:03:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] "All good from my side, pending the discussion with Ceri." [cookbooks] - 10https://gerrit.wikimedia.org/r/1291952 (https://phabricator.wikimedia.org/T426613) (owner: 10Federico Ceratto)
[08:03:51] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.47.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294209 (https://phabricator.wikimedia.org/T423913)
[08:03:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by jnuche@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294209 (https://phabricator.wikimedia.org/T423913) (owner: 10TrainBranchBot)
[08:04:14] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:05:29] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:05:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Failover url downloaders for reboots [dns] - 10https://gerrit.wikimedia.org/r/1294124 (owner: 10Muehlenhoff)
[08:05:43] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[08:05:54] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.47.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294209 (https://phabricator.wikimedia.org/T423913) (owner: 10TrainBranchBot)
[08:07:23] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[08:07:26] <wikibugs>	 (03CR) 10Muehlenhoff: Set pki-root1001 to role insetup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664) (owner: 10Elukey)
[08:08:04] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Add ganeti105[5678] and decom ganeti102[3456] - https://phabricator.wikimedia.org/T424680#11958176 (10MoritzMuehlenhoff)
[08:10:55] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2248 (T426633)', diff saved to https://phabricator.wikimedia.org/P93182 and previous config saved to /var/cache/conftool/dbconfig/20260527-081054-fceratto.json
[08:11:06] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
[08:11:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2153 (T426633)', diff saved to https://phabricator.wikimedia.org/P93183 and previous config saved to /var/cache/conftool/dbconfig/20260527-081112-fceratto.json
[08:11:45] <wikibugs>	 (03PS2) 10Elukey: Set pki-root1001 to role insetup [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664)
[08:11:54] <wikibugs>	 (03CR) 10Elukey: Set pki-root1001 to role insetup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664) (owner: 10Elukey)
[08:11:56] <logmsgbot>	 !log jnuche@deploy1003 rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.4  refs T423913
[08:12:01] <stashbot>	 T423913: 1.47.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T423913
[08:15:15] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es2051.codfw.wmnet with reason: host reimage
[08:16:31] <wikibugs>	 (03PS2) 10Arnaudb: gitlab: add envoy on Gitlab [puppet] - 10https://gerrit.wikimedia.org/r/1293722 (https://phabricator.wikimedia.org/T425441)
[08:16:31] <wikibugs>	 (03CR) 10Arnaudb: "thanks for the reviews, all things considered I think it's better to avoid adding Envoy on WMCS outside of the scope of a dedicated task" [puppet] - 10https://gerrit.wikimedia.org/r/1293722 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[08:18:06] <wikibugs>	 (03PS5) 10Arnaudb: trafficserver: add a map for gitlab as a backend [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441)
[08:18:46] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2051.codfw.wmnet with reason: host reimage
[08:19:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T426633)', diff saved to https://phabricator.wikimedia.org/P93184 and previous config saved to /var/cache/conftool/dbconfig/20260527-081942-fceratto.json
[08:27:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
[08:27:56] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] image: Flink 2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1293664 (https://phabricator.wikimedia.org/T412978) (owner: 10JavierMonton)
[08:28:41] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353 (10MoritzMuehlenhoff) 03NEW
[08:29:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P93185 and previous config saved to /var/cache/conftool/dbconfig/20260527-082950-fceratto.json
[08:29:55] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#11958231 (10MoritzMuehlenhoff)
[08:31:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
[08:31:49] <wikibugs>	 (03PS6) 10Arnaudb: trafficserver: add a map for gitlab as a backend [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441)
[08:32:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
[08:33:45] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[08:33:57] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2166: Upgrading db2166.codfw.wmnet
[08:34:17] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2166: Upgrading db2166.codfw.wmnet
[08:35:07] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2051.codfw.wmnet with OS trixie
[08:35:38] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2166.codfw.wmnet with OS trixie
[08:36:00] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[08:36:22] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1203: Upgrading db1203.eqiad.wmnet
[08:36:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
[08:36:51] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1203: Upgrading db1203.eqiad.wmnet
[08:37:43] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[08:38:27] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es2051: repool after maintenance
[08:38:28] <wikibugs>	 (03PS1) 10Phuedx: ext.wikimediaEvents: Add hoisting error detection test [extensions/WikimediaEvents] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294217 (https://phabricator.wikimedia.org/T427092)
[08:38:40] <wikibugs>	 (03PS1) 10Blake: mcrouter_wancache: swap mc1055 for mc1054 for trixie testing [puppet] - 10https://gerrit.wikimedia.org/r/1294216 (https://phabricator.wikimedia.org/T426044)
[08:38:40] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294217 (https://phabricator.wikimedia.org/T427092) (owner: 10Phuedx)
[08:39:58] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P93189 and previous config saved to /var/cache/conftool/dbconfig/20260527-083957-fceratto.json
[08:41:15] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11958339 (10ayounsi)
[08:41:29] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1203.eqiad.wmnet with OS trixie
[08:42:08] <wikibugs>	 (03PS4) 10Arnaudb: gitlab: add envoy on Gitlab [puppet] - 10https://gerrit.wikimedia.org/r/1293722 (https://phabricator.wikimedia.org/T425441)
[08:42:14] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] gitlab: add envoy on Gitlab [puppet] - 10https://gerrit.wikimedia.org/r/1293722 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[08:42:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Finally!" [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664) (owner: 10Elukey)
[08:43:06] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11958357 (10ayounsi)
[08:43:32] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11958374 (10ayounsi)
[08:47:26] <wikibugs>	 (03PS3) 10Filippo Giunchedi: alerts: add transformations option [puppet] - 10https://gerrit.wikimedia.org/r/1291947 (https://phabricator.wikimedia.org/T424814)
[08:47:26] <wikibugs>	 (03PS3) 10Filippo Giunchedi: toolforge: use alerts::deploy transformations [puppet] - 10https://gerrit.wikimedia.org/r/1291948 (https://phabricator.wikimedia.org/T424814)
[08:47:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: alerts: add transformations option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1291947 (https://phabricator.wikimedia.org/T424814) (owner: 10Filippo Giunchedi)
[08:48:02] <wikibugs>	 (03CR) 10Hnowlan: prometheus: add deployment label to appservers RED recording rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1293080 (https://phabricator.wikimedia.org/T249663) (owner: 10Hnowlan)
[08:50:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T426633)', diff saved to https://phabricator.wikimedia.org/P93190 and previous config saved to /var/cache/conftool/dbconfig/20260527-085005-fceratto.json
[08:50:17] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
[08:50:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM, though please consider also absenting mcrouter for puppet to do the cleanup instead of manually" [puppet] - 10https://gerrit.wikimedia.org/r/1278528 (https://phabricator.wikimedia.org/T422646) (owner: 10Andrew Bogott)
[08:50:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2170 (T426633)', diff saved to https://phabricator.wikimedia.org/P93191 and previous config saved to /var/cache/conftool/dbconfig/20260527-085024-fceratto.json
[08:50:29] <fabfur>	 !log depooling and installing haproxy-awslc on cp3074 and cp3066 (T419825) 
[08:50:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:34] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[08:51:16] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[08:51:28] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp3074.*
[08:51:40] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp3066.*
[08:51:41] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[08:51:46] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[08:51:51] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1290781 (https://phabricator.wikimedia.org/T426960) (owner: 10Krinkle)
[08:52:03] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[08:52:08] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[08:52:23] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[08:52:28] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
[08:52:46] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
[08:52:52] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[08:53:07] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[08:53:11] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
[08:53:29] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
[08:53:33] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[08:53:58] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[08:54:00] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: using haproxy-awslc on cp3074,cp3066 [puppet] - 10https://gerrit.wikimedia.org/r/1289998 (https://phabricator.wikimedia.org/T419825) (owner: 10Fabfur)
[08:54:02] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
[08:54:04] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
[08:54:20] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
[08:54:40] <Emperor>	 !log restart swift on ms-fe2011 T360913
[08:54:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:44] <stashbot>	 T360913: Swift proxy server misbehaviour (no longer calling `accept`?) - https://phabricator.wikimedia.org/T360913
[08:55:25] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
[08:57:52] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170 (T426633)', diff saved to https://phabricator.wikimedia.org/P93193 and previous config saved to /var/cache/conftool/dbconfig/20260527-085751-fceratto.json
[08:59:22] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
[09:00:41] <wikibugs>	 (03PS3) 10Effie Mouzeli: scap: remove testservers 4 [puppet] - 10https://gerrit.wikimedia.org/r/1198019 (https://phabricator.wikimedia.org/T397498)
[09:01:44] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: mw-mcrouter: use puppet defined image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054580 (owner: 10Effie Mouzeli)
[09:02:31] <logmsgbot>	 !log slyngshede@cumin1003 conftool action : set/pooled=yes; selector: name=cp6015.*
[09:02:38] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
[09:02:47] <logmsgbot>	 !log slyngshede@cumin1003 START - Cookbook sre.hosts.remove-downtime for cp6015.drmrs.wmnet
[09:02:47] <logmsgbot>	 !log slyngshede@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp6015.drmrs.wmnet
[09:02:58] <fabfur>	 !log repooling cp3074 and cp3066 (T419825)
[09:03:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:03] <stashbot>	 T419825: Test HAProxy 3.2 with AWS-LC libraries - https://phabricator.wikimedia.org/T419825
[09:03:09] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp3066.*
[09:03:12] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Update to kubernetes v1.31.14. [debs/kubernetes] (v1.31) - 10https://gerrit.wikimedia.org/r/1293087 (https://phabricator.wikimedia.org/T427065) (owner: 10Blake)
[09:03:16] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp3074.*
[09:03:25] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Remove pinned chart versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293750 (https://phabricator.wikimedia.org/T423251) (owner: 10JMeybohm)
[09:03:44] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[09:03:47] <wikibugs>	 10ops-drmrs, 06DC-Ops: cp6015 network error - https://phabricator.wikimedia.org/T426968#11958478 (10SLyngshede-WMF) I've done a few check and there isn't any reason to reimage the host.  I've removed the downtime and repooled the host.
[09:03:58] <wikibugs>	 10ops-drmrs, 06DC-Ops: cp6015 network error - https://phabricator.wikimedia.org/T426968#11958479 (10SLyngshede-WMF) 05Open→03Resolved
[09:04:42] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2010.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[09:04:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] alerts: add transformations option [puppet] - 10https://gerrit.wikimedia.org/r/1291947 (https://phabricator.wikimedia.org/T424814) (owner: 10Filippo Giunchedi)
[09:04:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] toolforge: use alerts::deploy transformations [puppet] - 10https://gerrit.wikimedia.org/r/1291948 (https://phabricator.wikimedia.org/T424814) (owner: 10Filippo Giunchedi)
[09:05:00] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp6009 is OK: reload-vcl successfully ran 0h, 2 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[09:05:14] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp6015 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[09:05:42] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[09:05:44] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[09:08:00] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P93194 and previous config saved to /var/cache/conftool/dbconfig/20260527-090759-fceratto.json
[09:08:19] <wikibugs>	 (03PS1) 10Elukey: role::ml_k8s::staging::master: enable IPIP encapsulation [puppet] - 10https://gerrit.wikimedia.org/r/1294223 (https://phabricator.wikimedia.org/T420438)
[09:08:20] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:08:21] <wikibugs>	 (03PS1) 10Elukey: Set ml-staging-ctrl to the Maglev scheduler and fix stale options [puppet] - 10https://gerrit.wikimedia.org/r/1294224 (https://phabricator.wikimedia.org/T420438)
[09:08:23] <wikibugs>	 (03PS1) 10Elukey: role::ml_k8s::staging::worker: enable IPIP encapsulation [puppet] - 10https://gerrit.wikimedia.org/r/1294225 (https://phabricator.wikimedia.org/T420438)
[09:08:26] <wikibugs>	 (03PS1) 10Elukey: Set Maglev's scheduling for inference-staging and ingress [puppet] - 10https://gerrit.wikimedia.org/r/1294226 (https://phabricator.wikimedia.org/T420438)
[09:09:14] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:09:18] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 04 Aug 2026 03:33:57 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:10:29] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:10:57] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] cache::text: pipe caching for lw streaming API [puppet] - 10https://gerrit.wikimedia.org/r/1293746 (https://phabricator.wikimedia.org/T425680) (owner: 10Clément Goubert)
[09:11:29] <wikibugs>	 (03Merged) 10jenkins-bot: Remove pinned chart versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293750 (https://phabricator.wikimedia.org/T423251) (owner: 10JMeybohm)
[09:16:36] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2166.codfw.wmnet with OS trixie
[09:18:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P93196 and previous config saved to /var/cache/conftool/dbconfig/20260527-091806-fceratto.json
[09:19:59] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1203.eqiad.wmnet with OS trixie
[09:23:40] <icinga-wm>	 RECOVERY - orchestrator resolve cache non-FQDNs on dborch1002 is OK: OK: all orchestrator resolve cache entries are FQDNs https://wikitech.wikimedia.org/wiki/Orchestrator
[09:23:51] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es2051: repool after maintenance
[09:24:30] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2166: Migration of db2166.codfw.wmnet completed
[09:25:19] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1050: repool after maintenance
[09:25:20] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1050: repool after maintenance
[09:25:33] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.major-upgrade
[09:25:55] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool es1050: Upgrading es1050.eqiad.wmnet
[09:26:14] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1050: Upgrading es1050.eqiad.wmnet
[09:26:45] <wikibugs>	 (03PS2) 10Effie Mouzeli: mcrouter_wancache: swap mc1055 for mc1054 for trixie testing [puppet] - 10https://gerrit.wikimedia.org/r/1294216 (https://phabricator.wikimedia.org/T426044) (owner: 10Blake)
[09:27:00] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS trixie
[09:28:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2170 (T426633)', diff saved to https://phabricator.wikimedia.org/P93200 and previous config saved to /var/cache/conftool/dbconfig/20260527-092814-fceratto.json
[09:28:32] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1203: Migration of db1203.eqiad.wmnet completed
[09:28:36] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
[09:28:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2173 (T426633)', diff saved to https://phabricator.wikimedia.org/P93202 and previous config saved to /var/cache/conftool/dbconfig/20260527-092842-fceratto.json
[09:28:53] <wikibugs>	 (03PS4) 10Arnaudb: gitlab: use service name for upstream addr [puppet] - 10https://gerrit.wikimedia.org/r/1294219 (https://phabricator.wikimedia.org/T425441)
[09:28:53] <wikibugs>	 (03CR) 10Arnaudb: "That change will require a gitlab-ctl reconfigure (run by puppet), so it will trigger a short unavailability period. I suggest to merge it" [puppet] - 10https://gerrit.wikimedia.org/r/1294219 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[09:30:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11958567 (10Jclark-ctr) a:03Jclark-ctr
[09:32:06] <wikibugs>	 (03CR) 10Blake: [C:03+2] Update to kubernetes v1.31.14. [debs/kubernetes] (v1.31) - 10https://gerrit.wikimedia.org/r/1293087 (https://phabricator.wikimedia.org/T427065) (owner: 10Blake)
[09:32:57] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] vrts: alerts for the new antispam pipeline [alerts] - 10https://gerrit.wikimedia.org/r/1293667 (https://phabricator.wikimedia.org/T402260) (owner: 10Arnaudb)
[09:34:48] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[09:34:59] <wikibugs>	 (03Merged) 10jenkins-bot: vrts: alerts for the new antispam pipeline [alerts] - 10https://gerrit.wikimedia.org/r/1293667 (https://phabricator.wikimedia.org/T402260) (owner: 10Arnaudb)
[09:36:10] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T426633)', diff saved to https://phabricator.wikimedia.org/P93203 and previous config saved to /var/cache/conftool/dbconfig/20260527-093609-fceratto.json
[09:36:18] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[09:36:56] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti105[5678] and decom ganeti102[3456] - https://phabricator.wikimedia.org/T424680#11958604 (10MoritzMuehlenhoff) 05Open→03Resolved All done
[09:37:01] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[09:37:16] <logmsgbot>	 !log bwojtowicz@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[09:38:04] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[09:38:19] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[09:41:46] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
[09:43:50] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[09:45:39] <wikibugs>	 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 3 others: Replace Spamassassin with Rspam for VRTS on Postfix - https://phabricator.wikimedia.org/T402260#11958638 (10ABran-WMF) 05In progress→03Resolved Alerts have been merged, I'm marking this as `Resolved`, feel free to...
[09:46:04] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
[09:46:17] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P93206 and previous config saved to /var/cache/conftool/dbconfig/20260527-094616-fceratto.json
[09:46:38] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] mcrouter_wancache: swap mc1055 for mc1054 for trixie testing [puppet] - 10https://gerrit.wikimedia.org/r/1294216 (https://phabricator.wikimedia.org/T426044) (owner: 10Blake)
[09:47:07] <wikibugs>	 (03CR) 10Arnaudb: "the previous deployment calendar link is broken: https://wikitech.wikimedia.org/wiki/Deployments#Friday,_May_29" [puppet] - 10https://gerrit.wikimedia.org/r/1294219 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[09:47:26] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[09:50:47] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289349 (owner: 10PipelineBot)
[09:53:13] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289349 (owner: 10PipelineBot)
[09:56:25] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P93208 and previous config saved to /var/cache/conftool/dbconfig/20260527-095624-fceratto.json
[09:58:30] <wikibugs>	 (03PS2) 10Jelto: miscweb: remove wmf-navigator public and private config from web container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405)
[09:59:00] <wikibugs>	 (03PS5) 10Dzahn: tcpproxy: add support for gitlab-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1282428 (https://phabricator.wikimedia.org/T425441)
[09:59:17] <wikibugs>	 (03PS8) 10Arnaudb: service: add gitlab-https and gitlab-ssh service to service catalog [puppet] - 10https://gerrit.wikimedia.org/r/1290684 (https://phabricator.wikimedia.org/T425441)
[09:59:17] <wikibugs>	 (03PS8) 10Arnaudb: lvs7003: add gitlab-ssh and gitlab-https [puppet] - 10https://gerrit.wikimedia.org/r/1291898 (https://phabricator.wikimedia.org/T425441)
[09:59:55] <wikibugs>	 (03PS1) 10STran: Deploy IRS Direct Reporting feature to enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294229 (https://phabricator.wikimedia.org/T427369)
[10:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1000)
[10:02:18] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1050.eqiad.wmnet with OS trixie
[10:03:42] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:03+2] rest-gateway: tighten rate limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289992 (https://phabricator.wikimedia.org/T424821) (owner: 10Daniel Kinzler)
[10:04:40] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[10:05:09] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool es1050: repool after maintenance
[10:06:14] <wikibugs>	 (03Merged) 10jenkins-bot: rest-gateway: tighten rate limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289992 (https://phabricator.wikimedia.org/T424821) (owner: 10Daniel Kinzler)
[10:06:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T426633)', diff saved to https://phabricator.wikimedia.org/P93211 and previous config saved to /var/cache/conftool/dbconfig/20260527-100632-fceratto.json
[10:06:39] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, but see comment inline about moving to 7.3.7" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1285804 (owner: 10Slyngshede)
[10:06:54] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
[10:07:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2174 (T426633)', diff saved to https://phabricator.wikimedia.org/P93212 and previous config saved to /var/cache/conftool/dbconfig/20260527-100701-fceratto.json
[10:08:39] <logmsgbot>	 !log daniel@deploy1003 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[10:08:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:08:47] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release kube-system/calico on k8s@codfw in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:10:00] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2166: Migration of db2166.codfw.wmnet completed
[10:10:01] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[10:10:08] <wikibugs>	 (03PS3) 10Jelto: miscweb: remove wmf-navigator public and private config from web container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405)
[10:10:30] <logmsgbot>	 !log daniel@deploy1003 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[10:11:18] <wikibugs>	 (03CR) 10Federico Ceratto: sre.mysql.upgrade: fix looping logic (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291999 (https://phabricator.wikimedia.org/T420203) (owner: 10FNegri)
[10:13:16] <wikibugs>	 (03CR) 10LSobanski: "Just to confirm, there is no way of making port 22 work externally on the GitLab IP of the TCP proxies?" [puppet] - 10https://gerrit.wikimedia.org/r/1282428 (https://phabricator.wikimedia.org/T425441) (owner: 10Dzahn)
[10:14:01] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1203: Migration of db1203.eqiad.wmnet completed
[10:14:02] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[10:14:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T426633)', diff saved to https://phabricator.wikimedia.org/P93215 and previous config saved to /var/cache/conftool/dbconfig/20260527-101426-fceratto.json
[10:17:50] <wikibugs>	 (03CR) 10Muehlenhoff: "This should be handled by SRE Clinic duty with a dedicated task" [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T423255) (owner: 10ArielGlenn)
[10:18:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] profile::rpkivalidator: Switch to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1293609 (owner: 10Muehlenhoff)
[10:19:07] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs1020 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[10:21:37] <logmsgbot>	 !log daniel@deploy1003 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[10:21:55] <wikibugs>	 10ops-codfw, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: re-rack mc2055 (before Jun 9th) - https://phabricator.wikimedia.org/T427373 (10jijiki) 03NEW p:05Triage→03High
[10:22:03] <logmsgbot>	 !log daniel@deploy1003 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[10:24:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P93217 and previous config saved to /var/cache/conftool/dbconfig/20260527-102434-fceratto.json
[10:27:57] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs1019 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[10:28:57] <wikibugs>	 (03PS1) 10Muehlenhoff: rpkivalidator: Fix up previous patch [puppet] - 10https://gerrit.wikimedia.org/r/1294238
[10:29:20] <logmsgbot>	 !log daniel@deploy1003 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[10:29:24] <wikibugs>	 (03CR) 10Marostegui: sre.mysql.global-read-only Set all sections as RO/RW (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto)
[10:29:41] <logmsgbot>	 !log daniel@deploy1003 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[10:32:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] rpkivalidator: Fix up previous patch [puppet] - 10https://gerrit.wikimedia.org/r/1294238 (owner: 10Muehlenhoff)
[10:34:33] <wikibugs>	 (03CR) 10JMeybohm: miscweb: remove wmf-navigator public and private config from web container (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[10:34:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P93218 and previous config saved to /var/cache/conftool/dbconfig/20260527-103441-fceratto.json
[10:34:50] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[10:35:01] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2165: Upgrading db2165.codfw.wmnet
[10:35:20] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2165: Upgrading db2165.codfw.wmnet
[10:35:29] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[10:35:50] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1193: Upgrading db1193.eqiad.wmnet
[10:36:29] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1193: Upgrading db1193.eqiad.wmnet
[10:36:49] <wikibugs>	 (03PS1) 10Arnaudb: vrts: skip pint validation on active/passive alerts [alerts] - 10https://gerrit.wikimedia.org/r/1294240 (https://phabricator.wikimedia.org/T402260)
[10:36:52] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] vrts: skip pint validation on active/passive alerts [alerts] - 10https://gerrit.wikimedia.org/r/1294240 (https://phabricator.wikimedia.org/T402260) (owner: 10Arnaudb)
[10:38:00] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS trixie
[10:38:33] <wikibugs>	 (03Merged) 10jenkins-bot: vrts: skip pint validation on active/passive alerts [alerts] - 10https://gerrit.wikimedia.org/r/1294240 (https://phabricator.wikimedia.org/T402260) (owner: 10Arnaudb)
[10:39:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch rpki2003 to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1294241
[10:39:32] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2165.codfw.wmnet with OS trixie
[10:41:16] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294241 (owner: 10Muehlenhoff)
[10:44:20] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Anycast services - depool strategy in terms of BGP routing - https://phabricator.wikimedia.org/T420821#11958890 (10ayounsi)
[10:44:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T426633)', diff saved to https://phabricator.wikimedia.org/P93222 and previous config saved to /var/cache/conftool/dbconfig/20260527-104449-fceratto.json
[10:45:11] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
[10:45:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2176 (T426633)', diff saved to https://phabricator.wikimedia.org/P93223 and previous config saved to /var/cache/conftool/dbconfig/20260527-104518-fceratto.json
[10:46:04] <wikibugs>	 (03PS3) 10Hnowlan: prometheus: add deployment label to appservers RED recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1293080 (https://phabricator.wikimedia.org/T249663)
[10:47:02] <wikibugs>	 (03PS1) 10STran: Set minimum edit count for skipcaptcha right to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973)
[10:50:34] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1050: repool after maintenance
[10:51:01] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] prometheus: add deployment label to appservers RED recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1293080 (https://phabricator.wikimedia.org/T249663) (owner: 10Hnowlan)
[10:52:09] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
[10:52:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T426633)', diff saved to https://phabricator.wikimedia.org/P93225 and previous config saved to /var/cache/conftool/dbconfig/20260527-105235-fceratto.json
[10:53:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: reimage to move primary IP from private1-c-eqiad to private1-c7-eqiad vlan - https://phabricator.wikimedia.org/T405632#11958935 (10ayounsi)
[10:53:04] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: lvs1020: reimage to move primary IP from private1-d-eqiad to private1-d7-eqiad vlan - https://phabricator.wikimedia.org/T405630#11958936 (10ayounsi)
[10:56:36] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[10:56:46] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] Deploy IRS Direct Reporting feature to enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294229 (https://phabricator.wikimedia.org/T427369) (owner: 10STran)
[10:57:22] <wikibugs>	 (03PS2) 10Slyngshede: Update to CAS version 7.3.7 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1285804
[10:57:41] <icinga-wm>	 PROBLEM - Host db2189 #page is DOWN: CRITICAL - Network Unreachable (10.192.16.180)
[10:57:52] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
[10:57:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability: Add Icinga check for SRX cluster status - https://phabricator.wikimedia.org/T271298#11958946 (10ayounsi) 05Open→03Declined We're not going to add more stuff to Icinga.
[10:57:57] <wikibugs>	 (03CR) 10Slyngshede: Update to CAS version 7.3.7 (032 comments) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1285804 (owner: 10Slyngshede)
[10:58:07] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Interface validators: allow for channelized port numbers on Juniper [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1293649 (https://phabricator.wikimedia.org/T427056) (owner: 10Cathal Mooney)
[10:58:14] <marostegui>	 !ack
[10:58:14] <sirenbot>	 8023 (ACKED)  Host db2189 (paged)
[10:58:46] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
[10:58:47] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release kube-system/calico on k8s@codfw in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[11:00:05] <jouncebot>	 mvolz: Time to snap out of that daydream and deploy Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1100).
[11:00:17] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2189', diff saved to https://phabricator.wikimedia.org/P93226 and previous config saved to /var/cache/conftool/dbconfig/20260527-110016-marostegui.json
[11:00:33] <marostegui>	 jelto: db2189 went down, I will handle it
[11:00:33] <wikibugs>	 (03Merged) 10jenkins-bot: Interface validators: allow for channelized port numbers on Juniper [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1293649 (https://phabricator.wikimedia.org/T427056) (owner: 10Cathal Mooney)
[11:01:20] <jynus>	 marostegui: I get no login process on serial, unsure if rebooting or stuck, but no normal state
[11:01:30] <wikibugs>	 (03CR) 10Harroyo-wmf: [C:03+1] Set minimum edit count for skipcaptcha right to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:01:32] <jynus>	 will log out so you can take it from there
[11:01:41] <jelto>	 Great thank you
[11:01:45] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
[11:01:45] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
[11:02:06] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
[11:02:10] <wikibugs>	 (03PS1) 10STran: Update Direct Reporting email [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294247 (https://phabricator.wikimedia.org/T427358)
[11:02:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294247 (https://phabricator.wikimedia.org/T427358) (owner: 10STran)
[11:02:40] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[11:02:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P93227 and previous config saved to /var/cache/conftool/dbconfig/20260527-110242-fceratto.json
[11:02:51] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] Update Direct Reporting email [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294247 (https://phabricator.wikimedia.org/T427358) (owner: 10STran)
[11:02:58] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[11:03:29] <wikibugs>	 (03PS1) 10Muehlenhoff: dbproxy: Remove unused public type [puppet] - 10https://gerrit.wikimedia.org/r/1294248 (https://phabricator.wikimedia.org/T149804)
[11:04:24] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Switch rpki2003 to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1294241 (owner: 10Muehlenhoff)
[11:05:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1285804 (owner: 10Slyngshede)
[11:05:17] <wikibugs>	 (03CR) 10Dreamy Jazz: Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:05:43] <wikibugs>	 (03CR) 10Slyngshede: [V:03+2 C:03+2] Update to CAS version 7.3.7 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1285804 (owner: 10Slyngshede)
[11:05:49] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294248 (https://phabricator.wikimedia.org/T149804) (owner: 10Muehlenhoff)
[11:06:06] <wikibugs>	 (03PS1) 10Marostegui: db2189: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1294249
[11:06:12] <wikibugs>	 (03PS1) 10Mvolz: Revert "citoid: pipeline bot promote" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294250
[11:06:22] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] Revert "citoid: pipeline bot promote" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294250 (owner: 10Mvolz)
[11:06:29] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Change codfw dns hosts BGP peering to top-of-rack switch - https://phabricator.wikimedia.org/T376894#11958977 (10ayounsi)
[11:07:57] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2189: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1294249 (owner: 10Marostegui)
[11:08:37] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[11:08:41] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[11:08:47] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "citoid: pipeline bot promote" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294250 (owner: 10Mvolz)
[11:10:46] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] "How will this interact with https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/8098b104f08cb1bc91c2ddde9f1f669f2c84ab47/wmf-conf" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:10:46] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[11:10:55] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[11:11:32] <jinxer-wm>	 FIRING: [2x] HelmReleaseBadStatus: Helm release kube-system/calico on k8s@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[11:11:39] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:12:08] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376 (10Marostegui) 03NEW
[11:12:24] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11959011 (10Marostegui) p:05Triage→03Medium
[11:12:33] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11959017 (10Marostegui)
[11:12:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P93229 and previous config saved to /var/cache/conftool/dbconfig/20260527-111250-fceratto.json
[11:13:39] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:15:30] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS trixie
[11:16:17] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[11:17:32] <wikibugs>	 (03CR) 10FNegri: sre.mysql.upgrade: fix looping logic (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291999 (https://phabricator.wikimedia.org/T420203) (owner: 10FNegri)
[11:19:41] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2165.codfw.wmnet with OS trixie
[11:22:58] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T426633)', diff saved to https://phabricator.wikimedia.org/P93230 and previous config saved to /var/cache/conftool/dbconfig/20260527-112257-fceratto.json
[11:23:20] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
[11:23:28] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93231 and previous config saved to /var/cache/conftool/dbconfig/20260527-112327-fceratto.json
[11:23:55] <wikibugs>	 (03PS1) 10Cathal Mooney: Interface validator: support channlized interface names [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1294256 (https://phabricator.wikimedia.org/T427056)
[11:24:20] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1193: Migration of db1193.eqiad.wmnet completed
[11:29:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1013.eqiad.wmnet - https://phabricator.wikimedia.org/T427190#11959086 (10Jclark-ctr) 05Open→03Resolved
[11:29:17] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2165: Migration of db2165.codfw.wmnet completed
[11:30:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11959092 (10Jclark-ctr)
[11:30:28] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::mariadb::proxy: Use Puppet types [puppet] - 10https://gerrit.wikimedia.org/r/1294258
[11:30:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11959094 (10Jclark-ctr)  D6 U36
[11:31:16] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11959117 (10Jclark-ctr)
[11:31:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93235 and previous config saved to /var/cache/conftool/dbconfig/20260527-113142-fceratto.json
[11:32:22] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Interface validator: support channlized interface names [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1294256 (https://phabricator.wikimedia.org/T427056) (owner: 10Cathal Mooney)
[11:33:41] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.5 point update - https://phabricator.wikimedia.org/T427072#11959124 (10MoritzMuehlenhoff)
[11:36:18] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Interface validator: support channlized interface names [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1294256 (https://phabricator.wikimedia.org/T427056) (owner: 10Cathal Mooney)
[11:39:23] <wikibugs>	 (03Merged) 10jenkins-bot: Interface validator: support channlized interface names [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1294256 (https://phabricator.wikimedia.org/T427056) (owner: 10Cathal Mooney)
[11:39:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc1014.eqiad.wmnet - https://phabricator.wikimedia.org/T427270#11959173 (10Jclark-ctr) 05Open→03Resolved
[11:40:06] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: GRE Interfaces statistics not being returned by Juniper MX via gnmi - https://phabricator.wikimedia.org/T403936#11959179 (10ayounsi) 05Open→03Resolved a:03cmooney It's now showing up thanks to {T424683}  https://grafana.wikimedia.org/goto/dfnbnedrb28sg...
[11:40:55] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Map video and other large files to 'low-priority' network Qos queue - https://phabricator.wikimedia.org/T410133#11959189 (10cmooney) 05Open→03Resolved a:03cmooney We actaully added a mechanism to do this late last year when we had some une...
[11:41:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P93237 and previous config saved to /var/cache/conftool/dbconfig/20260527-114149-fceratto.json
[11:49:04] <wikibugs>	 (03PS1) 10Majavah: memcached: Improve absenting support [puppet] - 10https://gerrit.wikimedia.org/r/1294259 (https://phabricator.wikimedia.org/T427189)
[11:49:06] <wikibugs>	 (03PS1) 10Majavah: prometheus: memcached_exporter: Improve absentability [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189)
[11:49:08] <wikibugs>	 (03PS1) 10Majavah: P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189)
[11:51:08] <wikibugs>	 (03PS2) 10Majavah: P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189)
[11:51:19] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Map internet-bound upload traffic to low-priority QoS queue - https://phabricator.wikimedia.org/T415649#11959238 (10cmooney) 05Open→03Declined I'm going to close this one.  I hadn't fully thought out the way we serve things currently.  `uplo...
[11:51:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P93239 and previous config saved to /var/cache/conftool/dbconfig/20260527-115157-fceratto.json
[11:53:42] <wikibugs>	 (03PS3) 10Majavah: P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189)
[11:55:30] <wikibugs>	 (03PS4) 10Majavah: P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189)
[11:56:38] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
[11:58:04] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8584/co" [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[11:58:24] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "is everything alright? /cc effie - ayounsi@cumin1003"
[11:58:29] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "is everything alright? /cc effie - ayounsi@cumin1003"
[11:58:44] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
[11:58:45] <XioNoX>	 looks like we're all good
[12:00:22] <wikibugs>	 (03PS1) 10Matthias Mullie: MMV Carousel: Restore click-to-open for carousel thumbnails [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294264 (https://phabricator.wikimedia.org/T426225)
[12:01:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294264 (https://phabricator.wikimedia.org/T426225) (owner: 10Matthias Mullie)
[12:01:11] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
[12:01:26] <wikibugs>	 10SRE-swift-storage, 10Thumbor: Gradually drop all thumbnails as a one-off clean up - https://phabricator.wikimedia.org/T379942#11959265 (10Ladsgroup)
[12:02:06] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2188 (T426633)', diff saved to https://phabricator.wikimedia.org/P93242 and previous config saved to /var/cache/conftool/dbconfig/20260527-120205-fceratto.json
[12:02:47] <wikibugs>	 (03CR) 10Marostegui: sre.mysql.upgrade: fix looping logic (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1291999 (https://phabricator.wikimedia.org/T420203) (owner: 10FNegri)
[12:04:21] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8586/c" [puppet] - 10https://gerrit.wikimedia.org/r/1294259 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[12:04:45] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
[12:04:53] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2212 (T426633)', diff saved to https://phabricator.wikimedia.org/P93243 and previous config saved to /var/cache/conftool/dbconfig/20260527-120452-fceratto.json
[12:05:12] <wikibugs>	 (03PS2) 10STran: Set minimum edit count for skipcaptcha right to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973)
[12:07:03] <wikibugs>	 (03CR) 10STran: Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[12:08:11] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 18): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8585/c" [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[12:09:38] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.dns.netbox
[12:09:48] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1193: Migration of db1193.eqiad.wmnet completed
[12:09:49] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[12:09:57] <wikibugs>	 06SRE, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: rdb201[34] implementation tracking - https://phabricator.wikimedia.org/T418924#11959280 (10jijiki) 05Stalled→03In progress a:05Clement_Goubert→03jijiki
[12:10:24] <logmsgbot>	 cmooney@cumin1003 update-extras (PID 1314811) is awaiting input
[12:11:52] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool db2189: Test
[12:12:09] <logmsgbot>	 !log fceratto@cumin1003 END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2189: Test
[12:13:01] <wikibugs>	 (03CR) 10Dreamy Jazz: Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[12:14:19] <wikibugs>	 06SRE, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: rdb201[34] implementation tracking - https://phabricator.wikimedia.org/T418924#11959306 (10jijiki)
[12:14:33] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1078 to eqiad - jclark@cumin1003"
[12:14:39] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1078 to eqiad - jclark@cumin1003"
[12:14:39] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:14:41] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] prometheus: add deployment label to appservers RED recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1293080 (https://phabricator.wikimedia.org/T249663) (owner: 10Hnowlan)
[12:14:46] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Migration of db2165.codfw.wmnet completed
[12:14:47] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[12:15:00] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.dns.netbox
[12:18:39] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[12:18:59] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1192: Upgrading db1192.eqiad.wmnet
[12:19:33] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1078 to eqiad - jclark@cumin1003"
[12:19:38] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1192: Upgrading db1192.eqiad.wmnet
[12:19:38] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1078 to eqiad - jclark@cumin1003"
[12:19:38] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:19:53] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078
[12:19:56] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079
[12:19:59] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[12:20:10] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2164: Upgrading db2164.codfw.wmnet
[12:20:12] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1079
[12:20:19] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078
[12:20:24] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080
[12:20:25] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.network.peering with action 'configure' for AS: 36692
[12:20:27] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2164: Upgrading db2164.codfw.wmnet
[12:20:32] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077
[12:20:37] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080
[12:20:46] <logmsgbot>	 !log jclark@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1077
[12:21:07] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS trixie
[12:21:33] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Q3 :rack/setup/install cloudvirt refresh - https://phabricator.wikimedia.org/T425088#11959318 (10Jclark-ctr) >>! In T425088#11924352, @fgiunchedi wrote: > @Jclark-ctr once  T426180 is resolved and hosts can be reimaged, please rack as follows >  > 1077 -> `C8` > 1078 -> `D5` > 1...
[12:21:34] <wikibugs>	 (03PS3) 10STran: Set minimum edit count for skipcaptcha right to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973)
[12:21:45] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
[12:21:53] <wikibugs>	 (03CR) 10STran: Set minimum edit count for skipcaptcha right to 10 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[12:22:05] <wikibugs>	 (03PS1) 10Dpogorzelski: fix: ml changelogs [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1294268 (https://phabricator.wikimedia.org/T419722)
[12:22:29] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS trixie
[12:23:48] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Wikidata Platform Team, and 2 others: Q4:rack/setup/install dse-k8s-wdqs100[1-3] (formerly wdqs103[6-8]) - https://phabricator.wikimedia.org/T423314#11959334 (10Jclark-ctr) I have updated server names, switchports and provisioned servers.  pending  puppet being updated  @BTu...
[12:24:01] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] "Beyond the open question about throttle exempted IPs, this LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294243 (https://phabricator.wikimedia.org/T426973) (owner: 10STran)
[12:28:53] <Amir1>	 !log deleting binlogs older than a year
[12:28:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:18] <wikibugs>	 (03PS1) 10Effie Mouzeli: aliases: swap rdb2007 with rdb2011 [puppet] - 10https://gerrit.wikimedia.org/r/1294270
[12:32:18] <wikibugs>	 (03PS1) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[12:32:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Blocklist more unused network protocols [puppet] - 10https://gerrit.wikimedia.org/r/1294272
[12:35:16] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
[12:35:29] <wikibugs>	 (03PS2) 10Atsuko: httpd-cas: config option to disable httpd-cas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763)
[12:37:19] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Set pki-root1001 to role insetup [puppet] - 10https://gerrit.wikimedia.org/r/1294179 (https://phabricator.wikimedia.org/T416664) (owner: 10Elukey)
[12:37:36] <wikibugs>	 (03PS1) 10Marostegui: installserver: Add pc1024 to UEFI array. [puppet] - 10https://gerrit.wikimedia.org/r/1294273
[12:37:37] <wikibugs>	 (03CR) 10Atsuko: httpd-cas: config option to disable httpd-cas (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[12:38:05] <wikibugs>	 (03PS2) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[12:40:23] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
[12:40:29] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:40:37] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
[12:41:08] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Add pc1024 to UEFI array. [puppet] - 10https://gerrit.wikimedia.org/r/1294273 (owner: 10Marostegui)
[12:43:07] <wikibugs>	 (03PS1) 10Effie Mouzeli: ratelimit: replace rdb2009 with rdb2013 #1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294274 (https://phabricator.wikimedia.org/T418924)
[12:43:09] <wikibugs>	 (03PS1) 10Effie Mouzeli: radioscope: replace rdb2009 with rdb2013 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294275 (https://phabricator.wikimedia.org/T418924)
[12:44:13] <wikibugs>	 (03PS1) 10Effie Mouzeli: rest-gateway: replace rdb2009 with rdb2013 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294276 (https://phabricator.wikimedia.org/T418924)
[12:44:14] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:44:20] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
[12:45:19] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Blocklist more unused network protocols [puppet] - 10https://gerrit.wikimedia.org/r/1294272 (owner: 10Muehlenhoff)
[12:45:34] <wikibugs>	 (03PS1) 10Effie Mouzeli: changeprop: replace rdb2009 with rdb2013 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294277 (https://phabricator.wikimedia.org/T418924)
[12:45:36] <wikibugs>	 (03PS1) 10Effie Mouzeli: changeprop-jobqueue: replace rdb2009 with rdb2013 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294278 (https://phabricator.wikimedia.org/T418924)
[12:48:32] <wikibugs>	 (03PS1) 10Effie Mouzeli: docker_registry:  replace rdb2009 with rdb2013 [puppet] - 10https://gerrit.wikimedia.org/r/1294279 (https://phabricator.wikimedia.org/T418924)
[12:49:00] <wikibugs>	 (03PS2) 10Effie Mouzeli: radioscope: replace rdb2009 with rdb2013 #2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294275 (https://phabricator.wikimedia.org/T418924)
[12:49:13] <wikibugs>	 (03PS2) 10Effie Mouzeli: rest-gateway: replace rdb2009 with rdb2013 #3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294276 (https://phabricator.wikimedia.org/T418924)
[12:52:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.5 point update - https://phabricator.wikimedia.org/T427072#11959440 (10MoritzMuehlenhoff)
[12:52:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1294270 (owner: 10Effie Mouzeli)
[12:56:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:et-1/1/5 (Transport: cr2-codfw:et-0/1/4 (Lumen, 449169461) {#changeme_lumen_patch}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[12:57:21] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: db2212 failed to reboot - https://phabricator.wikimedia.org/T427388#11959448 (10FCeratto-WMF)
[12:57:59] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1192.eqiad.wmnet with OS trixie
[13:00:05] <jouncebot>	 Lucas_WMDE, urbanecm, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1300).
[13:00:05] <jouncebot>	 aude, phuedx, mfossati, Tran, and matthiasmullie: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:08] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops: db2212 failed to reboot - https://phabricator.wikimedia.org/T427388#11959454 (10FCeratto-WMF) There are no events in `getsel` after `06/13/2025 14:24:15`
[13:00:10] <matthiasmullie>	 o/
[13:00:13] <phuedx>	 o/
[13:00:18] <mfossati>	 o/
[13:00:21] <Lucas_WMDE>	 I can’t deploy, sorry – in a meeting
[13:00:40] <Tran>	 o/
[13:01:31] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS trixie
[13:01:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:et-0/1/4 (Transport: cr2-eqiad:et-1/1/5 (Lumen, 449169461) {#3909}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[13:02:04] <matthiasmullie>	 While there is no deployer - mind if I get started with mfossati & my patches first?
[13:02:15] <wikibugs>	 (03CR) 10Atsuko: [V:03+2 C:03+2] image: Flink 2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1293664 (https://phabricator.wikimedia.org/T412978) (owner: 10JavierMonton)
[13:02:46] <wikibugs>	 (03CR) 10JavierMonton: [C:03+1] flink-app - default to setting metrics.internal.query-service.port [deployment-charts] - 10https://gerrit.wikimedia.org/r/1268071 (https://phabricator.wikimedia.org/T421216) (owner: 10Ottomata)
[13:03:00] <Tran>	 might as well unless aude or phuedx are here?
[13:03:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:03:51] <mfossati>	 matthiasmullie: sounds good to me
[13:04:14] <matthiasmullie>	 I have started
[13:04:15] <phuedx>	 I'm here but I'm happy to go second
[13:04:17] <matthiasmullie>	 phuedx Tran doe you need help backporting your patches?
[13:04:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1290781 (https://phabricator.wikimedia.org/T426960) (owner: 10Krinkle)
[13:04:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by mlitn@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294264 (https://phabricator.wikimedia.org/T426225) (owner: 10Matthias Mullie)
[13:04:27] <Tran>	 I can backport my own patch after phuedx
[13:04:34] <phuedx>	 Nope. I can self service w/ SpiderPig
[13:04:49] <matthiasmullie>	 Sweet. I'll ping you when I'm done!
[13:05:37] <matthiasmullie>	 mfossati: I'm pushing both our patches at the same time in the interest of time
[13:05:49] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1192: Migration of db1192.eqiad.wmnet completed
[13:05:57] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 99 days, 0:00:00 on db2212.codfw.wmnet with reason: failed to reboot T427388 T426633
[13:06:01] <stashbot>	 T427388: db2212 failed to reboot - https://phabricator.wikimedia.org/T427388
[13:06:19] <Krinkle>	 mfossati: thx for deploying that!
[13:06:24] <mfossati>	 matthiasmullie: sure!
[13:06:28] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[13:07:38] <wikibugs>	 (03Merged) 10jenkins-bot: mmv: Fix missing or stale arrow and counter controls [extensions/MultimediaViewer] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1290781 (https://phabricator.wikimedia.org/T426960) (owner: 10Krinkle)
[13:07:41] <wikibugs>	 (03Merged) 10jenkins-bot: MMV Carousel: Restore click-to-open for carousel thumbnails [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294264 (https://phabricator.wikimedia.org/T426225) (owner: 10Matthias Mullie)
[13:07:42] <mfossati>	 Krinkle: that made some waves :-)
[13:08:08] <logmsgbot>	 !log mlitn@deploy1003 Started scap sync-world: Backport for [[gerrit:1290781|mmv: Fix missing or stale arrow and counter controls (T426960)]], [[gerrit:1294264|MMV Carousel: Restore click-to-open for carousel thumbnails (T426225)]]
[13:08:14] <stashbot>	 T426960: Mediaviewer missing left/right arrows and X/Y counter is out of sync - https://phabricator.wikimedia.org/T426960
[13:08:14] <stashbot>	 T426225: Image Browsing: beta feature for rollout - https://phabricator.wikimedia.org/T426225
[13:10:07] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2164: Migration of db2164.codfw.wmnet completed
[13:10:45] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[13:11:11] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "Thanks!" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1294268 (https://phabricator.wikimedia.org/T419722) (owner: 10Dpogorzelski)
[13:12:46] <wikibugs>	 (03PS1) 10Muehlenhoff: Retire the Ubuntu mirror [puppet] - 10https://gerrit.wikimedia.org/r/1294284 (https://phabricator.wikimedia.org/T416707)
[13:13:13] <logmsgbot>	 !log mlitn@deploy1003 krinkle, mlitn: Backport for [[gerrit:1290781|mmv: Fix missing or stale arrow and counter controls (T426960)]], [[gerrit:1294264|MMV Carousel: Restore click-to-open for carousel thumbnails (T426225)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:13:19] <stashbot>	 T426960: Mediaviewer missing left/right arrows and X/Y counter is out of sync - https://phabricator.wikimedia.org/T426960
[13:13:20] <stashbot>	 T426225: Image Browsing: beta feature for rollout - https://phabricator.wikimedia.org/T426225
[13:13:38] <matthiasmullie>	 mfossati: it's on test servers - please check and confirm we're good to move forward!
[13:13:46] <mfossati>	 on it
[13:14:04] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393 (10Papaul) 03NEW
[13:14:24] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+2] fix: ml changelogs [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1294268 (https://phabricator.wikimedia.org/T419722) (owner: 10Dpogorzelski)
[13:14:28] <wikibugs>	 (03CR) 10Dpogorzelski: [V:03+2 C:03+2] fix: ml changelogs [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1294268 (https://phabricator.wikimedia.org/T419722) (owner: 10Dpogorzelski)
[13:14:51] <wikibugs>	 (03PS1) 10Muehlenhoff: autoinstall: Stop using mirrors.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1294285 (https://phabricator.wikimedia.org/T416707)
[13:15:18] <mfossati>	 matthiasmullie: I couldn't quickly reproduce "zoom out a bit. this causes the left and right arrows to temporarily appear". All other bugs look fixed
[13:15:27] <wikibugs>	 (03CR) 10Blake: site.pp: add rdb2013 and rdb2014 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[13:15:36] <matthiasmullie>	 ok, moving forward
[13:15:38] <logmsgbot>	 !log mlitn@deploy1003 krinkle, mlitn: Continuing with deployment
[13:15:40] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.depool depool db2189: Test
[13:15:56] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2189: Test
[13:16:51] <wikibugs>	 (03CR) 10Bking: [C:03+2] relforge: remove logstash (gelf) profile [puppet] - 10https://gerrit.wikimedia.org/r/1293809 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[13:16:53] <wikibugs>	 (03PS2) 10Federico Ceratto: sre.mysql.pool: Support depooling unreachable hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1294265 (https://phabricator.wikimedia.org/T427381)
[13:18:20] <wikibugs>	 (03PS3) 10Atsuko: httpd-cas: config option to disable httpd-cas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763)
[13:18:47] <wikibugs>	 (03PS1) 10Atsuko: eventstreams: new vendor modules check-in [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294283 (https://phabricator.wikimedia.org/T348763)
[13:19:16] <wikibugs>	 (03PS6) 10Atsuko: eventstreams: upgrade chart to ingress and idp [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289357 (https://phabricator.wikimedia.org/T348763)
[13:19:28] <wikibugs>	 (03PS5) 10Atsuko: eventstreams: copy eventstreams-internal to dse [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763)
[13:19:39] <wikibugs>	 (03CR) 10Atsuko: "Done" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[13:19:50] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1 C:03+2] R:cache::upload enable TCP Fast Open [puppet] - 10https://gerrit.wikimedia.org/r/1290678 (https://phabricator.wikimedia.org/T415454) (owner: 10Slyngshede)
[13:20:40] <wikibugs>	 06SRE, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: wikikube-worker23[57-74] implementation tracking - https://phabricator.wikimedia.org/T418927#11959538 (10Blake) if i'm ever looking at this task for history, the docs are [[ https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remove_nodes |...
[13:20:47] <wikibugs>	 06SRE, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: wikikube-worker23[57-74] implementation tracking - https://phabricator.wikimedia.org/T418927#11959539 (10Blake) 05In progress→03Resolved
[13:21:31] <logmsgbot>	 !log mlitn@deploy1003 Finished scap sync-world: Backport for [[gerrit:1290781|mmv: Fix missing or stale arrow and counter controls (T426960)]], [[gerrit:1294264|MMV Carousel: Restore click-to-open for carousel thumbnails (T426225)]] (duration: 13m 23s)
[13:21:35] <matthiasmullie>	 phuedx - I'm done, over to you! (and thanks for letting me cut in front!)
[13:21:39] <stashbot>	 T426960: Mediaviewer missing left/right arrows and X/Y counter is out of sync - https://phabricator.wikimedia.org/T426960
[13:21:39] <stashbot>	 T426225: Image Browsing: beta feature for rollout - https://phabricator.wikimedia.org/T426225
[13:21:51] <matthiasmullie>	 mfossati: done!
[13:21:52] <phuedx>	 👍
[13:21:59] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Revert "idp/idp_test: temporarily rollback growthbook(-next) access to nda/wmf" [puppet] - 10https://gerrit.wikimedia.org/r/1293585 (owner: 10Brouberol)
[13:22:04] <mfossati>	 thanks matthiasmullie
[13:22:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294217 (https://phabricator.wikimedia.org/T427092) (owner: 10Phuedx)
[13:25:56] <wikibugs>	 (03CR) 10Atsuko: [V:03+2 C:03+2] "Acknowledged" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1293664 (https://phabricator.wikimedia.org/T412978) (owner: 10JavierMonton)
[13:27:52] <wikibugs>	 (03PS1) 10Brouberol: idp_test: remove deprecated growthbook client_secret [labs/private] - 10https://gerrit.wikimedia.org/r/1294289
[13:28:02] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] idp_test: remove deprecated growthbook client_secret [labs/private] - 10https://gerrit.wikimedia.org/r/1294289 (owner: 10Brouberol)
[13:28:07] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] idp_test: remove deprecated growthbook client_secret [labs/private] - 10https://gerrit.wikimedia.org/r/1294289 (owner: 10Brouberol)
[13:28:22] <wikibugs>	 (03Merged) 10jenkins-bot: ext.wikimediaEvents: Add hoisting error detection test [extensions/WikimediaEvents] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294217 (https://phabricator.wikimedia.org/T427092) (owner: 10Phuedx)
[13:28:50] <logmsgbot>	 !log phuedx@deploy1003 Started scap sync-world: Backport for [[gerrit:1294217|ext.wikimediaEvents: Add hoisting error detection test (T427092)]]
[13:28:55] <stashbot>	 T427092: Run and synthetic A/A test that captures UA to investigate hoisting errors - https://phabricator.wikimedia.org/T427092
[13:30:45] <logmsgbot>	 !log phuedx@deploy1003 phuedx: Backport for [[gerrit:1294217|ext.wikimediaEvents: Add hoisting error detection test (T427092)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:31:20] <wikibugs>	 (03CR) 10Mforns: "Code looks good to me! :-)" [alerts] - 10https://gerrit.wikimedia.org/r/1294113 (https://phabricator.wikimedia.org/T423920) (owner: 10JavierMonton)
[13:31:26] <aude>	 sorry i am late for the backport
[13:31:30] <aude>	 window
[13:31:59] <aude>	 i can do mine whenever everything else is done
[13:33:25] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1010:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:36:08] <phuedx>	 No errors in the console on a couple of different sites 👍
[13:36:13] <logmsgbot>	 !log phuedx@deploy1003 phuedx: Continuing with deployment
[13:38:31] <jinxer-wm>	 FIRING: [13x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:40:15] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.14 point update - https://phabricator.wikimedia.org/T426759#11959614 (10MoritzMuehlenhoff)
[13:40:25] <logmsgbot>	 !log phuedx@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294217|ext.wikimediaEvents: Add hoisting error detection test (T427092)]] (duration: 11m 35s)
[13:40:31] <stashbot>	 T427092: Run and synthetic A/A test that captures UA to investigate hoisting errors - https://phabricator.wikimedia.org/T427092
[13:40:35] <phuedx>	 Tran: Over to you
[13:40:49] <Tran>	 👍 thanks, starting mine
[13:41:20] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by stran@deploy1003 using scap backport" [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294247 (https://phabricator.wikimedia.org/T427358) (owner: 10STran)
[13:41:27] <wikibugs>	 (03CR) 10Mforns: [C:03+2] html-enrichment: relax offset lag monitors [alerts] - 10https://gerrit.wikimedia.org/r/1294113 (https://phabricator.wikimedia.org/T423920) (owner: 10JavierMonton)
[13:43:03] <wikibugs>	 (03Merged) 10jenkins-bot: html-enrichment: relax offset lag monitors [alerts] - 10https://gerrit.wikimedia.org/r/1294113 (https://phabricator.wikimedia.org/T423920) (owner: 10JavierMonton)
[13:43:25] <jinxer-wm>	 FIRING: [22x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:45:03] <wikibugs>	 (03Merged) 10jenkins-bot: Update Direct Reporting email [extensions/ReportIncident] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294247 (https://phabricator.wikimedia.org/T427358) (owner: 10STran)
[13:45:30] <logmsgbot>	 !log stran@deploy1003 Started scap sync-world: Backport for [[gerrit:1294247|Update Direct Reporting email (T427358)]]
[13:45:35] <stashbot>	 T427358: Make direct reporting email subject lines unique enough to avoid VRT ticket threading - https://phabricator.wikimedia.org/T427358
[13:46:26] <wikibugs>	 (03PS1) 10Elukey: profile::kafka::broker: add ACLs in a file [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528)
[13:46:55] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey)
[13:48:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] memcached: Improve absenting support [puppet] - 10https://gerrit.wikimedia.org/r/1294259 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[13:51:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: "modules/profile/manifests/prometheus/ops.pp will need adjusting to pick up the class/define switch" [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[13:51:18] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1192: Migration of db1192.eqiad.wmnet completed
[13:51:19] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[13:51:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: "actually not true, nevermind" [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[13:52:17] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[13:52:24] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[13:52:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] prometheus: memcached_exporter: Improve absentability [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[13:53:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[13:53:25] <jinxer-wm>	 FIRING: [24x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:55:35] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2164: Migration of db2164.codfw.wmnet completed
[13:55:36] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[13:56:17] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host durum5003.eqsin.wmnet with OS trixie
[13:57:21] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] Remove k8s version from all services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1273967 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[13:58:07] <wikibugs>	 (03PS2) 10Elukey: profile::kafka::broker: add ACLs in a file [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528)
[13:58:16] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey)
[13:58:25] <jinxer-wm>	 FIRING: [23x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:59:43] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] ""Yes, that too" - there were two problems, the other one being a chart that isn't rendering (but to my surprise CI doesn't seem to mind). " [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293757 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[14:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1400)
[14:00:40] <wikibugs>	 (03PS1) 10Bking: relforge: Fix cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1294298 (https://phabricator.wikimedia.org/T427306)
[14:00:57] <wikibugs>	 (03Merged) 10jenkins-bot: Remove k8s version from all services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1273967 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[14:02:57] <logmsgbot>	 !log stran@deploy1003 stran: Backport for [[gerrit:1294247|Update Direct Reporting email (T427358)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:02:59] <wikibugs>	 (03PS3) 10Elukey: profile::kafka::broker: add ACLs in a file [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528)
[14:03:03] <stashbot>	 T427358: Make direct reporting email subject lines unique enough to avoid VRT ticket threading - https://phabricator.wikimedia.org/T427358
[14:03:25] <jinxer-wm>	 FIRING: [25x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:04:00] <Tran>	 testing now
[14:05:54] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey)
[14:05:58] <Tran>	 looks good, continuing
[14:06:03] <logmsgbot>	 !log stran@deploy1003 stran: Continuing with deployment
[14:06:06] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[14:06:15] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[14:06:27] <aude>	 I am ready to deploy mine when you are done (no hurry)
[14:06:57] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[14:07:08] <Tran>	 sorry mine had to do a scap rebuild but I'll ping you when mine's done.
[14:07:18] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db1178: Upgrading db1178.eqiad.wmnet
[14:07:57] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1178: Upgrading db1178.eqiad.wmnet
[14:08:06] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.major-upgrade
[14:08:17] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2163: Upgrading db2163.codfw.wmnet
[14:08:36] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2163: Upgrading db2163.codfw.wmnet
[14:09:14] <aude>	 sounds good
[14:09:54] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db1178.eqiad.wmnet with OS trixie
[14:10:54] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.reimage for host db2163.codfw.wmnet with OS trixie
[14:13:58] <wikibugs>	 (03PS4) 10Jelto: miscweb: remove wmf-navigator public and private config from web container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405)
[14:14:39] <wikibugs>	 (03CR) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[14:14:53] <wikibugs>	 (03CR) 10Jelto: miscweb: remove wmf-navigator public and private config from web container (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294208 (https://phabricator.wikimedia.org/T414405) (owner: 10Jelto)
[14:15:38] <wikibugs>	 (03PS3) 10Effie Mouzeli: site.pp: add rdb2013 and rdb2014 [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924)
[14:17:47] <wikibugs>	 (03CR) 10Bking: [C:03+2] relforge: Fix cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1294298 (https://phabricator.wikimedia.org/T427306) (owner: 10Bking)
[14:18:31] <logmsgbot>	 !log stran@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294247|Update Direct Reporting email (T427358)]] (duration: 33m 01s)
[14:18:37] <stashbot>	 T427358: Make direct reporting email subject lines unique enough to avoid VRT ticket threading - https://phabricator.wikimedia.org/T427358
[14:19:13] <Tran>	 aude: I'm done
[14:19:26] <aude>	 thank you!
[14:19:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by aude@deploy1003 using scap backport" [extensions/QuickSurveys] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1290924 (https://phabricator.wikimedia.org/T426457) (owner: 10Aude)
[14:20:06] <aude>	 mine is the QuickSurveys change (wmf3 only) and then a config change
[14:21:27] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] httpd-cas: config option to disable httpd-cas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:22:19] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] eventstreams: new vendor modules check-in [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294283 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:22:26] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
[14:22:42] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] eventstreams: upgrade chart to ingress and idp [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289357 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:22:53] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: host reimage
[14:23:24] <wikibugs>	 (03Merged) 10jenkins-bot: Make logging of title and page ID optional [extensions/QuickSurveys] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1290924 (https://phabricator.wikimedia.org/T426457) (owner: 10Aude)
[14:23:34] <wikibugs>	 (03CR) 10Brouberol: eventstreams: copy eventstreams-internal to dse (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:23:55] <logmsgbot>	 !log aude@deploy1003 Started scap sync-world: Backport for [[gerrit:1290924|Make logging of title and page ID optional (T426457)]]
[14:23:59] <stashbot>	 T426457: QuickSurveys: Make it possible to run surveys without capturing page title - https://phabricator.wikimedia.org/T426457
[14:25:53] <wikibugs>	 (03Merged) 10jenkins-bot: CI: Fix race condition [deployment-charts] - 10https://gerrit.wikimedia.org/r/1293757 (https://phabricator.wikimedia.org/T388969) (owner: 10Kamila Součková)
[14:26:17] <wikibugs>	 (03PS1) 10Bking: relforge: update list of mandatory plugins [puppet] - 10https://gerrit.wikimedia.org/r/1294302 (https://phabricator.wikimedia.org/T427306)
[14:26:25] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[14:26:31] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[14:26:52] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[14:26:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93260 and previous config saved to /var/cache/conftool/dbconfig/20260527-142659-fceratto.json
[14:27:08] <wikibugs>	 (03PS2) 10Bking: relforge: update list of mandatory plugins [puppet] - 10https://gerrit.wikimedia.org/r/1294302 (https://phabricator.wikimedia.org/T427306)
[14:27:39] <logmsgbot>	 !log aude@deploy1003 aude: Backport for [[gerrit:1290924|Make logging of title and page ID optional (T426457)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:28:10] <wikibugs>	 (03CR) 10Bking: [C:03+2] relforge: update list of mandatory plugins [puppet] - 10https://gerrit.wikimedia.org/r/1294302 (https://phabricator.wikimedia.org/T427306) (owner: 10Bking)
[14:28:15] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: host reimage
[14:29:24] <logmsgbot>	 !log aude@deploy1003 aude: Continuing with deployment
[14:29:27] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
[14:30:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1400)
[14:30:05] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1430)
[14:30:30] <wikibugs>	 (03PS4) 10CWilliams: sre.mysql.pool: Add support for downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1289965 (https://phabricator.wikimedia.org/T426318)
[14:30:45] <wikibugs>	 (03PS1) 10Muehlenhoff: mirrors: Disable tails mirror [puppet] - 10https://gerrit.wikimedia.org/r/1294306 (https://phabricator.wikimedia.org/T416707)
[14:33:33] <wikibugs>	 07sre-alert-triage, 06Data-Platform-SRE: Alert in need of triage: PuppetFailure (instance an-test-client1002:9100) - https://phabricator.wikimedia.org/T427399 (10LSobanski) 03NEW
[14:33:45] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
[14:34:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93262 and previous config saved to /var/cache/conftool/dbconfig/20260527-143416-fceratto.json
[14:34:20] <icinga-wm>	 RECOVERY - Host db2189 #page is UP: PING OK - Packet loss = 0%, RTA = 31.60 ms
[14:34:20] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11959812 (10Jhancock.wm) working on it. might reboot a few times.
[14:34:21] <wikibugs>	 (03CR) 10Elukey: profile::kafka::broker: add ACLs in a file (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey)
[14:34:32] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s2 #page on db2189 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:34:32] <icinga-wm>	 PROBLEM - MariaDB Events s2 on db2189 is CRITICAL: CRITICAL - Failed to query events: ERROR 2002 (HY000): Cant connect to local server through socket /run/mysqld/mysqld.sock (2) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Event_Scheduler
[14:34:32] <icinga-wm>	 PROBLEM - mysqld processes on db2189 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[14:34:33] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s2 #page on db2189 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:34:34] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 #page on db2189 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting%23Incident_Response
[14:34:34] <icinga-wm>	 PROBLEM - MariaDB Event Scheduler s2 on db2189 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Event_Scheduler
[14:34:45] <fabfur>	 !ack
[14:34:45] <sirenbot>	 8024 (ACKED)  db2189 (paged)/MariaDB Replica SQL: s2 (paged)
[14:34:46] <sirenbot>	 8025 (ACKED)  db2189 (paged)/MariaDB Replica IO: s2 (paged)
[14:34:46] <sirenbot>	 8026 (ACKED)  db2189 (paged)/MariaDB Replica Lag: s2 (paged)
[14:34:58] <federico3>	 looking
[14:35:03] <icinga-wm>	 PROBLEM - MariaDB read only s2 on db2189 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[14:35:11] <fabfur>	 !incidents
[14:35:12] <sirenbot>	 8024 (ACKED)  db2189 (paged)/MariaDB Replica SQL: s2 (paged)
[14:35:12] <sirenbot>	 8025 (ACKED)  db2189 (paged)/MariaDB Replica IO: s2 (paged)
[14:35:12] <sirenbot>	 8026 (ACKED)  db2189 (paged)/MariaDB Replica Lag: s2 (paged)
[14:35:12] <sirenbot>	 8023 (RESOLVED)  Host db2189 (paged)
[14:35:21] <fabfur>	 thanks federico3 
[14:35:25] <logmsgbot>	 !log aude@deploy1003 Finished scap sync-world: Backport for [[gerrit:1290924|Make logging of title and page ID optional (T426457)]] (duration: 11m 30s)
[14:35:27] <federico3>	 indeed it came up just now
[14:35:29] <moritzm>	 server rebooted
[14:35:30] <stashbot>	 T426457: QuickSurveys: Make it possible to run surveys without capturing page title - https://phabricator.wikimedia.org/T426457
[14:35:50] <godog>	           https://phabricator.wikimedia.org/T427376#11959812 (Jhancock.wm) working on it. might reboot a few times.
[14:36:02] <godog>	 noticed this in the backlog
[14:36:15] <moritzm>	 nothing in system event log
[14:36:17] <moritzm>	 ah
[14:36:31] <icinga-wm>	 RECOVERY - mysqld processes on db2189 is OK: PROCS OK: 1 process with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[14:36:42] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by aude@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1290926 (https://phabricator.wikimedia.org/T426781) (owner: 10Aude)
[14:37:14] <moritzm>	 right, Manuel actually also mentioned it in the earlier handoff
[14:37:46] <wikibugs>	 (03Merged) 10jenkins-bot: Re-enable ReadingLists QuickSurvey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1290926 (https://phabricator.wikimedia.org/T426781) (owner: 10Aude)
[14:38:11] <logmsgbot>	 !log aude@deploy1003 Started scap sync-world: Backport for [[gerrit:1290926|Re-enable ReadingLists QuickSurvey (T426781)]]
[14:38:16] <stashbot>	 T426781: Re-enable ReadingLists QuickSurvey - https://phabricator.wikimedia.org/T426781
[14:38:25] <jinxer-wm>	 FIRING: [24x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:38:57] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 99 days, 0:00:00 on db2189.codfw.wmnet with reason: crashed T427376
[14:39:01] <stashbot>	 T427376: db2189 crashed - https://phabricator.wikimedia.org/T427376
[14:39:05] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11959868 (10FCeratto-WMF) (added a long downtime just in case)
[14:40:03] <logmsgbot>	 !log aude@deploy1003 aude: Backport for [[gerrit:1290926|Re-enable ReadingLists QuickSurvey (T426781)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:40:14] <wikibugs>	 (03PS1) 10Bking: relforge: disable the disabling of security plugin [puppet] - 10https://gerrit.wikimedia.org/r/1294308 (https://phabricator.wikimedia.org/T427306)
[14:40:47] <wikibugs>	 (03CR) 10Scott French: "Thanks for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/1293789 (https://phabricator.wikimedia.org/T427312) (owner: 10Scott French)
[14:40:50] <wikibugs>	 (03CR) 10Scott French: [C:03+2] aptrepo: add component/php83 to bookworm-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/1293789 (https://phabricator.wikimedia.org/T427312) (owner: 10Scott French)
[14:42:29] <wikibugs>	 (03CR) 10Scott French: [C:03+2] package_builder: Use @distribution in the D04php hook (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1293790 (https://phabricator.wikimedia.org/T427312) (owner: 10Scott French)
[14:42:34] <logmsgbot>	 !log aude@deploy1003 aude: Continuing with deployment
[14:43:25] <jinxer-wm>	 FIRING: [24x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:43:55] <wikibugs>	 (03CR) 10Atsuko: eventstreams: copy eventstreams-internal to dse (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:44:10] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] httpd-cas: config option to disable httpd-cas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:44:17] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] eventstreams: new vendor modules check-in [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294283 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:44:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P93263 and previous config saved to /var/cache/conftool/dbconfig/20260527-144423-fceratto.json
[14:44:24] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] eventstreams: upgrade chart to ingress and idp [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289357 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:44:31] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1178.eqiad.wmnet with OS trixie
[14:44:42] <wikibugs>	 (03CR) 10Bking: [C:03+2] relforge: disable the disabling of security plugin [puppet] - 10https://gerrit.wikimedia.org/r/1294308 (https://phabricator.wikimedia.org/T427306) (owner: 10Bking)
[14:44:45] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] eventstreams: copy eventstreams-internal to dse (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:45:14] <wikibugs>	 (03PS4) 10Elukey: profile::kafka::broker: add ACLs in a file [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528)
[14:45:16] <wikibugs>	 (03CR) 10BBlack: "Maybe this is better discussed either back in the phab task or in some other forum, because things get complicated.  But to address the th" [puppet] - 10https://gerrit.wikimedia.org/r/1282428 (https://phabricator.wikimedia.org/T425441) (owner: 10Dzahn)
[14:46:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1282286 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede)
[14:46:13] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1294219 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[14:46:24] <wikibugs>	 (03Merged) 10jenkins-bot: httpd-cas: config option to disable httpd-cas [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294257 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:46:43] <logmsgbot>	 !log aude@deploy1003 Finished scap sync-world: Backport for [[gerrit:1290926|Re-enable ReadingLists QuickSurvey (T426781)]] (duration: 08m 32s)
[14:46:48] <stashbot>	 T426781: Re-enable ReadingLists QuickSurvey - https://phabricator.wikimedia.org/T426781
[14:46:57] <wikibugs>	 (03PS5) 10CWilliams: sre.mysql.pool: Add support for downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1289965 (https://phabricator.wikimedia.org/T426318)
[14:47:00] <wikibugs>	 (03CR) 10Elukey: profile::kafka::broker: add ACLs in a file (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294294 (https://phabricator.wikimedia.org/T425528) (owner: 10Elukey)
[14:47:03] <wikibugs>	 (03Merged) 10jenkins-bot: eventstreams: new vendor modules check-in [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294283 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:47:05] <wikibugs>	 (03Merged) 10jenkins-bot: eventstreams: upgrade chart to ingress and idp [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289357 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:50:53] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/eventstreams-internal: apply
[14:51:09] <logmsgbot>	 !log atsuko@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/eventstreams-internal: apply
[14:51:36] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2163.codfw.wmnet with OS trixie
[14:52:06] <wikibugs>	 (03CR) 10Majavah: [C:03+2] Replace role::mariadb::ferm with profile::mariadb::firewall [puppet] - 10https://gerrit.wikimedia.org/r/1292033 (https://phabricator.wikimedia.org/T411089) (owner: 10JHathaway)
[14:53:25] <jinxer-wm>	 FIRING: [24x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:54:01] <logmsgbot>	 !log cwilliams@cumin1003 END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
[14:54:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P93264 and previous config saved to /var/cache/conftool/dbconfig/20260527-145430-fceratto.json
[14:55:42] <wikibugs>	 (03PS6) 10Atsuko: eventstreams: copy eventstreams-internal to dse [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763)
[14:55:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on relforge1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[14:56:28] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-wmf-elasticsearch-exporter-9400.service crashloop on relforge1008:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:58:02] <wikibugs>	 (03CR) 10JavierMonton: stream: webrequest.page_view (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1290687 (https://phabricator.wikimedia.org/T426092) (owner: 10JavierMonton)
[14:58:02] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] eventstreams: copy eventstreams-internal to dse [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[14:58:07] <wikibugs>	 (03PS1) 10Effie Mouzeli: ratelimite: update homepage [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294314 (https://phabricator.wikimedia.org/T426951)
[14:58:10] <wikibugs>	 (03PS1) 10Trueg: Add wdqs namespace for the new deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294315 (https://phabricator.wikimedia.org/T425007)
[14:58:25] <jinxer-wm>	 FIRING: [15x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:59:46] <wikibugs>	 (03PS20) 10Majavah: firewall: Declare resources for both providers [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089)
[14:59:47] <wikibugs>	 (03PS20) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089)
[14:59:49] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2163: Migration of db2163.codfw.wmnet completed
[15:00:04] <wikibugs>	 (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[15:00:09] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] ratelimite: update homepage [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294314 (https://phabricator.wikimedia.org/T426951) (owner: 10Effie Mouzeli)
[15:00:10] <wikibugs>	 (03PS1) 10Brouberol: data-platform: add alert on growthbook seat usage [alerts] - 10https://gerrit.wikimedia.org/r/1294316 (https://phabricator.wikimedia.org/T420694)
[15:00:20] <wikibugs>	 (03Merged) 10jenkins-bot: eventstreams: copy eventstreams-internal to dse [deployment-charts] - 10https://gerrit.wikimedia.org/r/1289979 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:00:30] <wikibugs>	 (03PS1) 10JavierMonton: stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294318 (https://phabricator.wikimedia.org/T425624)
[15:02:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] data-platform: add alert on growthbook seat usage [alerts] - 10https://gerrit.wikimedia.org/r/1294316 (https://phabricator.wikimedia.org/T420694) (owner: 10Brouberol)
[15:03:19] <wikibugs>	 (03PS2) 10Brouberol: data-platform: add alert on growthbook seat usage [alerts] - 10https://gerrit.wikimedia.org/r/1294316 (https://phabricator.wikimedia.org/T420694)
[15:03:25] <jinxer-wm>	 FIRING: [15x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:04:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T426633)', diff saved to https://phabricator.wikimedia.org/P93267 and previous config saved to /var/cache/conftool/dbconfig/20260527-150438-fceratto.json
[15:05:01] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
[15:05:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1220 (T426633)', diff saved to https://phabricator.wikimedia.org/P93268 and previous config saved to /var/cache/conftool/dbconfig/20260527-150508-fceratto.json
[15:05:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on relforge1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:06:26] <wikibugs>	 (03CR) 10TChin: [C:03+1] stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294318 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton)
[15:07:04] <wikibugs>	 (03CR) 10JavierMonton: [C:03+2] stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294318 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton)
[15:08:03] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:08:09] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:08:29] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:08:33] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:09:05] <cdanis>	 !log 💔cdanis@apt1002.wikimedia.org ~ 🕚☕ sudo -i reprepro --component main --restrict cidergrinder update trixie-wikimedia
[15:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:17] <wikibugs>	 (03Merged) 10jenkins-bot: stream: webrequest-page-view [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294318 (https://phabricator.wikimedia.org/T425624) (owner: 10JavierMonton)
[15:09:38] <wikibugs>	 (03PS1) 10Jdlrobson: Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237)
[15:09:55] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:10:00] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:10:27] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:10:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2013.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:10:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2013.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:10:59] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:11:05] <logmsgbot>	 !log cwilliams@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Icinga wait failed during run
[15:11:14] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:11:34] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:11:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:12:18] <jinxer-wm>	 FIRING: [2x] PuppetFailure: Puppet has failed on relforge1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:12:23] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:12:41] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:13:20] <cdanis>	 !log 💙cdanis@cp5026.eqsin.wmnet ~ 🕚☕ sudo apt install lua5.4-ciderbloom lua5.4-ciderbloom-dbgsym
[15:13:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:14:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:15:11] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] memcached: Improve absenting support [puppet] - 10https://gerrit.wikimedia.org/r/1294259 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[15:16:19] <wikibugs>	 (03PS3) 10JavierMonton: stream: webrequest.page_view [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1290687 (https://phabricator.wikimedia.org/T426092)
[15:16:44] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] prometheus: memcached_exporter: Improve absentability [puppet] - 10https://gerrit.wikimedia.org/r/1294260 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[15:16:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:16:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:17:52] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:openstack: cloudweb: Absent memcached and mcrouter services [puppet] - 10https://gerrit.wikimedia.org/r/1294261 (https://phabricator.wikimedia.org/T427189) (owner: 10Majavah)
[15:18:03] <wikibugs>	 (03CR) 10Hashar: jenkins: add firewall rule for new jenkins to gearman on legacy host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1275537 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn)
[15:19:23] <cdanis>	 !log 💙cdanis@cp4047.ulsfo.wmnet ~ 🕦☕ sudo apt install lua5.4-ciderbloom lua5.4-ciderbloom-dbgsym
[15:19:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:53] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS trixie
[15:19:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2012.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:19:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2012.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:20:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:21:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:22:14] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1289965 (https://phabricator.wikimedia.org/T426318) (owner: 10CWilliams)
[15:22:18] <jinxer-wm>	 RESOLVED: [2x] PuppetFailure: Puppet has failed on relforge1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:22:34] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.remove-downtime for db1178.eqiad.wmnet
[15:22:35] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1178.eqiad.wmnet
[15:23:01] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11960059 (10Jhancock.wm) @FCeratto-WMF  okay the error code we got was inconclusive. it could mean a lot of things including just out of date firmware. I've updated the bios and the idrac. I do see a cpu machine...
[15:23:11] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: db2189 crashed - https://phabricator.wikimedia.org/T427376#11960062 (10Jhancock.wm) a:03Jhancock.wm
[15:23:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:24:07] <wikibugs>	 (03CR) 10Bking: [C:03+1] data-platform: add alert on growthbook seat usage [alerts] - 10https://gerrit.wikimedia.org/r/1294316 (https://phabricator.wikimedia.org/T420694) (owner: 10Brouberol)
[15:24:41] <wikibugs>	 (03PS2) 10Herron: alertmanager: add ml-task webhook [puppet] - 10https://gerrit.wikimedia.org/r/1294323
[15:24:57] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db1178: Recovering from failure in cookbook
[15:24:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2007.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2015.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:24:58] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2008.codfw.wmnet, wdqs2013.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:25:12] <icinga-wm>	 PROBLEM - Memcached on cloudweb2002-dev is CRITICAL: connect to address 208.80.153.41 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[15:25:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:25:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:26:00] <wikibugs>	 (03CR) 10TChin: [C:03+1] stream: webrequest.page_view [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1290687 (https://phabricator.wikimedia.org/T426092) (owner: 10JavierMonton)
[15:26:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[15:28:19] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] data-platform: add alert on growthbook seat usage [alerts] - 10https://gerrit.wikimedia.org/r/1294316 (https://phabricator.wikimedia.org/T420694) (owner: 10Brouberol)
[15:28:49] <wikibugs>	 (03PS1) 10Atsuko: Provision stream-internal.w.o [dns] - 10https://gerrit.wikimedia.org/r/1294326 (https://phabricator.wikimedia.org/T348763)
[15:29:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Provision stream-internal.w.o [dns] - 10https://gerrit.wikimedia.org/r/1294326 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:30:11] <wikibugs>	 (03PS1) 10Atsuko: trafficserver: enable stream-internal.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763)
[15:30:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] trafficserver: enable stream-internal.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:30:49] <wikibugs>	 (03CR) 10Brouberol: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1294326 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:32:22] <logmsgbot>	 !log cwilliams@cumin1003 END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2163: Migration of db2163.codfw.wmnet completed
[15:32:49] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2163: Migration of db2163.codfw.wmnet completed
[15:33:00] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2163: Migration of db2163.codfw.wmnet completed
[15:33:01] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
[15:33:10] <icinga-wm>	 PROBLEM - Memcached on cloudweb1003 is CRITICAL: connect to address 208.80.154.150 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[15:33:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:34:23] <wikibugs>	 (03PS2) 10Atsuko: trafficserver: enable stream-internal.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763)
[15:36:04] <wikibugs>	 (03CR) 10Atsuko: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:37:21] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Provision stream-internal.w.o [dns] - 10https://gerrit.wikimedia.org/r/1294326 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:37:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] trafficserver: enable stream-internal.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[15:38:07] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - labweb-ssl_7443: Servers cloudweb1004.wikimedia.org are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:38:07] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - labweb-ssl_7443: Servers cloudweb1004.wikimedia.org are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:38:11] <icinga-wm>	 PROBLEM - Memcached on cloudweb1004 is CRITICAL: connect to address 208.80.155.117 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[15:38:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:39:08] <wikibugs>	 (03PS1) 10Majavah: Revert "P:openstack: cloudweb: Absent memcached and mcrouter services" [puppet] - 10https://gerrit.wikimedia.org/r/1294333
[15:39:46] <wikibugs>	 (03PS2) 10Majavah: Revert "P:openstack: cloudweb: Absent memcached and mcrouter services" [puppet] - 10https://gerrit.wikimedia.org/r/1294333
[15:40:12] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1220 (T426633)', diff saved to https://phabricator.wikimedia.org/P93274 and previous config saved to /var/cache/conftool/dbconfig/20260527-154011-fceratto.json
[15:40:22] <wikibugs>	 (03CR) 10Elukey: [C:03+1] alertmanager: add ml-task webhook [puppet] - 10https://gerrit.wikimedia.org/r/1294323 (owner: 10Herron)
[15:40:38] <wikibugs>	 (03CR) 10Majavah: [V:03+2 C:03+2] Revert "P:openstack: cloudweb: Absent memcached and mcrouter services" [puppet] - 10https://gerrit.wikimedia.org/r/1294333 (owner: 10Majavah)
[15:41:06] <wikibugs>	 (03CR) 10Clément Goubert: site.pp: add rdb2013 and rdb2014 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294271 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[15:41:19] <wikibugs>	 (03CR) 10Herron: [C:03+2] alertmanager: add ml-task webhook [puppet] - 10https://gerrit.wikimedia.org/r/1294323 (owner: 10Herron)
[15:43:10] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:43:10] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:43:54] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:44:04] <wikibugs>	 10ops-codfw, 06DC-Ops: Power Supply - Status - issue on aqs2011:9290 - https://phabricator.wikimedia.org/T427409 (10phaultfinder) 03NEW
[15:44:05] <wikibugs>	 10ops-codfw, 06DC-Ops: Power Supply - Status - issue on mc-gp2005:9290 - https://phabricator.wikimedia.org/T427410 (10phaultfinder) 03NEW
[15:44:06] <wikibugs>	 10ops-eqiad, 06DC-Ops: Power Supply - PS Redundancy - issue on dbproxy1024:9290 - https://phabricator.wikimedia.org/T427408 (10phaultfinder) 03NEW
[15:45:09] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:45:17] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/webrequest-page-view-next: apply
[15:45:29] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:45:52] <wikibugs>	 (03CR) 10CWilliams: [C:03+2] sre.mysql.pool: Add support for downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1289965 (https://phabricator.wikimedia.org/T426318) (owner: 10CWilliams)
[15:46:41] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] Revert "site: Set lvs1017 to insetup_noferm" [puppet] - 10https://gerrit.wikimedia.org/r/1286517 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[15:46:44] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] Add lvs1017 to high-traffic1 [puppet] - 10https://gerrit.wikimedia.org/r/1286522 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[15:49:36] <wikibugs>	 (03PS5) 10Andrew Bogott: designate: remove leftover mcrouter code [puppet] - 10https://gerrit.wikimedia.org/r/1278528 (https://phabricator.wikimedia.org/T422646)
[15:50:16] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#11960218 (10Dzahn) We have already established a zuul naming pattern for existing VMs and "1-3" are in use.  Please use **zuul1004/zuul2004** and counting up from...
[15:50:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P93276 and previous config saved to /var/cache/conftool/dbconfig/20260527-155019-fceratto.json
[15:50:51] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.depool depool db2163: Testing cookbook
[15:51:11] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2163: Testing cookbook
[15:51:18] <wikibugs>	 (03Merged) 10jenkins-bot: sre.mysql.pool: Add support for downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1289965 (https://phabricator.wikimedia.org/T426318) (owner: 10CWilliams)
[15:52:19] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp6016.drmrs.wmnet,cp[1112,1114].eqiad.wmnet,cp[5024,5031-5032].eqsin.wmnet} and A:cp
[15:52:23] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.mysql.pool pool db2163: Repooling after testing patch
[15:52:49] <wikibugs>	 (03CR) 10Clément Goubert: "This and I011084cdc1fc4e850b74e28de0b5e52d5ee32175 should be done fairly close together (redioscope uses data from the rest-gateway rate l" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294276 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[15:52:58] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
[15:53:03] <logmsgbot>	 !log cwilliams@cumin1003 START - Cookbook sre.hosts.remove-downtime for db2163.codfw.wmnet
[15:53:03] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2163.codfw.wmnet
[15:53:19] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] "Needs to be done fairly close to Iadd2b5525978ce8726c0ecb3aec5b484efb1b639 as redioscope uses the redis data from the rest-gateway ratelim" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294275 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[15:53:38] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11960227 (10BCornwall) 05Open→03In progress
[15:53:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: re-rack mc2055 (before Jun 9th) - https://phabricator.wikimedia.org/T427373#11960232 (10Jhancock.wm) @jijiki we can rerack this in A3 no problem. I will be around most days from 1700 UTC to 2100 UTC. The days that work best f...
[15:55:14] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp6012 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[15:59:17] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
[16:00:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P93280 and previous config saved to /var/cache/conftool/dbconfig/20260527-160027-fceratto.json
[16:01:39] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11960266 (10cmooney) That looks good to me @papaul good stuff.  If we use vlan IDs 512/522 I guess the plan would be to change the vlan i...
[16:01:59] <wikibugs>	 (03PS1) 10Ottomata: EventStreams - Expose mediawiki.page_outlink_topic_prediction_change.v1 stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294341 (https://phabricator.wikimedia.org/T427416)
[16:02:35] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11960280 (10BCornwall)
[16:03:16] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp6016.drmrs.wmnet
[16:03:30] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11960281 (10BCornwall)
[16:03:42] <wikibugs>	 (03PS1) 10Sbisson: Allow disabling experiment for experienced editors (>=100 edits) [extensions/ArticleGuidance] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294342 (https://phabricator.wikimedia.org/T426871)
[16:04:17] <wikibugs>	 (03PS1) 10Sbisson: Allow disabling experiment for experienced editors (>=100 edits) [extensions/ArticleGuidance] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294343 (https://phabricator.wikimedia.org/T426871)
[16:04:27] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp6015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[16:04:30] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] tests/integration: readability improvements [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1293147 (https://phabricator.wikimedia.org/T385798) (owner: 10Hnowlan)
[16:04:35] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM! Thank you for the patch!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294341 (https://phabricator.wikimedia.org/T427416) (owner: 10Ottomata)
[16:04:56] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] EventStreams - Expose mediawiki.page_outlink_topic_prediction_change.v1 stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294341 (https://phabricator.wikimedia.org/T427416) (owner: 10Ottomata)
[16:05:06] <logmsgbot>	 !log sukhe@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum5003.eqsin.wmnet with OS trixie
[16:06:55] <wikibugs>	 (03PS1) 10Sbisson: frwiki: restrict Article Guidance experiment to junior editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294344 (https://phabricator.wikimedia.org/T426871)
[16:07:01] <wikibugs>	 (03Merged) 10jenkins-bot: EventStreams - Expose mediawiki.page_outlink_topic_prediction_change.v1 stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294341 (https://phabricator.wikimedia.org/T427416) (owner: 10Ottomata)
[16:07:59] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [extensions/ArticleGuidance] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294342 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[16:08:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [extensions/ArticleGuidance] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294343 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[16:08:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294344 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[16:08:54] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:09:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:10:16] <logmsgbot>	 !log otto@deploy1003 helmfile [staging] START helmfile.d/services/eventstreams: apply
[16:10:25] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1178: Recovering from failure in cookbook
[16:10:26] <logmsgbot>	 !log otto@deploy1003 helmfile [staging] DONE helmfile.d/services/eventstreams: apply
[16:10:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1220 (T426633)', diff saved to https://phabricator.wikimedia.org/P93283 and previous config saved to /var/cache/conftool/dbconfig/20260527-161034-fceratto.json
[16:10:55] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
[16:11:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1224 (T426633)', diff saved to https://phabricator.wikimedia.org/P93284 and previous config saved to /var/cache/conftool/dbconfig/20260527-161101-fceratto.json
[16:11:43] <wikibugs>	 (03CR) 10Ladsgroup: "Do you want me to leave it for Ceri to try or you're okay with me moving forward? 😄" [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[16:12:02] <wikibugs>	 (03Merged) 10jenkins-bot: tests/integration: readability improvements [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1293147 (https://phabricator.wikimedia.org/T385798) (owner: 10Hnowlan)
[16:12:44] <logmsgbot>	 !log otto@deploy1003 helmfile [eqiad] START helmfile.d/services/eventstreams: apply
[16:13:15] <logmsgbot>	 !log otto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
[16:13:21] <logmsgbot>	 !log otto@deploy1003 helmfile [codfw] START helmfile.d/services/eventstreams: apply
[16:14:05] <logmsgbot>	 !log otto@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
[16:15:34] <wikibugs>	 10ops-codfw, 06DC-Ops: Power Supply - Status - issue on aqs2011:9290 - https://phabricator.wikimedia.org/T427409#11960354 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm bad cpu. replaced
[16:16:42] <wikibugs>	 (03PS3) 10BCornwall: Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421)
[16:16:42] <wikibugs>	 (03PS4) 10BCornwall: Remove lvs1016 hieradata, demote to insetup_noferm [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421)
[16:16:42] <wikibugs>	 (03PS1) 10BCornwall: lvs: Set lvs1017 interface name [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421)
[16:17:02] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#11960361 (10Dzahn) Are these machines supposed to replace the main zuul VMs zuul1001/2001?  I am missing the context a bit how we got to physical hardware being a...
[16:17:54] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T426633)', diff saved to https://phabricator.wikimedia.org/P93285 and previous config saved to /var/cache/conftool/dbconfig/20260527-161753-fceratto.json
[16:20:02] <wikibugs>	 (03PS1) 10Dzahn: site: add zuul[12]00[4-9] with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1294347 (https://phabricator.wikimedia.org/T427353)
[16:20:36] <wikibugs>	 10SRE-Access-Requests: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421 (10ArielGlenn) 03NEW
[16:20:42] <wikibugs>	 (03CR) 10CI reject: [V:04-1] site: add zuul[12]00[4-9] with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1294347 (https://phabricator.wikimedia.org/T427353) (owner: 10Dzahn)
[16:21:11] <swfrench-wmf>	 jouncebot: nowandnext
[16:21:11] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 8 minute(s)
[16:21:11] <jouncebot>	 In 0 hour(s) and 8 minute(s): MediaWiki infrastructure (UTC late) (extended edition) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1630)
[16:21:14] <wikibugs>	 10SRE-Access-Requests: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421#11960374 (10ArielGlenn)
[16:21:33] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1017 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 (Connection refused) https://wikitech.wikimedia.org/wiki/PyBal
[16:21:50] <wikibugs>	 (03CR) 10ArielGlenn: "Will https://phabricator.wikimedia.org/T427421 suffice?" [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T423255) (owner: 10ArielGlenn)
[16:22:05] <wikibugs>	 10ops-eqiad, 06DC-Ops: Power Supply - PS Redundancy - issue on dbproxy1024:9290 - https://phabricator.wikimedia.org/T427408#11960378 (10Jclark-ctr) a:03Jclark-ctr
[16:24:15] <wikibugs>	 (03PS1) 10Dzahn: installserver: update partman for mixed VM/physical zuul machines [puppet] - 10https://gerrit.wikimedia.org/r/1294348 (https://phabricator.wikimedia.org/T427353)
[16:24:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:26:33] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1017 is CRITICAL: CRITICAL: 0 connections established with conf1007.eqiad.wmnet:4001 (min=12) https://wikitech.wikimedia.org/wiki/PyBal
[16:27:36] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops, 13Patch-For-Review: Repurpose ganeti102[3456] for Zuul migration - https://phabricator.wikimedia.org/T427353#11960386 (10Dzahn) @thcipriani @dduvall Was this requested by you because the existing VMs are too limited?  Is the idea to replace (just) t...
[16:27:39] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, May 28 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1270986 (https://phabricator.wikimedia.org/T413331) (owner: 10Robertsky)
[16:28:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P93287 and previous config saved to /var/cache/conftool/dbconfig/20260527-162800-fceratto.json
[16:28:42] <wikibugs>	 10ops-codfw, 06DC-Ops: Power Supply - Status - issue on mc-gp2005:9290 - https://phabricator.wikimedia.org/T427410#11960388 (10Jhancock.wm) PSU is bad. don't have an easy replacement. opened a ticket with dell.
[16:29:26] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] add new members of mw release working group to releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T423255) (owner: 10ArielGlenn)
[16:30:05] <jouncebot>	 swfrench-wmf: OwO what's this, a deployment window?? MediaWiki infrastructure (UTC late) (extended edition). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1630). nyaa~
[16:30:10] <swfrench-wmf>	 o/
[16:30:17] <wikibugs>	 (03PS2) 10Scott French: profile::services_proxy::envoy: Add non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293771
[16:30:19] <wikibugs>	 (03PS2) 10Scott French: profile::services_proxy::envoy: Enable non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293772
[16:30:20] <wikibugs>	 (03PS2) 10Scott French: ProductionServices: Temporarily use shellbox in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293774
[16:30:21] <wikibugs>	 (03PS2) 10Scott French: ProductionServices: Temporarily use shellbox in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293775
[16:30:21] <wikibugs>	 (03PS2) 10Scott French: ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776
[16:31:45] <swfrench-wmf>	 we'll be starting some maintenance shortly. please do not start any new MediaWiki deployments.
[16:33:05] <wikibugs>	 (03PS3) 10Dzahn: jenkins: add firewall rule for new jenkins to gearman on legacy host [puppet] - 10https://gerrit.wikimedia.org/r/1275537 (https://phabricator.wikimedia.org/T418521)
[16:33:06] <wikibugs>	 (03CR) 10Dzahn: jenkins: add firewall rule for new jenkins to gearman on legacy host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1275537 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn)
[16:33:34] <wikibugs>	 (03CR) 10CDanis: [C:03+1] profile::services_proxy::envoy: Add non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293771 (owner: 10Scott French)
[16:33:46] <wikibugs>	 (03CR) 10CDanis: [C:03+1] profile::services_proxy::envoy: Enable non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293772 (owner: 10Scott French)
[16:33:54] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:35:41] <wikibugs>	 (03CR) 10Scott French: [C:03+2] profile::services_proxy::envoy: Add non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293771 (owner: 10Scott French)
[16:35:45] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] jenkins: add firewall rule for new jenkins to gearman on legacy host [puppet] - 10https://gerrit.wikimedia.org/r/1275537 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn)
[16:35:47] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11960461 (10Papaul) @cmooney yes we will change the VLAN-id and rename the VLAN for rack 0603 during the switch migration. so it will be...
[16:35:54] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:35:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Power Supply - PS Redundancy - issue on dbproxy1024:9290 - https://phabricator.wikimedia.org/T427408#11960466 (10Jclark-ctr) 05Open→03Resolved
[16:36:14] <wikibugs>	 (03CR) 10Scott French: [C:03+2] profile::services_proxy::envoy: Enable non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1293772 (owner: 10Scott French)
[16:36:37] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11960468 (10Papaul)
[16:37:51] <logmsgbot>	 !log cwilliams@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2163: Repooling after testing patch
[16:38:09] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P93290 and previous config saved to /var/cache/conftool/dbconfig/20260527-163808-fceratto.json
[16:40:25] <wikibugs>	 (03PS3) 10Dzahn: add new members of mw release working group to releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T427421) (owner: 10ArielGlenn)
[16:40:33] <icinga-wm>	 PROBLEM - Check unit status of ipip-multiqueue-optimizer on lvs1017 is CRITICAL: CRITICAL: Status of the systemd unit ipip-multiqueue-optimizer https://wikitech.wikimedia.org/wiki/LVS%23IPIP_encapsulation_experiments
[16:41:48] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421#11960511 (10Dzahn) Since the group approver has already +1ed and all users are existing shell users there is nothing to be done besides mergin...
[16:41:59] <logmsgbot>	 !log brett@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: Setting up
[16:43:16] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp1112.eqiad.wmnet
[16:43:58] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421#11960525 (10Dzahn) 05Open→03In progress
[16:44:11] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] add new members of mw release working group to releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T427421) (owner: 10ArielGlenn)
[16:45:25] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: db2212 failed to reboot - https://phabricator.wikimedia.org/T427388#11960531 (10Jhancock.wm) it halted in the boot and i had to pull the power entirely to get it to reboot and make it past post. There still isn't anything new in the event logs. Can I update the firmware...
[16:46:09] <wikibugs>	 (03CR) 10Dzahn: [C:04-2] "syntax error" [puppet] - 10https://gerrit.wikimedia.org/r/1294347 (https://phabricator.wikimedia.org/T427353) (owner: 10Dzahn)
[16:46:12] <wikibugs>	 (03CR) 10CWilliams: "@Ladsgroup@gmail.com I am out until next week, so if it can wait until then I can give it a go, else proceed if it is blocking you" [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[16:46:30] <wikibugs>	 (03PS2) 10Dzahn: installserver: update partman for mixed VM/physical zuul machines [puppet] - 10https://gerrit.wikimedia.org/r/1294348 (https://phabricator.wikimedia.org/T427353)
[16:47:43] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "[releases1003:~] $ id ariel" [puppet] - 10https://gerrit.wikimedia.org/r/1293769 (https://phabricator.wikimedia.org/T427421) (owner: 10ArielGlenn)
[16:47:49] <wikibugs>	 (03PS3) 10Federico Ceratto: sre.mysql.pool: Support depooling unreachable hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1294265 (https://phabricator.wikimedia.org/T427381)
[16:48:17] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T426633)', diff saved to https://phabricator.wikimedia.org/P93291 and previous config saved to /var/cache/conftool/dbconfig/20260527-164815-fceratto.json
[16:48:29] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] ratelimit: replace rdb2009 with rdb2013 #1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294274 (https://phabricator.wikimedia.org/T418924) (owner: 10Effie Mouzeli)
[16:48:38] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1264.eqiad.wmnet with reason: Maintenance
[16:48:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1264 (T426633)', diff saved to https://phabricator.wikimedia.org/P93292 and previous config saved to /var/cache/conftool/dbconfig/20260527-164846-fceratto.json
[16:49:06] <wikibugs>	 (03PS16) 10FNegri: sre.mysql.multiinstance_reboot: new cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1290806 (https://phabricator.wikimedia.org/T420203)
[16:49:53] <wikibugs>	 (03CR) 10FNegri: "I refactored this to be a separate cookbook, as we discussed in I123f2c5c8a9aa3f52c5a29ed4d600b80781e46dc." [cookbooks] - 10https://gerrit.wikimedia.org/r/1290806 (https://phabricator.wikimedia.org/T420203) (owner: 10FNegri)
[16:50:06] <wikibugs>	 (03CR) 10CWilliams: "Fine with me" [cookbooks] - 10https://gerrit.wikimedia.org/r/1291993 (https://phabricator.wikimedia.org/T420203) (owner: 10Federico Ceratto)
[16:50:26] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] Provision stream-internal.w.o [dns] - 10https://gerrit.wikimedia.org/r/1294326 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[16:50:49] <wikibugs>	 (03PS3) 10Scott French: ProductionServices: Temporarily use shellbox in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293775
[16:50:49] <wikibugs>	 (03PS3) 10Scott French: ProductionServices: Temporarily use shellbox in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293774
[16:50:49] <wikibugs>	 (03PS3) 10Scott French: ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776
[16:51:25] <logmsgbot>	 !log atsuko@dns1004 START - running authdns-update
[16:51:26] <wikibugs>	 (03CR) 10CDanis: [C:03+1] ProductionServices: Temporarily use shellbox in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293775 (owner: 10Scott French)
[16:51:31] <wikibugs>	 (03PS2) 10Arlolra: Deploy PRV to 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293805 (https://phabricator.wikimedia.org/T427331)
[16:52:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by swfrench@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293775 (owner: 10Scott French)
[16:53:24] <logmsgbot>	 !log atsuko@dns1004 END - running authdns-update
[16:53:49] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421#11960573 (10Dzahn) Users have been created / added to the group on `releases1003/releases2003`.  The currently active server is `releases2003`...
[16:54:07] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to releasers-mediawiki for matmarex, ariel, jgiannelos - https://phabricator.wikimedia.org/T427421#11960575 (10Dzahn) 05In progress→03Resolved a:03Dzahn
[16:56:00] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices: Temporarily use shellbox in eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293775 (owner: 10Scott French)
[16:56:25] <logmsgbot>	 !log swfrench@deploy1003 Started scap sync-world: Backport for [[gerrit:1293775|ProductionServices: Temporarily use shellbox in eqiad]]
[16:58:04] <wikibugs>	 (03CR) 10Atsuko: [C:03+2] trafficserver: enable stream-internal.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1294327 (https://phabricator.wikimedia.org/T348763) (owner: 10Atsuko)
[16:58:30] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Backport for [[gerrit:1293775|ProductionServices: Temporarily use shellbox in eqiad]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:00:50] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Continuing with deployment
[17:02:02] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] installserver: update partman for mixed VM/physical zuul machines [puppet] - 10https://gerrit.wikimedia.org/r/1294348 (https://phabricator.wikimedia.org/T427353) (owner: 10Dzahn)
[17:04:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:05:09] <logmsgbot>	 !log swfrench@deploy1003 Finished scap sync-world: Backport for [[gerrit:1293775|ProductionServices: Temporarily use shellbox in eqiad]] (duration: 08m 44s)
[17:06:23] <wikibugs>	 (03PS2) 10BCornwall: lvs: Set lvs1017 interface overrides [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421)
[17:06:23] <wikibugs>	 (03PS4) 10BCornwall: Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421)
[17:06:23] <wikibugs>	 (03PS5) 10BCornwall: Remove lvs1016 hieradata, demote to insetup_noferm [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421)
[17:09:25] <jinxer-wm>	 FIRING: [9x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:09:58] <wikibugs>	 (03CR) 10Ssingh: lvs: Set lvs1017 interface overrides (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[17:10:20] <wikibugs>	 (03PS3) 10BCornwall: lvs: Set lvs1017 interface overrides [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421)
[17:10:20] <wikibugs>	 (03PS5) 10BCornwall: Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421)
[17:10:20] <wikibugs>	 (03PS6) 10BCornwall: Remove lvs1016 hieradata, demote to insetup_noferm [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421)
[17:10:34] <wikibugs>	 (03CR) 10BCornwall: lvs: Set lvs1017 interface overrides (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[17:10:36] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] lvs: Set lvs1017 interface overrides [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[17:11:03] <wikibugs>	 (03PS4) 10BCornwall: lvs: Set lvs1017 interface overrides [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421)
[17:11:39] <wikibugs>	 (03CR) 10CWilliams: sre.mysql.pool: Support depooling unreachable hosts (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1294265 (https://phabricator.wikimedia.org/T427381) (owner: 10Federico Ceratto)
[17:11:46] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] lvs: Set lvs1017 interface overrides [puppet] - 10https://gerrit.wikimedia.org/r/1294346 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[17:12:51] <jinxer-wm>	 FIRING: SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://eventstreams-internal.svc.codfw.wmnet:4992 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[17:13:55] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[17:14:03] <icinga-wm>	 PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs1016 is CRITICAL: CRITICAL: Service pybal.service has not been restarted after /etc/pybal/pybal.conf was changed (gt 1h). https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[17:14:44] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[17:14:44] <wikibugs>	 (03CR) 10CWilliams: sre.mysql.global-read-only Set all sections as RO/RW (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1277076 (https://phabricator.wikimedia.org/T419874) (owner: 10Federico Ceratto)
[17:14:45] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[17:15:39] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[17:15:40] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[17:16:09] <icinga-wm>	 RECOVERY - Check unit status of ipip-multiqueue-optimizer on lvs1017 is OK: OK: Status of the systemd unit ipip-multiqueue-optimizer https://wikitech.wikimedia.org/wiki/LVS%23IPIP_encapsulation_experiments
[17:16:12] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[17:16:13] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:16:19] <wikibugs>	 (03PS2) 10Dzahn: site: add zuul[12]00[4-9] with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1294347 (https://phabricator.wikimedia.org/T427353)
[17:16:20] <wikibugs>	 (03PS2) 10Jdlrobson: Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237)
[17:16:31] <wikibugs>	 (03PS1) 10Jdlrobson: Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294360 (https://phabricator.wikimedia.org/T427237)
[17:16:48] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:16:50] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[17:17:21] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[17:17:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[17:18:08] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[17:18:55] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[17:19:25] <jinxer-wm>	 FIRING: [9x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:21:10] <wikibugs>	 (03PS21) 10Majavah: firewall: Declare resources for both providers [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089)
[17:21:10] <wikibugs>	 (03PS21) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089)
[17:21:10] <wikibugs>	 (03PS1) 10Majavah: P:openstack: encapi: Fix type of firewall source port [puppet] - 10https://gerrit.wikimedia.org/r/1294361
[17:21:11] <wikibugs>	 (03PS1) 10Majavah: firewall: client: Add missing src_ips parameter [puppet] - 10https://gerrit.wikimedia.org/r/1294362
[17:22:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] firewall: client: Add missing src_ips parameter [puppet] - 10https://gerrit.wikimedia.org/r/1294362 (owner: 10Majavah)
[17:23:29] <icinga-wm>	 PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy1003 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner
[17:23:45] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1017 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[17:23:54] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[17:23:55] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1017 is OK: OK: 12 connections established with conf1007.eqiad.wmnet:4001 (min=12) https://wikitech.wikimedia.org/wiki/PyBal
[17:24:17] <wikibugs>	 (03PS4) 10Scott French: ProductionServices: Temporarily use shellbox in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293774
[17:24:25] <jinxer-wm>	 FIRING: [9x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:24:49] <wikibugs>	 (03CR) 10Majavah: [C:03+2] P:openstack: encapi: Fix type of firewall source port [puppet] - 10https://gerrit.wikimedia.org/r/1294361 (owner: 10Majavah)
[17:25:04] <wikibugs>	 (03CR) 10Majavah: [C:03+2] firewall: Declare resources for both providers [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[17:25:14] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp1114.eqiad.wmnet
[17:25:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by swfrench@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293774 (owner: 10Scott French)
[17:27:33] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices: Temporarily use shellbox in codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293774 (owner: 10Scott French)
[17:28:00] <logmsgbot>	 !log swfrench@deploy1003 Started scap sync-world: Backport for [[gerrit:1293774|ProductionServices: Temporarily use shellbox in codfw]]
[17:28:02] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] site: add zuul[12]00[4-9] with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1294347 (https://phabricator.wikimedia.org/T427353) (owner: 10Dzahn)
[17:30:20] <wikibugs>	 (03CR) 10Majavah: [C:03+1] "that one has now been merged, so this is obsolete" [puppet] - 10https://gerrit.wikimedia.org/r/1289378 (https://phabricator.wikimedia.org/T411089) (owner: 10JHathaway)
[17:30:25] <wikibugs>	 (03Abandoned) 10Majavah: Rename role::mariadb::ferm to role::mariadb::firewall [puppet] - 10https://gerrit.wikimedia.org/r/1289378 (https://phabricator.wikimedia.org/T411089) (owner: 10JHathaway)
[17:31:07] <icinga-wm>	 PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs1020 is CRITICAL: CRITICAL: Service pybal.service has not been restarted after /etc/pybal/pybal.conf was changed (gt 1h). https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[17:31:42] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Backport for [[gerrit:1293774|ProductionServices: Temporarily use shellbox in codfw]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:33:29] <icinga-wm>	 PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2002 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner
[17:34:25] <jinxer-wm>	 FIRING: [9x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:37:15] <wikibugs>	 (03PS22) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089)
[17:37:15] <wikibugs>	 (03PS2) 10Majavah: firewall: client: Remove reference to nonexistent param [puppet] - 10https://gerrit.wikimedia.org/r/1294362
[17:38:48] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Continuing with deployment
[17:38:55] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8587/co" [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[17:39:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: push_cross_cluster_settings_9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:40:13] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "This one is finally ready, I think." [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[17:40:19] <wikibugs>	 (03CR) 10Dzahn: trafficserver: add a map for gitlab as a backend (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1290731 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[17:40:42] <cdanis>	 jouncebot: nowandnext
[17:40:42] <jouncebot>	 For the next 0 hour(s) and 19 minute(s): MediaWiki infrastructure (UTC late) (extended edition) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1630)
[17:40:42] <jouncebot>	 In 2 hour(s) and 19 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T2000)
[17:41:43] <wikibugs>	 (03CR) 10Dzahn: "looks like this needs to wait for the port discussion to conclude" [puppet] - 10https://gerrit.wikimedia.org/r/1290684 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[17:42:51] <jinxer-wm>	 FIRING: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[17:43:01] <logmsgbot>	 !log swfrench@deploy1003 Finished scap sync-world: Backport for [[gerrit:1293774|ProductionServices: Temporarily use shellbox in codfw]] (duration: 15m 01s)
[17:43:55] <wikibugs>	 (03PS4) 10Scott French: ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776
[17:47:54] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1278528 (https://phabricator.wikimedia.org/T422646) (owner: 10Andrew Bogott)
[17:49:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1264 (T426633)', diff saved to https://phabricator.wikimedia.org/P93293 and previous config saved to /var/cache/conftool/dbconfig/20260527-174900-fceratto.json
[17:51:28] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox: apply
[17:52:38] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
[17:52:39] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[17:53:19] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[17:53:21] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[17:53:45] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[17:53:46] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
[17:54:09] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[17:54:10] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[17:54:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: push_cross_cluster_settings_9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:54:39] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[17:54:40] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[17:55:31] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[17:56:09] <wikibugs>	 (03CR) 10CDanis: [C:03+1] ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776 (owner: 10Scott French)
[17:57:20] <wikibugs>	 (03CR) 10CDanis: [C:03+2] ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776 (owner: 10Scott French)
[17:57:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by swfrench@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776 (owner: 10Scott French)
[17:58:12] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices: Revert to discovery shellbox listeners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293776 (owner: 10Scott French)
[17:58:39] <logmsgbot>	 !log swfrench@deploy1003 Started scap sync-world: Backport for [[gerrit:1293776|ProductionServices: Revert to discovery shellbox listeners]]
[17:59:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1264', diff saved to https://phabricator.wikimedia.org/P93294 and previous config saved to /var/cache/conftool/dbconfig/20260527-175908-fceratto.json
[18:00:36] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply
[18:00:40] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Backport for [[gerrit:1293776|ProductionServices: Revert to discovery shellbox listeners]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:00:58] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[18:01:00] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[18:01:12] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[18:01:14] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[18:01:26] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[18:01:27] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
[18:01:40] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[18:01:42] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[18:02:00] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[18:02:02] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-video: apply
[18:02:10] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
[18:03:31] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Continuing with deployment
[18:05:58] <wikibugs>	 (03PS1) 10Eric Gardner: Carousel only on articles [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294370 (https://phabricator.wikimedia.org/T427336)
[18:07:16] <wikibugs>	 (03PS1) 10Scott French: profile::services_proxy::envoy: Disable non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1294371
[18:07:26] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] firewall: client: Remove reference to nonexistent param [puppet] - 10https://gerrit.wikimedia.org/r/1294362 (owner: 10Majavah)
[18:07:44] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp5024.eqsin.wmnet
[18:08:26] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet
[18:08:27] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet
[18:09:03] <logmsgbot>	 !log swfrench@deploy1003 Finished scap sync-world: Backport for [[gerrit:1293776|ProductionServices: Revert to discovery shellbox listeners]] (duration: 10m 24s)
[18:09:07] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
[18:09:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1264', diff saved to https://phabricator.wikimedia.org/P93295 and previous config saved to /var/cache/conftool/dbconfig/20260527-180915-fceratto.json
[18:10:05] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
[18:10:22] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/mw-experimental: apply
[18:11:16] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
[18:11:50] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[18:12:28] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[18:12:39] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[18:13:14] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[18:13:23] <wikibugs>	 (03CR) 10Scott French: [C:03+2] profile::services_proxy::envoy: Disable non-discovery shellbox listeners [puppet] - 10https://gerrit.wikimedia.org/r/1294371 (owner: 10Scott French)
[18:16:46] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[18:16:56] <wikibugs>	 (03PS6) 10BCornwall: Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421)
[18:17:08] <wikibugs>	 (03PS7) 10BCornwall: Remove lvs1016 hieradata, demote to insetup_noferm [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421)
[18:18:04] <wikibugs>	 (03PS3) 10Majavah: firewall: client: Remove reference to nonexistent param [puppet] - 10https://gerrit.wikimedia.org/r/1294362
[18:18:52] <wikibugs>	 (03PS1) 10Ebernhardson: identity: Prune private ips from x-forwarded-for [extensions/CirrusSearch] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294373 (https://phabricator.wikimedia.org/T407432)
[18:19:14] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [extensions/CirrusSearch] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294373 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[18:19:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1264 (T426633)', diff saved to https://phabricator.wikimedia.org/P93296 and previous config saved to /var/cache/conftool/dbconfig/20260527-181923-fceratto.json
[18:19:33] <wikibugs>	 (03CR) 10Majavah: [C:03+2] firewall: client: Remove reference to nonexistent param [puppet] - 10https://gerrit.wikimedia.org/r/1294362 (owner: 10Majavah)
[18:19:41] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] Remove lvs1016, promote lvs1017 [puppet] - 10https://gerrit.wikimedia.org/r/1286523 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[18:20:39] <swfrench-wmf>	 jouncebot: nowandnext
[18:20:39] <jouncebot>	 For the next 0 hour(s) and 9 minute(s): MediaWiki infrastructure (UTC late) (extended edition) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T1630)
[18:20:39] <jouncebot>	 In 1 hour(s) and 39 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T2000)
[18:20:41] <wikibugs>	 (03PS1) 10Ebernhardson: Revert^2 "cirrus: AB test query suggester variants" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294374
[18:21:09] <wikibugs>	 (03PS2) 10Ebernhardson: Revert^2 "cirrus: AB test query suggester variants" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294374 (https://phabricator.wikimedia.org/T407432)
[18:21:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294374 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[18:23:10] <logmsgbot>	 !log swfrench@deploy1003 Started scap sync-world: Helmfile-only deployment to clean up unused mesh listeners
[18:24:14] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Helmfile-only deployment to clean up unused mesh listeners synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:24:56] <wikibugs>	 (03PS1) 10Catrope: auth: Mark the hidden token field used for reauth as skippable [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294375 (https://phabricator.wikimedia.org/T427398)
[18:25:07] <wikibugs>	 (03PS1) 10Catrope: Fix lastAuthTimestamp hack [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294376 (https://phabricator.wikimedia.org/T427398)
[18:25:20] <logmsgbot>	 !log swfrench@deploy1003 swfrench: Continuing with deployment
[18:26:08] <RoanKattouw>	 swfrench-wmf: Once you're done I would like to deploy fixes for the current train blocker, could you ping me when I'm good to go?
[18:26:36] <swfrench-wmf>	 RoanKattouw: will do! should be done in ~ 3-4m
[18:27:09] <icinga-wm>	 PROBLEM - pybal on lvs1016 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[18:27:11] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 (Connection refused) https://wikitech.wikimedia.org/wiki/PyBal
[18:28:08] <brett>	 ^Expected
[18:29:05] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1016 is CRITICAL: CRITICAL: 0 connections established with conf1007.eqiad.wmnet:4001 (min=12) https://wikitech.wikimedia.org/wiki/PyBal
[18:29:22] <logmsgbot>	 !log swfrench@deploy1003 Finished scap sync-world: Helmfile-only deployment to clean up unused mesh listeners (duration: 06m 12s)
[18:30:01] <swfrench-wmf>	 RoanKattouw: alright, I think the dust has settled. all yours!
[18:30:09] <icinga-wm>	 RECOVERY - pybal on lvs1016 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[18:30:11] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[18:30:18] <wikibugs>	 (03PS2) 10Dr0ptp4kt: Reactivate wikimedia.de email addresses for GrowthBook SSO [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294372 (https://phabricator.wikimedia.org/T418665)
[18:31:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by catrope@deploy1003 using scap backport" [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294376 (https://phabricator.wikimedia.org/T427398) (owner: 10Catrope)
[18:31:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by catrope@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294375 (https://phabricator.wikimedia.org/T427398) (owner: 10Catrope)
[18:31:29] <wikibugs>	 (03CR) 10Dr0ptp4kt: "I believe we may need this in order for the wikimedia.de account holders to SSO login, coupled with their other accoutrements in IDM and G" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294372 (https://phabricator.wikimedia.org/T418665) (owner: 10Dr0ptp4kt)
[18:32:49] <fabfur>	 !incidents
[18:32:49] <sirenbot>	 8024 (ACKED)  db2189 (paged)/MariaDB Replica SQL: s2 (paged)
[18:32:49] <sirenbot>	 8025 (ACKED)  db2189 (paged)/MariaDB Replica IO: s2 (paged)
[18:32:49] <sirenbot>	 8026 (ACKED)  db2189 (paged)/MariaDB Replica Lag: s2 (paged)
[18:32:50] <sirenbot>	 8023 (RESOLVED)  Host db2189 (paged)
[18:34:03] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs1016 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[18:34:05] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1016 is OK: OK: 12 connections established with conf1007.eqiad.wmnet:4001 (min=12) https://wikitech.wikimedia.org/wiki/PyBal
[18:35:08] <logmsgbot>	 !log joal@deploy1003 Started deploy [analytics/refinery@96cf761] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@96cf761f]
[18:37:13] <logmsgbot>	 !log joal@deploy1003 Finished deploy [analytics/refinery@96cf761] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@96cf761f] (duration: 02m 04s)
[18:39:00] <wikibugs>	 (03Merged) 10jenkins-bot: Fix lastAuthTimestamp hack [extensions/CentralAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294376 (https://phabricator.wikimedia.org/T427398) (owner: 10Catrope)
[18:39:08] <logmsgbot>	 !log joal@deploy1003 Started deploy [analytics/refinery@96cf761]: Regular analytics weekly train [analytics/refinery@96cf761f]
[18:40:14] <logmsgbot>	 !log joal@deploy1003 Finished deploy [analytics/refinery@96cf761]: Regular analytics weekly train [analytics/refinery@96cf761f] (duration: 01m 05s)
[18:42:55] <jinxer-wm>	 FIRING: [2x] PyBalBGPUnstable: PyBal BGP sessions on instance lvs1017 with peer 208.80.154.196 are failing #page - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=eqiad%20prometheus/ops&var-server=lvs1017 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable
[18:43:07] <sukhe>	 ah wow, the page for this
[18:43:08] <sukhe>	 !ack
[18:43:09] <sirenbot>	 8028 (ACKED)  [2x] PyBalBGPUnstable lvs sre (lvs1017:9090 pybal 64600 eqiad)
[18:43:15] <sukhe>	 it's been ages
[18:43:27] <sukhe>	 expected, brett ^
[18:43:38] <brett>	 thanks
[18:44:03] <icinga-wm>	 PROBLEM - SSH on stat1008 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring
[18:45:26] <wikibugs>	 (03Merged) 10jenkins-bot: auth: Mark the hidden token field used for reauth as skippable [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294375 (https://phabricator.wikimedia.org/T427398) (owner: 10Catrope)
[18:45:54] <logmsgbot>	 !log catrope@deploy1003 Started scap sync-world: Backport for [[gerrit:1294376|Fix lastAuthTimestamp hack (T427398)]], [[gerrit:1294375|auth: Mark the hidden token field used for reauth as skippable (T427398)]]
[18:45:59] <stashbot>	 T427398: Unable to edit pages on Mediawiki namespace on 1.47.0-wmf.4, redirects to Verify your Identity page - https://phabricator.wikimedia.org/T427398
[18:47:03] <icinga-wm>	 RECOVERY - SSH on stat1008 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[18:47:45] <logmsgbot>	 !log catrope@deploy1003 catrope: Backport for [[gerrit:1294376|Fix lastAuthTimestamp hack (T427398)]], [[gerrit:1294375|auth: Mark the hidden token field used for reauth as skippable (T427398)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:49:30] <logmsgbot>	 !log catrope@deploy1003 catrope: Continuing with deployment
[18:49:54] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp5031.eqsin.wmnet
[18:51:56] <moritzm>	 lvs1017 is for some planned maintenance and will recover and do I need to look into anything?
[18:52:19] <sukhe>	 moritzm: thanks but brett is on it and there should be no user-impact
[18:52:27] <moritzm>	 ok!
[18:53:36] <logmsgbot>	 !log catrope@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294376|Fix lastAuthTimestamp hack (T427398)]], [[gerrit:1294375|auth: Mark the hidden token field used for reauth as skippable (T427398)]] (duration: 07m 41s)
[18:53:40] <logmsgbot>	 !log joal@deploy1003 Started deploy [analytics/refinery@96cf761]: Regular analytics weekly train [analytics/refinery@96cf761f]
[18:53:41] <stashbot>	 T427398: Unable to edit pages on Mediawiki namespace on 1.47.0-wmf.4, redirects to Verify your Identity page - https://phabricator.wikimedia.org/T427398
[18:56:46] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] gitlab: use service name for upstream addr [puppet] - 10https://gerrit.wikimedia.org/r/1294219 (https://phabricator.wikimedia.org/T425441) (owner: 10Arnaudb)
[18:58:41] <logmsgbot>	 !log joal@deploy1003 Finished deploy [analytics/refinery@96cf761]: Regular analytics weekly train [analytics/refinery@96cf761f] (duration: 05m 01s)
[18:59:19] <logmsgbot>	 !log joal@deploy1003 Started deploy [analytics/refinery@96cf761] (thin): Regular analytics weekly train THIN [analytics/refinery@96cf761f]
[19:01:28] <logmsgbot>	 !log joal@deploy1003 Finished deploy [analytics/refinery@96cf761] (thin): Regular analytics weekly train THIN [analytics/refinery@96cf761f] (duration: 02m 08s)
[19:04:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: push_cross_cluster_settings_9400.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:06:38] <wikibugs>	 (03CR) 10C. Scott Ananian: [C:03+1] Deploy PRV to 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293805 (https://phabricator.wikimedia.org/T427331) (owner: 10Arlolra)
[19:08:02] <wikibugs>	 (03PS1) 10Cathal Mooney: lvs1017: change configured set of BGP peers to top-of-rack siwtch [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421)
[19:08:10] <wikibugs>	 (03CR) 10Ladsgroup: "It's blocking community and a high ranking committee in the movement. So I push it forward." [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[19:08:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] lvs1017: change configured set of BGP peers to top-of-rack siwtch [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421) (owner: 10Cathal Mooney)
[19:10:25] <wikibugs>	 (03PS2) 10Cathal Mooney: lvs1017: change configured set of BGP peers to top-of-rack siwtch [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421)
[19:11:41] <wikibugs>	 (03PS1) 10Eevans: linked-artifacts: deploy hoarde v1.3.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294387 (https://phabricator.wikimedia.org/T414112)
[19:12:41] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:13:02] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] lvs1017: change configured set of BGP peers to top-of-rack siwtch [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421) (owner: 10Cathal Mooney)
[19:13:09] <wikibugs>	 (03PS1) 10Dzahn: ci::firewall: srange and drange need to be arrays [puppet] - 10https://gerrit.wikimedia.org/r/1294388 (https://phabricator.wikimedia.org/T418521)
[19:15:35] <wikibugs>	 (03CR) 10Eevans: [C:03+2] linked-artifacts: deploy hoarde v1.3.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294387 (https://phabricator.wikimedia.org/T414112) (owner: 10Eevans)
[19:17:48] <wikibugs>	 (03Merged) 10jenkins-bot: linked-artifacts: deploy hoarde v1.3.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294387 (https://phabricator.wikimedia.org/T414112) (owner: 10Eevans)
[19:18:24] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] ci::firewall: srange and drange need to be arrays [puppet] - 10https://gerrit.wikimedia.org/r/1294388 (https://phabricator.wikimedia.org/T418521) (owner: 10Dzahn)
[19:19:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:19:31] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421) (owner: 10Cathal Mooney)
[19:20:16] <logmsgbot>	 !log eevans@deploy1003 helmfile [staging] START helmfile.d/services/linked-artifacts: apply
[19:20:17] <wikibugs>	 (03PS1) 10DDesouza: miscweb: bump (design|research)-landing-page [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294389 (https://phabricator.wikimedia.org/T344471)
[19:20:33] <logmsgbot>	 !log eevans@deploy1003 helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply
[19:20:58] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] "One note here for @kharlan@wikimedia.org, you may want to clear out the main branch of any extraneous files since this will be checked out" [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:21:10] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+2] lvs1017: change configured set of BGP peers to top-of-rack siwtch [puppet] - 10https://gerrit.wikimedia.org/r/1294385 (https://phabricator.wikimedia.org/T421421) (owner: 10Cathal Mooney)
[19:24:02] <wikibugs>	 (03CR) 10Ssingh: "Let's plan to merge tomorrow, 14:00 UTC." [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:24:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:27:55] <jinxer-wm>	 RESOLVED: [2x] PyBalBGPUnstable: PyBal BGP sessions on instance lvs1017 with peer 208.80.154.196 are failing #page - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=eqiad%20prometheus/ops&var-server=lvs1017 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable
[19:27:58] <sukhe>	 nice
[19:28:01] <brett>	 hooray
[19:28:10] <wikibugs>	 (03CR) 10Thcipriani: [C:04-1] scap.cfg.erb: Add hcaptcha checkout in production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:29:34] <wikibugs>	 (03PS1) 10Dzahn: CI: better naming; avoid using terms "new" and "legacy" [puppet] - 10https://gerrit.wikimedia.org/r/1294392
[19:30:45] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1288370 (https://phabricator.wikimedia.org/T423766) (owner: 10Pppery)
[19:31:07] <icinga-wm>	 PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs1020 is CRITICAL: CRITICAL: Service pybal.service has not been restarted after /etc/pybal/pybal.conf was changed (gt 1h). https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[19:31:26] <wikibugs>	 (03CR) 10DDesouza: [C:03+2] miscweb: bump (design|research)-landing-page [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294389 (https://phabricator.wikimedia.org/T344471) (owner: 10DDesouza)
[19:31:39] <wikibugs>	 (03PS1) 10Catrope: Permissions: Create wmf-officeit group on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294393
[19:32:08] <logmsgbot>	 !log brett@cumin2002 cookbooks.sre.cdn.roll-reboot finished rebooting cp5032.eqsin.wmnet
[19:32:08] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp6016.drmrs.wmnet,cp[1112,1114].eqiad.wmnet,cp[5024,5031-5032].eqsin.wmnet} and A:cp
[19:32:37] <wikibugs>	 (03PS3) 10Cathal Mooney: ulsfo LVS: peer with the ToR switch [puppet] - 10https://gerrit.wikimedia.org/r/1282731 (https://phabricator.wikimedia.org/T408892) (owner: 10Ayounsi)
[19:32:38] <wikibugs>	 (03PS3) 10Cathal Mooney: LVS BGP: peer with the gateway if no exception is set [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:33:57] <wikibugs>	 (03PS2) 10Dzahn: CI: better naming; avoid using terms "new" and "legacy" [puppet] - 10https://gerrit.wikimedia.org/r/1294392
[19:34:08] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: bump (design|research)-landing-page [deployment-charts] - 10https://gerrit.wikimedia.org/r/1294389 (https://phabricator.wikimedia.org/T344471) (owner: 10DDesouza)
[19:34:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:35:52] <wikibugs>	 (03PS4) 10Cathal Mooney: LVS BGP: peer with the gateway if no exception is set [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:36:25] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] scap.cfg.erb: Add hcaptcha checkout in production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:36:27] <wikibugs>	 (03PS2) 10Ahmon Dancy: scap.cfg.erb: Add hcaptcha checkout in production [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829)
[19:37:00] <wikibugs>	 (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:37:05] <wikibugs>	 (03CR) 10Ahmon Dancy: scap.cfg.erb: Add hcaptcha checkout in production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:37:09] <icinga-wm>	 PROBLEM - pybal on lvs1016 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[19:37:11] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 (Connection refused) https://wikitech.wikimedia.org/wiki/PyBal
[19:37:19] <wikibugs>	 (03CR) 10Thcipriani: [C:03+1] scap.cfg.erb: Add hcaptcha checkout in production [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[19:39:25] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: opensearch_2@relforge-eqiad-small-alpha.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:41:01] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, May 27 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-ite" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293819 (https://phabricator.wikimedia.org/T426614) (owner: 10Bartosz Dziewoński)
[19:42:05] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1016 is CRITICAL: CRITICAL: 0 connections established with conf1007.eqiad.wmnet:4001 (min=12) https://wikitech.wikimedia.org/wiki/PyBal
[19:42:22] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] Remove lvs1016 hieradata, demote to insetup_noferm [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[19:43:06] <wikibugs>	 (03PS5) 10Cathal Mooney: LVS BGP: peer with the gateway if no exception is set [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:43:15] <wikibugs>	 (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282764 (owner: 10Ayounsi)
[19:45:46] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[19:45:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 86062048 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:45:59] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[19:46:00] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[19:46:03] <icinga-wm>	 PROBLEM - Check if Pybal has been restarted after pybal.conf was changed on lvs1016 is CRITICAL: CRITICAL: Service pybal.service is not active. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[19:46:12] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[19:46:13] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[19:46:28] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[19:46:48] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8598/co" [puppet] - 10https://gerrit.wikimedia.org/r/1286524 (https://phabricator.wikimedia.org/T421421) (owner: 10BCornwall)
[19:46:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 2736424 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:48:06] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[19:48:19] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[19:48:20] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[19:48:36] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[19:48:37] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[19:48:56] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[19:51:24] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
[19:57:39] <wikibugs>	 (03PS8) 10Jdlrobson: Remove MinervaNightMode config after skin cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285523 (https://phabricator.wikimedia.org/T426689) (owner: 10HakanIST)
[20:00:05] <jouncebot>	 RoanKattouw, urbanecm, TheresNoTime, kindrobot, and cjming: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T2000). nyaa~
[20:00:05] <jouncebot>	 stephanebisson, ebernhardson, Pppery, and MatmaRex: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:08] <Pppery>	 here
[20:00:09] <ebernhardson>	 \o
[20:00:12] <stephanebisson>	 o/
[20:00:17] <MatmaRex>	 hi
[20:00:48] <stephanebisson>	 I'm starting...
[20:00:57] <MatmaRex>	 i' not a deployer, i'd appreciate if someone could ship my change. it's not risky, it can go out together with whatever else.
[20:02:02] <Jdlrobson>	 I can do it MatmaRex 
[20:02:21] <Jdlrobson>	 hmm looks like my deploy disappeared?
[20:02:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [extensions/ArticleGuidance] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294342 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:02:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [extensions/ArticleGuidance] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294343 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:02:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294344 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:04:28] <Jdlrobson>	 k ill do mine later since the deploy window looks very busy
[20:04:34] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.network.peering with action 'configure' for AS: 12355
[20:05:49] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12355
[20:06:38] <ebernhardson>	 i wonder sometimes if we need a second deploy window that works for west coast? I dunno if later (4pm?) would be reasoable
[20:08:33] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, May 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294393 (owner: 10Catrope)
[20:14:09] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
[20:16:01] <wikibugs>	 (03Merged) 10jenkins-bot: Allow disabling experiment for experienced editors (>=100 edits) [extensions/ArticleGuidance] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294342 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:16:08] <wikibugs>	 (03Merged) 10jenkins-bot: frwiki: restrict Article Guidance experiment to junior editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294344 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:16:55] <wikibugs>	 (03Merged) 10jenkins-bot: Allow disabling experiment for experienced editors (>=100 edits) [extensions/ArticleGuidance] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294343 (https://phabricator.wikimedia.org/T426871) (owner: 10Sbisson)
[20:17:22] <logmsgbot>	 !log sbisson@deploy1003 Started scap sync-world: Backport for [[gerrit:1294342|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294343|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294344|frwiki: restrict Article Guidance experiment to junior editors (T426871)]]
[20:17:27] <stashbot>	 T426871: Enable AG experiment on phase 2 wikis - https://phabricator.wikimedia.org/T426871
[20:19:14] <logmsgbot>	 !log sbisson@deploy1003 sbisson: Backport for [[gerrit:1294342|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294343|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294344|frwiki: restrict Article Guidance experiment to junior editors (T426871)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v
[20:19:14] <logmsgbot>	 erified there.
[20:20:39] <icinga-wm>	 RECOVERY - Check if Pybal has been restarted after pybal.conf was changed on lvs1020 is OK: OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed. https://wikitech.wikimedia.org/wiki/PyBal%23Pybal_service_has_not_been_restarted
[20:20:54] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
[20:21:19] <logmsgbot>	 !log sbisson@deploy1003 sbisson: Continuing with deployment
[20:21:43] <logmsgbot>	 !log brett@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
[20:22:36] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11961220 (10BCornwall)
[20:25:33] <logmsgbot>	 !log sbisson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294342|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294343|Allow disabling experiment for experienced editors (>=100 edits) (T426871)]], [[gerrit:1294344|frwiki: restrict Article Guidance experiment to junior editors (T426871)]] (duration: 08m 11s)
[20:25:39] <stashbot>	 T426871: Enable AG experiment on phase 2 wikis - https://phabricator.wikimedia.org/T426871
[20:25:39] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.decommission for hosts lvs1016.eqiad.wmnet
[20:25:56] <wikibugs>	 (03PS1) 10Bking: OpenSearch: Add required config for bootstrapping a cluster [puppet] - 10https://gerrit.wikimedia.org/r/1294402 (https://phabricator.wikimedia.org/T427306)
[20:26:16] <stephanebisson>	 ok, I'm done. Over to you ebernhardson
[20:26:18] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1294402 (https://phabricator.wikimedia.org/T427306) (owner: 10Bking)
[20:27:13] <swfrench-wmf>	 !log reprepro include php8.3_8.3.31-1+wmf12u2 into component/php83 for bookworm-wikimedia - T427312
[20:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:18] <stashbot>	 T427312: Build PHP 8.3 packages for bookworm - https://phabricator.wikimedia.org/T427312
[20:29:04] <ebernhardson>	 stashbot: thanks!
[20:29:04] <stashbot>	 See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help.
[20:29:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy1003 using scap backport" [extensions/CirrusSearch] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294373 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[20:29:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294374 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[20:30:38] <wikibugs>	 (03Merged) 10jenkins-bot: Revert^2 "cirrus: AB test query suggester variants" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294374 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[20:31:09] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.dns.netbox
[20:37:11] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
[20:38:20] <swfrench-wmf>	 !log reprepro include php-defaults_94+wmf12u1 into component/php83 for bookworm-wikimedia - T427312
[20:38:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:26] <stashbot>	 T427312: Build PHP 8.3 packages for bookworm - https://phabricator.wikimedia.org/T427312
[20:39:06] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
[20:39:07] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:39:08] <logmsgbot>	 !log brett@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs1016.eqiad.wmnet
[20:40:17] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission lvs1016.eqiad.wmnet - https://phabricator.wikimedia.org/T427451#11961277 (10BCornwall)
[20:40:34] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware: decommission lvs1016.eqiad.wmnet - https://phabricator.wikimedia.org/T427451#11961281 (10BCornwall)
[20:40:37] <wikibugs>	 06SRE, 06Traffic: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11961282 (10BCornwall)
[20:41:37] <wikibugs>	 06SRE, 06Traffic: Revert lvs1017 Mellanox NIC to Broadcom - https://phabricator.wikimedia.org/T421421#11961284 (10BCornwall) 05In progress→03Resolved
[20:43:44] <wikibugs>	 (03Merged) 10jenkins-bot: identity: Prune private ips from x-forwarded-for [extensions/CirrusSearch] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294373 (https://phabricator.wikimedia.org/T407432) (owner: 10Ebernhardson)
[20:43:54] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:43:56] <swfrench-wmf>	 !log reprepro include dh-php_5.5+wmf12u1 into component/php83 for bookworm-wikimedia - T427312
[20:44:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:01] <stashbot>	 T427312: Build PHP 8.3 packages for bookworm - https://phabricator.wikimedia.org/T427312
[20:44:14] <logmsgbot>	 !log ebernhardson@deploy1003 Started scap sync-world: Backport for [[gerrit:1294373|identity: Prune private ips from x-forwarded-for (T407432)]], [[gerrit:1294374|Revert^2 "cirrus: AB test query suggester variants" (T407432)]]
[20:44:19] <stashbot>	 T407432: Follow-up AB test of dym language model variants - https://phabricator.wikimedia.org/T407432
[20:46:07] <logmsgbot>	 !log ebernhardson@deploy1003 ebernhardson: Backport for [[gerrit:1294373|identity: Prune private ips from x-forwarded-for (T407432)]], [[gerrit:1294374|Revert^2 "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:47:33] <logmsgbot>	 !log ebernhardson@deploy1003 ebernhardson: Continuing with deployment
[20:48:27] <wikibugs>	 (03CR) 10Kosta Harlan: scap.cfg.erb: Add hcaptcha checkout in production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[20:48:54] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:51:45] <logmsgbot>	 !log ebernhardson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294373|identity: Prune private ips from x-forwarded-for (T407432)]], [[gerrit:1294374|Revert^2 "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 30s)
[20:51:50] <stashbot>	 T407432: Follow-up AB test of dym language model variants - https://phabricator.wikimedia.org/T407432
[20:52:21] <ebernhardson>	 Pppery: you're up, config's should probbly fit in 10min
[20:52:29] <Pppery>	 Not a deployer
[20:52:53] <ebernhardson>	 hmm, ok i can ship. Yours and MatmaRex's?
[20:53:18] <MatmaRex>	 sure. thanks
[20:53:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1288370 (https://phabricator.wikimedia.org/T423766) (owner: 10Pppery)
[20:53:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293819 (https://phabricator.wikimedia.org/T426614) (owner: 10Bartosz Dziewoński)
[20:55:55] <wikibugs>	 (03Merged) 10jenkins-bot: Allow Vector 2022 font size changes in namespace 100 for enwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1288370 (https://phabricator.wikimedia.org/T423766) (owner: 10Pppery)
[20:55:59] <wikibugs>	 (03Merged) 10jenkins-bot: Fix case of 'commonsfinder' in $wgUrlProtocols [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1293819 (https://phabricator.wikimedia.org/T426614) (owner: 10Bartosz Dziewoński)
[20:56:25] <logmsgbot>	 !log ebernhardson@deploy1003 Started scap sync-world: Backport for [[gerrit:1288370|Allow Vector 2022 font size changes in namespace 100 for enwiktionary (T423766)]], [[gerrit:1293819|Fix case of 'commonsfinder' in $wgUrlProtocols (T426614)]]
[20:56:31] <stashbot>	 T423766: Allow Vector-2022 font size changes in namespace 100 on the English Wiktionary - https://phabricator.wikimedia.org/T423766
[20:56:32] <stashbot>	 T426614: add "CommonsFinder://" custom scheme to $wgUrlProtocols for native app OAuth2 support - https://phabricator.wikimedia.org/T426614
[20:58:26] <logmsgbot>	 !log ebernhardson@deploy1003 matmarex, ebernhardson, pppery: Backport for [[gerrit:1288370|Allow Vector 2022 font size changes in namespace 100 for enwiktionary (T423766)]], [[gerrit:1293819|Fix case of 'commonsfinder' in $wgUrlProtocols (T426614)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:58:40] <Pppery>	 Looking
[20:58:46] <ebernhardson>	 thanks
[20:59:22] <Pppery>	 Looks good
[20:59:29] <ebernhardson>	 MatmaRex: yours look ok?
[20:59:34] <MatmaRex>	 ebernhardson: looks good, thanks
[20:59:50] <logmsgbot>	 !log ebernhardson@deploy1003 matmarex, ebernhardson, pppery: Continuing with deployment
[21:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T2100)
[21:04:03] <logmsgbot>	 !log ebernhardson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1288370|Allow Vector 2022 font size changes in namespace 100 for enwiktionary (T423766)]], [[gerrit:1293819|Fix case of 'commonsfinder' in $wgUrlProtocols (T426614)]] (duration: 07m 38s)
[21:04:09] <stashbot>	 T423766: Allow Vector-2022 font size changes in namespace 100 on the English Wiktionary - https://phabricator.wikimedia.org/T423766
[21:04:10] <stashbot>	 T426614: add "CommonsFinder://" custom scheme to $wgUrlProtocols for native app OAuth2 support - https://phabricator.wikimedia.org/T426614
[21:04:48] <wikibugs>	 (03PS3) 10Ahmon Dancy: scap.cfg.erb: Add hcaptcha checkout in production [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829)
[21:04:58] <wikibugs>	 (03CR) 10Ahmon Dancy: scap.cfg.erb: Add hcaptcha checkout in production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1287007 (https://phabricator.wikimedia.org/T403829) (owner: 10Ahmon Dancy)
[21:05:00] <ebernhardson>	 all set! deploy window complete
[21:06:05] <MatmaRex>	 thank you!
[21:09:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:14:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:19:56] <logmsgbot>	 !log arlolra@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[21:20:21] <logmsgbot>	 !log arlolra@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[21:20:22] <logmsgbot>	 !log arlolra@deploy1003 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[21:20:49] <logmsgbot>	 !log arlolra@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[21:23:54] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitCrashLoop: prometheus-wmf-elasticsearch-exporter-9200.service crashloop on relforge1008:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[21:24:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: opensearch-disable-readahead-relforge-eqiad.service on relforge1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:37:41] <logmsgbot>	 !log bking@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on relforge[1008-1010].eqiad.wmnet with reason: non-production environment
[21:43:06] <jinxer-wm>	 FIRING: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy   - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[21:45:40] <wikibugs>	 (03PS1) 10Eric Gardner: Exclude more content from selection [extensions/ReaderExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294432 (https://phabricator.wikimedia.org/T426308)
[21:52:00] <EricGardner>	 Heads up that I will be deploying two small patches in the readers window in about 10 minutes
[22:00:04] <jouncebot>	 Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260527T2200)
[22:00:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by egardner@deploy1003 using scap backport" [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294370 (https://phabricator.wikimedia.org/T427336) (owner: 10Eric Gardner)
[22:02:00] <Jdlrobson>	 EricGardner: me 2. 
[22:02:09] <Jdlrobson>	 EricGardner: are yours config only or backports? 
[22:02:31] <EricGardner>	 I'm doing backports
[22:03:07] <EricGardner>	 I have 2, just started the first one
[22:03:12] <wikibugs>	 (03Merged) 10jenkins-bot: Carousel only on articles [extensions/MultimediaViewer] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294370 (https://phabricator.wikimedia.org/T427336) (owner: 10Eric Gardner)
[22:03:41] <logmsgbot>	 !log egardner@deploy1003 Started scap sync-world: Backport for [[gerrit:1294370|Carousel only on articles (T427336)]]
[22:03:46] <stashbot>	 T427336: Carousel: Limit the feature to article pages only - https://phabricator.wikimedia.org/T427336
[22:04:13] <Jdlrobson>	 Since they are both backports and scap takes a long time, mind if we builk the next ones together? There is an issue with thumbnail rendering on all pages impacting readers so pretty important this goes out 
[22:04:24] <Jdlrobson>	 (i was unable to find space in the earlier backport window)
[22:05:16] <EricGardner>	 I'd prefer to backport my patches separately. I can wait to do my second one until you are done
[22:05:36] <logmsgbot>	 !log egardner@deploy1003 egardner: Backport for [[gerrit:1294370|Carousel only on articles (T427336)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:05:46] <Jdlrobson>	 Okay that works. I can bundle https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1294322 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1294360 together. 
[22:09:32] <logmsgbot>	 !log egardner@deploy1003 egardner: Continuing with deployment
[22:10:07] <wikibugs>	 (03CR) 10Cwhite: [C:04-1] OpenSearch: Add required config for bootstrapping a cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1294402 (https://phabricator.wikimedia.org/T427306) (owner: 10Bking)
[22:10:26] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11961402 (10Papaul)
[22:10:45] <icinga-wm>	 PROBLEM - Host ml-serve1014 is DOWN: PING CRITICAL - Packet loss = 100%
[22:12:21] <icinga-wm>	 RECOVERY - Host ml-serve1014 is UP: PING OK - Packet loss = 0%, RTA = 3.61 ms
[22:13:41] <logmsgbot>	 !log egardner@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294370|Carousel only on articles (T427336)]] (duration: 10m 00s)
[22:13:46] <stashbot>	 T427336: Carousel: Limit the feature to article pages only - https://phabricator.wikimedia.org/T427336
[22:14:20] <EricGardner>	 Jdlrobson: feel free to deploy your patches now
[22:14:31] <Jdlrobson>	 thanks EricGardner on it
[22:15:20] <wikibugs>	 (03PS9) 10Jdlrobson: Remove MinervaNightMode config after skin cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285523 (https://phabricator.wikimedia.org/T426689) (owner: 10HakanIST)
[22:16:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294360 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:16:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:19:57] <wikibugs>	 (03PS1) 10Catrope: passwordlessLogin: Limit conditional mediation to the main login form [extensions/OATHAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294435 (https://phabricator.wikimedia.org/T427419)
[22:22:09] <wikibugs>	 (03CR) 10Ladsgroup: Add config for conductwiki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[22:22:14] <wikibugs>	 (03PS2) 10Ladsgroup: Add config for conductwiki [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984)
[22:22:19] <RoanKattouw>	 Jdlrobson: Do you mind if I tag along with another patch after you're done?
[22:22:20] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] Add config for conductwiki [puppet] - 10https://gerrit.wikimedia.org/r/1292346 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[22:23:06] <Jdlrobson>	 RoanKattouw: Eric is after me but you can go after.  I had a small cleanup patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1285523?usp=search) but that's not urgent. I can attempt that tomorrow.
[22:23:45] <RoanKattouw>	 OK, do what you need to do and then please ping me when you're both done
[22:24:12] <EricGardner>	 RoanKattouw: mine should be quick
[22:28:07] <wikibugs>	 (03Merged) 10jenkins-bot: Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1294360 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:30:37] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:31:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:32:00] <Jdlrobson>	 :( flaky tests
[22:33:19] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Add conduct.wikimedia.org (T426984)
[22:33:24] <stashbot>	 T426984: Create Conductwiki wiki - https://phabricator.wikimedia.org/T426984
[22:34:14] <wikibugs>	 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11961461 (10Papaul)
[22:34:16] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Add conduct.wikimedia.org (T426984) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:35:25] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[22:36:08] <wikibugs>	 (03Merged) 10jenkins-bot: Thumbnails are not being optimized in large mode [core] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294322 (https://phabricator.wikimedia.org/T427237) (owner: 10Jdlrobson)
[22:36:54] <Jdlrobson>	 Did Amir start a deploy...? 
[22:37:00] <RoanKattouw>	 Yes without being in this channel
[22:37:04] <RoanKattouw>	 I'm pinging him on Slack about this now
[22:37:14] <Jdlrobson>	 i think that just merged my chance without testing o_o
[22:37:46] <RoanKattouw>	 Parallel deploys? How is that even possible with the deployment lock?
[22:37:53] <RoanKattouw>	 I guess scap backport only acquires the lock after the change merges?
[22:37:59] <Jdlrobson>	 hmm i dont know what happened to my deploys
[22:38:12] <EricGardner>	 Ok. I have to go right at 4pm so I will relinquish my spot in the queue. RoanKattouw: you are welcome to proceed once Jdlrobson is done
[22:38:19] <EricGardner>	 My thing is less urgent and we can do it tomorrow
[22:38:26] <Jdlrobson>	 EricGardner: want me to do yours if I have time?
[22:38:27] <RoanKattouw>	 Jdlrobson: Your deploy is waiting for Amir's to be done
[22:38:34] <RoanKattouw>	 22:36:16 concurrent prep is locked by ladsgroup (pid 3468335) on Wed May 27 22:32:23 2026; reason is "Add conduct.wikimedia.org (T426984)".
[22:38:35] <stashbot>	 T426984: Create Conductwiki wiki - https://phabricator.wikimedia.org/T426984
[22:38:37] <Jdlrobson>	 RoanKattouw Spiderpig says "All changes have been merged" but did not give me the chance to test
[22:39:05] <RoanKattouw>	 Yeah Spiderpig is paused, it will try again in 10 minutes to see if Amir's is done by then
[22:39:12] <Jdlrobson>	 urggh ok
[22:39:12] <RoanKattouw>	 If not, idk if it waits again or just fails
[22:39:15] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Add conduct.wikimedia.org (T426984) (duration: 07m 16s)
[22:39:29] <RoanKattouw>	 Aha, it finished and yours immediately resumed
[22:39:35] <logmsgbot>	 !log jdlrobson@deploy1003 Started scap sync-world: Backport for [[gerrit:1294360|Thumbnails are not being optimized in large mode (T427237)]], [[gerrit:1294322|Thumbnails are not being optimized in large mode (T427237)]]
[22:39:40] <stashbot>	 T427237: Regression: Thumbnails on content pages are not scaled for large preference without losing quality - https://phabricator.wikimedia.org/T427237
[22:39:45] <Jdlrobson>	 ok cool
[22:40:06] <logmsgbot>	 !log ladsgroup@cumin1003 START - Cookbook sre.mysql.sanitarium_restart
[22:40:06] <logmsgbot>	 !log ladsgroup@cumin1003 END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
[22:40:18] <logmsgbot>	 !log ladsgroup@cumin1003 START - Cookbook sre.mysql.sanitarium_restart
[22:40:20] <EricGardner>	 Jdlrobson the second patch I was going to deploy was https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ReaderExperiments/+/1294432
[22:40:42] <EricGardner>	 But I can do it tomorrow if you run out of time
[22:40:49] <Jdlrobson>	 EricGardner: np
[22:40:56] <Jdlrobson>	 we'll see what happens :)
[22:40:59] <EricGardner>	 thanks!
[22:41:01] <Jdlrobson>	 but yeh will deploy if i can
[22:41:29] <logmsgbot>	 !log jdlrobson@deploy1003 jdlrobson: Backport for [[gerrit:1294360|Thumbnails are not being optimized in large mode (T427237)]], [[gerrit:1294322|Thumbnails are not being optimized in large mode (T427237)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:42:24] <logmsgbot>	 !log jdlrobson@deploy1003 jdlrobson: Continuing with deployment
[22:43:04] <Amir1>	 I'm so sorry. I should have checked the window. It was late, I assumed everything is over
[22:43:13] <Amir1>	 If there is anything I can do to help, let me know
[22:43:26] <logmsgbot>	 ladsgroup@cumin1003 sanitarium_restart (PID 1976244) is awaiting input
[22:43:33] <RoanKattouw>	 Amir1: All good, your deploy was really fast and Spiderpig's locking mechanism worked perfectly
[22:43:45] <RoanKattouw>	 It paused Jon's deploy for 3 minutes and then automatically resumed when yours was done
[22:44:03] <Amir1>	 I was pushing a simple apache change
[22:44:11] <Amir1>	 glad it worked and sorry again
[22:45:02] <Jdlrobson>	 RoanKattouw: you can go now and then i'll try and fit Eric's in
[22:45:09] <Jdlrobson>	 (mine is just syncing now)
[22:46:30] <logmsgbot>	 !log jdlrobson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294360|Thumbnails are not being optimized in large mode (T427237)]], [[gerrit:1294322|Thumbnails are not being optimized in large mode (T427237)]] (duration: 06m 54s)
[22:46:34] <stashbot>	 T427237: Regression: Thumbnails on content pages are not scaled for large preference without losing quality - https://phabricator.wikimedia.org/T427237
[22:47:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by catrope@deploy1003 using scap backport" [extensions/OATHAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294435 (https://phabricator.wikimedia.org/T427419) (owner: 10Catrope)
[22:49:47] <icinga-wm>	 PROBLEM - VRRP status on cr3-eqsin is CRITICAL: VRRP CRITICAL - 1 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
[22:50:20] <wikibugs>	 (03Merged) 10jenkins-bot: passwordlessLogin: Limit conditional mediation to the main login form [extensions/OATHAuth] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294435 (https://phabricator.wikimedia.org/T427419) (owner: 10Catrope)
[22:50:47] <logmsgbot>	 !log catrope@deploy1003 Started scap sync-world: Backport for [[gerrit:1294435|passwordlessLogin: Limit conditional mediation to the main login form (T427419)]]
[22:50:52] <stashbot>	 T427419: Unable to finish 2FA - https://phabricator.wikimedia.org/T427419
[22:52:38] <logmsgbot>	 !log catrope@deploy1003 catrope: Backport for [[gerrit:1294435|passwordlessLogin: Limit conditional mediation to the main login form (T427419)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[22:54:27] <logmsgbot>	 !log catrope@deploy1003 catrope: Continuing with deployment
[22:55:01] <logmsgbot>	 !log ladsgroup@cumin1003 END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
[22:58:36] <logmsgbot>	 !log catrope@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294435|passwordlessLogin: Limit conditional mediation to the main login form (T427419)]] (duration: 07m 49s)
[22:58:41] <stashbot>	 T427419: Unable to finish 2FA - https://phabricator.wikimedia.org/T427419
[22:58:47] <RoanKattouw>	 Jdlrobson: Mine is done, go ahead
[23:00:38] <Jdlrobson>	 thanks RoanKattouw
[23:01:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [extensions/ReaderExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294432 (https://phabricator.wikimedia.org/T426308) (owner: 10Eric Gardner)
[23:01:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285523 (https://phabricator.wikimedia.org/T426689) (owner: 10HakanIST)
[23:02:11] <wikibugs>	 (03Merged) 10jenkins-bot: Remove MinervaNightMode config after skin cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285523 (https://phabricator.wikimedia.org/T426689) (owner: 10HakanIST)
[23:02:27] <wikibugs>	 (03PS1) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:03:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:04:15] <wikibugs>	 (03Merged) 10jenkins-bot: Exclude more content from selection [extensions/ReaderExperiments] (wmf/1.47.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1294432 (https://phabricator.wikimedia.org/T426308) (owner: 10Eric Gardner)
[23:04:42] <logmsgbot>	 !log jdlrobson@deploy1003 Started scap sync-world: Backport for [[gerrit:1294432|Exclude more content from selection (T426308)]], [[gerrit:1285523|Remove MinervaNightMode config after skin cleanup (T426689)]]
[23:04:49] <stashbot>	 T426308: [Share Highlights] Share card display edge cases - https://phabricator.wikimedia.org/T426308
[23:04:50] <stashbot>	 T426689: Remove night mode flags in Minerva and Vector - https://phabricator.wikimedia.org/T426689
[23:05:34] <wikibugs>	 (03PS2) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:06:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:06:37] <logmsgbot>	 !log jdlrobson@deploy1003 jdlrobson, h2o, egardner: Backport for [[gerrit:1294432|Exclude more content from selection (T426308)]], [[gerrit:1285523|Remove MinervaNightMode config after skin cleanup (T426689)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[23:09:12] <logmsgbot>	 !log jdlrobson@deploy1003 jdlrobson, h2o, egardner: Continuing with deployment
[23:09:28] <Jdlrobson>	 ok lgtm
[23:10:37] <wikibugs>	 (03PS3) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:11:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:13:17] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs1011.eqiad.wmnet, wdqs1013.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[23:13:24] <logmsgbot>	 !log jdlrobson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1294432|Exclude more content from selection (T426308)]], [[gerrit:1285523|Remove MinervaNightMode config after skin cleanup (T426689)]] (duration: 08m 42s)
[23:13:28] <Jdlrobson>	 all done
[23:13:30] <stashbot>	 T426308: [Share Highlights] Share card display edge cases - https://phabricator.wikimedia.org/T426308
[23:13:30] <stashbot>	 T426689: Remove night mode flags in Minerva and Vector - https://phabricator.wikimedia.org/T426689
[23:16:17] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[23:21:50] <wikibugs>	 (03PS4) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:22:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:23:54] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[23:25:54] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[23:26:46] <wikibugs>	 (03PS5) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:27:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:30:04] <wikibugs>	 (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1293821 (owner: 10TrainBranchBot)
[23:39:37] <wikibugs>	 (03PS6) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:39:38] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1294440
[23:39:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1294440 (owner: 10TrainBranchBot)
[23:40:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984) (owner: 10Ladsgroup)
[23:53:03] <wikibugs>	 (03PS7) 10Ladsgroup: Init conductwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1294438 (https://phabricator.wikimedia.org/T426984)
[23:54:47] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1294440 (owner: 10TrainBranchBot)
[23:59:57] <Amir1>	 jouncebot: nowandnext
[23:59:57] <jouncebot>	 No deployments scheduled for the next 6 hour(s) and 0 minute(s)
[23:59:57] <jouncebot>	 In 6 hour(s) and 0 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260528T0600)
[23:59:57] <jouncebot>	 In 6 hour(s) and 0 minute(s): Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260528T0600)