[00:32:35] <ryankemper>	 !log [WDQS] Restarted blazegraph on `wdqs101[1,3]`
[00:32:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:33:40] <ryankemper>	 !log [WDQS] Restarted blazegraph on `wdqs1014` as well. all 3 hosts were deadlocked
[00:33:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:39:07] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240830
[00:39:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240830 (owner: 10TrainBranchBot)
[00:53:52] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1240830 (owner: 10TrainBranchBot)
[00:56:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[01:08:51] <wikibugs>	 (03CR) 10ArielGlenn: "This looks clearer for people skimming through the tests, and that's a good thing. Left some style comments for you." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1239972 (owner: 10Daniel Kinzler)
[01:09:12] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240831
[01:09:12] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240831 (owner: 10TrainBranchBot)
[01:11:51] <wikibugs>	 06SRE, 10MediaWiki-extensions-OAuth, 06MediaWiki-Platform-Team, 05MW-1.46-notes (1.46.0-wmf.16; 2026-02-17): Editing using OAuth 2 doesn’t work - https://phabricator.wikimedia.org/T417839#11634933 (10Rtconner) Yes mine is working good too thank you.
[01:21:00] <wikibugs>	 (03PS1) 10BryanDavis: Revert "extension-list: add a bogus extension to test l10n-update" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240832 (https://phabricator.wikimedia.org/T411516)
[01:27:26] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 23 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240832 (https://phabricator.wikimedia.org/T411516) (owner: 10BryanDavis)
[01:34:55] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1240831 (owner: 10TrainBranchBot)
[01:36:09] <zabe>	 jouncebot: nowandnext
[01:36:10] <jouncebot>	 No deployments scheduled for the next 5 hour(s) and 23 minute(s)
[01:36:10] <jouncebot>	 In 5 hour(s) and 23 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T0700)
[01:39:41] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Start reading from new file tables on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239497 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe)
[01:40:37] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from new file tables on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1239497 (https://phabricator.wikimedia.org/T416548) (owner: 10Zabe)
[01:41:48] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1239497|Start reading from new file tables on mediawikiwiki (T416548)]]
[01:41:52] <stashbot>	 T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548
[01:44:00] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:1239497|Start reading from new file tables on mediawikiwiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[01:45:03] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[01:48:04] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[01:49:05] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1239497|Start reading from new file tables on mediawikiwiki (T416548)]] (duration: 07m 17s)
[01:49:09] <stashbot>	 T416548: Start reading from file table on wmf production - https://phabricator.wikimedia.org/T416548
[02:08:21] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:12:39] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:15:39] <wikibugs>	 (03PS1) 10Dwisehaupt: Add spf records for civicrm and frmx hosts [dns] - 10https://gerrit.wikimedia.org/r/1240834 (https://phabricator.wikimedia.org/T417958)
[02:33:21] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:42:40] <wikibugs>	 06SRE, 10MediaWiki-extensions-OAuth, 06MediaWiki-Platform-Team, 05MW-1.46-notes (1.46.0-wmf.16; 2026-02-17): Editing using OAuth 2 doesn’t work - https://phabricator.wikimedia.org/T417839#11635028 (10Legoktm) Was any integration or regression test added for this?
[03:19:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[05:23:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[05:24:41] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[05:43:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[05:54:41] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[05:55:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between cr2-eqord and cr1-eqiad (208.80.154.196) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=eqiad&var-device=cr2-eqord:9804&var-bgp_group=Confed_eqiad&var-bgp_neighbor=cr1-eqiad - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:00:39] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:08:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[06:10:39] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:12:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:13:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[06:14:41] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[06:16:24] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:17:09] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:21:24] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[06:58:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T0700)
[07:03:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:13:32] <logmsgbot>	 !log arnaudb@cumin1003 START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
[07:18:02] <logmsgbot>	 !log arnaudb@cumin1003 END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
[07:18:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:19:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Run IDM spec tests on Bookworm/Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1240840
[07:20:40] <wikibugs>	 (03PS1) 10Muehlenhoff: lvs: Run spec tests on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1240841
[07:24:41] <jinxer-wm>	 FIRING: [6x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:25:04] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[07:26:39] <jinxer-wm>	 FIRING: [3x] CoreBGPDown: Core BGP session down between cr2-eqord and cr1-eqiad (208.80.154.196) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[07:29:21] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] gerrit: resume replication on gerrit-spare [puppet] - 10https://gerrit.wikimedia.org/r/1240689 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb)
[07:29:41] <jinxer-wm>	 FIRING: [6x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:31:39] <jinxer-wm>	 FIRING: [6x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[07:33:21] <jinxer-wm>	 FIRING: [6x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:33:55] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/1240842
[07:35:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[07:36:39] <jinxer-wm>	 RESOLVED: [6x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[07:37:46] <jinxer-wm>	 FIRING: [7x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
[07:37:57] <jinxer-wm>	 FIRING: [2x] GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in codfw - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable
[07:38:42] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service gerrit2003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#gerrit2003:29418 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:42:46] <jinxer-wm>	 RESOLVED: [4x] GerritHAProxyServiceUnavailable: Gerrit tcp-proxy (HAProxy) service gerrit_ssh is DOWN in codfw - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyServiceUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyServiceUnavailable
[07:42:51] <jinxer-wm>	 RESOLVED: [14x] GerritHAProxyBackendUnavailable: Gerrit backend is unavilable for tcp-proxy (HAProxy) gerrit_ssh - https://wikitech.wikimedia.org/wiki/Gerrit/Operations#GerritHAProxyBackendUnavailable - grafana.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy?viewPanel=panel-15 - https://alerts.wikimedia.org/?q=alertname%3DGerritHAProxyBackendUnavailable
[07:43:42] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service gerrit2003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#gerrit2003:29418 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:43:55] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove access for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/1240842
[07:47:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove access for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/1240842 (owner: 10Muehlenhoff)
[07:47:24] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "LGTM!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) (owner: 10Ryan Kemper)
[07:49:41] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:52:54] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] hadoop.reboot-workers: make host override smarter [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) (owner: 10Ryan Kemper)
[07:55:32] <wikibugs>	 (03PS9) 10Brouberol: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[07:55:37] <wikibugs>	 06SRE, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 13Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11635286 (10RKemper) `an-test-worker*` done
[07:57:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove dotfiles of three absented users [puppet] - 10https://gerrit.wikimedia.org/r/1240847
[07:58:15] <wikibugs>	 (03Merged) 10jenkins-bot: hadoop.reboot-workers: make host override smarter [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) (owner: 10Ryan Kemper)
[07:59:08] <wikibugs>	 (03PS1) 10Arnaudb: gerrit: gerrit-spare lfs-sync enable [puppet] - 10https://gerrit.wikimedia.org/r/1240846 (https://phabricator.wikimedia.org/T417246)
[07:59:12] <wikibugs>	 (03PS1) 10Arnaudb: gerrit: fix gerrit1003 ssh fingerprint [puppet] - 10https://gerrit.wikimedia.org/r/1240848 (https://phabricator.wikimedia.org/T417246)
[07:59:55] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] gerrit: gerrit-spare lfs-sync enable [puppet] - 10https://gerrit.wikimedia.org/r/1240846 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb)
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T0800)
[08:01:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[08:01:47] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] gerrit: fix gerrit1003 ssh fingerprint [puppet] - 10https://gerrit.wikimedia.org/r/1240848 (https://phabricator.wikimedia.org/T417246) (owner: 10Arnaudb)
[08:03:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove dotfiles of three absented users [puppet] - 10https://gerrit.wikimedia.org/r/1240847 (owner: 10Muehlenhoff)
[08:03:38] <wikibugs>	 (03PS1) 10Brouberol: Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850
[08:09:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850 (owner: 10Brouberol)
[08:13:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:15:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-eqord and cr1-eqiad (208.80.154.196) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:16:05] <wikibugs>	 (03PS2) 10Brouberol: Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850
[08:17:00] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster class and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1240853 (https://phabricator.wikimedia.org/T365798)
[08:18:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:19:13] <logmsgbot>	 !log brouberol@cumin1003 START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
[08:20:03] <wikibugs>	 (03PS3) 10Brouberol: Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850
[08:20:39] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:22:22] <icinga-wm>	 PROBLEM - BFD status on lsw1-e3-eqiad.mgmt is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[08:23:50] <wikibugs>	 (03PS4) 10Brouberol: Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850
[08:27:37] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240853 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[08:28:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:29:08] <logmsgbot>	 !log brouberol@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
[08:29:22] <icinga-wm>	 RECOVERY - BFD status on lsw1-e3-eqiad.mgmt is OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[08:29:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster::base_repo [puppet] - 10https://gerrit.wikimedia.org/r/1240856 (https://phabricator.wikimedia.org/T365798)
[08:30:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Use importlib.metadata instead of pkg_resources, now deprecated/removed. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240850 (owner: 10Brouberol)
[08:32:09] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:33:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:33:50] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 10Event-Platform: Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561#11635329 (10Gehel)
[08:34:41] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[08:37:09] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:39:57] <wikibugs>	 (03CR) 10Gehel: "LGTM, minus the build issue" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[08:48:17] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240856 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[08:56:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Globalsign/Digicert stub certs used by PCC [labs/private] - 10https://gerrit.wikimedia.org/r/1240865 (https://phabricator.wikimedia.org/T414955)
[08:57:38] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11635379 (10Tacsipacsi) >>! In T414805#11632990, @Ladsgroup wrote: > A global rate limit effectively is not that different than full...
[08:58:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove various stub certs for now removed cergen certs [labs/private] - 10https://gerrit.wikimedia.org/r/1240866 (https://phabricator.wikimedia.org/T357750)
[08:59:54] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove Globalsign/Digicert stub certs used by PCC [labs/private] - 10https://gerrit.wikimedia.org/r/1240865 (https://phabricator.wikimedia.org/T414955) (owner: 10Muehlenhoff)
[09:00:15] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove various stub certs for now removed cergen certs [labs/private] - 10https://gerrit.wikimedia.org/r/1240866 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[09:00:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove long obsoleted sni/star stub certs [labs/private] - 10https://gerrit.wikimedia.org/r/1240867
[09:10:18] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove long obsoleted sni/star stub certs [labs/private] - 10https://gerrit.wikimedia.org/r/1240867 (owner: 10Muehlenhoff)
[09:12:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review, 07Puppet (Puppet 7.0): Phase out cergen - https://phabricator.wikimedia.org/T357750#11635406 (10MoritzMuehlenhoff) 05Open→03Resolved cergen is fully undeployed from our infrastructure: All certificates have been mig...
[09:13:44] <wikibugs>	 (03CR) 10Elukey: [C:03+2] setup.py: Pin setuptools < 82.0.0 to make pkg_resources available. [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240702 (owner: 10Blake)
[09:15:23] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11635412 (10Volans) @Blake I think we should do that check inside the logic that fires up the cookbook, namely...
[09:26:15] <wikibugs>	 (03PS3) 10Effie Mouzeli: x-wikimedia-debug-routing: add routing to mw-parsoid [puppet] - 10https://gerrit.wikimedia.org/r/1240750 (https://phabricator.wikimedia.org/T386246)
[09:28:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[09:28:59] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] x-wikimedia-debug-routing: add routing to mw-parsoid [puppet] - 10https://gerrit.wikimedia.org/r/1240750 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[09:29:50] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] x-wikimedia-debug-routing: add routing to mw-parsoid [puppet] - 10https://gerrit.wikimedia.org/r/1240750 (https://phabricator.wikimedia.org/T386246) (owner: 10Effie Mouzeli)
[09:30:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between cr2-eqord and cr1-eqiad (208.80.154.196) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:33:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[09:33:56] <hashar>	 jouncebot: nowandnext
[09:33:56] <jouncebot>	 For the next 22 hour(s) and 26 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T0800)
[09:33:56] <jouncebot>	 In 2 hour(s) and 26 minute(s): GitLab version upgrades (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T1200)
[09:34:00] <hashar>	 arnaudb: I am going to upgrade the CI Jenkins on contint1002
[09:35:39] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:36:29] <wikibugs>	 (03CR) 10Elukey: hadoop.reboot-workers: make host override smarter (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) (owner: 10Ryan Kemper)
[09:38:57] <arnaudb>	 ack !
[09:40:09] <hashar>	 I wanted to do it yesterday but well.. I ended up not having the time for it :b
[09:41:20] <hashar>	 !log Upgraded CI Jenkins from 2.528.3 to 2.541.2 # T417791
[09:41:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[09:44:46] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove puppetmaster class and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1240853 (https://phabricator.wikimedia.org/T365798)
[09:44:50] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240853 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[09:44:58] <wikibugs>	 (03PS10) 10Brouberol: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[09:46:09] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:46:15] <wikibugs>	 (03CR) 10Sergio Gimeno: [C:03+1] "No objections, just a question." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240694 (https://phabricator.wikimedia.org/T417422) (owner: 10Urbanecm)
[09:50:01] <wikibugs>	 (03PS3) 10Muehlenhoff: Fix copy&paste errors in comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240202
[09:51:09] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:51:24] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:53:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[09:53:21] <jinxer-wm>	 FIRING: [4x] CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-1/0/1:0 (Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533 24ms 10Gbps wave) {#10180823000321:0}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[09:56:09] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-eqord (208.80.154.198) - group Confed_eqord - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[09:58:11] <wikibugs>	 (03PS11) 10Brouberol: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[10:00:38] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06serviceops-radar, 07Puppet (Puppet 7.0): expose_puppet_certs:  Services will need to trust the new ca - https://phabricator.wikimedia.org/T340741#11635512 (10taavi) 05Open→03Resolved assuming this is done as all of the checkboxes have been checked
[10:02:53] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Fix copy&paste errors in comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240202 (owner: 10Muehlenhoff)
[10:05:59] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Puppet 5 support from Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240877 (https://phabricator.wikimedia.org/T365798)
[10:08:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[10:09:20] <wikibugs>	 (03PS12) 10Brouberol: elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[10:10:48] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review, 07Puppet (Puppet 7.0): Next steps for Puppet 7 - https://phabricator.wikimedia.org/T330490#11635528 (10MoritzMuehlenhoff)
[10:12:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:13:22] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11635530 (10Nux) This would not change the calculations much, but there are [[ https://global-search.toolforge.org/?q=thumb%5C%2F.%5C...
[10:13:36] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[10:14:10] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[10:14:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove Puppet 5 support from Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240877 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[10:14:56] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove bgpalerter spec test [puppet] - 10https://gerrit.wikimedia.org/r/1240878
[10:15:11] <wikibugs>	 (03CR) 10Arnaudb: "the server is back in production, backups should be able to properly run again" [puppet] - 10https://gerrit.wikimedia.org/r/1240650 (owner: 10Jcrespo)
[10:15:30] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] Revert "backup: Temporarily ignore backup job failures from gerrit1003" [puppet] - 10https://gerrit.wikimedia.org/r/1240650 (owner: 10Jcrespo)
[10:15:43] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove bgpalerter spec test [puppet] - 10https://gerrit.wikimedia.org/r/1240878
[10:18:57] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] elasticsearch_cluster: allow checking last reboot [software/spicerack] - 10https://gerrit.wikimedia.org/r/1235112 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[10:26:19] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove Puppet 5 support from Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240877 (https://phabricator.wikimedia.org/T365798)
[10:29:19] <wikibugs>	 (03PS1) 10Hashar: zuul: pin spec tests to Bullseye (10) [puppet] - 10https://gerrit.wikimedia.org/r/1240879
[10:30:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] zuul: pin spec tests to Bullseye (10) [puppet] - 10https://gerrit.wikimedia.org/r/1240879 (owner: 10Hashar)
[10:31:58] <wikibugs>	 (03PS2) 10Hashar: zuul: pin spec tests to Bullseye (10) [puppet] - 10https://gerrit.wikimedia.org/r/1240879
[10:33:04] <wikibugs>	 (03CR) 10Majavah: [C:04-1] "bullseye would be Debian 11, not 10" [puppet] - 10https://gerrit.wikimedia.org/r/1240879 (owner: 10Hashar)
[10:34:14] <logmsgbot>	 !log ammarpad@deploy2002 mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at [[phab:T417862]]' 'Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships' 'Wikimedia Foundation/Advancement/Community Growth/Community Investment and Partnerships' Ammarpad  # T417862
[10:34:18] <stashbot>	 T417862: Request to move translatable page: :meta:Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships - https://phabricator.wikimedia.org/T417862
[10:35:04] <wikibugs>	 (03CR) 10Hashar: "I keep being confused by the doubled version system (names vs number) :-\  Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1240879 (owner: 10Hashar)
[10:37:06] <wikibugs>	 (03PS3) 10Hashar: zuul: pin spec tests to Bullseye (11) [puppet] - 10https://gerrit.wikimedia.org/r/1240879
[10:37:08] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on A:wikikube-staging-worker-codfw
[10:38:41] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS trixie
[10:38:48] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1240879 (owner: 10Hashar)
[10:38:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] zuul: pin spec tests to Bullseye (11) [puppet] - 10https://gerrit.wikimedia.org/r/1240879 (owner: 10Hashar)
[10:45:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Fix copy&paste errors in comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/1240202 (owner: 10Muehlenhoff)
[10:45:59] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10Prod-Kubernetes, 06ServiceOps new, and 5 others: Fix thumbor discovery records and make swift use them - https://phabricator.wikimedia.org/T397618#11635575 (10MLechvien-WMF) @JTweed-WMF would you have inputs on how to triage this task?
[10:47:52] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[10:47:58] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[10:48:04] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[10:48:31] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[10:49:04] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11635578 (10Blake) I think I'd be inclined to prefer the more-defensive option (maybe @Clement_Goubert has a pr...
[10:56:30] <wikibugs>	 (03PS1) 10Joal: dse-k8s-eqiad: Add turnilo-next namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240884 (https://phabricator.wikimedia.org/T416120)
[10:57:12] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
[10:58:22] <wikibugs>	 (03PS2) 10Hnowlan: Revert "svg: refuse to generate SVGs larger than a particular size" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1218284 (https://phabricator.wikimedia.org/T411076) (owner: 10Muehlenhoff)
[10:58:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "svg: refuse to generate SVGs larger than a particular size" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1218284 (https://phabricator.wikimedia.org/T411076) (owner: 10Muehlenhoff)
[10:58:47] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T415786)', diff saved to https://phabricator.wikimedia.org/P88914 and previous config saved to /var/cache/conftool/dbconfig/20260220-105847-marostegui.json
[10:58:51] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[11:00:08] <wikibugs>	 (03CR) 10Hnowlan: "This change has been reverted in another change and builds on master are now successful again" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1218284 (https://phabricator.wikimedia.org/T411076) (owner: 10Muehlenhoff)
[11:02:39] <wikibugs>	 (03PS1) 10Joal: Add turnilo-next to dse-k8s-eqiad kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1240887 (https://phabricator.wikimedia.org/T416119)
[11:03:23] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
[11:03:50] <wikibugs>	 (03PS1) 10Clément Goubert: service mesh: Add page-analytics listener [puppet] - 10https://gerrit.wikimedia.org/r/1240888 (https://phabricator.wikimedia.org/T411769)
[11:04:11] <wikibugs>	 (03PS1) 10Muehlenhoff: civicrm: Run spec tests on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1240889
[11:04:42] <wikibugs>	 (03PS3) 10Clément Goubert: wikifeeds: Add request definition for page analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1220629 (https://phabricator.wikimedia.org/T411769) (owner: 10Jgiannelos)
[11:04:49] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] IPReputation: Lower IPoid request and connect timeouts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240769 (https://phabricator.wikimedia.org/T417910) (owner: 10Kosta Harlan)
[11:05:27] <wikibugs>	 (03CR) 10Clément Goubert: wikifeeds: Add request definition for page analytics (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1220629 (https://phabricator.wikimedia.org/T411769) (owner: 10Jgiannelos)
[11:05:55] <wikibugs>	 (03CR) 10Ayounsi: "I have no opinion on that" [puppet] - 10https://gerrit.wikimedia.org/r/1240878 (owner: 10Muehlenhoff)
[11:06:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] civicrm: Run spec tests on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1240889 (owner: 10Muehlenhoff)
[11:06:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wikifeeds: Add request definition for page analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/1220629 (https://phabricator.wikimedia.org/T411769) (owner: 10Jgiannelos)
[11:08:08] <wikibugs>	 (03PS1) 10Joal: Add helm chart for turnilo UI [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240890 (https://phabricator.wikimedia.org/T416118)
[11:11:48] <wikibugs>	 (03PS1) 10Joal: Add turnilo-next helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240891 (https://phabricator.wikimedia.org/T416121)
[11:13:56] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P88915 and previous config saved to /var/cache/conftool/dbconfig/20260220-111355-marostegui.json
[11:15:55] <wikibugs>	 (03PS1) 10Mszwarc: Ensure that sysops don't have '(oathauth-recover-for-user)' right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240892 (https://phabricator.wikimedia.org/T417877)
[11:16:30] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] [Growth] Force legacy validation of GrowthMentorList [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240694 (https://phabricator.wikimedia.org/T417422) (owner: 10Urbanecm)
[11:16:43] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] [Growth] beta: Enable new GrowthMentorList validation on beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240697 (https://phabricator.wikimedia.org/T417422) (owner: 10Urbanecm)
[11:17:19] <wikibugs>	 (03PS2) 10Muehlenhoff: civicrm: Run spec tests on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1240889
[11:20:45] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS trixie
[11:21:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove bgpalerter spec test [puppet] - 10https://gerrit.wikimedia.org/r/1240878 (owner: 10Muehlenhoff)
[11:24:32] <wikibugs>	 06SRE: Remove production data access for NDA expired user mobrovac - https://phabricator.wikimedia.org/T388030#11635678 (10MoritzMuehlenhoff) 05Open→03Resolved Access has been removed
[11:26:49] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS trixie
[11:26:57] <wikibugs>	 06SRE, 10MediaWiki-extensions-OAuth, 06MediaWiki-Platform-Team, 05MW-1.46-notes (1.46.0-wmf.16; 2026-02-17): Editing using OAuth 2 doesn’t work - https://phabricator.wikimedia.org/T417839#11635687 (10Tgr) Not for this incident specifically, but we are planning to update tests in the next few weeks ({T4...
[11:28:06] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[11:28:27] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[11:29:04] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P88916 and previous config saved to /var/cache/conftool/dbconfig/20260220-112903-marostegui.json
[11:29:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Record LDAP access for mikez [puppet] - 10https://gerrit.wikimedia.org/r/1240896
[11:33:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Record LDAP access for mikez [puppet] - 10https://gerrit.wikimedia.org/r/1240896 (owner: 10Muehlenhoff)
[11:38:54] <wikibugs>	 (03PS3) 10Tiziano Fogli: thanos::rule: add ExecReload to the service unit [puppet] - 10https://gerrit.wikimedia.org/r/1239906 (https://phabricator.wikimedia.org/T414579)
[11:38:55] <wikibugs>	 (03PS26) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579)
[11:40:32] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-parsoid/experimental: update resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240899
[11:44:12] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T415786)', diff saved to https://phabricator.wikimedia.org/P88917 and previous config saved to /var/cache/conftool/dbconfig/20260220-114412-marostegui.json
[11:44:14] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] mw-parsoid/experimental: update resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240899 (owner: 10Effie Mouzeli)
[11:44:17] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[11:44:29] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
[11:44:33] <wikibugs>	 (03PS27) 10Tiziano Fogli: slothslos: add module to build and deploy sloth manifests [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579)
[11:44:38] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88918 and previous config saved to /var/cache/conftool/dbconfig/20260220-114437-marostegui.json
[11:44:56] <wikibugs>	 (03CR) 10Tiziano Fogli: "The flattening step now happens before generation." [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[11:45:48] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-parsoid/experimental: update resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240899 (owner: 10Effie Mouzeli)
[11:47:46] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
[11:47:48] <wikibugs>	 (03Merged) 10jenkins-bot: mw-parsoid/experimental: update resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240899 (owner: 10Effie Mouzeli)
[11:48:39] <wikibugs>	 (03PS2) 10Majavah: cr-cloud: Move allow-public below deny-to-private-subnets [homer/public] - 10https://gerrit.wikimedia.org/r/970275
[11:49:07] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
[11:49:40] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
[11:49:48] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-experimental: apply
[11:50:24] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
[11:50:36] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[11:51:16] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[11:51:24] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[11:52:06] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[11:52:28] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 06ServiceOps new, and 2 others: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11635763 (10Clement_Goubert) I'd rather we be more defensive than not, especially if there is no strong enforce...
[11:54:19] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] cr-cloud: Move allow-public below deny-to-private-subnets [homer/public] - 10https://gerrit.wikimedia.org/r/970275 (owner: 10Majavah)
[11:54:57] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
[12:00:04] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[12:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T0800)
[12:00:05] <jouncebot>	 jelto, arnoldokoth, mutante, and arnaudb: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) GitLab version upgrades deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260220T1200).
[12:00:11] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 23 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240892 (https://phabricator.wikimedia.org/T417877) (owner: 10Mszwarc)
[12:07:43] <wikibugs>	 (03CR) 10Clément Goubert: "Wouldn't they be relevant to beta, which is still on bare metal?" [puppet] - 10https://gerrit.wikimedia.org/r/1240720 (owner: 10Muehlenhoff)
[12:13:46] <wikibugs>	 (03PS1) 10Muehlenhoff: puppetserver: Update two hooks to the variants from the puppetserver module [puppet] - 10https://gerrit.wikimedia.org/r/1240924 (https://phabricator.wikimedia.org/T365798)
[12:14:49] <wikibugs>	 (03PS1) 10Hashar: jenkins: pin spec tests to Bullseye (11) to Bookworm (12) [puppet] - 10https://gerrit.wikimedia.org/r/1240925
[12:14:51] <icinga-wm>	 PROBLEM - Druid historical on an-druid1007 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.druid.cli.Main server historical https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[12:15:11] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS trixie
[12:16:33] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS trixie
[12:20:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[12:20:05] <wikibugs>	 (03PS1) 10Ayounsi: nftables: define NETWORK_INFRA [puppet] - 10https://gerrit.wikimedia.org/r/1240931
[12:20:42] <wikibugs>	 (03PS3) 10Muehlenhoff: pmacct: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/954287
[12:23:02] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:23:34] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] k8s.roll-reimage-nodes: Support exclusion of target OS version [cookbooks] - 10https://gerrit.wikimedia.org/r/1240725 (https://phabricator.wikimedia.org/T414417) (owner: 10JMeybohm)
[12:23:38] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] k8s.roll-reimage-nodes: Remove --puppet argument when calling reimage [cookbooks] - 10https://gerrit.wikimedia.org/r/1240755 (https://phabricator.wikimedia.org/T414417) (owner: 10JMeybohm)
[12:24:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM, merging" [puppet] - 10https://gerrit.wikimedia.org/r/1240925 (owner: 10Hashar)
[12:24:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] jenkins: pin spec tests to Bullseye (11) to Bookworm (12) [puppet] - 10https://gerrit.wikimedia.org/r/1240925 (owner: 10Hashar)
[12:24:43] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/954287 (owner: 10Muehlenhoff)
[12:27:25] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:27:25] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:29:13] <wikibugs>	 (03Merged) 10jenkins-bot: k8s.roll-reimage-nodes: Remove --puppet argument when calling reimage [cookbooks] - 10https://gerrit.wikimedia.org/r/1240755 (https://phabricator.wikimedia.org/T414417) (owner: 10JMeybohm)
[12:29:51] <wikibugs>	 (03Merged) 10jenkins-bot: k8s.roll-reimage-nodes: Support exclusion of target OS version [cookbooks] - 10https://gerrit.wikimedia.org/r/1240725 (https://phabricator.wikimedia.org/T414417) (owner: 10JMeybohm)
[12:30:02] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:31:50] <icinga-wm>	 RECOVERY - Druid historical on an-druid1007 is OK: PROCS OK: 1 process with command name java, args org.apache.druid.cli.Main server historical https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[12:33:02] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:34:20] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on A:wikikube-staging-worker-eqiad
[12:35:54] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS trixie
[12:36:06] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
[12:37:47] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240924 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[12:39:53] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
[12:45:32] <wikibugs>	 (03CR) 10Alex.sanford: [C:03+1] "Just noting that this will also deploy https://gitlab.wikimedia.org/repos/sre/miscweb/security-landing-page/-/merge_requests/25 (Update Te" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240819 (https://phabricator.wikimedia.org/T415379) (owner: 10SBassett)
[12:46:28] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T415786)', diff saved to https://phabricator.wikimedia.org/P88922 and previous config saved to /var/cache/conftool/dbconfig/20260220-124627-marostegui.json
[12:46:32] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[12:46:45] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] dse-k8s-eqiad: Add turnilo-next namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240884 (https://phabricator.wikimedia.org/T416120) (owner: 10Joal)
[12:46:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] dse-k8s-eqiad: Add turnilo-next namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240884 (https://phabricator.wikimedia.org/T416120) (owner: 10Joal)
[12:47:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add turnilo-next to dse-k8s-eqiad kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1240887 (https://phabricator.wikimedia.org/T416119) (owner: 10Joal)
[12:47:09] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Add turnilo-next to dse-k8s-eqiad kubeconfigs [puppet] - 10https://gerrit.wikimedia.org/r/1240887 (https://phabricator.wikimedia.org/T416119) (owner: 10Joal)
[12:51:06] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
[12:52:02] <wikibugs>	 06SRE, 06serviceops, 10Wikibase GraphQL, 06Wikibase Reuse Team, 10Wikidata: Create a rewrite for the GraphQL endpoint on wikidata.org - https://phabricator.wikimedia.org/T417026#11635862 (10LSobanski)
[12:54:00] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
[12:55:34] <wikibugs>	 06SRE, 06serviceops, 10Wikibase GraphQL, 06Wikibase Reuse Team, and 2 others: Create a rewrite for the GraphQL endpoint on wikidata.org - https://phabricator.wikimedia.org/T417026#11635869 (10taavi)
[13:01:01] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS trixie
[13:01:03] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on A:wikikube-staging-worker-codfw
[13:01:27] <wikibugs>	 (03CR) 10Rsilvola: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240819 (https://phabricator.wikimedia.org/T415379) (owner: 10SBassett)
[13:01:36] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P88923 and previous config saved to /var/cache/conftool/dbconfig/20260220-130136-marostegui.json
[13:02:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:04:41] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:08:21] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:13:06] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS trixie
[13:15:16] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS trixie
[13:16:45] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P88924 and previous config saved to /var/cache/conftool/dbconfig/20260220-131644-marostegui.json
[13:22:15] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55711 bytes in 0.075 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:22:15] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.186 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:22:53] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:30:00] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
[13:31:53] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T415786)', diff saved to https://phabricator.wikimedia.org/P88925 and previous config saved to /var/cache/conftool/dbconfig/20260220-133152-marostegui.json
[13:31:58] <stashbot>	 T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786
[13:32:09] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
[13:32:17] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1242 (T415786)', diff saved to https://phabricator.wikimedia.org/P88926 and previous config saved to /var/cache/conftool/dbconfig/20260220-133216-marostegui.json
[13:34:16] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
[13:40:04] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[13:50:12] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[13:51:18] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[13:52:20] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS trixie
[13:54:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[13:58:23] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS trixie
[14:03:15] <wikibugs>	 (03PS1) 10Brouberol: idp_test: define turnilo_next client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1240977
[14:03:24] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] idp_test: define turnilo_next client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1240977 (owner: 10Brouberol)
[14:03:29] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] idp_test: define turnilo_next client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1240977 (owner: 10Brouberol)
[14:05:47] <wikibugs>	 (03PS1) 10Brouberol: idp_test: define the turnilo_next service [puppet] - 10https://gerrit.wikimedia.org/r/1240979 (https://phabricator.wikimedia.org/T417990)
[14:08:55] <wikibugs>	 (03CR) 10Joal: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1240979 (https://phabricator.wikimedia.org/T417990) (owner: 10Brouberol)
[14:09:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-esams and fe80::5e5e:ab00:d3d:83c7 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:09:31] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] idp_test: define the turnilo_next service [puppet] - 10https://gerrit.wikimedia.org/r/1240979 (https://phabricator.wikimedia.org/T417990) (owner: 10Brouberol)
[14:12:47] <logmsgbot>	 !log jayme@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
[14:14:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-esams and fe80::5e5e:ab00:d3d:83c7 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:14:13] <wikibugs>	 (03CR) 10Jgreen: [C:03+1] Add spf records for civicrm and frmx hosts [dns] - 10https://gerrit.wikimedia.org/r/1240834 (https://phabricator.wikimedia.org/T417958) (owner: 10Dwisehaupt)
[14:19:15] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
[14:35:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[14:36:55] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS trixie
[14:36:58] <logmsgbot>	 !log jayme@cumin1003 END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on A:wikikube-staging-worker-eqiad
[14:43:02] <wikibugs>	 (03CR) 10Jsn.sherman: "this totally works, but I wonder if we should create variables for revertrisk language agnostic default/fallback/whatever (naming is hard)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240672 (https://phabricator.wikimedia.org/T411485) (owner: 10Kgraessle)
[15:01:58] <wikibugs>	 (03CR) 10Urbanecm: [Growth] Force legacy validation of GrowthMentorList (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1240694 (https://phabricator.wikimedia.org/T417422) (owner: 10Urbanecm)
[15:06:12] <wikibugs>	 (03PS10) 10Arnaudb: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417998)
[15:08:21] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:08:56] <wikibugs>	 (03CR) 10SBassett: "Yep." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240819 (https://phabricator.wikimedia.org/T415379) (owner: 10SBassett)
[15:09:10] <wikibugs>	 (03CR) 10Herron: [C:03+1] "Sweet! thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1239166 (https://phabricator.wikimedia.org/T414579) (owner: 10Tiziano Fogli)
[15:09:41] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:13:28] <wikibugs>	 (03PS11) 10Hashar: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417998) (owner: 10Arnaudb)
[15:13:36] <wikibugs>	 (03PS12) 10Hashar: gerrit: adapt httpd config to ATS [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417998) (owner: 10Arnaudb)
[15:15:58] <wikibugs>	 (03CR) 10Hashar: "I have removed T417536 which was to deal with the incident of CI not being able to clone. It has been fully fixed by disabling the TCP con" [puppet] - 10https://gerrit.wikimedia.org/r/1240197 (https://phabricator.wikimedia.org/T417998) (owner: 10Arnaudb)
[15:16:28] <wikibugs>	 (03PS1) 10Ssingh: P:bird::anycast: improve the code for IPv6 support (and automatically detect it) [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605)
[15:17:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Let's give that a shot next week, we can compare the generated set on a ferm node with an nftables node" [puppet] - 10https://gerrit.wikimedia.org/r/1240931 (owner: 10Ayounsi)
[15:18:02] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8100/co" [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[15:22:46] <wikibugs>	 (03PS1) 10Elukey: admin: set home dir for analytics-sre [puppet] - 10https://gerrit.wikimedia.org/r/1241004 (https://phabricator.wikimedia.org/T402512)
[15:23:12] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1241004 (https://phabricator.wikimedia.org/T402512) (owner: 10Elukey)
[15:23:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on kubestage2004 - https://phabricator.wikimedia.org/T416726#11636592 (10Jhancock.wm) new part shipped last night
[15:23:43] <wikibugs>	 06SRE, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in operations group for Rsilvola - https://phabricator.wikimedia.org/T418004#11636593 (10Rsilvola)
[15:24:09] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "Yeah PCC reminds me where I am wrong. I forgot that that under a single FQDN, you can have both v4 and v6. Fixing that." [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[15:29:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1241004 (https://phabricator.wikimedia.org/T402512) (owner: 10Elukey)
[15:31:07] <icinga-wm>	 RECOVERY - Router interfaces on mr1-ulsfo is OK: OK: host 198.35.26.130, interfaces up: 37, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:34:29] <wikibugs>	 (03CR) 10Elukey: [C:03+2] admin: set home dir for analytics-sre [puppet] - 10https://gerrit.wikimedia.org/r/1241004 (https://phabricator.wikimedia.org/T402512) (owner: 10Elukey)
[15:38:21] <wikibugs>	 06SRE, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11636635 (10taavi)
[15:39:54] <wikibugs>	 06SRE, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11636638 (10taavi) +2 in deployment-charts is linked to the ability to actually deploy those changes, and as far as I can tell...
[15:42:21] <wikibugs>	 (03PS2) 10Ssingh: P:bird::anycast: improve the code for IPv6 support (and automatically detect it) [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605)
[15:42:42] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11636656 (10MatthewVernon)
[15:42:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:bird::anycast: improve the code for IPv6 support (and automatically detect it) [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[15:44:16] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster::base_repo [puppet] - 10https://gerrit.wikimedia.org/r/1240856 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[15:44:41] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster class and related classes [puppet] - 10https://gerrit.wikimedia.org/r/1240853 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[15:45:23] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] puppetserver: Update two hooks to the variants from the puppetserver module [puppet] - 10https://gerrit.wikimedia.org/r/1240924 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[15:48:33] <wikibugs>	 06SRE, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11636661 (10Rsilvola) Ah, well this clarifies the situation. Thanks @taavi!
[15:59:04] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[16:04:13] <wikibugs>	 (03PS1) 10Brouberol: define new httpd-cas image based on httpd including the cas module [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990)
[16:05:57] <wikibugs>	 (03PS2) 10Brouberol: define new httpd-cas image based on httpd including the cas module [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990)
[16:07:52] <wikibugs>	 (03PS3) 10Brouberol: define new httpd-cas image based on httpd including the cas module [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990)
[16:08:21] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:12:29] <wikibugs>	 (03CR) 10Joal: [C:03+1] "LGTM! thanks so much :)" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990) (owner: 10Brouberol)
[16:14:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] define new httpd-cas image based on httpd including the cas module [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990) (owner: 10Brouberol)
[16:14:10] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] define new httpd-cas image based on httpd including the cas module [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1241007 (https://phabricator.wikimedia.org/T417990) (owner: 10Brouberol)
[16:20:00] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Q3:rack/setup/install apus-fe200[4-5] - https://phabricator.wikimedia.org/T416387#11636757 (10MatthewVernon) OK, I understand the problem - these nodes are being UEFI booted, but the installer is setup for BIOS still.  Sorry, I've got confused be...
[16:24:30] <wikibugs>	 (03CR) 10Mmartorana: [C:03+2] Version bump security-landing-page values file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240819 (https://phabricator.wikimedia.org/T415379) (owner: 10SBassett)
[16:25:55] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org - https://phabricator.wikimedia.org/T417941#11636766 (10Dzahn) @Jgreen I removed the dmarc@donate.wikimedia.org line from that alias.  I did not find any "dmarc-ruf@" alias in that p...
[16:26:57] <wikibugs>	 (03Merged) 10jenkins-bot: Version bump security-landing-page values file [deployment-charts] - 10https://gerrit.wikimedia.org/r/1240819 (https://phabricator.wikimedia.org/T415379) (owner: 10SBassett)
[16:27:12] <wikibugs>	 (03PS2) 10Dzahn: phabricator: disable dump job [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824)
[16:29:15] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "to my surprise this is noop in the compiler.." [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[16:32:25] <wikibugs>	 (03PS1) 10MVernon: installserver: use EFI booting for new apus frontends [puppet] - 10https://gerrit.wikimedia.org/r/1241010 (https://phabricator.wikimedia.org/T416387)
[16:33:21] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:41] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:35:56] <wikibugs>	 (03PS1) 10Elukey: profile::puppetserver: rework and fix the analytics-sre config [puppet] - 10https://gerrit.wikimedia.org/r/1241012 (https://phabricator.wikimedia.org/T402512)
[16:36:58] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Platform-SRE, 06DC-Ops: Q3:rack/setup/install druid-internal100[1-6] - https://phabricator.wikimedia.org/T417430#11636837 (10RobH)
[16:37:44] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[16:42:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:43:33] <logmsgbot>	 jhancock@cumin2002 netbox (PID 4065054) is awaiting input
[16:45:29] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding moss-fe2021 to codfw - jhancock@cumin2002"
[16:45:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding moss-fe2021 to codfw - jhancock@cumin2002"
[16:45:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:46:06] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host moss-fe2021
[16:46:08] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host moss-fe2022
[16:46:19] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-fe2021
[16:46:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-fe2022
[16:46:37] <wikibugs>	 (03PS1) 10Tiziano Fogli: meta-monitoring: add rewrite rule to redirect home to Wikitech [puppet] - 10https://gerrit.wikimedia.org/r/1241014 (https://phabricator.wikimedia.org/T417900)
[16:46:41] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host moss-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:47:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host moss-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:47:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[16:55:20] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012 (10RobH) 03NEW p:05Triage→03Medium
[16:56:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012#11636926 (10RobH)
[16:56:49] <wikibugs>	 (03CR) 10Ssingh: codfw: add the following cp nodes (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1240784 (owner: 10CDobbins)
[16:57:10] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:57:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:09:53] <wikibugs>	 (03PS3) 10Ssingh: P:bird::anycast: automatically detect IPv6 support [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605)
[17:11:20] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8105/console" [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[17:12:14] <logmsgbot>	 !log sbassett@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[17:12:36] <dancy>	 🐦
[17:12:46] <logmsgbot>	 !log sbassett@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[17:13:05] <logmsgbot>	 !log sbassett@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[17:13:21] <logmsgbot>	 !log sbassett@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[17:13:37] <logmsgbot>	 !log sbassett@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[17:13:52] <logmsgbot>	 !log sbassett@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[17:13:55] <logmsgbot>	 !log sbassett@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[17:14:02] <logmsgbot>	 !log sbassett@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[17:14:05] <logmsgbot>	 !log sbassett@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[17:14:10] <logmsgbot>	 !log sbassett@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[17:14:29] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11637021 (10Jhancock.wm)
[17:18:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host moss-fe2022.codfw.wmnet with OS bullseye
[17:18:28] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11637028 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host moss-fe2022.codfw.wmnet with OS bullseye
[17:18:38] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2022.codfw.wmnet with OS bullseye
[17:18:45] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11637029 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host moss-fe2022.codfw.wmnet with OS bullseye executed with errors: - mos...
[17:24:03] <wikibugs>	 06SRE, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637038 (10sbassett) (we just did this for Alex [T418015], if you'd like an example request to work from)
[17:27:40] <wikibugs>	 (03PS4) 10Ssingh: P:bird::anycast: automatically detect IPv6 support [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605)
[17:28:02] <wikibugs>	 (03CR) 10Ssingh: "PCC NOOP for a random selection of hosts in sudo cumin "C:bird%do_ipv6=true"" [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[17:28:40] <wikibugs>	 10SRE-Access-Requests, 06Gerrit-Privilege-Requests: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015#11637047 (10Dzahn)
[17:28:56] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637048 (10Dzahn)
[17:29:44] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015#11637063 (10Dzahn) Being able to +2 in the repo should be combined with being able to actually deploy changes.  Which then turns it into a request...
[17:29:54] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637065 (10Dzahn) Being able to +2 in the repo should be combined with being able to actually deploy...
[17:30:02] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8106/console" [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[17:30:39] <wikibugs>	 (03CR) 10Ssingh: "Adding a few relevant folks for the review." [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[17:30:49] <wikibugs>	 (03CR) 10Ssingh: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1241003 (https://phabricator.wikimedia.org/T81605) (owner: 10Ssingh)
[17:40:53] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015#11637097 (10ASanford-WMF) I am already in the `deployment` shell group - https://phabricator.wikimedia.org/source/operations-puppet/browse/product...
[17:45:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018 (10RobH) 03NEW
[17:46:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637114 (10RobH)
[17:47:46] <mutante>	 !log gerrit added Alex Sanford to wmf-deployment group - already has deployment shell group T418015
[17:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:51] <stashbot>	 T418015: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015
[17:48:01] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637117 (10RobH) @ayounsi or @papaul: Would one of you be best suited to provide the cable diagram and matrix so we know how/where each switch is conne...
[17:49:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015#11637118 (10Dzahn) @ASanford-WMF Oh! I see, yes. Sorry about that. I just added you to the Gerrit group as requested.
[17:50:50] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests: Request membership in wmf-deployment group for alex.sanford - https://phabricator.wikimedia.org/T418015#11637119 (10ASanford-WMF) 05Open→03Resolved a:03ASanford-WMF Great, thanks! 🙌
[17:54:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[18:14:04] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[18:40:25] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11637204 (10RobH)
[18:42:55] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host moss-fe2021.codfw.wmnet with OS bullseye
[18:43:06] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11637213 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host moss-fe2021.codfw.wmnet with OS bullseye
[18:43:15] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2021.codfw.wmnet with OS bullseye
[18:43:24] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243#11637226 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host moss-fe2021.codfw.wmnet with OS bullseye executed with errors: - mos...
[18:47:57] <jinxer-wm>	 FIRING: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip6) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:48:26] <sukhe>	 huh
[18:48:28] <sukhe>	 esams
[18:48:29] <sukhe>	 !ack
[18:48:30] <sirenbot>	 7461 (ACKED)  ProbeDown sre (2a02:ec80:300:ed1a::1 ip6 text-https:443 probes/service http_text-https_ip6 esams)
[18:48:30] <Amir1>	 here
[18:48:41] <Amir1>	 faster than the person oncall :D
[18:48:42] <sukhe>	 just ipv6 huh
[18:48:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:49:00] <sukhe>	 no not really
[18:49:01] <sukhe>	 ok
[18:49:08] <cdanis>	 sukhe: Amir1: we have rising NELs as well
[18:49:29] <cdanis>	 https://logstash.wikimedia.org/goto/74d8bb4b0159485a083635ceeb7941e9
[18:49:32] <sukhe>	 cdanis: Amir1: see noc@ email
[18:49:39] <Amir1>	 https://grafana.wikimedia.org/d/000000500/varnish-caching?orgId=1&from=now-1h&to=now&timezone=utc&var-cluster=$__all&var-site=esams&var-status=1&var-status=2&var-status=3&var-status=4&var-status=5&refresh=15m
[18:49:46] <Amir1>	 raw req is going down
[18:50:15] <wikibugs>	 (03CR) 10Dwisehaupt: [C:03+2] Add spf records for civicrm and frmx hosts [dns] - 10https://gerrit.wikimedia.org/r/1240834 (https://phabricator.wikimedia.org/T417958) (owner: 10Dwisehaupt)
[18:50:30] <jinxer-wm>	 FIRING: LibericaUnhealthyRealserverPooled: ...
[18:50:30] <jinxer-wm>	 Liberica service text-httpslb6_443 has 4 unhealthy realservers pooled on lvs3010:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled - https://grafana.wikimedia.org/d/d70d14db-4a71-414d-8425-7a30d7127ca6/liberica-services?orgId=1&var-site=esams&var-service=text-httpslb6_443&var-instance=lvs3010 - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[18:50:39] <logmsgbot>	 !log dwisehaupt@dns1004 START - running authdns-update
[18:52:08] <wikibugs>	 (03CR) 10Xcollazo: "At some other time, to disable they `absent`ed the job: I3fc05f5" [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[18:52:11] <logmsgbot>	 !log dwisehaupt@dns1004 END - running authdns-update
[18:55:30] <jinxer-wm>	 FIRING: [8x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 6 unhealthy realservers pooled on lvs3008:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[18:56:51] <jinxer-wm>	 FIRING: SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=esams - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[18:57:10] <wikibugs>	 (03CR) 10Dwisehaupt: [C:03+1] "Looks good. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1240889 (owner: 10Muehlenhoff)
[18:57:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:57:58] <jinxer-wm>	 FIRING: NELHigh: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[18:58:10] <Amir1>	 !ack
[18:58:11] <sirenbot>	 7464 (ACKED)  NELHigh sre (thanos-rule@main tcp.timed_out)
[18:58:21] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:00:51] <icinga-wm>	 PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp3068 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[19:01:55] <icinga-wm>	 RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp3068 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2026-05-07 21:41:31 +0000 (expires in 76 days) https://wikitech.wikimedia.org/wiki/HTTPS
[19:02:13] <icinga-wm>	 PROBLEM - HAProxy HTTPS wikiworkshop.org ECDSA on cp3071 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[19:02:58] <jinxer-wm>	 FIRING: NELByCountryHigh: Elevated Network Error Logging events (tcp.timed_out from RU) - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELByCountryHigh
[19:03:29] <icinga-wm>	 RECOVERY - HAProxy HTTPS wikiworkshop.org ECDSA on cp3071 is OK: SSL OK - Certificate wikiworkshop.org contains all required SANs:Certificate wikiworkshop.org (ECDSA) valid until 2026-05-13 04:44:41 +0000 (expires in 81 days) https://wikitech.wikimedia.org/wiki/HTTPS
[19:06:38] <wikibugs>	 (03PS1) 10Ssingh: config/sites: prepend_as_out to true for esams [homer/public] - 10https://gerrit.wikimedia.org/r/1241024
[19:07:07] <icinga-wm>	 PROBLEM - HAProxy HTTPS wikiworkshop.org ECDSA on cp3069 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[19:07:14] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.network.cf
[19:07:14] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.network.cf (exit_code=1)
[19:07:32] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.network.cf
[19:07:32] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.network.cf (exit_code=1)
[19:07:58] <jinxer-wm>	 FIRING: [3x] NELByCountryHigh: Elevated Network Error Logging events (tcp.timed_out from DE) - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELByCountryHigh
[19:08:23] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.network.cf
[19:08:23] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.network.cf (exit_code=1)
[19:10:05] <icinga-wm>	 RECOVERY - HAProxy HTTPS wikiworkshop.org ECDSA on cp3069 is OK: SSL OK - Certificate wikiworkshop.org contains all required SANs:Certificate wikiworkshop.org (ECDSA) valid until 2026-05-13 04:44:41 +0000 (expires in 81 days) https://wikitech.wikimedia.org/wiki/HTTPS
[19:11:15] <wikibugs>	 (03PS2) 10Ssingh: config/sites: prepend_as_out to true for drmrs [homer/public] - 10https://gerrit.wikimedia.org/r/1241024
[19:12:15] <icinga-wm>	 PROBLEM - HAProxy HTTPS wikipedia25.org ECDSA on cp3072 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS
[19:12:16] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.network.cf
[19:12:16] <logmsgbot>	 !log sukhe@cumin1003 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[19:12:27] <wikibugs>	 (03PS1) 10Bernard Wang: Migrate default user preference configuration to Community Configuration [extensions/WP25EasterEggs] (wmf/1.46.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1241026 (https://phabricator.wikimedia.org/T415355)
[19:12:52] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
[19:13:15] <icinga-wm>	 RECOVERY - HAProxy HTTPS wikipedia25.org ECDSA on cp3072 is OK: SSL OK - Certificate wikipedia25.org contains all required SANs:Certificate wikipedia25.org (ECDSA) valid until 2026-04-07 07:52:16 +0000 (expires in 45 days) https://wikitech.wikimedia.org/wiki/HTTPS
[19:13:19] <wikibugs>	 (03CR) 10Ssingh: [V:03+2 C:03+2] config/sites: prepend_as_out to true for drmrs [homer/public] - 10https://gerrit.wikimedia.org/r/1241024 (owner: 10Ssingh)
[19:14:18] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637304 (10Shawn) Why are those 429 errors unpredictable?  All this afternoon LiveRC (real time monitoring tool of recent changes on...
[19:14:34] <logmsgbot>	 !log sukhe@cumin1003 END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site esams [reason: no reason specified, ]
[19:16:22] <logmsgbot>	 !log sukhe@cumin1003 START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
[19:16:24] <logmsgbot>	 !log sukhe@cumin1003 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: no reason specified, ]
[19:18:21] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:21:51] <jinxer-wm>	 RESOLVED: SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=esams - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[19:22:57] <jinxer-wm>	 FIRING: [3x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:22:58] <jinxer-wm>	 RESOLVED: [3x] NELByCountryHigh: Elevated Network Error Logging events (tcp.timed_out from DE) - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELByCountryHigh
[19:23:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:25:30] <jinxer-wm>	 FIRING: [16x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 4 unhealthy realservers pooled on lvs3008:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[19:27:57] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:27:58] <jinxer-wm>	 RESOLVED: NELHigh: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[19:30:30] <jinxer-wm>	 RESOLVED: [16x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 4 unhealthy realservers pooled on lvs3008:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[19:39:24] <wikibugs>	 (03PS1) 10Aklapper: ProdPasteBot: Call paste.edit instead of deprecated paste.create [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572)
[19:40:58] <wikibugs>	 (03CR) 10Aklapper: "Note that this is how I imagine Python, and that this is untested." [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572) (owner: 10Aklapper)
[19:41:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ProdPasteBot: Call paste.edit instead of deprecated paste.create [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572) (owner: 10Aklapper)
[19:47:04] <wikibugs>	 (03CR) 10Aklapper: [C:04-1] "Yeah, "self.phab" is not how to get the host URI here" [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572) (owner: 10Aklapper)
[19:53:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:54:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:55:19] <_joe_>	 !ack
[19:55:20] <sirenbot>	 7465 (ACKED)  [2x] ProbeDown sre (text-https:443 probes/service drmrs)
[19:55:31] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637409 (10Dzahn) a:03thcipriani Hi Tyler, they would need both shell "deployment" and gerrit "wmf...
[19:58:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:59:26] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637416 (10Dzahn) Well.. wait.   I am saying that but it's actually just about access to deployment...
[19:59:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:01:04] <mutante>	 cccccbukvgbchetkjblbetcvjefrerkgdrthgbbdijvt
[20:06:32] <wikibugs>	 (03PS6) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:06:41] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637421 (10thcipriani) k8s deployment does require `deployment` for read access to the configs to us...
[20:08:01] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637423 (10Dzahn) a:05thcipriani→03None Thanks for that confirmation and approval:)
[20:08:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:10:30] <jinxer-wm>	 FIRING: [4x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 5 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:10:50] <wikibugs>	 (03PS7) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:13:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:13:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:14:26] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637434 (10Dzahn) @Rsilvola So.. there are 2 things you need. The shell access to the deployment ser...
[20:14:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:15:07] <wikibugs>	 (03PS8) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:15:30] <jinxer-wm>	 RESOLVED: [8x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 4 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:16:30] <jinxer-wm>	 FIRING: [4x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 2 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:17:06] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637440 (10Dzahn)
[20:18:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:19:16] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637442 (10Shawn) I'm not sure if I understood properly the issue because I tried to fix LiveRC by changing size of icons but I stil...
[20:19:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:20:27] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Gerrit-Privilege-Requests, 06Release-Engineering-Team, 06Security-Team: Request membership in wmf-deployment group for Rsilvola - https://phabricator.wikimedia.org/T418004#11637446 (10Dzahn) @Rsilvola I copied the template for shell access requests into the ticket descript...
[20:21:30] <jinxer-wm>	 RESOLVED: [5x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 2 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:22:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:24:29] <wikibugs>	 (03PS9) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:24:30] <jinxer-wm>	 FIRING: [4x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 6 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:26:51] <jinxer-wm>	 FIRING: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1  - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[20:28:21] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job probes/swagger in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:29:30] <jinxer-wm>	 FIRING: [8x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 6 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:31:51] <jinxer-wm>	 RESOLVED: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1  - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[20:32:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:33:21] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job probes/swagger in ops@drmrs - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:34:30] <jinxer-wm>	 RESOLVED: [8x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 6 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[20:34:56] <wikibugs>	 (03PS10) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:37:26] <wikibugs>	 (03PS11) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:45:33] <wikibugs>	 (03PS12) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:49:59] <wikibugs>	 (03PS13) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[20:57:11] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "That's true and thanks for the link! but $dump_job_ensure is supposed to be set by $dump being true or false." [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[20:57:27] <wikibugs>	 (03PS14) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[21:01:56] <wikibugs>	 (03PS3) 10Dzahn: phabricator: disable dump job [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824)
[21:02:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:04:30] <jinxer-wm>	 FIRING: [4x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 5 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[21:05:31] <wikibugs>	 (03PS15) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[21:06:51] <jinxer-wm>	 FIRING: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1  - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[21:08:21] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:09:30] <jinxer-wm>	 FIRING: [8x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 5 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[21:09:41] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:10:34] <logmsgbot>	 !log eevans@cumin1003 START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, ]
[21:10:37] <logmsgbot>	 !log eevans@cumin1003 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, ]
[21:10:45] <wikibugs>	 (03PS16) 10CDobbins: varnish: clean up Content-Security-Policy header [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618)
[21:11:51] <jinxer-wm>	 RESOLVED: [2x] SwaggerProbeHasFailures: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://en.wikipedia.org/api/rest_v1  - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures
[21:12:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:13:03] <wikibugs>	 (03CR) 10CDobbins: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240799 (https://phabricator.wikimedia.org/T117618) (owner: 10CDobbins)
[21:13:21] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:14:30] <jinxer-wm>	 RESOLVED: [6x] LibericaUnhealthyRealserverPooled: Liberica service gerrit-httpslb6_443 has 5 unhealthy realservers pooled on lvs6001:3003 - https://wikitech.wikimedia.org/wiki/Liberica#LibericaUnhealthyRealserverPooled  - https://alerts.wikimedia.org/?q=alertname%3DLibericaUnhealthyRealserverPooled
[21:21:43] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11637548 (10Tacsipacsi) >>! In T414805#11636273, @Ladsgroup wrote: > The rate limit is at the edge layer so it doesn't know whether i...
[21:28:38] <wikibugs>	 (03PS3) 10CDobbins: codfw: add the following cp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1240784
[21:29:31] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] "Ah, I see it now." [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[21:29:50] <wikibugs>	 (03CR) 10CDobbins: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1240784 (owner: 10CDobbins)
[21:30:22] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+1] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[21:30:49] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] phabricator: disable dump job [puppet] - 10https://gerrit.wikimedia.org/r/1240778 (https://phabricator.wikimedia.org/T417824) (owner: 10Dzahn)
[21:33:21] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:38:33] <wikibugs>	 (03CR) 10CDobbins: codfw: add the following cp nodes (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1240784 (owner: 10CDobbins)
[21:47:05] <wikibugs>	 (03CR) 10CDobbins: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8119/co" [puppet] - 10https://gerrit.wikimedia.org/r/1240784 (owner: 10CDobbins)
[21:54:41] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[22:03:11] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on kubestage2004 - https://phabricator.wikimedia.org/T418027 (10ops-monitoring-bot) 03NEW
[22:06:23] <wikibugs>	 (03PS25) 10CDobbins: prometheus: add pooled host check [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)
[22:06:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] prometheus: add pooled host check [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[22:07:16] <wikibugs>	 (03CR) 10CDobbins: prometheus: add pooled host check (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[22:09:41] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:12:14] <wikibugs>	 (03PS26) 10CDobbins: prometheus: add pooled host check [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641)
[22:13:09] <wikibugs>	 (03PS2) 10Aklapper: ProdPasteBot: Call paste.edit instead of deprecated paste.create [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572)
[22:13:21] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:13:42] <wikibugs>	 (03CR) 10CDobbins: "I'm sorry, I don't remember the answer to this question, but I should be able to install it on a puppetserver, right?" [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[22:15:54] <wikibugs>	 (03CR) 10CDobbins: [V:03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8120/console" [puppet] - 10https://gerrit.wikimedia.org/r/1219634 (https://phabricator.wikimedia.org/T406641) (owner: 10CDobbins)
[22:21:37] <wikibugs>	 (03PS1) 10Eevans: Revert "config/sites: prepend_as_out to true for drmrs" [homer/public] - 10https://gerrit.wikimedia.org/r/1241040
[22:28:24] <wikibugs>	 (03CR) 10BBlack: [C:03+2] Revert "config/sites: prepend_as_out to true for drmrs" [homer/public] - 10https://gerrit.wikimedia.org/r/1241040 (owner: 10Eevans)
[22:29:07] <wikibugs>	 (03CR) 10BBlack: [C:03+1] "+1 :)" [homer/public] - 10https://gerrit.wikimedia.org/r/1241040 (owner: 10Eevans)
[22:29:41] <wikibugs>	 (03CR) 10Eevans: [C:03+2] Revert "config/sites: prepend_as_out to true for drmrs" [homer/public] - 10https://gerrit.wikimedia.org/r/1241040 (owner: 10Eevans)
[22:29:44] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "config/sites: prepend_as_out to true for drmrs" [homer/public] - 10https://gerrit.wikimedia.org/r/1241040 (owner: 10Eevans)
[22:35:01] <logmsgbot>	 !log eevans@cumin1003 START - Cookbook sre.network.cf
[22:35:07] <logmsgbot>	 !log eevans@cumin1003 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[22:35:41] <wikibugs>	 (03CR) 10Dzahn: ProdPasteBot: Call paste.edit instead of deprecated paste.create (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572) (owner: 10Aklapper)
[22:38:26] <wikibugs>	 (03CR) 10Paladox: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1240703 (owner: 10Muehlenhoff)
[22:39:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Run Gerrit spec tests on Bullseye/Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1240703 (owner: 10Muehlenhoff)
[22:40:50] <wikibugs>	 (03CR) 10Paladox: Run Gerrit spec tests on Bullseye/Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1240703 (owner: 10Muehlenhoff)
[22:55:22] <wikibugs>	 (03PS1) 10Eevans: cassandra: enable use of Java 17 [puppet] - 10https://gerrit.wikimedia.org/r/1241042
[22:56:57] <wikibugs>	 (03PS2) 10Eevans: cassandra: enable use of Java 17 [puppet] - 10https://gerrit.wikimedia.org/r/1241042
[22:57:31] <wikibugs>	 (03PS3) 10Eevans: cassandra: enable use of Java 17 [puppet] - 10https://gerrit.wikimedia.org/r/1241042 (https://phabricator.wikimedia.org/T418010)
[22:58:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cassandra: enable use of Java 17 [puppet] - 10https://gerrit.wikimedia.org/r/1241042 (https://phabricator.wikimedia.org/T418010) (owner: 10Eevans)
[23:02:38] <logmsgbot>	 !log eevans@cumin1003 START - Cookbook sre.network.cf
[23:02:38] <logmsgbot>	 !log eevans@cumin1003 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[23:03:24] <logmsgbot>	 !log eevans@cumin1003 START - Cookbook sre.network.cf
[23:03:25] <logmsgbot>	 !log eevans@cumin1003 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[23:06:37] <wikibugs>	 (03CR) 10Aklapper: ProdPasteBot: Call paste.edit instead of deprecated paste.create (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1241027 (https://phabricator.wikimedia.org/T410572) (owner: 10Aklapper)
[23:44:07] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11637743 (10Papaul) The first step was completed by remote hands yesterday but the port number and the cable ID's were not given to me so I just got the informati...
[23:47:30] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists: Spam filtering rules for mediawiki-api@lists.wikimedia.org failing - https://phabricator.wikimedia.org/T418028#11637744 (10Quiddity) I sent this to that mailing-list's owner list last week: > You're getting a lot of spam at this mailing list, over the last few months. I wonder...