[06:34:17] FYI, doh6002 and durum6002 will alert for a bit to complete the reimages of Ganeti/B13 [06:47:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [06:52:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [07:03:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [07:08:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [07:12:23] FIRING: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:13:21] FIRING: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:14:02] FIRING: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [08:07:23] RESOLVED: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [08:08:21] RESOLVED: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [08:09:02] RESOLVED: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [08:13:57] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on Wikidata for Firefox (Browser extension) - https://phabricator.wikimedia.org/T398588 (10Shisma) 03NEW [08:18:03] durum5001 and doh6001 will briefly go down [08:22:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:27:00] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:28:25] FIRING: SystemdUnitFailed: anycast-healthchecker.service on durum6001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:32:00] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:33:25] RESOLVED: SystemdUnitFailed: anycast-healthchecker.service on durum6001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:27:04] 06Traffic: Consider using the alternative chain of Google Trust Services certificates - https://phabricator.wikimedia.org/T398596 (10Vgutierrez) 03NEW [09:27:13] 06Traffic: Consider using the alternative chain of Google Trust Services certificates - https://phabricator.wikimedia.org/T398596#10971587 (10Vgutierrez) p:05Triage→03Medium [09:32:30] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Codfw: management down to racks D3 and D8 (switch port down) - https://phabricator.wikimedia.org/T398598 (10cmooney) 03NEW p:05Triage→03High [09:34:44] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Codfw: management down to racks D3 and D8 (switch port down) - https://phabricator.wikimedia.org/T398598#10971628 (10cmooney) [10:17:01] 10netops, 06Infrastructure-Foundations, 10netbox, 06SRE: Decom cookbook: delete virtual interfaces from device - https://phabricator.wikimedia.org/T398412#10971813 (10Volans) Option 2 LGTM too [10:22:37] 10netops, 06Infrastructure-Foundations, 10netbox, 06SRE: Decom cookbook: delete virtual interfaces from device - https://phabricator.wikimedia.org/T398412#10971847 (10cmooney) [10:59:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612 (10cmooney) 03NEW p:05Triage→03High [13:17:48] doh6001/durum6001 will go down/alert for a bit [13:17:56] moritzm: thanks, no worries [13:28:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:33:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:35:40] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612#10972622 (10Jclark-ctr) @cmooney i am available to assist [13:39:30] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:40:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.44:443 @ registry2004 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=misc - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [13:41:20] ^ reboot, so expected [13:41:22] for registry2004 [13:44:30] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:45:51] RESOLVED: [2x] FermMSS: Unexpected MSS value on 10.2.1.44:443 @ registry2004 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=misc - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [13:56:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612#10972687 (10cmooney) @Jclark-ctr has replaced the optics both side of the link. Link is up and light levels healthy, we'll see how it goe... [13:57:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612#10972688 (10Jclark-ctr) Replaced both optics no spares on site now at eqiad [13:58:07] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: VC link from asw2-c4-eqiad to asw2-c7-eqiad flapping - https://phabricator.wikimedia.org/T398612#10972689 (10Jclark-ctr) sr4 optics black handle @RobH [14:12:37] 06Traffic, 10Page Content Service, 06Wikipedia-Android-App-Backlog, 10Content-Transform-Team (Work In Progress): [[2025 Coeur d'Alene shooting]] showing old version in Android app - https://phabricator.wikimedia.org/T398243#10972786 (10MSantos) [14:15:44] 06Traffic, 06collaboration-services, 06SRE: Document how to deploy changes to DNS repo without Gerrit working - https://phabricator.wikimedia.org/T336754#10972817 (10ABran-WMF) [14:37:24] 06Traffic, 06collaboration-services, 06SRE: Document how to deploy changes to DNS repo without Gerrit working - https://phabricator.wikimedia.org/T336754#10972892 (10ssingh) Happy to collaborate on this, FWIW. [15:10:50] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10973012 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=18b0e287-37ef-45af-8eea-bfbccbf5c316) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [15:22:23] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10973044 (10Vgutierrez) [17:24:48] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Link down between cr3-ulsfo and cr4-ulsfo - https://phabricator.wikimedia.org/T390731#10973383 (10cmooney) FWIW I seen an interesting talk from the latest Nanog conference about "return loss" on shorter and faster links which can c...