[06:16:48] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510 (10Papaul) 03NEW [06:43:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511 (10Papaul) 03NEW [06:43:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11317386 (10Papaul) p:05Triage→03Medium [06:43:54] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11317387 (10Papaul) p:05Triage→03Medium [07:56:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11317482 (10cmooney) @papaul looks good! Nothing jumping out at me as problematic in terms of the connectivity plan. I don't think it makes sense to use 40G tho... [09:41:00] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar), 05WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532 (10LSobanski) 03NEW [09:43:42] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar), 05WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11317892 (10LSobanski) p:05Triage→03High [09:44:21] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar), 05WMF-NDA: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11317895 (10LSobanski) [10:14:25] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T407833#11318022 (10cmooney) 05Open→03Resolved I removed these additional sessions last week but got distracted and didn't come back to edi... [12:08:43] FIRING: [4x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:13:43] FIRING: [18x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:18:43] FIRING: [18x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:23:43] FIRING: [18x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [12:28:43] RESOLVED: [18x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [13:18:28] 06Traffic, 06DC-Ops, 10ops-codfw: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549 (10ssingh) 03NEW [13:18:36] 06Traffic, 06DC-Ops, 10ops-codfw: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318574 (10ssingh) p:05Triage→03High [13:35:59] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11318699 (10LSobanski) [13:39:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11318700 (10Papaul) @cmooney thanks for the feedback, I will upgrade the diagram to match the 100G links between the core routers and the switches and the type of... [14:27:21] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318894 (10Jhancock.wm) logged into idrac and found following error. ` A critical diagnostic event occurred in the memory device at B2. Contact your service provider for assistance in... [14:30:58] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11318918 (10tappof) I saw the alerts on the ALERTS metric: https://w.wiki/FqSi . I think there was a silence rule in place, so you didn't get any notifications.... [14:31:46] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: lvs2011 hardware issue after reboot - https://phabricator.wikimedia.org/T408549#11318932 (10ssingh) 05Open→03Resolved a:03ssingh Thanks for the help @Jhancock.wm. Marking this as resolved for now. [14:38:23] hello traffic friends - I have another haproxy config change to roll out to A:cp hosts. any concerns if I move ahead with that at some point soon? [14:46:59] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378#11319051 (10cmooney) >>! In T408378#11318918, @tappof wrote: > I saw the alerts on the ALERTS metric: https://w.wiki/FqSi . Ok thanks for that! That is a good... [15:04:14] 06Traffic, 06SRE, 05FY2025-26 WE3.3 Engaging core audiences, 06Reader Experience Team (REx Sprint 8 [Q2 Oct 21-Nov 3]): [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11319191 (10Jdrewniak) When I talked to #traffic about this topic... [15:29:18] * swfrench-wmf is doing this now [16:02:54] * swfrench-wmf is done [16:06:52] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: MAGRU power maint - CHG0262056 - October 29-30, 2025 - https://phabricator.wikimedia.org/T408589#11319592 (10RobH) @netops & #traffic: I don't expect any impact from this according to the notification but just FYI! [17:13:01] 06Traffic, 06SRE, 05FY2025-26 WE3.3 Engaging core audiences, 06Reader Experience Team (REx Sprint 8 [Q2 Oct 21-Nov 3]): [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11320228 (10CDanis) Sounds good to me @Jdrewniak ! Thanks :) [17:25:47] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320304 (10RobH) [[ https://docs.google.com/spreadsheets/d/13ow4JxrsQdz8KSsdBBNwvlrAuGKo8OHWcnR4RhXTYc0/edit?usp=sharing | Google Sheet listing of all affect... [17:38:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320374 (10RobH) [17:38:34] 06Traffic: Containerize ncmonitor - https://phabricator.wikimedia.org/T408617 (10BCornwall) 03NEW [17:38:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11320386 (10RobH) [17:38:54] 06Traffic: Containerize ncmonitor - https://phabricator.wikimedia.org/T408617#11320388 (10BCornwall) p:05Triage→03Low [18:32:06] 06Traffic: ncmonitor should verify that DNSSEC is disabled in MarkMonitor - https://phabricator.wikimedia.org/T402961#11320592 (10BCornwall) 05Open→03In progress [22:22:25] 06Traffic, 06Experimentation Lab, 07Essential-Work: Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11321481 (10JVanderhoop-WMF) [23:48:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11321730 (10Papaul)