[06:39:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp5032:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [06:43:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [06:44:00] RESOLVED: PurgedHighBacklogQueue: Large backlog queue for purged on cp5032:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [06:48:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:38:40] FIRING: [2x] VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:58:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:50:42] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11253656 (10elukey) All cp hosts (but 2056) have the latest bios+idrac and I've run the complete version of the provision cookbook to apply the whole set of BIO... [09:02:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp3073:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3073 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:07:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp3073:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3073 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:51:56] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705 (10cmooney) 03NEW p:05Triage→03High [09:56:57] 06Traffic, 10Beta-Cluster-Infrastructure: Copy the Traffic team on alerts for deployment-cache* hosts - https://phabricator.wikimedia.org/T406650#11253944 (10Ladsgroup) >>! In T406650#11252791, @bd808 wrote: >>>! In T406650#11252612, @bd808 wrote: >>>>! In T406650#11252502, @Aklapper wrote: >>> I don't know th... [10:14:46] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705#11253997 (10cmooney) JTAC Case 2025-1008-891506 raised. [10:37:33] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705#11254135 (10cmooney) Typical lackluster from Juniper. After finally looking at the logs they requested we re-seat the card, so I will work to create a r... [11:34:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr2-eqiad: fan failure on left tray [Oct 2025] - https://phabricator.wikimedia.org/T406554#11254376 (10Jclark-ctr) Replaced fan modular with spare from storage room. Pending tac ticket with juniper [11:35:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr2-eqiad: fan failure on left tray [Oct 2025] - https://phabricator.wikimedia.org/T406554#11254391 (10Jclark-ctr) Verified fan speeds compared with cr1 ` jclark@re0.cr1-eqiad> show chassis fan Item Status... [11:40:11] 06Traffic, 10DNS, 06SRE: Migrate PDNS recursor config to use /etc/powerdns/recursor.d ? - https://phabricator.wikimedia.org/T389333#11254526 (10MoritzMuehlenhoff) The config shipped in Debian trixie is already following that scheme and very minimal, Debian only ships this: ` dnssec: # validation: process... [11:52:08] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705#11254678 (10cmooney) Remote hands request CS3302125 has been raised to the Digital Realty staff on site in AMS9 Science Park. [13:09:41] 06Traffic, 10Observability-Alerting, 13Patch-For-Review: Port DNS icinga checks to Alertmanager - https://phabricator.wikimedia.org/T384425#11255035 (10ssingh) [13:31:41] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705#11255104 (10cmooney) Looks like the card re-seat did the trick: ` cmooney@re0.cr1-esams> show chassis fpc 0 detail Slot 0 information: State... [14:04:51] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move lvs1020 link from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T404959#11255300 (10VRiley-WMF) Okay, was looking at this issue a bit. There are currently two fiber cables involved with this process. After... [14:20:50] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr2-eqiad: fan failure on left tray [Oct 2025] - https://phabricator.wikimedia.org/T406554#11255372 (10cmooney) 05Open→03Resolved Thanks @Jclark-ctr. As you say it seems the one that has gone in is the same model as came out.... [14:23:48] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams: MPC7E 3D 40XGE line card in slot 0 failure [Oct 2025] - https://phabricator.wikimedia.org/T406705#11255395 (10cmooney) 05Open→03Resolved esams has been re-pooled and traffic levels have returned to normal for the site. closing this task now,... [14:40:53] going to start the group1 rollout for rest.php migration in the next few minutes, it'll involve toggling puppet on A:cp for a minute - let me know if I should hold [14:41:37] no issues from our end I think [14:44:13] thanks! [14:50:05] brett: sorry I should have updated you last night, we didn't touch that cable coming from lvs1020 as there was some complications with how the fibres were routed on site [14:50:29] I think we will probably re-attempt it today, having fully worked out the current path and best way to change it [14:51:30] TL;DR the fibre goes via an intermediary patch-panel which makes the change a little more involved, but still no big deal just the way it physically goes [14:55:53] reenabled puppet [15:05:47] 06Traffic, 10Phabricator, 06Release-Engineering-Team (Radar): Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#11255569 (10Aklapper) @Vgutierrez: Would you have any shareable opinions how to proceed? Can we do anything on our side here, a... [15:08:56] thanks for the update! [15:18:23] 06Traffic, 10DNS, 06SRE: Migrate PDNS recursor config to use /etc/powerdns/recursor.d ? - https://phabricator.wikimedia.org/T389333#11255648 (10ssingh) Thanks @MoritzMuehlenhoff, that sounds like a good plan to me but leaving to @CDobbins for the final word. [16:15:13] 06Traffic, 10Phabricator, 06Release-Engineering-Team (Radar): Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#11255927 (10Vgutierrez) Taking a second look at the curl reproducer, I'm seeing the following behavior: First request using: `... [16:36:17] 06Traffic, 10Phabricator, 06Release-Engineering-Team (Radar): Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#11256047 (10Vgutierrez) following up on my last comment, the webm file size is 1660261 bytes, so a request asking for a range st... [17:45:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11256390 (10RobH) [17:46:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11256391 (10RobH) [18:33:51] Varnish HTTP 30x Int responses, nicely fading away [18:33:54] https://grafana.wikimedia.org/d/000000500/varnish-caching?orgId=1&from=now-30d&to=now&timezone=utc&var-cluster=$__all&var-site=$__all&var-status=3&refresh=15m&viewPanel=panel-7 [18:35:29] nice [18:36:39] Last 2 days, showing yesterday very clearly plus annotation from Grafana DB: https://grafana.wikimedia.org/d/000000500/varnish-caching?orgId=1&from=now-2d&to=now&timezone=utc&var-cluster=$__all&var-site=$__all&var-status=3&refresh=15m&viewPanel=panel-7 [18:37:32] Hit/miss rate hasn't moved at all which surprises me slightly. Same for mw RED side. But I guess that's just from doing it slow enough over the weeks and today [18:39:49] https://grafana.wikimedia.org/d/35WSHOjVk/application-servers-red-k8s?orgId=1&from=now-2d&to=now&timezone=utc&var-site=$__all&var-deployment=mw-web&var-method=GET&var-code=200&var-handler=php&var-service=mediawiki&refresh=1m [18:41:46] Same for eg Memcached traffic behind MW which one might expect to see an increase: https://grafana.wikimedia.org/d/lqE4lcGWz/wanobjectcache-key-group?orgId=1&from=now-7d&to=now&timezone=utc&var-keygroup=sidebar&viewPanel=panel-71 [18:41:52] But nothing wild there [18:53:39] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Main Rollout] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11256650 (10BCornwall) [18:54:32] 06Traffic, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work, 13Patch-For-Review: Disable LVS paging for WDQS - https://phabricator.wikimedia.org/T406141#11256652 (10ssingh) Thanks to a review by @bking, we have merged this. I think we can consider this as resolved. Thanks @LSobanski and... [18:54:36] 06Traffic, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), 07Essential-Work, 13Patch-For-Review: Disable LVS paging for WDQS - https://phabricator.wikimedia.org/T406141#11256653 (10ssingh) 05Open→03Resolved [20:27:58] 06Traffic, 10DNS, 06SRE: Migrate PDNS recursor config to use /etc/powerdns/recursor.d ? - https://phabricator.wikimedia.org/T389333#11256913 (10CDobbins) Thanks for the clarification. That sounds good, @MoritzMuehlenhoff. I have no objections to implementing this. [21:08:20] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Traffic host migrations - https://phabricator.wikimedia.org/T405623#11257052 (10BCornwall) @RobH Thanks for soliciting the feedback! The cp hosts depooling schedule is fine. For DNS, we would prefer to depool these as well rather than unplug them live.... [21:21:35] 06Traffic, 10DNS, 06SRE: Migrate PDNS recursor config to use /etc/powerdns/recursor.d ? - https://phabricator.wikimedia.org/T389333#11257102 (10MoritzMuehlenhoff) >>! In T389333#11256913, @CDobbins wrote: > Thanks for the clarification. That sounds good, @MoritzMuehlenhoff. I have no objections to implementi... [23:11:26] 06Traffic, 06MediaWiki-Platform-Team (Radar), 07patch-welcome: Desktop view link on private wiki login required pages is broken - https://phabricator.wikimedia.org/T352798#11257353 (10Jdlrobson-WMF) > Was this meant for a different task? Sorry for confusion. Having to handle Special:Badtitle in calls to `...