[05:38:50] 10netops, 06Infrastructure-Foundations: FPC1 Failure on cr1-esams - take 2 - https://phabricator.wikimedia.org/T403360#11137741 (10ayounsi) 05Open→03Resolved a:03ayounsi Good news, it's still up. [06:38:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [06:43:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:43:40] FIRING: [2x] VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:03:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:12:46] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11138219 (10elukey) We discussed the use case during the I/F meeting and we don't have any concern. I guess that we'll create a n... [10:42:06] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11138372 (10Vgutierrez) @brouberol hey! it looks like we split airflow airflow instances by team and we don't currently have an i... [10:47:40] 06Traffic, 06SRE, 10Wikidata, 10Wikidata-Query-Service: Find a solution for SPARQL federation that is blocked by stricter user agent policy enforcement - https://phabricator.wikimedia.org/T402959#11138378 (10gmodena) >>! In T402959#11132802, @CDanis wrote: > Hi @Lydia_Pintscher , SRE can make some exceptio... [13:04:38] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11138880 (10ssingh) >>! In T300877#11130890, @ayounsi wrote: >> the idea is that static routes should help save us in that situation > > That would only... [13:06:08] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11138896 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum2001.codfw.wmnet with OS trixie [13:09:22] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11138915 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum4001.ulsfo.wmnet with OS trixie [13:45:59] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11139191 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum2001.codfw.wmnet with OS trixie completed: - durum2001 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled... [13:52:11] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11139224 (10brouberol) We could do this, however there's a fixed operational cost induced by creating an Airflow instance, in ter... [13:57:42] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11139256 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum4001.ulsfo.wmnet with OS trixie completed: - durum4001 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled... [14:28:26] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11139484 (10ayounsi) [14:56:12] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11139713 (10brouberol) Also, small nit: we split airflow instance by domain (ML, analytics, search, research, etc), as they tend... [16:04:25] FIRING: SystemdUnitCrashLoop: dnsdist.service crashloop on doh3005:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [16:14:25] RESOLVED: SystemdUnitCrashLoop: dnsdist.service crashloop on doh3005:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [16:17:08] 10netops, 10Ganeti, 06Infrastructure-Foundations: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11140100 (10ayounsi) magru Anchor is back online. It did require some remote help from the RIPE team, especially to configure v6. v4 worked out of the box, even without the... [16:22:35] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11140127 (10ssingh) Thanks for taking care of this @ayounsi! We will update this task when we are ready to remove the `eqiad` ones. [16:30:13] 06Traffic, 10MediaWiki-Platform-Team (Radar), 10MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), 13Patch-For-Review: [Rollout Phase 1] Implement redirect-less mobile routing and enable for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595#11140177 (10Krinkle) >>! In T401595#11130418, @gerritbot... [17:35:29] 06Traffic, 10MediaWiki-Platform-Team (Radar), 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510 (10Krinkle) 03NEW [17:38:34] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11140580 (10ssingh) [18:12:32] 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11140689 (10Krinkle) [18:16:17] 06Traffic, 10MediaWiki-Platform-Team (Radar), 10MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), 13Patch-For-Review: [Rollout Phase 1] Implement redirect-less mobile routing and enable for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595#11140707 (10Krinkle) [18:16:40] 06Traffic, 10MediaWiki-Platform-Team (Radar), 10MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), 13Patch-For-Review: [Rollout Phase 1] Implement unified mobile routing and enable on wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595#11140710 (10Krinkle) [19:35:07] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11140960 (10Ottomata) platform_eng instance is becoming a catchall, perhaps that one would be okay here? [20:31:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling needed between cages to eqiad 2025/6 switch refresh - https://phabricator.wikimedia.org/T402432#11141430 (10wiki_willy) a:03Jclark-ctr [23:34:21] 06Traffic: Consider rate limiting non-standard thumbnail sizes - https://phabricator.wikimedia.org/T402792#11142006 (10matmarex) @dschwen Hi, perhaps I can reach you this way (I tried GitHub and on-wiki). I would appreciate if you could have a look at https://github.com/dschwen/wikiminiatlas/pull/42, so that we...