[08:41:57] I'm switching the disk type of the doh/hcaptcha-proxyy VMs in ulsfo as part of the routed Ganeti migration, there might be brief BGP alerts [08:49:05] ack [08:50:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh4002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=ulsfo&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:55:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh4002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=ulsfo&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:59:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh4001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=ulsfo&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:00:15] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh4001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=ulsfo&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:04:00] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh4001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=ulsfo&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [10:12:08] hi folks! for T306550 it'd be very helpful to get an answer for our planning purposes on whether the dumps service is feasible to host behind the CDN or not [10:12:09] T306550: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550 [10:12:42] or at least LVS if not CDN? [12:42:28] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11727260 (10brouberol) {F73147012} We can see that sockets are no longer leaking after the NIC replace... [12:42:39] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11727262 (10brouberol) 05Open→03Resolved a:03brouberol [12:43:21] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11727266 (10brouberol) a:05brouberol→03BTullis [13:35:27] 06Traffic, 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11727489 (10CDanis) >>! In T306550#11651360, @taavi wrote: > (FWIW, a major reason these connections are... [13:45:33] 06Traffic, 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11727553 (10BBlack) Trying to pull these threads together for a bit of context. The question that isn't... [13:54:19] 06Traffic: Varnish sends invalid Server-Timing header when device is enrolled in an experiment - https://phabricator.wikimedia.org/T420586 (10phuedx) 03NEW [13:58:34] 06Traffic: Varnish sends invalid Server-Timing header when device is enrolled in an experiment - https://phabricator.wikimedia.org/T420586#11727637 (10BBlack) Yes, our VCL has this wrong, thanks for finding this! Patches incoming for the VCL examples/tests in the vmod, and for the production VCL. [14:05:45] 06Traffic, 06SRE: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11727707 (10cmooney) >>! In T366193#11713908, @ssingh wrote: >>> I think we should clean up stuff in the interim though since it will be a while before we can get our hands on the /24. I will need your help with... [14:23:07] sukhe: I'm going to deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1254185 [14:30:20] thanks XioNoX [14:44:29] 10netops, 06Traffic, 06Infrastructure-Foundations: esams/magru: 185.71.138.0/24 (wikidough) prefix not advertized - https://phabricator.wikimedia.org/T420342#11727945 (10ayounsi) 05Open→03Resolved Preferred path changed as expected: `name=esams 185.71.138.138/32 *[BGP/170] 00:00:03, MED 0, localpre... [14:46:35] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Debug connection re-use on Gerrit's httpd causing Gerrit interface to be very slow - https://phabricator.wikimedia.org/T420189#11727954 (10ABran-WMF) The `MaxRequestWorkers` bump will also help for [[ h... [15:05:19] 06Traffic, 13Patch-For-Review: Varnish sends invalid Server-Timing header when device is enrolled in an experiment - https://phabricator.wikimedia.org/T420586#11728035 (10BCornwall) 05Open→03In progress [15:05:27] 06Traffic, 13Patch-For-Review: Varnish sends invalid Server-Timing header when device is enrolled in an experiment - https://phabricator.wikimedia.org/T420586#11728037 (10BCornwall) a:03BBlack [15:18:16] hello traffic friends - would anyone with some ATS expertise be available to chat for 10-15 minutes today? I'd like to double-check my reasoning behind a behavior change we're seeing. [15:19:52] swfrench-wmf: our resident ATS expert is out on PTO. fabfur can you take this? ^ if not I can but I am in back to back meetings for a bit that I can't skip [15:20:42] ah, got it - thanks, sukhe [15:20:52] swfrench-wmf: write me, no prob! [15:21:17] fabfur: thanks! I'll message you shortly [15:27:09] 10netops, 06Traffic, 06Infrastructure-Foundations: esams/magru: 185.71.138.0/24 (wikidough) prefix not advertized - https://phabricator.wikimedia.org/T420342#11728162 (10cmooney) Nice work! [16:50:39] 06Traffic: Varnish sends invalid Server-Timing header when device is enrolled in an experiment - https://phabricator.wikimedia.org/T420586#11728819 (10BBlack) 05In progress→03Resolved This should be active everywhere now and fixed in live traffic. Re-open if not! [17:07:36] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11728887 (10BCornwall) [17:38:50] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11729041 (10BCornwall)