[07:34:03] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes, 13Patch-For-Review: Handling inbound IPIP traffic on low traffic LVS k8s based realservers - https://phabricator.wikimedia.org/T352956#10902888 (10akosiaris) I 've gone ahead and switch all of aux-k8s to MTU 1460. This time around, I went for a mo... [08:02:51] 06Traffic, 10Liberica: Test katran forwarding plane on lvs1013 - https://phabricator.wikimedia.org/T395228#10902983 (10Vgutierrez) 05Open→03Resolved lvs1013 has been balancing ncredir@eqiad using katran since 2025-06-09 at 10:30 [08:13:28] 06Traffic, 10Liberica: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561 (10Vgutierrez) 03NEW [08:13:39] 06Traffic, 10Liberica: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10903031 (10Vgutierrez) p:05Triage→03Medium [08:25:23] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10903091 (10Vgutierrez) p:05Low→03Medium [08:34:05] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10903110 (10Vgutierrez) all PoPs have been running liberica with IPIP encapsulation for a while now and we are already starting the migration to Katran there, so we c... [08:49:55] 06Traffic: Import BingBot prefixes - https://phabricator.wikimedia.org/T395358#10903207 (10Fabfur) 05Open→03Resolved [09:46:40] FIRING: VarnishChildRestarted: varnish-text restarted on cp5018 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5018&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [09:51:10] uh... [09:56:27] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581 (10Vgutierrez) 03NEW [09:56:40] RESOLVED: VarnishChildRestarted: varnish-text restarted on cp5018 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5018&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [09:56:48] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10903505 (10Vgutierrez) p:05Triage→03High [10:00:24] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10903519 (10Vgutierrez) This seems to be a well known issue reported on https://github.com/varnishcache/varnish-cache/issues/3856 and fixed in https://github.com/varnishcache/varnish-cache/commit/3beb5c8b90744319064b09e6c3729... [10:07:38] 06Traffic, 06Movement-Insights, 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: NEW BUG REPORT: Investigate rise in May 2025 Reader metrics - https://phabricator.wikimedia.org/T395934#10903556 (10Joe) One potential reason for this surge is linked to the activities of an actor, worki... [10:37:40] 06Traffic, 06Experimentation Lab: EventGate: Investigate data loss during the SDS 2.4.11 Synthetic A/A Test experiment - https://phabricator.wikimedia.org/T396474#10903639 (10phuedx) >>! In T396474#10902047, @dr0ptp4kt wrote: > 2. That the event loss now matches expectations. @phuedx probably best if you have... [12:40:01] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904311 (10ayounsi) [12:46:45] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904328 (10ayounsi) [12:49:48] 06Traffic, 10RESTBase, 10RESTBase Sunsetting, 06serviceops, and 2 others: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557#10904336 (10MSantos) @akosiaris this is ready to for traffic blocking. Please let us know if th... [12:54:11] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904369 (10ayounsi) [12:57:42] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904383 (10ayounsi) [13:03:52] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904444 (10ayounsi) [13:04:52] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10904451 (10ayounsi) 05Open→03Resolved a:03ayounsi All done! [13:09:14] 06Traffic, 10RESTBase, 10RESTBase Sunsetting, 06serviceops, and 2 others: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557#10904475 (10akosiaris) >>! In T393557#10904336, @MSantos wrote: > @akosiaris this is ready to f... [13:23:21] 06Traffic: Package, prepare the upgrade, and deploy ATS 10 in production - https://phabricator.wikimedia.org/T396611 (10ssingh) 03NEW [13:23:27] 06Traffic: Package, prepare the upgrade, and deploy ATS 10 in production - https://phabricator.wikimedia.org/T396611#10904566 (10ssingh) p:05Triage→03Low [13:42:53] o/ I have another ATS API change, in this case making a per-wiki rule an all-wikis rule. Would it suit to do that now-ish? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1149625 [13:43:17] thanks for checking, now is fine :) [13:43:34] thanks! [14:11:09] 06Traffic, 06Commons: Error "%error_body_content%" while editing Commons - https://phabricator.wikimedia.org/T372473#10904818 (10Jeff_G) [14:17:48] 06Traffic, 13Patch-For-Review: Deploy maxmind lookup configuration everywhere - https://phabricator.wikimedia.org/T395295#10904869 (10Fabfur) 05Open→03Resolved [14:18:29] 06Traffic, 06SRE: haproxy is able to load the same GeoIP & IP-to-ASN data as Varnish does - https://phabricator.wikimedia.org/T329849#10904877 (10Fabfur) 05Open→03Resolved p:05Triage→03Medium a:03Fabfur [14:20:54] 06Traffic: Map ISPs in Maxmind db, used in turnilo/superset, to use in requestctl rule - https://phabricator.wikimedia.org/T392219#10904899 (10Fabfur) 05In progress→03Resolved [14:24:25] 06Traffic, 10Hiddenparma: Requestctl should use x-provenance header - https://phabricator.wikimedia.org/T396621 (10Fabfur) 03NEW [14:27:13] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10904952 (10fnegri) [14:35:30] 06Traffic: Create VTC tests for HAProxy - https://phabricator.wikimedia.org/T393770#10905017 (10Fabfur) After some trials and investigation with varnishtest for haproxy I've noticed two main difficulties: - "native" haproxy support in VTC is limited to inline configuration: additional configuration files can be... [14:38:05] fabfur: maybe we could embrace and extend httpbb 😇 [14:40:04] BTW, ATS uses https://bitbucket.org/autestsuite/reusable-gold-testing-system/src/master/ [14:41:00] if httpbb is already used at WMF I would give him a chance, to not introduce new tools [14:41:03] opinions on this? [14:42:09] vgutierrez: interesting, looking that too [14:46:36] I'd explore the current state of the art tools, compare them and pick the best suited for our needs instead of following cargo cult [14:46:52] (not that I have anything against httpbb per se) [14:48:33] using something that's already used instead of introducing new tools or reinventing the wheel doesn't really seems cargo culting to me [14:50:17] fabfur: it's still worth going through the existing VTCs and seeing what can be implemented trivially in httpbb, vs in autest [14:51:01] boh looks valid tools [14:53:43] TIL httpbb is SRE made :) [14:55:15] lol [14:55:24] it's r.zl's bb [15:22:32] 06Traffic: Upgrade to ATS 9.2.10 - https://phabricator.wikimedia.org/T390912#10905209 (10ssingh) [15:31:32] 06Traffic: Replace Digicert TLS certs with Google Trust Services ones - https://phabricator.wikimedia.org/T395131#10905243 (10Vgutierrez) [15:31:46] 06Traffic: Replace Digicert TLS certs with Google Trust Services ones - https://phabricator.wikimedia.org/T395131#10905246 (10Vgutierrez) 05Open→03In progress [15:53:40] FIRING: [3x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:56:39] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10905468 (10BCornwall) a:03BCornwall [15:56:57] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10905472 (10BCornwall) [15:57:00] FIRING: [7x] PurgedHighEventLag: High event process lag with purged on cp5019:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:57:05] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10905473 (10BCornwall) 05Open→03In progress [15:58:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:02:00] FIRING: [19x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:07:00] RESOLVED: [17x] PurgedHighEventLag: High event process lag with purged on cp5019:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:13:40] FIRING: [16x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:28:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:31:00] FIRING: [2x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:36:00] FIRING: [16x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:41:00] FIRING: [19x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:46:00] RESOLVED: [27x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:57:33] inflatador: sorry, it's been a day as you may have seen [16:57:37] I need some time to eat lunch to get started [16:59:04] we can start at 13:30 EDT if that's fine [16:59:06] * sukhe afk for a bit [16:59:18] sukhe sounds like I missed the "fun". ;( . I'm happy to push back 'till next week if that is better [16:59:33] ryankemper ^^ [17:00:56] ack, i can make 13:30 [17:04:11] thanks, 13:30 it is [17:08:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:23:40] FIRING: [16x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:28:40] FIRING: [15x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:31:42] * sukhe here [17:33:37] sukhe: cool we're in https://meet.google.com/cem-ukov-vcv. but irc works fine too if preferred [17:33:52] I think we're still sticking with this order of operations, i'm looking over the patches again https://phabricator.wikimedia.org/T143553 [17:34:18] oops meant to link https://phabricator.wikimedia.org/T143553#10861215 [17:35:25] ryankemper: yes, just give me two mins [17:38:38] ryankemper: so you merged etcd data (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308/) ? [17:38:55] yup that's happening right now [17:39:03] ok [17:39:08] just realized i'm not sure which host(s) to run puppet on [17:39:12] then yes, next is the service yaml change [17:40:16] just run on puppetserver, that will call the relevant conftool bits [17:40:54] checking as we go along! (skipping the google meet so that I can look here; I can join if required) [17:41:05] let me verify this before you move on [17:43:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:44:01] ok [17:44:15] {"eqiad": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=search-psi"} [17:44:18] {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=search-psi"} [17:44:21] {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=search-omega"} [17:44:24] {"eqiad": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=search-omega"} [17:44:46] great, so onward to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151300 ? [17:44:47] in fact let me join the call [17:44:49] yes [17:47:19] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10906005 (10cmooney) Nice work @ayounsi! I would have thought dc-ops need to follow up with remote hands requests to remove the cables though? [20:44:35] 10netops, 06Traffic, 06Infrastructure-Foundations: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731#10906579 (10ayounsi) Yep, I opened those too: {T396603} {T396602} {T396601}