[06:18:41] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Add envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11747534 (10ABran-WMF) [06:22:18] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Add envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11747537 (10ABran-WMF) >>! In T420909#11742909, @ABran-WMF wrote: > `gerrit-spare` now uses Envoy to expose its service to the CDN... [06:28:23] 06Traffic, 06collaboration-services, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: gerrit: Add Envoy in Gerrit's stack - https://phabricator.wikimedia.org/T420909#11747543 (10ABran-WMF) [08:28:56] 06Traffic, 06SRE, 07Wikimedia-Incident: Dead links on https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Process_count - https://phabricator.wikimedia.org/T421207 (10MatthewVernon) 03NEW [09:05:58] 06Traffic, 06SRE, 07Documentation, 07Sustainability (Incident Followup): Dead links on https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Process_count - https://phabricator.wikimedia.org/T421207#11747923 (10Aklapper) This ticket does not cover an incident. [09:52:26] 06Traffic, 10DNS, 06Fundraising-Backlog, 06Infrastructure-Foundations, and 2 others: Add support for Brand Indicators for Message Identification (BIMI) for wiki mail - https://phabricator.wikimedia.org/T311685#11748061 (10taavi) [10:49:08] Can I get a sanity check for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1260624 ? [10:49:26] Reapplying my syntax-errory lua change without the syntax error this time [10:57:14] vgutierrez: tyvm <3 [11:14:38] 06Traffic, 07Wikimedia-Incident: Bad ATS config led to large volume of 5xx from RESTBase - https://phabricator.wikimedia.org/T421203#11748295 (10Clement_Goubert) >>! In T421203#11747752, @Joe wrote: > The way I see it this was a case of "failure in depth". > > * We are still using restbase as the default fall... [12:03:27] 06Traffic, 07Wikimedia-Incident: Bad ATS config led to large volume of 5xx from RESTBase - https://phabricator.wikimedia.org/T421203#11748500 (10Vgutierrez) > There is no CI check for validity of lua scripts loaded in trafficserver; some have functional tests but a syntax error should have never passed CI, eve... [12:38:28] claime: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1260667 should help next time :) [12:39:22] vgutierrez: <3 [12:39:32] we already had CI tests for lua scripts [12:39:38] the problem is that the config file was always mocked [12:39:41] and never tested [12:40:24] those tests are also run on the cp servers before applying changes to the lua scripts [13:05:22] yeah that was a blind spot (always mocking the config) [13:40:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on hcaptcha-proxy6001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:45:48] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on hcaptcha-proxy6001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:46:31] taavi: ^^ could be some issue related to your recent commit? [13:46:41] Evaluation Error: The value '' cannot be converted to Numeric. (file: /srv/puppet_code/environments/production/modules/nftables/functions/ip_addr_stmt.pp, line: 37, column: 18) [13:46:49] looking, thanks [13:55:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11749076 (10ayounsi) As a side note we will need to manually change the IPs of the routed ganeti nodes in rack 23 to the 10.128.1.0/24 subnet. Normal operation... [14:00:08] fix is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1260694, sorry about that [14:07:32] 10netops, 06Infrastructure-Foundations: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238 (10ayounsi) 03NEW [14:43:45] 10netops, 06Infrastructure-Foundations: mr1-eqiad: move from OSPF to BGP - https://phabricator.wikimedia.org/T421238#11749273 (10Papaul) @ayounsi yes I can setup for next week since this week we have the DC switch over. [14:55:48] RESOLVED: [2x] PuppetZeroResources: Puppet has failed generate resources on hcaptcha-proxy6001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:22:23] 06Traffic, 10observability, 10SRE-SLO: Factor in pooled status for SLO measurements - https://phabricator.wikimedia.org/T420498#11749443 (10hnowlan) [15:24:07] 06Traffic: haproxy_client_healthcheck_ttfb histogram seems to be missing on upload - https://phabricator.wikimedia.org/T421246#11749463 (10Vgutierrez) p:05Triage→03Medium [16:03:45] 06Traffic, 06SRE, 07Documentation, 07Sustainability (Incident Followup): Dead links on https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Process_count - https://phabricator.wikimedia.org/T421207#11749639 (10BCornwall) 05Open→03In progress a:03BCornwall [16:03:51] 06Traffic, 06SRE, 07Documentation, 07Sustainability (Incident Followup): Dead links on https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Process_count - https://phabricator.wikimedia.org/T421207#11749643 (10BCornwall) p:05Triage→03Low [16:10:28] 06Traffic, 06SRE, 07Documentation, 07Sustainability (Incident Followup): Dead links on https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Process_count - https://phabricator.wikimedia.org/T421207#11749707 (10BCornwall) 05In progress→03Resolved The information on the page was not accurate and... [16:22:32] 06Traffic, 10observability, 10SRE-SLO: Factor in pooled status for SLO measurements - https://phabricator.wikimedia.org/T420498#11749812 (10RLazarus) We were actually just talking about this in the SLOs group last week (adding @Vgutierrez and @CDanis). In theory this is supposed to "Just Work": all our SLOs... [16:36:15] 06Traffic, 05MW-1.46-notes (1.46.0-wmf.19; 2026-03-10), 07OKR-Work, 13Patch-For-Review, 06Test Kitchen (Experiment Platform Sprint 21): Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11749892 (10Sfaci) > Hi @Sfaci: no issues... [16:48:41] 06Traffic, 10observability, 10SRE-SLO: Factor in pooled status for SLO measurements - https://phabricator.wikimedia.org/T420498#11749956 (10Vgutierrez) > For instance, a recent decommissioning of codfw cp nodes (T419753) left the legacy ats-be service unavailable and caused the (depooled!) nodes to increment... [16:57:28] 06Traffic: haproxy_client_healthcheck_ttfb histogram seems to be missing on upload - https://phabricator.wikimedia.org/T421246#11749985 (10Vgutierrez) this is a side effect of moving healthchecks on upload from healthcheck.wm.org to upload.wm.o in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164466 [17:27:25] 06Traffic, 06SRE: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097#11750221 (10ssingh) >>! In T411097#11742616, @MLechvien-WMF wrote: > #traffic do we know when we can do this cleanup? @BCornwall and I will be working on... [17:43:42] 06Traffic, 06Data-Engineering, 06Data-Engineering-Radar, 06SRE: Add pageview information to turnilo's webrequest_sampled_live (is_pageview is always "-") - https://phabricator.wikimedia.org/T402612#11750292 (10Ottomata) [22:36:06] 10Wikimedia-Apache-configuration, 10glam2commons, 10IA Upload, 10MediaWiki-Core-Benchmarker, and 17 others: Help! - https://phabricator.wikimedia.org/T421325 (10MilianoMir) 03NEW [22:39:23] 10Wikimedia-Apache-configuration, 10glam2commons, 10IA Upload, 10MediaWiki-Core-Benchmarker, and 15 others: Help! - https://phabricator.wikimedia.org/T421325#11751977 (10Seddon) [23:12:19] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE, 06Release-Engineering-Team (Radar): Beta cluster seems to be extremely slow for logged in user during page navigation - https://phabricator.wikimedia.org/T267435#11752112 (10bd808) 05Open→03Declined let's stop chasing this one.