[00:37:35] 06Traffic, 06SRE, 10MediaWiki-Platform-Team (Radar): Have CDN edge set the `X-Request-Id` header for incoming external requests - https://phabricator.wikimedia.org/T221976#11146274 (10Krinkle) [00:42:12] 06Traffic, 10MediaWiki-Platform-Team (Radar), 10MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), 13Patch-For-Review: [Rollout Phase 1] Implement unified mobile routing and enable on wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595#11146279 (10Krinkle) [06:48:06] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: esams: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T403580#11146558 (10MoritzMuehlenhoff) >>! In T403580#11142998, @ayounsi wrote: > @MoritzMuehlenhoff I tried to create the VM using `sudo cookbook sre.ganeti.m... [07:00:25] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11146590 (10elukey) >>! In T392851#11144994, @RobH wrote: > cp2045 has had the idrac, bios, and SSD firmware updated to latest revisions to match cp2043. > > P... [08:53:34] 06Traffic, 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T403616#11146882 (10SLyngshede-WMF) 05Open→03Resolved p:05Triage→03High a:03SLyngshede-WMF [08:53:41] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: esams: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T403580#11146886 (10ayounsi) [09:02:47] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: esams: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T403580#11146920 (10ayounsi) 05Open→03Resolved The RIPE re-generated an image using the /32 and /128 netmask. The install went perfectly fine. [12:37:14] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Switch frack eqiad frdata-codfw NAT to frdata2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T403718 (10Jgreen) 03NEW [12:38:07] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Switch frack eqiad frdata-codfw NAT to frdata2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T403718#11147607 (10Jgreen) [12:42:43] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Reuse old payments-codfw LVS-DR IP for frmx2002 NAT - https://phabricator.wikimedia.org/T403719 (10Jgreen) 03NEW [12:52:53] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Reuse old payments-codfw LVS-DR IP for frmx2002 NAT - https://phabricator.wikimedia.org/T403719#11147667 (10Jgreen) p:05Triage→03Medium [13:28:45] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11147869 (10Papaul) I talked to @Jgreen on IRC about the schedule, there is a maintenance window during from September 22nd to the 26th so this will be a best time for the m... [13:32:37] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Reuse old payments-codfw LVS-DR IP for frmx2002 NAT - https://phabricator.wikimedia.org/T403719#11147894 (10ayounsi) 05Open→03Resolved nat added [13:34:10] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Switch frack eqiad frdata-codfw NAT to frdata2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T403718#11147901 (10Jgreen) [13:34:41] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Switch frack eqiad frdata-codfw NAT to frdata2002.frack.codfw.wmnet - https://phabricator.wikimedia.org/T403718#11147903 (10ayounsi) 05Open→03Resolved a:03ayounsi All good there too [14:28:04] 06Traffic, 10RESTBase, 10RESTBase Sunsetting, 06serviceops, and 2 others: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557#11148179 (10MSantos) >>! In T393557#10918979, @hnowlan wrote: > There is a larger chunk of work... [14:28:10] 06Traffic, 10RESTBase, 10RESTBase Sunsetting, 06serviceops, and 2 others: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557#11148180 (10MSantos) 05Open→03Resolved a:03MSantos [14:46:15] 10netops, 06Infrastructure-Foundations, 06SRE: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845#11148288 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=829d4d0b-c9d0-4961-b07b-d12e8f1ac430) set by pt1979@cumin2002 for 2:00:00 on 1 host(s) and their... [14:59:55] FIRING: SLOMetricAbsent: varnish-combined esams - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:00:14] FIRING: SLOMetricAbsent: varnish-combined esams - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:00:59] FIRING: SLOMetricAbsent: trafficserver-combined - https://slo.wikimedia.org/?search=trafficserver-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:05:59] FIRING: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:10:14] RESOLVED: SLOMetricAbsent: varnish-combined esams - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:10:59] RESOLVED: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:11:05] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11148443 (10Jhancock.wm) @elukey it still fails with just the BIOS update. moving on to idrac and ssd updates. [15:14:55] RESOLVED: SLOMetricAbsent: varnish-combined esams - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:17:15] 06Traffic, 10Observability-Metrics, 06SRE: Port Traffic dashboards to Thanos - https://phabricator.wikimedia.org/T302266#11148493 (10Peachey88) 05Stalled→03Resolved p:05Unbreak!→03Medium a:03ssingh [17:16:23] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Make varnish-frontend-restart work on Beta Cluster - https://phabricator.wikimedia.org/T299054#11149191 (10Krinkle) I'm guessing the below has the same root cause, albeit on a deployment host, not a varnish host. ` krinkle@deployment-deploy04:~$ sudo tail -n1... [17:23:53] 06Traffic: Add an Allow header on 405 responses - https://phabricator.wikimedia.org/T403767 (10Vgutierrez) 03NEW [18:09:24] 06Traffic, 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T403616#11149413 (10bd808) 05Resolved→03Open `lang=shell-session, counterexample bd808@deployment-cache-text08.deployment-prep.eqiad1:/e... [18:28:47] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11149517 (10Jhancock.wm) @elukey okay so what i did today in terms of firmware updates is: cp2044 BIOS, iDRAC, SSD cp2046 BIOS, iDRAC only cp2047 BIOS only... [19:14:43] FIRING: [5x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:19:43] RESOLVED: [11x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:58:15] 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11150066 (10Krinkle) Proposed text for #Tech-News: > When browsing a wiki (like en.wikipedia.org), we respond in t... [21:18:07] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Make varnish-frontend-restart work on Beta Cluster - https://phabricator.wikimedia.org/T299054#11150196 (10bd808) [21:18:57] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Make varnish-frontend-restart work on Beta Cluster - https://phabricator.wikimedia.org/T299054#11150202 (10bd808) [21:19:06] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Make varnish-frontend-restart work on Beta Cluster - https://phabricator.wikimedia.org/T299054#11150204 (10bd808) [21:35:49] 06Traffic, 10Beta-Cluster-Infrastructure: Puppet agent failure detected on instance deployment-cache-text08 in project deployment-prep - https://phabricator.wikimedia.org/T403616#11150304 (10bd808) >>! In T403616#11149413, @bd808 wrote: > This is still happening because the guard condition added in https://ger... [22:53:50] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845#11150505 (10Papaul) mr1-ulsfo is now running BGP . All OSPF entries on mr1-ulsfo, cr3-ulsfo and cr4-ulsfo for the management network removed.