[06:47:32] 10netops, 06Infrastructure-Foundations: cr2-codfw - Host 0 ECC single bit parity error - https://phabricator.wikimedia.org/T371868 (10ayounsi) 03NEW p:05Triage→03Low [08:17:04] 06Traffic, 06Security-Team, 10WMF-General-or-Unknown, 07ContentSecurityPolicy, 13Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618#10044260 (10TheDJ) >>! In T117618#10043609, @Vollbracht wrote: > What's the effect? Shall Wikimedia content be include... [09:41:45] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#10044438 (10Joe) 05In progress→03Resolved [10:03:30] 10netops, 06Infrastructure-Foundations, 06SRE: cloudsw1-d5-eqiad instability Aug 6 2024 - https://phabricator.wikimedia.org/T371879 (10cmooney) 03NEW p:05Triage→03High [10:20:22] 10netops, 06Infrastructure-Foundations, 06SRE: cloudsw1-d5-eqiad instability Aug 6 2024 - https://phabricator.wikimedia.org/T371879#10044547 (10cmooney) [11:16:33] <_joe_> I'd like to try to add the requestctl integration in haproxy on cp4044 later today [11:17:24] <_joe_> Do you think it's possible to depool it for a bit while I debug the stuff that inevitably we have got wrong. [11:17:33] <_joe_> s/\./?/ [11:17:55] _joe_: yep [11:18:06] just !log it please [11:18:09] <_joe_> vgutierrez: ack, thanks [11:18:12] <_joe_> ofc [11:18:37] <_joe_> it will be later in the day anyways, I'm going to have lunch in a few [11:18:44] ulsfo is fully pooled at the moment so we can live without a host for a while [11:18:52] _joe_: enjoy your lunch :) [11:18:54] <_joe_> ok [13:20:53] FIRING: SystemdUnitCrashLoop: confd.service crashloop on cp4044:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:21:15] known [13:27:46] thanks :) [13:29:47] 06Traffic: Adding IP Addresses to SPF (Dayforce) - https://phabricator.wikimedia.org/T371304#10044985 (10ssingh) 05Open→03Resolved @APaul-WMF has confirmed Dayforce has validated this at their end. [13:55:53] RESOLVED: SystemdUnitCrashLoop: confd.service crashloop on cp4044:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:56:10] <_joe_> that doesn't mean the actual problem is solved lol [14:34:36] 06Traffic: Error message says "%error_body_content%" - https://phabricator.wikimedia.org/T371424#10045278 (10CDobbins) 05In progress→03Resolved The fix for this was deployed this morning. If this error reappears, please reopen this task or create a new one. [15:12:00] btw traffic folks this haproxy config change is slow-rolling out now https://gerrit.wikimedia.org/r/1059126 [15:12:44] cool :) [15:57:09] 10netops, 06Infrastructure-Foundations, 06SRE: cloudsw1-d5-eqiad instability Aug 6 2024 - https://phabricator.wikimedia.org/T371879#10045702 (10cmooney) Just to update on the situation things remain stable since the changes earlier on. ` cmooney@cloudsw1-d5-eqiad> show bgp summary | match "^[0-9]" 10.64.... [16:58:20] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) - https://phabricator.wikimedia.org/T370068#10046062 (10ssingh) 05Open→03Resolved We have upgraded all DNS boxes, Wikimedia DNS and durum hosts to the latest version of anycast-he... [19:14:25] FIRING: SystemdUnitFailed: varnishmtail@default.service on cp3070:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:17:15] 3070 huh [19:17:45] wow ok [19:20:50] huge spike in requests, restarted [19:21:06] everything else looks good [19:24:25] RESOLVED: SystemdUnitFailed: varnishmtail@default.service on cp3070:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed