[13:18:42] dpogorzelski: sorry about the lack of a response yesterday [13:19:03] we will review today and then we can plan on the deploy. does tomorrow work? maybe same time [13:38:25] FIRING: [2x] SystemdUnitFailed: anycast-healthchecker.service on doh1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:43:25] RESOLVED: [4x] SystemdUnitFailed: anycast-healthchecker.service on dns7001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:44:02] 06Traffic, 10Wikimedia-Site-requests: Request throttle exemption of IP addresses for ESEAP Conference 2026 - https://phabricator.wikimedia.org/T426295#11936353 (10Robertsky) 05Open→03Resolved Event is over. If there is feedback on any traffic issues, it will be after we have collated responses in the p... [14:39:30] FIRING: [8x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh5004:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [14:44:15] FIRING: [8x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh5004:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [14:49:15] RESOLVED: [7x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [16:20:39] Can I get a consult on https://phabricator.wikimedia.org/T426323 / https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1287731 please? [16:21:51] I understand no-cache doesn't mean "don't cache" but it's still weird to me that it worked with that configuration in the api-gateway and without Vary: Origin [16:22:02] but something is apparently different with the rest-gateway? [16:23:47] Maybe the fact we changed from a map with no plugins in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/dac3ebe1c134569fb673b4524a8ed961aca71c13%5E%21/#F0 [16:24:18] claime: both our experts on this topic are out but we can try. do you want us to follow up here or on the task? [16:25:00] sukhe: I'll write up a summary on task, I was mostly posting here if someone was available to chat :) [16:25:53] claime: I can try after lunch but otherwise fab will be back this week later and he is the best person to ask :) [16:26:03] and thanks, we will follow up on the task then I guess [16:26:24] <3 [16:31:56] 06Traffic, 10ContentTranslation, 06LPL Hypothesis, 06Security-Team, and 4 others: CX dashboard can't load page collections on some wikis (blocked by CORS) - https://phabricator.wikimedia.org/T426323#11937471 (10Clement_Goubert) From what I can tell, this behaviour changed when we moved from the old `api-ga... [16:32:20] sukhe: done, thanks for the future help :P [17:13:38] 06Traffic, 06MediaWiki-Platform-Team (Radar): Error 429 for search queries and images in older browsers - https://phabricator.wikimedia.org/T425763#11937586 (10ssingh) Hi @BrokenImages1234: Following up to check if this issue still persists for you? [17:14:11] 06Traffic: images are not loading for some users (on the us west coast?) - https://phabricator.wikimedia.org/T425670#11937590 (10ssingh) Hi: Following up to see if this issue still persists; I think perhaps not (see my comment above) since I think it was transient but please let us know. [17:23:15] claime: noted, will read and respond, thanks! [18:14:03] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11937988 (10BCornwall) [18:30:44] 06Traffic: Purge dhcpcd-base from traffic hosts - https://phabricator.wikimedia.org/T426224#11938011 (10BCornwall) 05Open→03Resolved Removed from A:cp, A:tcpproxy, A:durum [19:57:25] FIRING: [4x] SystemdUnitFailed: haproxy.service on cp7003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:03:08] looking [20:03:27] cert sync issues [20:03:31] it self-corrected [20:06:28] FIRING: KeyholderUnarmed: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [20:07:23] so touchy [20:07:29] handled [20:11:28] RESOLVED: KeyholderUnarmed: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [20:37:25] FIRING: [5x] SystemdUnitFailed: haproxy.service on cp7004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:40:09] FIRING: LVSHighRX: Excessive RX traffic on lvs3008:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [20:45:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs3008:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [20:48:11] ^cp7004 is happy too, the alerts are just too trigger-happy, it seems [20:48:19] though the other traffic ones aren't me [22:26:27] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Degraded RAID on lvs2012 - https://phabricator.wikimedia.org/T425890#11938571 (10BCornwall) 05Resolved→03Open p:05Triage→03High Sadly, it appears that the replacement drive is still problematic. From the latest boot log: ` May 11 15:21:16 lvs2012 kernel: md... [22:36:25] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Degraded RAID on lvs2012 - https://phabricator.wikimedia.org/T425890#11938592 (10BCornwall) I see the new disk as "Ready" instead of "Online" in iDRAC. I'm also noticing a discrepancy: lvs2014 has the virtual disks set to the "Write Back" policy while lvs2012 has "W...