[04:35:13] 06Traffic, 06SRE, 13Patch-For-Review: Move contact info detection at the edge to a lua module - https://phabricator.wikimedia.org/T414300#11974879 (10Joe) 05Open→03Resolved [06:56:38] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11975082 (10ops-monitoring-bot) Draining ganeti2045.codfw.wmnet of running VMs [07:13:03] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11975129 (10ops-monitoring-bot) Completed depooling of db2241 by fceratto@cumin1003: Depool for rack maintenance [07:15:12] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11975135 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=55ec911b-df34-41d6-a145-ff36a55ba765) set by fceratto@cumin1003 for 2 days... [07:23:54] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11975160 (10ops-monitoring-bot) Starting pool of db2241 by fceratto@cumin1003: Depool for rack maintenance [07:25:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11975162 (10ayounsi) [07:26:12] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975165 (10ops-monitoring-bot) Depooled pc1021.eqiad.wmnet and pc2021.codfw.wmnet rack A3 maintenance - fceratto@cumin1003 - T427301 [07:26:48] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11975168 (10ayounsi) [07:26:49] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11975169 (10ayounsi) [07:26:56] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975170 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=45ec6ca5-e7dc-4aac-829a-479be0c8c095) set by fceratto@cumin1003 for 2 days, 0:00:00 on 1 host(... [07:29:00] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975171 (10ops-monitoring-bot) Completed depooling of db2158 by fceratto@cumin1003: rack A3 maintenance [07:29:43] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975176 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=dd2e4787-ea9c-4ac3-a947-eed9b2dfef8b) set by fceratto@cumin1003 for 2 days, 0:00:00 on 1 host(... [08:09:19] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11975294 (10ops-monitoring-bot) Completed pooling of db2241 by fceratto@cumin1003: Depool for rack maintenance [08:59:22] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975451 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=79ff0806-933d-49e7-8c67-71ba7d45bc8b) set by fceratto@cumin1003 for 2 days, 0:00:00 on 1 host(... [08:59:44] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975452 (10FCeratto-WMF) [08:59:55] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11975453 (10FCeratto-WMF) [09:01:12] Is now a good time to merge my ATS change patch stack? I will disable puppet on all but the two cp nodes I hit, test there, then re-enable [09:01:22] 10netops, 06Infrastructure-Foundations, 06SRE: cr2-drmrs unexpected reboot - https://phabricator.wikimedia.org/T427600#11975473 (10cmooney) {F86180058} [09:01:35] I have three patches to merge, I will do them one by one with this procedure for each [09:01:58] Tell me in the next 10 minutes if it's not a good time, otherwise I'll go through with it [09:02:10] Patches are https://gerrit.wikimedia.org/r/c/operations/puppet/+/1293699 and following [09:06:04] claime: As good a time as any. I think I am the traffic team this morning, so nothing else is really going on :-) [09:12:33] ack [12:10:03] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11976107 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7ff06b6f-10ee-45fc-a0d8-f9e445c2a726) set by ayounsi@cumin1003 for 1:00:00 on 27 host(s) and t... [12:11:16] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11976109 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2cc85cc5-a5e3-45c9-b39d-9775610ef6e4) set by ayounsi@cumin1003 for 1:00:00 on 3 host(s) and th... [14:02:26] 10netops, 06DBA, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A3 maintenance - https://phabricator.wikimedia.org/T427301#11976595 (10ayounsi) 05Open→03Resolved a:03ayounsi A3 switch upgrade went fine, ~11min switch downtime plus a few more for the interfaces to come back up. We ha... [14:03:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11976599 (10ayounsi) [14:50:23] 06Traffic, 06Commons, 06DBA, 06SRE: Unable to save edits or delete pages on Commons – database lag - https://phabricator.wikimedia.org/T402749#11976839 (10Ademola) Implementation is ready and tested. For batches under 200 files, maxSimultaneousReq is now set to 4; larger batches remain at 2. Source cod... [14:54:27] 10netops, 06Infrastructure-Foundations: Nokia: implement maintenance mode - https://phabricator.wikimedia.org/T419673#11976869 (10ayounsi) [15:30:01] 10netops, 06Infrastructure-Foundations, 06SRE, 10SRE-tools: Netbox - PuppetDB audit 2021-11 - https://phabricator.wikimedia.org/T295762#11977272 (10ayounsi) 05Open→03Resolved a:03ayounsi No need to keep that old parent task open. [15:33:26] 10netops, 06Infrastructure-Foundations, 06SRE: Occasional high ICMP probe response from codfw to cr1-drmrs - https://phabricator.wikimedia.org/T315645#11977334 (10cmooney) 05Stalled→03Declined [15:53:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10netbox, 06SRE: Avoid ghost hosts on the network - https://phabricator.wikimedia.org/T306007#11977548 (10ayounsi) 05Open→03Resolved a:03ayounsi Looks like the current provisioning process with the `Port with no description on access switch` ale... [16:08:15] 10netops, 06Infrastructure-Foundations, 06SRE: Adjust "port with no description on access switch" alert - https://phabricator.wikimedia.org/T353364#11977663 (10ayounsi) 05Open→03Resolved I've renamed it to `Interface UP for 7 days with no description` when migrating the alert to AlertManager. Please... [16:32:04] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 05Bot detection and mitigation (WE4.2 hCaptcha editing trial), and 2 others: hCaptcha: Stop using urldownloader for health checks of the secure-api.js file - https://phabricator.wikimedia.org/T421464#11977962 (10kostajh) 05Open→03Resolved [17:46:24] 06Traffic, 06Security-Team, 10WMF-General-or-Unknown, 07ContentSecurityPolicy, 13Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618#11978368 (10ssingh) >>! In T117618#11974306, @sbassett wrote: > Revisiting this over the past month, it looks like we'... [17:48:02] 06Traffic, 06Security-Team, 10WMF-General-or-Unknown, 07ContentSecurityPolicy, 13Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618#11978388 (10sbassett) >>! In T117618#11978368, @ssingh wrote: > @sbassett: Thanks for the patch; confirming that the p... [20:29:09] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11979041 (10jijiki) [20:38:43] FIRING: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3071&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:48:43] RESOLVED: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3071&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [21:20:17] 06Traffic: Create metrics/monitoring of fifo-log-demux - https://phabricator.wikimedia.org/T345939#11979312 (10BCornwall) a:05BCornwall→03None [21:21:03] 06Traffic: Clean up Varnish VCL - https://phabricator.wikimedia.org/T370200#11979313 (10BCornwall) 05In progress→03Resolved Way too broad, this can either be reopened later if we want to revisit this or we can create better-phrased, smaller tasks [21:25:23] 06Traffic: Clean up purged release branches - https://phabricator.wikimedia.org/T410530#11979324 (10BCornwall) 05Open→03Resolved They were holdovers - comparing the branches showed that the commits were all merged in. I've deleted them. [21:26:02] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11979326 (10BCornwall) a:05BCornwall→03Vgutierrez [21:26:08] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11979328 (10BCornwall) 05In progress→03Stalled