[00:06:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11321757 (10Papaul) [04:41:43] FIRING: [2x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:46:43] FIRING: [14x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:56:43] FIRING: [14x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [05:01:43] RESOLVED: [14x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [05:03:43] 06Traffic, 10Hiddenparma, 06SRE, 13Patch-For-Review: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826#11322007 (10Joe) 05Open→03Resolved [07:49:16] 10netops, 06Infrastructure-Foundations, 06SRE: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11322154 (10fgiunchedi) 05Stalled→03Open [09:35:11] Hi sukhe , the new druid-public-coordinator service on LVS is ready to deploy today when you have the time to sync. Here is the stack of patches [09:35:11] 1198500: DNS: Add druid-public-coordinator record | https://gerrit.wikimedia.org/r/c/operations/dns/+/1198500 [09:35:11] 1198498: LVS: etcd data for druid-public-coordinator | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198498 [09:35:11] 1198499: LVS: Add druid-public-coordinator to service list | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198499 [09:35:11] 1199256: druid: add druid-coordinator to druid public worker role | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199256 [11:08:03] initial check looks good BTW [13:29:03] stevemunene: happy to look shortly. when are you planning to get this done? [13:32:52] Thanks sukhe , currently on the tail end of a druid service restart so anytime within the hour or later today would work. I can get started with merging the dns changes in the meantime [13:40:47] ok [13:42:20] let us know if we can help [13:42:48] since you do a lot of these, there might be value in you doing them and we can help with that, or you can let us know if you prefer us to do it for you (the LVS restarts) [13:51:21] Ack, will ping you when about to start [14:24:43] FIRING: [4x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:24:52] ChrisDobbins901_: can you have a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199534 and/or suggest a test host for the pcc? I'm pretty sure my change doesn't touch prod hosts but I also don't want to break DNS! [14:27:20] ack, doing it now [14:29:43] FIRING: [12x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:33:49] andrewbogott: it looks good to me. any dnsbox host should work for PCC [14:34:43] FIRING: [12x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:34:46] sorry, can you give me a specific hostname? I'm confused by the word 'recursor' not appearing anywhere in site.pp [14:36:03] ah, you ran it already. thanks! [14:37:11] you're welcome! I ran it just as a reality check for myself. sorry about the lack of clarity on my part [14:37:34] np, I just hadn't scrolled to the bottom yet [14:39:43] RESOLVED: [12x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [16:12:57] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Compile a list of "canonical" thumbnail sizes - https://phabricator.wikimedia.org/T408715 (10MatthewVernon) 03NEW [16:13:25] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062#11323972 (10MatthewVernon) [16:14:49] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Compile a list of "canonical" thumbnail sizes - https://phabricator.wikimedia.org/T408715#11323977 (10MatthewVernon) p:05Triage→03High [16:20:43] 06Traffic, 13Patch-For-Review: varnishtests are broken with podman - https://phabricator.wikimedia.org/T408202#11324013 (10BCornwall) 05Open→03Resolved [16:47:54] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Downgrade pfw1-codfw to Junos 23.4R2-S3 - https://phabricator.wikimedia.org/T393996#11324213 (10Papaul) a:03Papaul [16:51:20] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Downgrade pfw1-codfw to Junos 23.4R2-S3 - https://phabricator.wikimedia.org/T393996#11324235 (10Papaul) @Dwisehaupt hello yes we can do this during the maintenance windows in November. Any day you prefer for that week? Thank you [16:56:00] FIRING: [3x] PurgedHighEventLag: High event process lag with purged on cp5022:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:57:55] oh that's always fun [17:01:00] FIRING: [14x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:06:00] FIRING: [17x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:11:00] FIRING: [22x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:16:00] RESOLVED: [28x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:28:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp5022:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5022 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:29:39] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Compile a list of "canonical" thumbnail sizes - https://phabricator.wikimedia.org/T408715#11324423 (10MatthewVernon) [17:33:00] FIRING: [14x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:38:00] FIRING: [14x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:43:00] RESOLVED: [20x] PurgedHighEventLag: High event process lag with purged on cp5019:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:50:00] FIRING: [2x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:55:00] RESOLVED: [3x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [18:08:32] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Downgrade pfw1-codfw to Junos 23.4R2-S3 - https://phabricator.wikimedia.org/T393996#11324582 (10Dwisehaupt) @Papaul How does Wednesday 11/5 work? That would be good for us. Thursday would be a good alternative also. [18:41:08] hello traffic friends - I have yet another haproxy config change to roll out to cp hosts. any concerns / conflicts if I move ahead with that soon? [18:41:23] hi. none but check with brett once ^ [18:41:37] All good! [18:41:58] awesome, thank you both! [18:45:16] * swfrench-wmf is doing [18:58:20] 06Traffic, 06SRE, 05FY2025-26 WE3.3 Engaging core audiences, 06Reader Experience Team (REx Sprint 8 [Q2 Oct 21-Nov 3]): [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11324749 (10SToyofuku-WMF) DB config: https://noc.wikimedia.org/d... [19:02:04] brett: I see you just merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198430 [19:02:26] I have puppet disabled across all A:cp and was just about to run a rolling run-puppet-agent [19:02:44] do you want me to hold? [19:02:53] swfrench-wmf: Nah, it's not urgent [19:02:58] I was just gonna wai [19:03:07] wait [19:03:23] It's a minor patch [19:03:26] my rolling run-puppet-agent will apply that change across the fleet :) [19:04:03] how would you like me to proceed? [19:04:30] please do proceed [19:04:42] it won't affect your work [19:05:35] brett: yeah, agreed that it won't conflict with anything I'm doing. I was more checking in case you wanted to test anything :) [19:05:45] nope, all good! Roll puppet away [19:06:15] * swfrench-wmf thumbs up [19:22:16] brett: done [19:22:37] Thanks!