[01:33:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11381926 (10Papaul) [02:11:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11381965 (10Papaul) [02:25:43] FIRING: [13x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:30:43] FIRING: [30x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:35:43] FIRING: [30x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:40:43] RESOLVED: [30x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [05:32:47] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11382104 (10Papaul) [06:18:26] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11382134 (10Papaul) [06:28:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11382156 (10Papaul) @cmooney @ayouns I update the task with all the IPV4 and IPV6 addresses for the links, irb's and loopbacks. Please review and let me know i... [08:20:12] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304#11382376 (10MatthewVernon) [08:50:32] 10netops, 06Traffic, 06Infrastructure-Foundations: POPs LVS : remove public vlan trunking - https://phabricator.wikimedia.org/T367732#11382492 (10ayounsi) a:03ssingh @ssingh started working on this with https://gerrit.wikimedia.org/r/1206424 in {T410047} boldly assigning the task to him :) [08:53:25] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11382501 (10ayounsi) See also {T367732} [10:08:47] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11382907 (10fgiunchedi) [10:12:53] 10netops, 06Infrastructure-Foundations, 06SRE: Audit and verify all cloudcephosd have their primary interface tagged and access to cloud-storage vlan - https://phabricator.wikimedia.org/T409690#11382933 (10fgiunchedi) Something else I forgot: I'm assuming codfw also is applicable in this case? i.e. these hos... [10:13:23] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11382935 (10fgiunchedi) [12:15:01] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304#11383363 (10MatthewVernon) A couple of notes on extracting thumbnail size from `uri_path` - a [[ https://phabricator.wikimedia.org/T360589... [13:07:20] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11383539 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002... [13:17:27] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11383550 (10SLyngshede-WMF) @ssingh The dnsdist shipping with Trixie is already a version 1.9, is there any reason for us to package our own version? [13:48:12] 10netops, 06Infrastructure-Foundations, 10Toolforge, 06tools-infrastructure-team: Plan networking for Toolforge-on-Metal experiment - https://phabricator.wikimedia.org/T407140#11383672 (10cmooney) Ok thanks @fgiunchedi for the info. I think that seems doable. As per the sub-task about a VRF I think that... [14:00:00] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11383719 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for... [15:26:45] 06Traffic, 06SRE, 13Patch-For-Review: Meta query about why we map 31.13.103.0/24 to US - https://phabricator.wikimedia.org/T409735#11384056 (10SLyngshede-WMF) Script and tooling https://gitlab.wikimedia.org/slyngshede/meta-geomap I'll move it to the SRE namespace after a review [15:40:39] Hello. I have an `lvs_setup` patch that I believe is ready to go: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199763 [15:42:51] btullis: looks good. we will need to do https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers after this [15:43:05] but the Traffic meeting is starting shortly. can we do this tomorrow perhaps, or can we do it later for you? [15:44:25] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Cleaning up Puppet and Netbox VLAN sub-ints on edge sites - https://phabricator.wikimedia.org/T410411 (10ssingh) 03NEW [15:44:27] Either option is fine by me. There's absolutely no rush and I don't mind if you merge and restart pybal at your convenience. [15:44:40] ok thanks, we can do it later [15:45:08] Ack, many thanks. [15:48:29] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304#11384208 (10Ladsgroup) At the scale we are talking, they won't make any dent in the stats. [15:48:37] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11384209 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin1003 for... [15:49:09] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cleaning up Puppet and Netbox VLAN sub-ints on edge sites - https://phabricator.wikimedia.org/T410411#11384212 (10ssingh) p:05Triage→03Low [16:17:15] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11384428 (10BCornwall) @SLyngshede-WMF I would guess "yes", since we modify the source ([[ https://gitlab.wikimedia.org/repos/sre/dnsdist/-/blob/bookworm-wikimedia/debian/patches/0001-Remove-topQueries-during-the-build-process... [16:17:56] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304#11384438 (10MatthewVernon) It's about 0.5% difference in count of 250, which isn't a vast amount, but it's not nothing. And the ranking of... [16:45:44] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11384628 (10Papaul) I took a look at xe-1/0/8 as you mentioned it was cp5002 and i saw dns5004 and just to realized that this task has been open since 2020 5 years ago so now on por... [18:04:36] 06Traffic, 06Security-Team, 10WMF-General-or-Unknown, 07ContentSecurityPolicy, 13Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618#11385134 (10sbassett) Per the 2025-11-18 meeting, next steps are: # @ssingh and #traffic to clean up and optimize [[... [18:50:09] 10netops, 06Traffic, 06Infrastructure-Foundations: POPs LVS : remove public vlan trunking - https://phabricator.wikimedia.org/T367732#11385383 (10ssingh) Related: T410411. [18:51:24] 10netops, 06Traffic, 06Infrastructure-Foundations: POPs LVS : remove public vlan trunking - https://phabricator.wikimedia.org/T367732#11385400 (10ssingh) >>! In T367732#11382492, @ayounsi wrote: > @ssingh started working on this with https://gerrit.wikimedia.org/r/1206424 in {T410047} boldly assigning the t... [18:55:13] hello traffic friends - now that the codfw portion of the etcd work in T352245 is complete, any concerns or conflicts if I were to move codfw pybals back to conf2004 in the next hour or so? [18:55:13] T352245: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245 [18:55:25] swfrench-wmf: +1 [18:55:55] awesome, thank you [18:56:01] I'll keep y'all posted here [19:15:48] now that the train has rolled, I'll get this started in a few minutes. same exact procedure as yesterady. [19:15:59] *yesterday [19:31:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11385630 (10RobH) Day 6 Update: * 31 hosts moved today, 77 hosts remain * got directions from Clement on how to move wikikube hosts effectively, moved half... [19:44:16] {{done}} [19:44:40] <3 [19:54:05] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455 (10cmooney) 03NEW p:05Triage→03Low [19:57:04] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ulsfo switch refresh - https://phabricator.wikimedia.org/T410456 (10RobH) 03NEW p:05Triage→03Medium [19:57:41] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ulsfo switch refresh - https://phabricator.wikimedia.org/T410456#11385835 (10RobH) 05Open→03Invalid dupe of T410456 [19:58:24] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ulsfo switch refresh - https://phabricator.wikimedia.org/T410456#11385838 (10RobH) 05Invalid→03Resolved dupe of T408510 [19:59:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11385845 (10RobH) [19:59:52] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11385849 (10BCornwall) [20:00:56] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11385855 (10BCornwall) [20:02:10] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11385856 (10RobH) [20:03:11] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11385859 (10ssingh) [20:03:37] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo: ulsfo switch refresh - https://phabricator.wikimedia.org/T410456#11385866 (10Aklapper) →14Duplicate dup:03T408510 [20:03:40] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11385868 (10Aklapper) [20:04:59] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11385870 (10RobH) [20:05:06] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11385874 (10cmooney) [20:09:35] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11385892 (10cmooney) [20:11:59] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11385900 (10cmooney) [20:13:35] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11385909 (10cmooney) [20:26:07] hi folks. cp2043.codfw.wmnet is still sending daily debmonitor emails to root@ [20:26:38] ack, will look into it [20:30:10] taavi: That host is currently the testbed for infra foundations while they work on the new server hardware... I'm not sure the appropriate way forward [20:31:17] maybe it is setting "profile::debmonitor::client::ensure: 'absent'" in Hiera for that host [20:31:37] elukey: ^ Okay to do this? [20:35:33] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11385997 (10cmooney) [20:44:04] 06Traffic, 06Commons: Error: 503, Backend fetch failed - https://phabricator.wikimedia.org/T410201#11386022 (10RoyZuo) @ssingh Request served via cp3066 cp3066, Varnish XID 920193864 Error: 503, Backend fetch failed at Tue, 18 Nov 2025 20:41:41 GMT [20:53:38] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11386097 (10BCornwall) @MatthewVernon Looks like you were a maintainer of pcre3 in Debian before it was axed in Trixie. Sadly, we're in need of that package for trafficserver (the current patches for enabling pcre2 support are... [22:27:22] 06Traffic, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Fix Hive event.development_network_probe table - https://phabricator.wikimedia.org/T400360#11386412 (10Ahoelzl) 05Open→03Resolved [23:27:16] 10netops, 06Infrastructure-Foundations, 06SRE: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455#11386635 (10cmooney) To try to verify what happened here I tried to make the same change in netbox-next, (with [[ https://netbox-next.wikimedia.org/dcim/devices/6359/ | this ]] being th... [23:32:07] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062#11386657 (10MatthewVernon)