[11:09:45] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Compile a list of "canonical" thumbnail sizes - https://phabricator.wikimedia.org/T408715#11378426 (10TheDJ) >>! In T408715#11332556, @AntiCompositeNumber wrote: > It's historically been easy for applications to generate their... [11:18:39] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062#11378451 (10TheDJ) [11:53:12] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11378556 (10ayounsi) > Once the router change is done, therefore, we need to somehow adjust the netmask on all the existing hosts on the v... [12:01:18] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11378569 (10ayounsi) a:03Papaul @papaul is that something you could look into ? Is there is a way to disable the NIC's LLDP through the BIOS menu ? Maybe some solution from the la... [12:54:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11378749 (10ayounsi) > I personally prefer to use the first (ok second) address in each v6 subnet as the gateway, i.e. 2a02:ec80:400:1::1/64 Sounds good to me.... [13:01:02] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Transport link saturation not alerting - https://phabricator.wikimedia.org/T409330#11378763 (10ayounsi) a:03ayounsi My bad ! I turned them off after adding the transit/peering saturation alerts. Forgetting transport and core links.... I'll take ca... [14:37:19] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11379093 (10Papaul) @ayounsi yes I can look into it. Thanks. [14:44:28] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379107 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [14:45:56] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379108 (10ssingh) >>! In T410047#11374122, @cmooney wrote: > @ssingh I made a patch and can kick off the changes in Netbox and on the ro... [14:49:25] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379134 (10ayounsi) You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't need it (and we will even less need it af... [14:50:49] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379141 (10cmooney) >>! In T410047#11379108, @ssingh wrote: > My plan for now to unblock the hCaptcha work was to decommission one of the... [14:52:31] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379156 (10ssingh) >>! In T410047#11379134, @ayounsi wrote: > You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't... [14:53:09] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379157 (10ssingh) >>! In T410047#11379141, @cmooney wrote: >>>! In T410047#11379108, @ssingh wrote: >> My plan for now to unblock the hC... [14:56:36] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379165 (10cmooney) >>! In T410047#11379157, @ssingh wrote: > Yeah, good point about the LVS IPs since we no longer need them given Liber... [15:05:34] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379188 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [15:10:08] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No free IPs on public1-ulsfo vlan (Nov 2025) - https://phabricator.wikimedia.org/T410047#11379212 (10ssingh) >>! In T410047#11379165, @cmooney wrote: >>>! In T410047#11379157, @ssingh wrote: >> Yeah, good point about the LVS IP... [15:29:48] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Traffic host migrations - https://phabricator.wikimedia.org/T405623#11379313 (10RobH) @BCornwall and @ssingh: We chatted about this last week, can we schedule this work to move dns1006 tomorrow, Tuesday November 18th at 9AM Pacific / 5PM GMT? [15:35:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Netbox Cable report - incorrectly parsing Nokia power supplies - https://phabricator.wikimedia.org/T410073#11379334 (10LSobanski) p:05Triage→03Medium a:03ayounsi [15:36:03] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379339 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [15:37:21] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379344 (10LSobanski) p:05Triage→03Medium [15:50:03] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379401 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [16:40:35] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11379750 (10MoritzMuehlenhoff) >>! In T409860#11372373, @ssingh wrote: > `hcaptcha-proxy3001` wor... [17:06:32] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11379965 (10fgiunchedi) [17:08:48] 10netops, 06Infrastructure-Foundations, 06SRE: Audit and verify all cloudcephosd have their primary interface tagged and access to cloud-storage vlan - https://phabricator.wikimedia.org/T409690#11379974 (10fgiunchedi) Thank you @cmooney ! FYI as per Andrew we really only care about cloudcephosd1035 through c... [17:15:23] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380025 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin1003 for... [17:21:42] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380072 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [17:39:27] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Traffic host migrations - https://phabricator.wikimedia.org/T405623#11380223 (10BCornwall) @robh Works for me. Mind adding a calendar invite? Thanks. [17:46:48] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380279 (10ssingh) >>! In T409860#11379750, @MoritzMuehlenhoff wrote: >>>! In T409860#11372373,... [17:58:16] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304 (10MatthewVernon) 03NEW [17:59:37] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Measure request frequency of thumbnail sizes - https://phabricator.wikimedia.org/T410304#11380408 (10MatthewVernon) [18:12:48] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380462 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [18:17:15] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380482 (10MoritzMuehlenhoff) >>! In T409860#11380279, @ssingh wrote: > Thanks! Can you share t... [18:34:26] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [18:40:35] hello traffic - ahead of some etcd maintenance tomorrow, I'll be temporarily pointing PyBals in codfw at conf2006 [0]. any concerns if I do that today/soon, rather than waiting until right before the maintenance? [18:40:35] [0] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1203556 [18:41:28] swfrench-wmf: today should be fine. there is no work planned on our end I think that should interfere [18:41:32] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380553 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [18:43:55] sukhe: great, thank you! I'll aim to get started in the next 15 minutes or so, then. I'll flag in -sre about the anticipated icinga check noise (i.e., between puppet run and restart). [18:44:02] thanks! [19:04:45] 06Traffic, 06Commons: Error: 503, Backend fetch failed - https://phabricator.wikimedia.org/T410201#11380634 (10ssingh) Tagging Traffic for this is perfectly fine, thanks @A_smart_kitten. @RoyZuo: can you confirm this is still happening? I am asking that to see if the timing matched with some operational issue... [19:20:51] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380697 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [19:22:02] [non-urgent] is there already a task for the persistent `PYBAL CRITICAL - CRITICAL - k8s-ingress-dse_30443: Servers dse-k8s-worker2002.codfw.wmnet are marked down but pooled` alert spam? [19:22:02] makes it kind of slow to wait for a green icinga board in sre.loadbalancer.restart-pybal :) [19:30:50] optional: select the alerts in icinga web UI and click "reschdule next service" check to make it faster :p [19:32:18] mutante: heh, yeah - so, I do that for ones that _will_ succeed. alas, that dse-k8s-worker2002 nonsense never will :) [19:33:57] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380794 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [19:34:29] swfrench-wmf: I see.. hmm. how about downtime cookbook with an expiration time of 10 years:) [19:34:59] lol [19:35:14] swfrench-wmf: sadly no. the issue is that the host is down but marked pooled. we have pinged the relevant team for it. [19:35:57] just redirect the alerting [19:36:06] sukhe: thanks! I'll bother them directly about it [19:36:33] we have two options, either depool the host (we can't do it ourselves) or just ignore the alert and remind them again. we have been doing more of the latter [19:36:36] so, unfortunately, these are icinga checks that would be hard / awkward to make per-service [19:36:41] and they are looking into it, so there's that [19:37:34] yeah the other issue is that in Q3, hopefully, there is no more Pybal anyway so there's that [19:37:41] or edit the alerting rules so that monitoring system pings them for you and then switch to ignoring it [19:37:56] the reason I bring this up is so to clarify why we are not really spending much time on it for now [19:38:40] sukhe: yeah, I was imagining that in a liberica world, these are prom alerts with labels indicating the affected service, so this becomes a non-problem (or at least a silence-able one) [19:39:20] yeah, it's really a non-problem there in a way as far as Liberica admin goes [19:39:25] *admin work [19:42:24] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11380805 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin10... [20:29:01] 06Traffic, 06Infrastructure-Foundations, 06SRE, 10vm-requests: eqiad/codfw/esams/ulsfo/eqsin/drmrs/magru: 2 VM request for hCaptcha proxy (bird/anycast), total of 14 - https://phabricator.wikimedia.org/T409860#11381013 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 f... [23:07:28] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11381580 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host sretest2004.codfw.wmnet with OS trixie [23:51:40] 10netops, 06Infrastructure-Foundations, 06SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11381762 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host sretest2004.codfw.wmnet with OS trixie completed: - sretest2004 (**PASS**)...