[04:11:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674#12021945 (10Papaul) @ayounsi @cmooney i was looking at moving the mgmt interface to irb.900 and I noticed on all the mr's there is the default DHCP... [05:00:26] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674#12021991 (10Papaul) [05:02:43] 10netops, 06Infrastructure-Foundations: codfw: upgrade routers (2026) - https://phabricator.wikimedia.org/T417871#12021993 (10Papaul) @BCornwall FYI we are planning on doing the cr2-eqdfw Junos upgrade next week on Wednesday June 24th at 10:00am CT. Thanks [05:37:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: Standardize management routers interfaces - https://phabricator.wikimedia.org/T421674#12022005 (10ayounsi) Indeed, we can remove that DHCP pool. [06:01:11] 10netops, 10homer, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Homer should abort on filter rules applied on non-existent or disabled interfaces - https://phabricator.wikimedia.org/T428886#12022047 (10ayounsi) yeah me neither, it's a tradeoff (engineering time/impact) that I think is accepta... [06:27:13] 06Traffic, 06Data-Platform-SRE, 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457#12022104 (10ayounsi) We discussed the HTTP Proxy vs. urldownloader during the I/F meeting and you can go ahead... [06:34:25] 06Traffic, 06Infrastructure-Foundations, 06SRE: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12022111 (10ayounsi) Thanks for the great writeup. For the advantages and drawbacks you listed, my preference would go to (3) liberica/pybal. To me the ru... [06:43:28] 06Traffic, 06Infrastructure-Foundations, 06SRE: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12022117 (10MoritzMuehlenhoff) Let's wait until Liberica is available and then go with 3. [10:05:45] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 13Patch-For-Review: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12022835 (10cmooney) [10:14:01] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 13Patch-For-Review: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12022852 (10FCeratto-WMF) For the `db*` hosts they can be indeed depooled as described. [10:20:02] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 13Patch-For-Review: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12022872 (10cmooney) >>! In T428020#12022852, @FCeratto-WMF wrote: > For the `db*` hosts they can be indeed depooled as described. Thanks for the confirmatio... [10:57:04] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12023045 (10MoritzMuehlenhoff) [12:07:05] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12023355 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f96964dd-ded4-440c-9105-9a1b97d2144f) set by cmooney@cumin1003 for 1:00:00 on 5 host(s) and their servi... [12:10:54] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12023377 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1dff57c0-efff-46db-8f85-423d33775bce) set by cmooney@cumin1003 for 1:00:00 on 29 host(s) and their serv... [12:30:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12023473 (10jcrespo) [12:40:40] 10netops, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE, 06tools-infrastructure-team: Upgrade cloudsw1-e4-eqiad - https://phabricator.wikimedia.org/T429013#12023550 (10fgiunchedi) Indeed the recent rack redundancy testing has shown we are resilient to the loss of one rack, for all hosts but cloudvirts... [12:41:13] 10netops, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE, 06tools-infrastructure-team: Upgrade cloudsw1-f4-eqiad - https://phabricator.wikimedia.org/T429014#12023556 (10fgiunchedi) See my update at https://phabricator.wikimedia.org/T429013#12023550 since it applies equally here [12:51:13] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12023612 (10cmooney) Switch upgrade was successful, all hosts are back online and being repooled. @MatthewVernon if you want to check thanos-be2006 please do, it's back doing norm... [13:04:18] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12023706 (10MatthewVernon) Thanos-swift cluster looks good, thanks. [13:13:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Install new MPC10E-10C line cards on cr1-eqiad and cr2-eqiad slot 0. - https://phabricator.wikimedia.org/T426343#12023730 (10cmooney) [13:13:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Install new MPC10E-10C line cards on cr1-eqiad and cr2-eqiad slot 0. - https://phabricator.wikimedia.org/T426343#12023732 (10cmooney) >>! In T426343#12016920, @Papaul wrote: > @cmooney I took a look at the steps all look good to me for... [13:17:14] Hey! Another question on urldownloader vs webproxy: T429338 [13:17:15] T429338: Standardize on a single web proxy hostname - https://phabricator.wikimedia.org/T429338 [13:17:52] We had a similar discussion in T427457, where the recommandation is to use urldownloader for the moment. But I'm not clear on why, so I don't know if the same reasoning applies here as well. [13:17:53] T427457: Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457 [13:18:32] 06Traffic, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Standardize on a single web proxy hostname - https://phabricator.wikimedia.org/T429338#12023746 (10Gehel) [13:19:26] 06Traffic, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Standardize on a single web proxy hostname - https://phabricator.wikimedia.org/T429338#12023751 (10Gehel) [13:19:52] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move WMF5520's switch ports to frack-eqiad-administration vlan - https://phabricator.wikimedia.org/T429340 (10Jgreen) 03NEW [13:33:43] gehel: we can take a look in a bit, thakns [13:45:31] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#12023875 (10cmooney) [13:46:55] gehel: yes there have been various discussions happening about this over the last week [13:47:55] I think we'll arrive with a better setup easier to understand than now but there are a few things to be ironed out [13:53:35] 06Traffic, 06Data-Platform-SRE (2026-06-05 - 2026-06-26): Standardize on a single web proxy hostname - https://phabricator.wikimedia.org/T429338#12023924 (10cmooney) There are also the hcaptcha-proxy VMs too. In terms of the original point I believe there are differences between the configuration across these... [13:58:36] 06Traffic, 06Infrastructure-Foundations, 06SRE: Scaling urldownloaders by adding redundancy and load balancing - https://phabricator.wikimedia.org/T429175#12023981 (10cmooney) Option 3 is fine I think. The only thing I'd say against it is with my network hat on I'm always trying to minimise hops, especially... [14:49:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#12024308 (10Papaul) [15:14:50] topranks/sukhe : Thanks for the help! No emergency here at all, but cleaning up the mess on our side will be nice! [15:32:29] gehel: Even though it totally sounds like it's Traffic's domain, I'd say that nobody "owns" the proxy situation - it was created way back when, even before Traffic existed. [15:33:16] This lack of ownership is probably why we have the disparate hostnames :) [15:54:44] 10netops, 10homer, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Homer should abort on filter rules applied on non-existent or disabled interfaces - https://phabricator.wikimedia.org/T428886#12024751 (10cmooney) >>! In T428886#12019220, @taavi wrote: > I'm not a huge fan of relying on the exac... [17:07:30] 10netops, 06Infrastructure-Foundations: codfw: upgrade routers (2026) - https://phabricator.wikimedia.org/T417871#12025355 (10BCornwall) Thanks for the heads up, @Papaul! Is there anything you require of traffic during that time? [17:31:47] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12025477 (10cmooney) 05Open→03Resolved All done here. [17:32:05] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#12025486 (10cmooney) [17:42:35] 06Traffic, 06Infrastructure-Foundations: eqsin: re-image rack 604 servers on new vlan - https://phabricator.wikimedia.org/T428229#12025682 (10BCornwall) [17:54:36] 10netops, 06Infrastructure-Foundations, 06SRE: cr2-esams rpd failure after enabling bgp 'graceful-shutdown' (June 2026) - https://phabricator.wikimedia.org/T429386 (10cmooney) 03NEW p:05Triage→03Low [18:51:11] 10netops, 06Infrastructure-Foundations, 06SRE: Network device tls certs: alerting niggles - https://phabricator.wikimedia.org/T429242#12026032 (10cmooney) So something odd is going on here. If I query thanos only two Nokia devices in eqiad are currently showing a //probe_ssl_earliest_cert_expiry// value: `... [18:51:42] 10netops, 06Infrastructure-Foundations, 06SRE: Blackbox probe for TLS cert expriy failing on multiple eqiad SR-Linux nodes - https://phabricator.wikimedia.org/T429242#12026037 (10cmooney) [19:35:06] 10netops, 06Infrastructure-Foundations, 06SRE: Blackbox probe for TLS cert expriy failing on multiple eqiad SR-Linux nodes - https://phabricator.wikimedia.org/T429242#12026310 (10cmooney) 05Open→03Resolved Alight small gap when gnmic had to reconnect but other than that lsw1-c4-eqiad is back working... [19:52:02] 06Traffic, 06Infrastructure-Foundations: eqsin: re-image rack 604 servers on new vlan - https://phabricator.wikimedia.org/T428229#12026403 (10BCornwall)