[02:33:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10169923 (10Papaul) [02:35:08] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10169924 (10Papaul) [02:46:09] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10169925 (10Papaul) [07:33:17] 06Traffic: Test liberica BGP support - https://phabricator.wikimedia.org/T375464 (10Vgutierrez) 03NEW [07:33:27] 06Traffic: Test liberica BGP support - https://phabricator.wikimedia.org/T375464#10170092 (10Vgutierrez) p:05Triage→03Medium [07:33:44] ^^ XioNoX, topranks when you have the chance I'd need some help from you with that one [07:37:24] vgutierrez: sure, a bit later today [07:38:01] yeah, no rush [07:46:29] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10170101 (10ayounsi) From JTAC: > [...] after engaging further resources we have been requested to attempt a full chassis reboot and check if the issue persists before proceeding with the... [08:38:43] 06Traffic: Test liberica BGP support - https://phabricator.wikimedia.org/T375464#10170262 (10ayounsi) > it should be enough to add lvs1013 to the list of eqiad lvs_neighbors in homer or do we need something on top of that? In eqiad they're still configured manually, otherwise we now just have to flip the BGP fl... [08:39:07] vgutierrez: replied, but in short let me know when to add the session on the CR side [08:40:26] cool [08:40:28] thx XioNoX [08:46:08] XioNoX: BTW, now that you rebooted cr3-ulsfo should we repool ulsfo? [08:46:30] router seemed to be ok without traffic since the last crash [08:47:38] vgutierrez: I prefer to keep monitoring a bit longer, and we can repool later today [08:47:52] XioNoX: cool, ping me when you're ready [08:56:20] 06Traffic, 06collaboration-services, 06SRE, 13Patch-For-Review, 10Release-Engineering-Team (Radar): implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#10170311 (10Jelto) I reviewed the [throttling in the past 7 days](https://grafana.wikimedia... [09:56:48] 06Traffic, 06collaboration-services, 06SRE, 13Patch-For-Review, 10Release-Engineering-Team (Radar): implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#10170568 (10Jelto) [13:18:44] XioNoX: looking good so far? [13:19:27] let me check [13:21:26] sukhe: so far it has been going fine, so +1 to repool [13:21:59] XioNoX: thanks! and thanks for following up so quickly [13:22:01] will pool shortly [13:24:30] there will be lots of eyes around today, so don't hesitate to depool if it shows signs of issues. If it happens again, we will have to replace the router. [13:25:42] yeah! (also on on-call) [13:37:59] sukhe: are you repooling? should do I? [13:38:08] oh.. you already done it [13:38:09] cool [13:38:15] vgutierrez: just did [13:38:22] hopefully it goes well [13:38:43] aaand you just jinxed it ;P [13:39:52] can't even blame fabfu.r, he is not around :( [13:47:28] thank you all for making that happen :) [13:47:46] swfrench-wmf: thanks to XioNoX but yeah, ulsfo is pooled :) [13:49:19] looking good so far at ~4k rps [13:49:56] frees up codfw a bit as expected: https://grafana.wikimedia.org/goto/Ugg69BgHg?orgId=1 [13:52:33] a bit? 50% of its load [13:52:48] 40-50% [13:52:56] it's an important chunk [13:54:04] I meant bit as in figure of speech but point taken :] [13:55:32] ;P [13:56:11] vgutierrez: in some cultures, understatement can be a kind of emphasis ;) [13:57:05] what's the overused quote anyway, "underpromise overdeliver" something? so say 4k rps and maybe you get 8k [13:57:07] my L8 is thick as a brick, ignorant of those subtleties.. I'll ask for a firmware upgrade but it seems highly improbable that it will happen at some point [13:57:46] so you can blame me :) [14:39:52] XioNoX, topranks: BTW, dunno if you remember our rp_filter talk, but it looks like disabling/relaxing it for ipip0 and "all" is enough [14:40:17] it looks like the kernel applies the most restrictive one between the special setting "all" and the interface one [14:42:43] yeah.. "The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}." [14:53:28] good to know! [15:38:57] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#10172078 (10Krinkle) [15:39:00] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs2013: move uplink to lsw1-c2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370927#10172072 (10Papaul) 05Open→03Resolved This is done [15:45:23] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Update codfw LVS connectivity to support new LSW in rows C & D - https://phabricator.wikimedia.org/T370635#10172089 (10Papaul) 05Open→03Resolved a:03Papaul This is also done we can resolve. [15:48:06] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#10172117 (10Papaul) 05Open→03Resolved a:03Papaul This is done we are tracking the decom in https://phabricator.wikimedia.org/T375419 and https://... [15:48:15] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate codfw row C & D database hosts to new Leaf switches - https://phabricator.wikimedia.org/T370852#10172128 (10Papaul) 05Open→03Resolved a:03Papaul This is done [15:50:12] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10172139 (10Papaul) [16:52:06] 06Traffic, 06Data-Persistence, 06SRE, 10SRE-swift-storage, and 6 others: Change default image thumbnail size - https://phabricator.wikimedia.org/T355914#10172323 (10ovasileva) p:05High→03Medium Potential next steps in addition to the tickets above is QTE preparation and test cases. [17:37:29] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Reverse DNS for k8s pods IPs - https://phabricator.wikimedia.org/T344171#10172491 (10CDanis) As discussed at the k8s SIG today: * There were some doubts about if Calico can advertise just a subset of Service ClusterIP range, or if it was all-or-not... [21:24:24] 06Traffic, 06Movement-Insights: Investigating unique devices traffic data - https://phabricator.wikimedia.org/T375562 (10Hghani) 03NEW [21:56:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-d-codfw switch stack - https://phabricator.wikimedia.org/T375419#10173348 (10Papaul) [22:06:50] 06Traffic: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569 (10BCornwall) 03NEW