[08:46:46] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11365310 (10fgiunchedi) [11:17:36] 10netops, 06Infrastructure-Foundations, 06SRE: Row C traffic outage Nov 11 2025 - https://phabricator.wikimedia.org/T409800#11365890 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e41d36ab-ea9e-437e-a0db-341d018dedf6) set by cmooney@cumin1003 for 2:00:00 on 2 host(s) and their services w... [11:19:19] moritzm: looks like you have had puppet disabled on sretest1003 for a while and now it's sending rootspam for an expired debmonitor certificate, mind if I re-enable that? [11:20:03] sure thing! [11:20:39] it'll also be reimaged some time this week [11:22:31] thanks, doing [11:28:17] thanks [12:48:02] 10netops, 06Infrastructure-Foundations, 06SRE: Row C traffic outage Nov 11 2025 - https://phabricator.wikimedia.org/T409800#11366307 (10BTullis) [13:01:14] 10SRE-tools, 06DBA, 06Infrastructure-Foundations: Improve database master switchover script - https://phabricator.wikimedia.org/T200306#11366383 (10Marostegui) I am going to merge this into {T200306} [13:01:26] 10SRE-tools, 06DBA, 06Infrastructure-Foundations: Improve database master switchover script - https://phabricator.wikimedia.org/T200306#11366385 (10Marostegui) Sorry I mean into {T409926} [13:53:21] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11366599 (10Andrew) OSD nodes up through 1034 are scheduled for decom in 2026. Unless there's an urgent port shortage, we should only retcon 1035 and... [14:44:36] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 10Toolforge: Create new VRF and networks for Toolforge-on-Metal - https://phabricator.wikimedia.org/T409309#11366850 (10cmooney) So a few things emerged after the call today: * We need a device that will provide NAT (IPv4) and firewalling (IPv... [15:07:48] 10CAS-SSO, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11367003 (10taavi) [15:44:24] moritzm: for https://phabricator.wikimedia.org/T409860, should we file separate tasks for the approvals, or is one enough? (cc Raine) [15:45:48] one task is perfectly fine, I've just +1d it [15:46:14] thanks! [15:47:11] for the group placement, we should just do a rough count I am assuming of the capacity; anything to keep in mind? [15:50:23] I left a comment on task with a proposal, the count can be a little misleading since we started to refresh ganeti nodes with 128G instead of 64G, so the average RAM capacity differs a little depending on how old the servers are [15:50:42] and left a note wrt Bird and routed ganeti, is wikidough on Bookworm or Trixie? [15:51:01] if the latter, then I'd first need to respin the patched Bird deb for trixie [15:52:50] wikidough is still on bookworm for now. pdns-rec 5 was the blocker but we did that and need to test it there, so will eventually move to trixie [15:53:11] yeah, so in this case though that's why I suggested we just go with bookworm for now and reimage to trixie later [15:53:26] because I didn't want it to seem like that is a blocker because it's not in a way [15:53:57] re: routed ganeti in magru/esams, thanks, already in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1204073 [15:55:18] basically bird 2.17.1+branch.mq.bgp.multilisten.c47b08a1524c-cznic.1 not in trixie is the only blocker for the new hcaptcha VMs (we already packaged anycast-hc as that is running on durum nodes that are trixie) [16:01:32] sorry, mentally mixed up wikidough and hcaptcha, this was actually meant in reference to the new nodes [16:01:54] let me see how we can get the bird-enabled build for trixie tomorrow [16:02:20] the current deb was provided from their branch and the plan was to move to 2.18 once released (since that will have all changes merged in) [16:02:31] moritzm: no worries, don't stress about it please! that's why I didn't mention it :P [16:02:38] we can always reimage later [16:02:48] but shouldn't be too complicated, I [16:02:54] will give it a quick shot tomorrow [16:03:18] we are reimaging to insetup now anyway so we can do that later [16:03:48] ok! [16:04:46] the current hcaptcha nodes are trixie but they are not using bird so doesn't apply! [16:11:49] ah, ok! then there's no trixie bocker in terms of routed ganeti for them [16:12:45] yeah for the existing four ones. for the ones in the task above, there are going to use bird, so that blocker is there [16:24:10] ok, I'll have a look at the bird build tomorrow, then [17:39:59] moritzm: for magru and esams, which group (if any) do we need for makevm? asking due to routed ganeti [17:41:04] (or actually just for esams I assume) [18:11:02] +++ [18:58:40] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11368174 (10RobH) Day 3 (Monday) : No Migrations, catch up day for other tasks for both Rob and John. Tuesday Holiday doesn't count Day 4 Update (Wednesday):... [19:00:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11368177 (10RobH) [19:00:36] 10netops, 06Infrastructure-Foundations, 06SRE: Row C traffic outage Nov 11 2025 - https://phabricator.wikimedia.org/T409800#11368178 (10RobH) [19:01:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11368182 (10RobH) [19:47:38] moritzm: do you still aspire to look at https://phabricator.wikimedia.org/T409328 or should I take another stab? [22:14:03] andrewbogott: I had a quick look yesterday and the CAS part looks all fine in the logs [22:14:44] I miss some context what's actually the finer details of the indended setup and I currently have some more pressing things to look at [22:15:34] so please take another stab, otherwise I'll try to make some time for it next week