[07:40:02] marostegui: can I start reboots on s7? [07:51:59] Yep [08:17:19] Morning folks, can I get +1s on two changes, please? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1278274 to restore 3 nodes to the rings and start draining the next 2; and https://gitlab.wikimedia.org/repos/data_persistence/swift-ring/-/merge_requests/18 to teach the ring manager about the new racks [08:19:18] thanks m.arostegui :) [09:28:15] Hi, I'm back [09:31:48] hi :) [13:23:25] jynus: there are pending DNS changes for db2141 and I saw you decommissioned it [13:23:26] can I merge? [13:23:43] it is ongoing [13:23:57] the script is running now [13:24:09] I will have it locked until it finishes [13:24:27] I'm running the decommissioning cookbook on a different host [13:24:31] not me personally, the ongoing script [13:24:41] so not sure at what stage you are [13:24:44] and now it wants to merge the DNS changes for your host as well [13:24:52] what are the change you are shown? [13:25:00] yep, go ahead [13:25:13] mine will do a second pass anyway [13:25:31] -20 1H IN PTR db2141.mgmt.codfw.wmnet. [13:25:31] -43 1H IN PTR db2141.codfw.wmnet. [13:25:31] -db2141 1H IN A 10.192.32.43 [13:25:31] -db2141 1H IN A 10.193.0.20 [13:25:36] all good [13:25:57] thanks [13:26:49] 2026-04-28 13:26:37,044 [INFO] Nothing to commit! [13:26:54] So it is smart enough [13:28:48] or not transactional enough depending on pov :D [13:30:06] it is dns, based on udp, good enough if it doesn't unalive itself :-D [13:36:06] I cannot delete an instance because of the locks table [13:36:11] from zarcillo [13:36:25] and that will likely break the grafana generation script [13:36:44] federico3: let me know what has to be ammended at https://wikitech.wikimedia.org/wiki/MariaDB/Decommissioning_a_DB_Host [13:38:06] and most likely tell you why fks are evil in a web environment [13:38:50] should I delete cascade or delete the constraint? [13:43:08] I'm confused... because of the locks table? [13:44:28] yep, deletion of instance_section table fails [13:44:36] ah yes, there's a fk on section_instances [13:44:41] CONSTRAINT `locks_ibfk_1` FOREIGN KEY (`instance`) REFERENCES `section_instances` (`instance`) [13:44:55] delete cascade or drop fk, your call [13:45:00] yes we can remove the fk [13:45:22] that way it will conserve historical records [13:45:25] cascade is ok [13:45:39] ha ha, which one you prefer of the 2 :-D, ok with both [13:45:48] I don't think there's much value in keeping old locks around [13:45:55] cascade will remove data, fk will conserve data but remove integrity check [13:46:17] ok, then will keep the fk (you can remove it in the future and just remove the records referencing it) [13:46:47] but if you keep it you may want to complete the docs on the link above with the cascade deletion [13:47:53] it is bit weird to reference section_instances and not instances anyway [13:50:02] indeed, I'm removing the constraint, after all it's not really useful [13:51:00] that is not urgent, I deleted the references, which unblocked the run of generate prometheus on prometheus hosts (my only worry) [15:40:00] urandom, Emperor: can you please look into https://phabricator.wikimedia.org/T424674 in the next two days, the old intermediate expires on Sunday and since a lot of European people are out on Friday for Labour day, it would be good to have this wrapped up by Thursday for a final sanity check [15:42:20] ah, that's quite short notice [15:42:46] I'll try and look at it for my things tomorrow [15:46:18] ok! [16:06:45] moritzm: stupid Q> why can't profile::tlsproxy::envoy::cfssl_label: just be updated in hieradata/common/profile/tlsproxy/envoy.yaml and thus rolled out everywhere? [16:08:51] it means checking 40ish services whether they still work properly, so the blast radius is really large [16:09:33] and we need to have it done really soon, so this is the more realistic way [16:09:55] manuel, if around, do you mark T424028 when done on our side or you wait for dc ops to resolve the subtask? [16:09:55] T424028: Decommission db2141-db2152 - https://phabricator.wikimedia.org/T424028 [16:14:23] Emperor: we have also been warning people at the SRE meeting for a couple of weeks about this work :) [16:16:10] the change should really be a hiera key set, and then the puppet agent takes care of everything. We had some issues with some other systems but everything should work fine now, but it is surely safer to always do it one host at the time with puppet disabled etc.., at least for sensitive clusters [16:16:27] let us know if you need reviews etc.. [16:18:37] elukey: thanks, I'd missed the "you need to do X" part of the process, alas. I'll try and get it done tomorrow, will need a +1 (or 3) then. [16:20:40] just add me and Luca as reviewers, we'll be reviewing these with short turnaround [16:21:17] ack, ta [16:33:05] also please ping us if you encounter any issue etc.. [16:33:12] hopefully it will be a very quick rollout [16:33:14] thanks! [17:30:45] jynus: i tick it as done from our side, as there's nothing else left for me