[06:59:29] greetings [09:46:40] hello! [09:54:21] any opinions where to publish this T411590 from? the data is relatively easy to generate from any puppetized host, just needs a web server capable of serving some puppet-generated json files. run it on a tiny new vm in cloudinfra? re-use some existing vm in cloudinfra (enc? download?)? [09:54:22] T411590: Publish machine-readable version of Cloud VPS IP space - https://phabricator.wikimedia.org/T411590 [10:01:10] download was my first thought but T367593 made me wonder if we shouldn't be putting more stuff in there [10:01:11] T367593: Replace 'download' cloud-vps project after we support per-tool object storage - https://phabricator.wikimedia.org/T367593 [10:03:22] taavi: new vm in cloudinfra sounds good to me, but I'm also fine with reusing another vm [10:04:10] yeah both options seem fine [10:04:41] "download" is probably not where I would look if wanted to find this in 1 year... "cloudinfra" seems more apt, but I guess you can always find it grepping puppet :) [10:06:05] I'll do a new VM them, we can call it meta.wmcloud.org or something [10:06:20] (bikeshedding for the name welcome now, but not in an hour when I've written all the code) [10:07:37] lol, meta sounds good [10:26:51] +1 :) [10:46:46] sorry, an hour was too conservative of an estimate. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214475 (+ its parents) [10:49:09] and https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/285 [10:54:13] LGTM [10:57:54] does that apply to the parent gerrit patches as well? [10:59:49] (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214474, that is) [11:05:30] looking, my bad I missed the parent [11:06:44] yes all good [11:06:45] thx [11:19:31] it works https://meta.wmcloud.org/ [11:19:54] nicely done \o/ [12:13:56] similar thing but for toolforge: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214493 pcc shows an empty array, but I suspect that is an artifact of how PCC and PuppetDB work [12:23:58] yep, tested on toolsbeta and it pulls the puppetdb data fine [12:26:08] taavi: lgtm, +1d [12:26:13] ty [12:27:35] hmm, I just realized we could use the same mechanism to publish lists of, say, the cloud vps proxies or the metricsinfra prometheus hosts, and then pull that data in tofu manifests to adjust specific security group rules [14:43:23] I'm going to need to delete/recreate the toolforge etcd nodes; has anyone built one recently? It looks like the procedure is 1) make the node by hand 2) add to cluster with wmcs.toolforge.k8s.etcd.add_node_to_cluster but I wonder if I'm missing some automation for step 1. [14:44:37] andrewbogott: cookbook wmcs.toolforge.add_k8s_etcd_node [14:45:50] oh, I see, it's in a different place in the cookbook tree [14:45:51] thx [14:46:48] does that also cluster or do I need to run the cluster command after? [14:47:15] try it and see? [14:47:23] but probably not [14:48:17] also please update https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Upgrade_etcd once you've done that [14:48:54] ok! Dropping the clusters down to two nodes for now, may send some warnings [14:51:15] andrewbogott: did you intend to just run the cookbook that removed the node from some but not all settings but not the node itself? [14:53:07] also also T375217 seems like a wildly incorrect task for that [14:53:08] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [14:53:43] Well... I can create some short-lived subtasks if you want :) [14:53:52] I'm deleting the node so I can reimage the cloudvirt it lives on [14:55:42] T375217 is tagged cloud-vps, and the task title and description lists hardware, not VMs in some tenant project [14:55:47] plus the docs I just linked to link to T361237 [14:55:47] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [14:56:25] if you want to delete a node I would recommend running the delete cookbook, not the one that does a tiny part of that which you just used [14:57:49] ...I can't follow what you're saying about the task linked. [14:57:58] What I'm doing today is: upgrading cloudvirtlocal hosts to Trixie. [14:58:02] To do that I need to drain them [14:58:05] which is what I'm doing [14:59:26] (or, at least, I read 'wmcs.toolforge.k8s.etcd.depool_and_remove_node' as doing that, is that wrong?) [14:59:54] wait what, is that a different thing than wmcs.toolforge.remove_k8s_etcd_node? [15:00:34] also, ignore what I said about the task, I somehow got the impression you were upgrading the etcd nodes themeselves [15:00:40] ok, I figured :) [15:00:55] I'm not sure about the competing cookbooks, I assumed that since one explicitly mentioned depooling that that was what I wanted [15:01:07] but also it failed, so... now I need to understand that [15:01:29] aha, those cookbooks are the exact same thing but with slightly different names [15:02:15] if you're sure they're the same, lets get rid of one of them! [15:03:17] by the same I mean remove_k8s_etcd_node is literally just importing k8s.etcd.depool_and_remove_node and exporting it with a different name [15:03:45] weird [15:36:32] godog: hey, I am working on this one: https://phabricator.wikimedia.org/T410989 [15:36:58] just in the process of disabling the switch interfaces in Netbox now, after which I will run homer to disable the ports on the switches [15:37:03] am I ok to proceed? [15:37:24] topranks: sweet! yes I'm fairly sure we're all set, you shouldn't see link on those interfaces [15:37:42] ok cool wasn't sure on the satus host-side, I'll double check that first then [15:38:00] thank you, yes host side the interface is meant to be down [15:44:33] godog: hmm quite a few of them seem to be up [15:44:35] https://phabricator.wikimedia.org/P86381 [15:45:48] I think this may be a quirk of the DAC cable or NIC [15:46:06] looking at, for instance the first one for cloudcephosd1035, on the host it shows hard DOWN, but the switch sees a link [15:48:08] even stranger the switch knows a MAC on the port [15:48:17] I think that's due to the NIC trying to do LLDP or something [16:01:17] godog: I am going to ignore the switch side and focus on the hosts [16:01:29] from that point of view we are good on all of them apart from cloudcephosd1052 [16:01:30] https://phabricator.wikimedia.org/P86382 [16:01:39] I'll continue and shut the links for all but that one [16:15:24] FYI folks pushing the config to the switches now [18:56:45] * dhinus off [19:01:00] andrewbogott: you may get some diffs in puppet-merge from me with vlan ids [19:01:05] you can go ahead with those [19:01:23] I think I got in before that [19:01:26] want me to merge again? [19:02:10] woo quoting bugs! merged [19:21:40] Regarding T361237, is there any reason not to skip ahead to Trixie? I've created on Trixie node for toolsbeta and it seems to be clustering fine (toolsbeta-test-k8s-etcd-27.toolsbeta.eqiad1.wikimedia.cloud) [19:21:41] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [21:38:46] andrewbogott: have you done something recently that might have caused the tools.db.svc.eqiad.wmflabs DNS record to just disappear? [21:40:35] re trixie: does etcd support skipping from 3.3 (bullseye) to 3.5 (trixie)? IIRC that blocked skipping the last time [21:41:49] I'm rebuilding the dns servers. If for some reason a record was present in pdns but not present in designate... Otherwise no. [21:41:59] I'm afk but will be back in 10 [21:44:03] https://phabricator.wikimedia.org/P86393 recursor issue it seems? [21:45:19] .wmflabs is straight up just missing from `forward_zones`?? [21:58:28] I tried fix that in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214656 but can't figure out how to get the ruby template to work correctly so I'm instead reverting https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214657 for now [22:02:42] no no, don't revert that, that will break everything immediately [22:02:46] that's required on trixie [22:02:54] sorry, just got back [22:04:27] sigh, reverting the revert then [22:04:46] thanks [22:05:19] do you have a paste of what that template rendered to, in yaml? [22:05:25] otherwise I can wait and see [22:05:59] which template? [22:06:49] I mean, what did https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214656 produce? [22:07:01] the PCC fails to compile [22:07:05] oh, ok [22:13:29] I'm working on a patch, will be a few [22:13:57] andrewbogott: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214656 PS5 seems to work fine [22:16:57] commented, I think that concatenating needs do happen conditionally [22:17:05] although I'm relatively sure that the edge case I'm thinking of doesn't exist [22:17:42] good catch, updated and running PCC again [22:19:09] I'm happy if pcc is happy [22:19:47] https://puppet-compiler.wmflabs.org/output/1214656/7788/ [22:20:34] seems ok to me [22:20:39] looks right although I wouldn't mind if it gave us a little more context :/ [22:21:29] go ahead and merge, I'll apply on the cloudservice node [22:22:05] already running [22:24:25] I can resolve .wmflabs names again [22:24:28] thanks! [22:24:44] I'm looking at the eternal 'Info: Loading facts' and wondering why on one else voted for the 'puppet is now super slow' papercut [22:24:53] Oh great! Thanks for noticing + the patch [22:25:04] do you have a minute to talk about etcd or would you like to go? [22:25:30] a minute, sure, but not much more :-P [22:26:10] so here is me on a trixie host checking to see if the pool is working... [22:26:27] https://www.irccloud.com/pastebin/hLoEcI2R/ [22:26:41] but does that actually show that the /cluster/ is working, or just that each individual node can take a write/read? [22:28:21] that's 2x nodes running 3.3.25 and one running 3.5.16 [22:29:21] ok, anyway, you should go, I don't have a burning need to upgrade etcd before tomorrow :)