[06:36:39] greetings [08:07:01] hello! [13:57:37] did someone work on/reimage cloudcephosd1052 or shall I do that now? [13:58:16] I didn't do it and I don't think anyone else did [13:59:04] I will then :) [13:59:13] thanks! [14:00:23] unrelated: is anyone but me maintaining a git repo that gets periodically rebased against a different upstream repo? I'm wondering if there's a good way to do that without making gerrit angry [14:01:06] Obviously the one-patch-at-a-time model doesn't work with that, and gerrit doesn't like me pushing something that contains differently-hashed versions of patches it already knows about. Maybe --force is the only option [14:32:16] andrewbogott: we do something vaguely similar for pywikibot, but I'm not sure if it's a good workflow https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Pywikibot_image [14:33:00] so if you're doing a merge, that means that all of the wmcs-specific patches are buried deep in the git history right? [14:33:39] I'm not even sure, but I have to do one this week as part of clinic duty :D [14:33:42] that would make gerrit happy I guess. I like being able to see our hacks as patches that are applied on top of the upstream though... [14:39:07] the pywikibot repo has a separate branch for toolforge hacks, so you can do something like `git log stable..toolforge --no-merges` to see the local commits [14:39:56] huh, I didn't know about --no-merges [14:40:46] but yeah, if it's a long lived branch IHMO it should definitely be getting merge commits not rebases + force pushes [14:40:55] ok [14:41:26] I think my preference would be to just make a fresh, rebased branch every time I do this. But that leaves it unclear what the latest branch is for any devs who come after me [14:42:02] I have lost a non-trivial amount of time trying to figure out which branch I should be modifying in those horizon repos [14:43:11] yeah [14:43:37] ok, so you think: do everything on main, just merge upstream onto main and don't worry about cluttered history [14:45:41] ? [14:46:27] my preference is that the local repo has two branches: one as a 'clean' upstream branch, which can depending on the upstream branching model be either fast forwarded or merged from upstream, and a local branch with all of our hacks where the upstream branch is merged when updating [14:46:37] so similar to what I have on the toolforge pywikibot repo :-) [14:47:01] what's the advantage of having an upstream branch rather than just adding upstream as an origin? [14:48:51] instantly seeing which upstream version we're running without having to set it as an origin, know which origin branch we're running, having to dig when the last merge was, etc [14:49:39] * andrewbogott wonders if he can get there without starting over [16:26:35] andrewbogott: tried provisioning the cluster again this morning. same problems. :( it looks like kubelet keeps crashing [16:26:51] weird [16:26:55] one very odd thing i noticed is that it's trying to use docker as the container runtime [16:26:59] did you start with a template copied from a working deployment? [16:27:04] and failing to connect because docker isn't running [16:27:20] i'm specified containerd as the runtime and verified that the heat script has that set correctly [16:27:31] *i've* specified [16:27:49] andrewbogott: no, the template is my creation as well [16:28:00] via tofu [16:28:00] ok [16:28:24] * andrewbogott looks for a working template [16:29:47] looks like our templates also specify containerd. Of course I'm not 100% sure that they actually /use/ containerd... [16:30:22] here's a sample: [16:30:26] https://www.irccloud.com/pastebin/H6jmryX8/ [16:30:49] welllll here's the whole thing [16:30:53] https://www.irccloud.com/pastebin/YQ7jMIkD/ [16:31:09] nice [16:31:19] i'll mine against that and look for clues [16:31:19] does anything jump out there as different? [16:31:23] *diff* mine [16:31:29] I'm seeing that with 'openstack coe cluster template show ab05a0c0-e6f8-4162-b56f-9b2d00e6ee7f' [16:32:06] One other thing is that a while ago we were seeing some kind of leak where heat would misbehave after running for a long while. So I'm going to restart all the services now just in case (although I don't think it was the same failure as what you're seeing) [16:33:04] done [16:45:03] i don't see any glaring differences between the two, some minor version differences and the cinder volume driver in my template [16:45:13] https://www.irccloud.com/pastebin/L7P5wdEM/ [16:45:53] * dhinus logging off, enjoy your weekends [16:48:17] oh wait, mine is missing the kube_tag label [16:48:24] that seems... important [16:48:41] dduvall: it ought to be able to talk to cinder but can you try removing that bit for science? [16:49:01] oh yeah, I was looking at that too. It should use a default, but... life is uncertain [16:58:09] i think that might have been it! the `kube_tag` label [16:58:32] kubelet has at least started doing things [16:59:17] https://www.irccloud.com/pastebin/l8YvMgk0/ [17:00:38] thanks for the help, andrewbogott [17:00:45] 'help' [17:01:08] I'm kind of surprise that magnum lets you leave out a tag that's required, but... it's pretty hands-off as you've noticed. [17:01:16] lmk if the cluster comes up all the way! [17:01:59] :D yeah it definitely feels a bit more like incantation than well defined interface, but it seems to be rolling now [17:03:43] it appears to have come up. i have new tofu errors to address but that was expected [17:13:19] nice [17:33:19] Magnum can in theory support multiple cluster types. Most of the interesting config ends up in labels as a result as this is the generic key:value store passed to the backend implementations. https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/blob/main/main.tf?ref_type=heads#L62-82 is a known working config and also has some comments showing where I found the settings I'm using. [17:52:54] bd808: thanks! i did base my config all of what you have there, but somehow i missed the `kube_tag` label [17:54:36] config *on* what you have. typing is hard today