[07:59:05] 🎉 [07:59:24] and my guess is the log-inventory being set on `all` hosts was the reason `Gathering facts` was dieing out [09:55:54] I wrote a summary of the experiment I have done with Zuul+K8S over the last two days [09:56:07] thanks to corvus assistance I got something ready last night. I wrote a summary at https://phabricator.wikimedia.org/T395826#10883037 [09:57:20] I originally though the jobs could mention the image they wanna run, which could have been used to then `docker run` that image in a VM having Docker [09:57:35] but with Kubernetes there is no such thing (short of doing docker in docker maybe?) [09:58:00] so the legacy images would have to be file as labels in the nodepool config [10:10:06] ---- [10:36:42] corvus: about Ansible attempting to create /root/.ansible/tmp . I think that is because the Kubernetes driver does not provide a `username` and thus `ansible_user` is not set and probably default to `root`. [10:36:58] it turns out the zuul.conf has `executor.default_username = root` [10:37:05] I have changed it to nobody and I have removed the rootdir mounts :) [10:37:22] (summarized as a next comment on the same task) [10:37:32] the Kubernetes driver does not let one set a username [16:03:07] i think we can probably add specifying the username via the k8s nodepool driver. i think that makes sense to bring it to parity with the other drivers. [16:04:32] hashar: if it were important to specify the image name in a job variable, that could be accomplished using the nodepool namespace feature. but as we've seen, there's a lot of extra work needed for that, and we lose streaming logs. so as long as the images aren't changing *very* often, it's probably manageable to add them as labels in nodepool. [16:04:55] note with nodepool-in-zuul, the labels will be defined in zuul, so that process will get a little bit easier (but will stay substantiall the same). [16:17:01] for the user name, I guess I could have solved by setting in the job `ansible_user_name: nobody` [16:18:29] i think it's just `ansible_user` and zuul will override that, so we should fix the k8s driver [16:21:22] +1 :) [16:21:47] I am happy to have found that one since that was really bugging me last night [16:22:41] I am sure lot of our images can be converted to ansible playbooks running in a more or less generic worker image [16:23:06] yeah, that's a more typical usage [16:25:20] I explored a bit using Magnum to spin up a k8s cluster, but there are too many unknowns [16:25:36] how so? [16:26:18] lot of clicks in Horizon, I got something started but I don't know how to access it :) [16:26:27] so can't really configure nodepool to point to it [16:26:37] I talked about it with bd808 (we are in a team meeting) [16:27:27] I guess I will polish up the design doc with the finding from the last couple days [16:27:29] ( https://docs.google.com/document/d/1WWdc137zcKyTECEO6wktOmS2cuKK6iLhvonviuEz1tY/edit?tab=t.0 ) [16:29:04] adding more abstraction to the abstraction I think we would want to build something like https://gitlab.wikimedia.org/cloudvps-repos/deployment-prep/tofu-provisioning to make managing Kubernetes with Magnum a gitops operation orchestrated by OpenTofu. #devops #abstraction #confusing [16:30:05] I did create one at https://horizon.wikimedia.org/ngdetails/OS::Magnum::Cluster/3281dc1f-d4f9-4a81-b403-cc7e6977545a [16:30:18] it is in "CREATE_IN_PROGRESS" [16:41:41] I pasted the wrong docs link. The design doc is https://docs.google.com/document/d/1Y-0nX_0n0AymZ3N-KDRIwuy6V-WXb_EhEP11O0vvAek/edit [17:01:59] I am attempting a schema https://docs.google.com/drawings/d/19N6a4sP85TNmk84-bnFt57GXne0-e6g_CvIEKq7oXHw/edit :) [17:09:44] dduvall: if we only have a few projects left on Gerrit + PipelineLib would it make sense to move them to Gitlab + Kokurri ? [17:10:01] I am assuming the migration is straightforward [17:10:31] that would be one path, sure [17:10:41] my guess is some project did not migrate because they rather stick to Gerrit [17:10:58] right [17:11:09] and it would be nice to serve those folks with a pipelinelib replacement there [17:22:21] ---- [17:22:42] on another topic, there is mention in the doc about "trusted worker node" [17:22:48] and I don't know what it means :) [17:24:13] I have 3 projects using pipelinelib still and all 3 because I'd rather not move to gitlab :) [17:27:38] sounds like something we could dig into on an upcoming call [18:39:35] I have added an architecture diagram to the design doc [19:27:08] til `sleep inf` [19:27:17] special cased to do an infinite loop that produces numbers [19:28:42] and `sleep infinity` which ends up the same for some reason I can't determine [19:39:54] strtold(3): An infinity is either "INF" or "INFINITY", disregarding case. [20:39:40] --- [23:37:06] looking at scrollback re:pipelinelib and some digging: there are 55 pipelinelib repos still, including all the machinelearning stuff that produces a bunch of images from a single repo. Grepping around git shows none are using the parallel execution things, so nothing *too* fancy: producing an image, passing in some envvars, publishing and tagging. The only fancy stuff is letting users [23:37:08] define what images to produce and what to name them...and parsing the config file. [23:38:28] kokkuri does a lot of this stuff, too, modulo parsing a config file since we have a .gitlab-ci.yaml (which lets people define stages). [23:38:51] I wonder if there's some opportunity for unification of our two systems here. [23:39:27] dduvall: ^ pipeline repo stats and hopefully semi-coherent rambling :)