[06:26:15] We have a pending project creation request: T425909 [06:26:16] T425909: Request creation of wikitts VPS project - https://phabricator.wikimedia.org/T425909 [07:39:16] komla: I think that should be pending a clarification after https://phabricator.wikimedia.org/T425804#11907981 [07:40:11] also with the new context from thec cloudvps request I'm wondering what prevents them to make the already existing setup in ml-lab to be exposed [07:40:20] without having to recreate envs [08:06:45] volans: I was wondering the same thing [08:07:40] tho from their "urgency" of having the cloud vps project created this week, I doubt they are willing to wait for the support to be implemented in toolforge [08:17:16] morning [08:38:26] volans: the existing ml-lab machine is a bare metal host, is it possible/acceptable to expose services running there? [08:39:09] to run a 12G container in toolforge, I'm not sure what we would need, but I think it might be tricky [08:39:39] so +1 from me to use cloud vps [08:41:39] I see a.ndrew mentioned that this was discussed yesterday in the infra meeting, what was the outcome? [08:50:45] dhinus: I don't know what's the specific setup of ml-lab, but if they use it as a dev/lab test host they should also have a way to expose services there I guess :) [08:54:25] yeah that's true :) but they wrote other teams currently do not have access, so maybe they use SSH tunnels or something similar? [08:54:58] I fear that, but that's easily fixable with existing puppet code :D [08:56:07] right now I'm not fully remembering all the details but I think we said that if they can split into a bunch of small instances on toolforge that would be ok? There was a discussion about k8s scheduler to understand how that migth affect available slots in the fleet [08:56:24] fear confirmed: https://wikitech.wikimedia.org/wiki/Machine_Learning/ML-Lab#SSH_tunnel [08:56:27] :D [08:57:04] :P [09:04:26] I'm asking also because AFAIK ml-lab has GPUs [09:06:16] (yep, dual gpu) [09:07:28] if might not be used for this project but in general things that run there might not be easily replicable elsewhere [09:27:23] here we go again: https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem [10:16:34] the list of compromised packages seems limited to mostly niche ones, I don't think we're likely to be affected [12:29:31] can I get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286337? [13:48:31] My response https://phabricator.wikimedia.org/T425804#11907981 was written while also talking about a different topic in a meeting so may need additional clarification. I was going to add a sentence like "have you tried using more, smaller workers?" but it looks like their whole point is that their performance improves when the consolidate onto fewer workers. [14:50:24] andrewbogott: do you want to try asking that question in the task? we could also try to suggest the route volans was describing above (exposing the ml-lab instance) [14:51:42] sure unless you want to :) [14:52:38] I'll happily leave that to you :) [14:53:30] (personally I would probably just approve the cloud-vps request, but I agree there might be better solutions) [15:19:44] dhinus: if they want to scale horizontally (more small pods) can I volunteer you to help them with their config? I would have to learn how to do that before I could show them how :) [15:20:21] `toolforge jobs run --replicas N` [15:21:00] andrewbogott: sure you can tag me, but also what taavi said :) [15:21:53] heh, I know nothing of their design but maybe that'll do it! [15:23:25] ok, done [15:23:25] https://phabricator.wikimedia.org/T425804#11913283 [16:55:42] andrewbogott: [16:55:42] > Successfully built magnum_cluster_api-0.36.6-cp38-abi3-linux_x86_64.whl [16:56:14] had an another look of the packaging from a few days ago and finally found why it wasn't able to build even when pulling rust deps from cargo directly [16:56:38] yeah, I can build the wheel, it's the debian packaging that I'm fighting with... making some progress there though [16:56:44] what were you running into with the wheel build? [16:57:04] no, that's /inside/ the debian build that I got it to produce a wheel [16:57:09] oh, cool! [16:58:21] but the thing i was running into was that dh-python was also doing its own thing for blocking network access which i didn't know about [16:58:41] so i just tried to run sbuild with its network blocking functionality disabled and was very confused when the connection error didn't change [16:58:53] oh, it was dh-python! I knew /something/ was blocking the network [16:59:06] sounds like you have gone down the same path as me, so far :) [16:59:24] eventually I found https://manpages.debian.org/trixie/dh-python/pybuild.1.en.html#ENVIRONMENT [16:59:46] now this is failing on dh-python's attempt to run tests, give me a second and I'll have a deb :P [17:00:23] nice! [17:00:27] the patch I'm going down is: [17:00:47] override_dh_clean: [17:00:47] dh_clean [17:00:47] mkdir -p .cargo [17:00:47] cargo vendor > .cargo/config.toml [17:01:15] which avoids having rust download things at build time but I don't know that it's necessarily better that way [17:01:27] and I'm not to the finishline anyway [17:02:02] s/patch/path/ [17:04:05] doing that in dh_clean feels not-very-nice [17:04:22] could do a repack if you wanted to avoid loading that at build time, but also feels pretty meh [17:04:52] yeah, I don't think it's obviously better/different [17:05:09] and /something/ is still cleaning the Cargo.toml.orig files even after dh_clean finishes [17:05:49] hm, does repack happen before clean? [17:06:01] I guess I could just remove clean entirely :/ [17:22:03] ok, now I'm caught up to where you are, failing on tests [17:28:12] are you seeking to fix the tests, or skip them? [17:36:13] skip them, wasn't as trivial of a fix as i hoped [17:36:21] ok, https://gitlab.wikimedia.org/repos/cloud/deb/magnum-cluster-api builds on unstable now [17:52:56] Thanks! Is that a fork of zigo's thing at https://salsa.debian.org/openstack-team/services/magnum-cluster-api.git or did you start the debian parts from scratch? [17:53:32] from scratch [17:55:05] ok [18:11:05] taavi: would you expect it to build on Trixie as well? I'm getting the cryptic "dpkg-genchanges: warning: 'since' option specifies non-existing version" [18:11:15] If not, will the package you built /install/ on Trixie? [18:11:52] lemme see [18:12:13] for the latter: no, it has native extensions built against glibc+etc. on unstable, those won't work on trixie [18:17:52] oh, actually I guess on Trixie the .deb gets built, it's only the changes that fail. So I can probably load this into reprepro as it is [18:24:49] seems like it needs cargo from backports at least [18:25:07] and that repo needs some metadata fixes before uploading to apt.wm.o [18:26:20] ok. I'm trying a few tests with a by-hand install but won't put it on reprepro for now [18:39:17] andrewbogott: pushed a debian/trixie-wikimedia branch that should be good enough to get you started [18:39:33] thx, will try