[14:13:11] <volans>	 andrewbogott: I'm not sure I get your comment in https://phabricator.wikimedia.org/T422801#11981516, are those fedora hosts puppetized?
[14:13:33] <andrewbogott>	 no
[14:13:43] <volans>	 those are magnum?
[14:13:46] <andrewbogott>	 yeah
[14:13:49] <andrewbogott>	 but they're just examples.
[14:14:07] <andrewbogott>	 I think that cumin needs to know in advance what username to use when contacting VMs and I don't know how it will know.
[14:14:13] <volans>	 and they live in the same dns namespace of the other hosts and don't have a common hostname prefix/suffix?
[14:14:34] <andrewbogott>	 yes
[14:14:38] <volans>	 that's bad :/
[14:14:55] <andrewbogott>	 well... we might be able to predict the host names most of the time
[14:15:01] <andrewbogott>	 Assuming that magnum doesn't change them :)
[14:15:14] <andrewbogott>	 I guess having something that mostly works will still be better than what we have now
[14:15:16] <volans>	 can we force a prefix/suffix?
[14:15:27] <volans>	 via magnum config I mean
[14:15:34] <andrewbogott>	 good question, that would help
[14:16:02] <andrewbogott>	 the other thing is that sometimes the distro for a given workload changes (e.g. magnum VMs will soon switch from fc to ubuntu or maybe debian)
[14:16:19] <volans>	 same hostname?
[14:16:22] <andrewbogott>	 if we can coerce them to have predictable hostnames when that happens then we're good, but I'm not 100% sure
[14:16:45] <andrewbogott>	 I
[14:17:21] <volans>	 maybe it could be easier to instead setup the root user at provisioning time?
[14:17:28] <andrewbogott>	 I'll read some more config docs. I think I was hung up on wanting an all-purpose solution; if we just add a bunch of special-case lines to the ssh config (and don't mind doing more in the future) then this might be possible-ish.
[14:18:55] <andrewbogott>	 for ssh config, does it have to be prefix/suffix or can we match something in the /middle/ of the hostname?
[14:20:32] <volans>	 https://man7.org/linux/man-pages/man5/ssh_config.5.html#PATTERNS
[14:21:46] <volans>	 to be tested but *foo*.*.eqiad1.wikimedia.cloud should potentially work
[14:22:36] <andrewbogott>	 ok, so we could match on *-node-* although then we'll trip over users who use the same naming scheme
[14:22:49] <andrewbogott>	 I guess I need to decide if I'm trying to make this work for all magnum workers or just paws+magnum
[14:23:20] <volans>	 node is too generic, has to be something we inject
[14:23:32] <andrewbogott>	 yeah, would be a lot better
[14:25:52] <andrewbogott>	 hm, the new driver uses a different naming scheme, examples "kube-9d4oa-5w4q6-qg7nv" and "kube-9d4oa-default-worker-n6q7c-cqwtb-bg8pt"
[14:26:01] <andrewbogott>	 But again, don't think that kube-* is specific enough
[14:27:46] <volans>	 yep
[14:27:58] <volans>	 how does it provision stuff? does it run cloud-init?
[14:29:42] <volans>	 are those VMs tagged in any way in Nova?
[14:31:32] <andrewbogott>	 at the moment it's using ignition and not cloud-init
[14:31:44] <andrewbogott>	 good question about nova tags, let's see...
[14:34:51] <andrewbogott>	 no special tags. But of course we can identify them based on the image name they're based off of (which I have been naming "*for-magnum" lately)
[14:35:27] <andrewbogott>	 There's a real maze of tasks about arbitrary worker naming, I haven't gotten to the end yet to tell what was decided :)
[14:36:53] <volans>	 I dont' see much in the example config beside specifying a specific name
[14:37:05] <volans>	 but I think it's per cluster, not a global prefix
[14:37:29] <andrewbogott>	 for hsotnames, this would likely be a cluster-api setting probably and not a magnum setting.
[14:37:47] <andrewbogott>	 since the cluster-api driver creates different hostnames, I assume that magnum is not specifying them
[14:37:58] <andrewbogott>	 sorry, this is confusing because we stand athwart two different backends right now
[14:38:23] <andrewbogott>	 I need to run a test and make sure that the root user is actually disabled with the latest driver -- that will take a few minutes.
[14:39:27] <volans>	 ok, the opposite approach could be to have a script that queries openstack APIs and generates the ssh config bits for those hosts with explicit hostnames, it could run every hour or so
[14:43:18] <andrewbogott>	 ok yeah, these worker nodes use 'debian@'
[14:44:20] <andrewbogott>	 a simpler path forward is to just fix this for /our/ magnum clusters: paws and quarry. Those will have predictable hostnames, and we can just treat third-party clusters as cuminless black boxes.
[14:44:31] <andrewbogott>	 Not ideal but still a big improvement on what we have today.
[14:50:40] <dhinus>	 can I get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1297114?
[14:51:16] <dhinus>	 I'm still trying to lower the binlog write volume (see task), but we need to recoup some space and this seems an easy way
[14:52:02] <andrewbogott>	 done
[14:52:07] <dhinus>	 thanks!
[19:20:28] <andrewbogott>	 Amir1: should we make you a vps project to run your tiff compressor?  If this is going to be a forever thing then it should probably live in prod k8s someplace but if you're also blocking the upload of uncompressed tiffs then we can run it in cloud-vps for as long as it takes and then shut down.
[19:20:56] <andrewbogott>	 It's also possible to grant extra-big ram quotas to toolforge tools, within reason.
[19:42:40] <Amir1>	 andrewbogott: I was actually about to create a ticket for requesting a cloud vps project for it but *gestures at the state of the world and wikimedia*
[19:43:00] <Amir1>	 one of my top priority things to do today
[19:43:28] <andrewbogott>	 there is a lot happening :)  If you open a ticket and then bug me again in the morning we should get you set up (might not have a second +1 until then)
[19:44:03] <Amir1>	 Sure. Thanks
[21:31:39] <andrewbogott>	 taavi, a question for you to answer whenever: I need to add a security group rule to every Trove VM for cumin access. I can either do that by making a little timer that just polls and adds them, /or/ I can add a second management network to the Trove project, attach that and a new IP to each trove VM and add the security group to that.  Trove does not implement automatic security groups for the primary interface, only for 
[21:31:40] <andrewbogott>	 management.
[21:32:00] <bliviero>	 andrewbogott: for magnum clusters, isn't it possible that releng CI might move back and use internal clusters for their needs?   and in that case, what would we do w/ cumin?
[21:32:05] <andrewbogott>	 I'm leaning poll-and-add because that's something I can whip up in no time and it applies retroactively. But here's your chance to say you hate that and will help with the latter instead :)
[21:32:48] <andrewbogott>	 bliviero: When you say 'internal clusters' you're talking about them moving /to/ cloud-vps, or away from cloud-vps?
[21:35:44] <andrewbogott>	 If you're asking about how my paws-specific cumin solution would help with CI clusters... the answer is that as long as we're talking about a handful of long-lived use cases in known projects, we can probably special-case each of them. It's inelegant, but I don't think there's any elegant solution for cumin to know what kind of VM a given VM is beforehand.
[21:36:02] <andrewbogott>	 But yeah, if we special-case known uses then that will always leave some random magnum workers unreachable.
[21:37:58] <bliviero>	 andrewbogott: moving /to/
[21:38:13] <andrewbogott>	 ok, then everything else I said applies :)
[21:38:46] <bliviero>	 andrewbogott: is there a way to interrogate some systems that would divulge the names of the active VMs in that "pool"?   is that technically what you are doing?
[21:39:55] <andrewbogott>	 kind of. Cumin already has an openstack backend that aggregates things based on openstack tenant. So that could be expanded to know other things about kinds of VMs.
[21:40:35] <andrewbogott>	 But historically Riccardo has wanted cumin to remain a universal tool without a lot of knowledge about our particular openstack implementation. I don't think we can really keep with that and also add a bunch of special querying.
[21:41:06] <andrewbogott>	 But we might be able to do something generic-ish using VM tags outside of the VM name.
[21:41:34] <andrewbogott>	 I'm trying to avoid that (and adding distinctive naming strings) because the engine that actually creates the VMs is a bit of a black box and I'm happy with it staying that way :)
[21:43:31] <andrewbogott>	 Basically: Everything is possible if you're willing to hook into third-party code but then we pay the maintenance price for that for ages. I'm still hoping for a solution that uses existing configs.
[22:25:15] <Amir1>	 andrewbogott: Sorry it took long but it's here T428102
[22:25:15] <stashbot>	 T428102: Request creation of tiff compression VPS project - https://phabricator.wikimedia.org/T428102