[08:43:43] morning. toolsdb replication is lagging, I'm on it [08:50:59] T398170 [08:51:03] T398170: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170 [09:08:47] thanks! [09:11:39] dhinus: iirc you were interested in reviewing patches to the cloudvps tofu provider? [09:11:54] taavi: sure, send them my way :) [09:29:20] This weekend tools-static got stuck again, and I left a silly script running to restart it wheneven it found it in D state, and it restarted it ~12 times (it did not check if it had been in that state for long though) [09:37:10] dhinus: got a warning that toolsdb replication is broken, are you looking into it? [09:53:45] dcaro: looking [09:54:07] I thought I fixed it and took a break, but apparently something is still wrong [09:56:44] ack, let me know if I can help [09:57:00] I'm tracking it here T398170 [09:57:01] T398170: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170 [09:57:47] hmm... wouldn't it be cool a tool that suggests what indices to add to toolsdb dbs? xd [10:04:06] yep, though I'm sure people could find other ways to break things :D [10:05:27] hahaha, yep [10:07:51] ha I think I found what I did wrong, I'll document it on the task [10:08:10] then I'll have to recreate the replica from scratch to make sure it's in sync [10:08:28] 👍 [10:28:17] Hmm... should I add paws to the list of 'team=wmcs' flagged projects? Yes right? [10:28:27] wait, it's there already xd [10:36:30] * dcaro lunch [11:09:28] review needed: https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/57 [11:09:49] * dhinus lunch [11:16:28] *stamp* [11:28:45] ohhh, this is cool [11:28:48] https://usercontent.irccloud-cdn.com/file/LNwvMIFL/image.png [11:29:00] gitlab shows uncovered chunks of code [11:29:38] yes, I spent all my effort setting that up to track coverage and then never actually wrote proper tests [11:44:19] toolsbeta probe alerts are me, I missed that it has hiera overriding the default nameservers [11:45:28] ack [12:58:52] fyi. for tools-satic, nginx restart seems to be enough :) [12:59:03] (when it gets stuck in D state) [13:13:54] dhinus: you that have macos, the lima-kilo vm is running as arm64 arch? (or emulated amd64?) [13:14:21] the OS is using arm64, but then inside it you can run both arm and amd binaries [13:14:41] so the emulation is used only when running specific amd binaries (like the ones that we install) [13:14:46] ahhh, there's an issue that the misctools package is amd64 arch only (the rest are any arch) [13:15:09] so when we apt install, it fails to pull it [13:15:26] right makes sense, if you find a binary instead of an apt package, it should work [13:15:53] I thought debian's support for arm was quite thourough, I'm surprised they don't package that one. or is it a package we build in wmf? [13:15:59] interesting, can you force to install even if the arch is not the same? [13:16:05] it's our package xd [13:16:16] https://gerrit.wikimedia.org/r/plugins/gitiles/labs/toollabs/+/refs/heads/master/misctools/ [13:16:19] then it makes sense :P yes I think you can try force-installing it [13:16:40] the binary itself should work, it's just apt complaining (I think) [13:28:12] hm. in theory that specific package could be cross compiled with relatively little effort, but that's opening a rather massive rabbit hole :P [13:30:03] yep, we don't need arm64 packages in prod, probably just tweaking the lima-kilo for mac-os to install the amd leaves the 'special case' in the dev env [13:31:17] we don't need those at the moment... [13:31:52] are we still planning to migrate the toolforge cli to golang at some point? and have that installable by our users? [13:38:08] I added a couple of technical questions, but does anyone have objections to this project request? https://phabricator.wikimedia.org/T397861 <- serving dumps via bittorrent [13:41:06] also: shall I reboot those two stuck tools nfs workers? [13:43:43] moving the misctools to gitlab with the rest [13:43:44] https://gerrit.wikimedia.org/r/c/labs/toollabs/+/1165027 [13:47:00] dcaro: heh, I just created T398202 [13:47:05] T398202: Migrate misctools package to GitLab - https://phabricator.wikimedia.org/T398202 [13:47:15] please use the opportunity to rename the repo (and the source package) from `toollabs` to `misctools` [13:47:41] I called it `toolforge-misctools` kind like the rest of packages [13:48:26] that works too [13:48:40] are you renaming the binary package as well? [13:58:01] not yet no, I'd start just moving the repo for now [15:29:05] So if nobody has any more input, I'll send the beta email and add the link in the toolforge help page ... in 30min xd [15:53:28] taavi: I think I need the keyholder password for enc-1.cloudinfra-codfw1dev, can you document its location on https://wikitech.wikimedia.org/wiki/Keyholder#WMCS_projects_passphrases? And I guess the equivalent for eqiad1. [15:54:40] andrewbogott: the eqiad1 equivalent is listed there as wmcs_openstack_instance_puppet_user, I would guess the codfw1dev one is in a similar location [15:55:15] ah, I see, was using the wrong search string. I'll document if the codfw1dev isn't obvious [15:55:16] ty [15:55:53] although it looks to have a pre-puppet7 path, so you want to adjust that [15:57:48] /srv/git/labs/private/passwords.txt is the puppet7 path I think [15:58:41] that sounds right [16:09:08] Beta announced \o/ [16:12:13] \o/ [16:12:20] 🥳 [16:13:01] taavi: one hangup with my designate refactor is that the service user doesn't have rights to edit puppet config, it's rejected by the enc policy check. [16:13:23] Maybe that's an argument for a new role, or for letting 'designateadmin' change puppet settings, or... to just not do this? [16:13:44] I could also give my service role member rights in every project, which I guess would still be better than using novaadmin but only barely [16:14:20] hmm. on its own, out of those I would prefer a new role [16:14:31] but I guess designate also needs to speak to the proxy api? [16:14:37] so that'd be one more role as well, if we go that road [16:14:59] hm, yeah [16:15:01] crap [16:15:02] * dcaro back in a bit [16:15:31] I can also just give the designate service user the same rights as novaadmin, and it's still a little better than using the shared account... [16:15:56] I guess member-of-every-project would solve both the puppet and the proxy issue [16:16:18] so... two new roles? Or inherited membership? [16:19:42] hmmmm [16:20:12] * andrewbogott trying inherited roles now just to unbreak things [16:20:19] how does the account talk to other openstack services? or does it just not talk to anything else? [16:21:22] It uses the 'service' role. [16:21:45] Well, actually, the docs say to give it 'admin' so I did, but I need to double check that it can't do all those things with service instead. [16:21:53] that is, service and admin on the service project [16:22:09] what about adding the service role to proxy/enc policies? [16:23:54] So, like, having service role on any project allows manipulation of puppet and proxies on any other project? [16:24:15] I think in that case I'd rather it be admin, and keep 'service' for mostly just validating keys. [16:24:38] wait are those roles in a different project than what the user is acting on? [16:27:00] ... [16:27:17] So maybe I'm lumping two things together at once. [16:27:51] mostly I'm confused why this seemingly isn't a problem for any other openstack service->service interactions [16:29:00] ok, so... [16:29:40] For the most part, services have service users for limited purposes: checking the validity of a user token, and (sometimes) launching long-running actions that might take longer than the user token's lifespan. [16:29:48] But most activities are actually done /as/ the user. [16:30:08] Because designate-sink runs out of band, it's a different problem. It does things itself, rather than as the user (with current design) [16:30:40] So in a perfect world, designate would have /two/ service accounts: one to do normal stuff (like check user tokens for synchronous operations) [16:31:00] and one to do actual actions in a project, like, deleting a proxy for a deleted instance. [16:31:10] So -- that's why it's different. [16:31:32] As for the solution: we can imitate those two service accounts by using multiple roles, or we can actually have two accounts. [16:32:14] (OR we could stop using designate-sink and refactor everything into synchronous calls but that will involve hacking designate code quite a bit and generally be complicated) [16:33:16] ok, that clears up a lot of things [16:34:33] I think out of the options, a separate account + new roles is my favourite solution [16:35:08] ok. I think that's the most correct as well, although not the easiest. [16:35:42] So after lunch I'll setup a 'designatesink' account and, I guess, proxyadmin and encadmin roles? <- I welcome naming suggestions [16:35:52] We don't currently manage role or role assignment via tofu do we? [16:37:08] no, but I also don't think it'd be too difficult to at least manage the roles there [16:37:45] ok. Would be nice for documentation purposes [21:25:11] Look what I've found, I'll check tomorrow https://registry.terraform.io/providers/grafana/grafana/latest/docs [21:51:15] d.caro: neat! Hopefully you don't end up blocked on T377372 [21:51:15] T377372: Document how to authenticate a bot account through CAS-SSO - https://phabricator.wikimedia.org/T377372