[08:30:33] does anyone know how to reload the SAL web service? https://sal.toolforge.org/ is erroring out [08:30:39] I have restated stashbot, but that is not related :D [08:31:04] hashar: let me see [08:31:29] the irc bot seems to work at least :) [08:31:43] hashar: restarted it [08:32:39] arturo: may you add the commands you did on the SAL page at https://wikitech.wikimedia.org/wiki/Tool:SAL ? :) [08:32:44] that might help the future me [08:32:56] or paste them here and I will do it [08:33:37] sure [08:34:05] * arturo feels weird about editing wikitech from a new account, disconnected to the old one ... [08:40:08] hashar: https://wikitech.wikimedia.org/w/index.php?title=Tool%3ASAL&diff=2231898&oldid=1942572 [08:40:57] amazing thank you so much! [08:44:57] np [08:56:49] I merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/543, how do I deploy it? [08:57:06] the docs are a bit incomplete https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Quota_management [09:03:03] dhinus: you need to use the toolforge deploy cookbook for maintain-kubeusers [09:03:49] we usually use the deploy cookbook referencing the MR branch, before merging to main [09:04:06] but since the patch was already merged, you can run the cookbook for the main branch [09:04:57] something like this [09:05:04] aborrero@cloudcumin1001:~$ sudo cookbook wmcs.toolforge.component.deploy --cluster-name tools --component maintain-kubeusers --git-branch main [09:05:05] the diagram at https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/blob/main/README.md?ref_type=heads seems to imply you should merge first [09:05:57] in that diagram, we are in the bottom section, and the merge step is the last bit [09:06:05] ah rigth! [09:06:08] *right [09:06:27] sorry [09:07:06] no big deal, the only thing is that now the deploy repo doesn't reflect what's in the cluster, that's why the deploy cookbook needs to be run for the main branch [09:08:24] running the cookbook... [09:11:43] "Exception: No merge requests found for branch main for project Toolforge deploy" [09:11:55] I think it did the deploy though [09:18:41] I opened T376254 [09:18:41] T376254: component.deploy cookbook fails for branch "main" - https://phabricator.wikimedia.org/T376254 [10:05:42] Yep, the idea is to deploy and make sure it works before getting the changes merged in main [10:05:50] (so main has always whatever is deployed) [10:06:35] the error is because it tries to report on the MR, of which there's not for main, should be an easy fix 👍 [10:11:21] this should do it https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1077336 [10:11:23] * dcaro lunch [10:18:46] arturo: I'm trying to apply the default SG, but something is off, tofu plan shows SG id 7311cc68-8c19-4a26-8ed5-aefc01a43184, but from horizon I see f5aa41c6-48f7-443d-97c2-762e7e2d958a [10:19:06] if I modify the URL I can see both [10:19:17] https://horizon.wikimedia.org/project/security_groups/7311cc68-8c19-4a26-8ed5-aefc01a43184/ [10:19:20] https://horizon.wikimedia.org/project/security_groups/f5aa41c6-48f7-443d-97c2-762e7e2d958a/ [10:19:33] mmm [10:19:34] but here I only see one https://horizon.wikimedia.org/project/security_groups/ [10:20:25] let me see [10:20:51] this is literally the first time this is happening, so it is OK if it fails [10:21:22] for the catalyst-dev project I only see 1 sec group [10:21:24] https://usercontent.irccloud-cdn.com/file/VAN9pkRa/image.png [10:21:32] (with no rules) [10:21:34] yes I see the same [10:21:36] I deleted the rules [10:21:44] but when I run tofu apply it's trying to modify a different ID [10:21:59] let me try again the tofu apply cookbook [10:22:26] https://phabricator.wikimedia.org/P69444 [10:23:57] mmm [10:26:12] look this weird paste [10:26:15] https://www.irccloud.com/pastebin/GPWaSxHD/ [10:26:52] this URL of the paste looks better https://www.irccloud.com/pastebin/raw/GPWaSxHD [10:27:28] looks like there are 2 "default" SG in that project? that should not happen [10:27:46] oh project_id is different [10:27:57] there are two, but one doesn't even show up in the list of sg [10:28:14] is it something to do with our custom project-ids that we want to be a name and not a id? [10:28:34] one has project_d=catalyst-dev, the other has project_id=7209100e0e744a4fbdf447534d4eb825 [10:28:46] oh, that's a very good catch [10:28:59] I remember a.ndrew did some change to how we manage ids [10:29:05] but I don't remember the details [10:29:27] yeah, basically we should start to see projects where its id is no longer the name, but an uuid or similar [10:29:38] I thought I added support for this in the tofu code [10:29:59] I'm going to lunch, back in 1 hour or so :) [10:29:59] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/blob/main/modules/secgroups/main.tf?ref_type=heads#L83 [10:30:07] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/blob/main/modules/secgroups/main.tf?ref_type=heads#L59 <--- here is the bug [10:30:31] instead of each.key, it should be `var.projects[each.key].id` [10:30:36] to get the project id, not the project name [10:30:37] right! [10:31:47] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/78 [10:32:38] approved! [10:32:51] * dhinus lunch [10:33:21] thanks, will merge it and see what happens [10:38:40] worked as expected [11:34:40] dhinus: this reminded me that we should get https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075859 merged [12:16:16] arturo: I'm checking the SGs for catalyst-dev and they still both exist, is that expected because of how we manage project ids? [12:16:34] I guess we would need to manually cleanup the rogue one [12:16:54] can we though? because it's a "default" one. will the CLI let you delete it? [12:17:06] I can try I guess [12:17:36] I think so [12:17:54] I believe it is only created/enforced by neutron at project creation time [12:18:11] yep, it's gone [12:18:13] sudo wmcs-openstack security group delete 7311cc68-8c19-4a26-8ed5-aefc01a43184 [12:19:34] check again in a few minutes just in case :-P [12:19:39] true :D [12:20:00] maybe we should also try creating an instance and checking the right SG is linked [12:20:23] yeah [12:20:44] and even better, create another project and see what happens with the default sg [12:21:07] true [12:22:27] re: the patch to remove the keystone hook, did you understand why PCC shows no changes to the python files? [12:24:10] no. I asked in #-sre, but got no reply so far [12:25:16] ack [12:42:03] dhinus: I just got confirmation that the diff wont show up [12:48:11] arturo: ok, let's merge it! I +1d it [12:48:18] thanks! [12:51:05] merged, so that's another reason to exercise the project creation flow [12:53:27] I'm chasing a wikireplicas issue, if you want to test it yourself, or I can do it later [12:54:31] ok I'll do myself, thanks! [13:51:32] arturo: the files in /usr/lib/python3/dist-packages/wmfkeystonehooks are still there (as expected), shall we clean them manually? [13:51:57] I'm not sure if they are also linked from somewhere [13:52:20] wait, some files are probably unrelated to SGs [13:52:55] maybe I should understand how keystone hooks actually work :) [14:09:37] the py files should still be present. My patch just changed the logic in the py file [14:12:40] you might have to restart services (or even remove the python caches in the file system) [15:09:05] quick review? https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1076978 [15:09:49] +1'd [15:10:31] quick review here: https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1077372 [15:16:03] arturo: after adding my +1 I realized you could also use "tofu plan -detailed-exit-code" [15:16:22] not critical, I'm also fine with leaving your string check [15:17:49] ok [15:23:06] another quick review https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1076978 [15:28:35] dhinus: I think I'm ready now to create test projects: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/82 [15:31:09] oh, bad paste, the last review was for https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1076980 xd, sorry [15:46:13] There will be some slightly broken bits on new project creation until T376220 is handled. [15:46:14] T376220: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220 [16:42:22] oh, it slipped my mind, I'll do tomorrow if nobody gets to it earlier [16:48:56] * dcaro off [16:49:18] !remindme tomorrow at 10:00am to do T376220 [16:49:19] T376220: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220 [16:49:27] xd, just trying [19:07:23] there are CI builds failing cause they process can't resolve `gerrit.wikimedia.org` for some reason. I feel like there might be an issue with the WMCS recursor or the instances dns cache/resolver [19:07:46] https://phabricator.wikimedia.org/T374830 [19:51:55] hashar: with a.ndrewbogott on sabbatical and and R.ook on family leave I'm not sure who to poke to look at the resolvers. I'm not seeing a persistent systemic failure at the moment. `dig +noall +answer gerrit.wikimedia.org` is working for me from Toolforge instances as an example. [20:26:41] What's the easiest way to map a wikitech user account to any toolforge/similar accounts the user may have had? [20:27:57] And relatedly... can we do wildcard lookups on email in ldap? [20:31:52] Reedy: I would guess ldap lookup on `cn` for the first question. For the second you can use globs in an LDAP query like `ldap 'mail=bdavis@*'`. The `ldap` I'm using there is the alias from https://wikitech.wikimedia.org/wiki/User:BryanDavis/LDAP#Easy_CLI_queries [20:34:09] https://contact.toolforge.org/ is pretty good at finding accounts too if you have a username. [20:43:50] if they're "no groups found", that presuambly suggests they've not logged into any sort of shell account, right? [20:44:21] that seems like a reasonable theory, yeah. [20:50:18] that ldap query worked a treat too, cheers :)