[00:15:26] andrewbogott: (when you have time to think about this) I think I'd like to make the roles for T399488 tomorrow so I can keep moving on figuring out how to setup this k8s cluster. I know how to do it, but I'd like your approval before I do. [00:15:26] T399488: Add k8s_admin, k8s_developer, and k8s_viewer roles expected by default Magnum config for Kubernetes auth using Keystone auth - https://phabricator.wikimedia.org/T399488 [10:03:15] I restarted cloudvirt1073 to test T399212 [10:03:16] T399212: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212 [10:03:28] the host is in the "maintenance" aggregate, but when it came back it fired an alert [10:03:30] Neutron neutron-openvswitch-agent on cloudvirt1073 is down [10:05:27] I think the alerts don't cross-check [10:05:39] (might be doable with a promql I think) [10:06:28] what do you mean cross-check? anyway the metric is back to normal, the alert will resolve in 20m [10:10:18] the "cloudvirt.safe_reboot" cookbook sets a silence, but then it's removed too soon... maybe the min_over_time[20m] is not really necessary in the alert [10:11:30] it was introduced in https://gerrit.wikimedia.org/r/c/operations/alerts/+/805783 [10:14:57] I do indeed see a lot of those alerts on 2025-06-25 where there was a mass-reboot of cloudvirts [10:21:00] I created T399705 [10:21:01] T399705: [wmcs-cookbooks] cloudvirt.safe_reboot triggers NeutronAgentDown alert - https://phabricator.wikimedia.org/T399705 [10:48:32] I meant that the alert checking the openvswitch agent being up or down, does not cross check the aggregate the host is on [12:42:21] hmpf... the foxtrot-ldap image uses buster and the repos are not available anymore, I'm trying to use a newer image instead but the company that generated the previous does not do so anymore [12:42:31] so using the bitnami one, the configuration setup is different [12:42:38] I got some stuff working, but [12:42:49] I'm having issues trying to load the openssh-lpk schema [12:43:10] (the one than adds the ldapPublicKey field I think) [12:43:26] anyone has experience messing with that? [13:03:05] I think I advanced some... we don't use nis.ldif default schema, but rfc2306bis or something [13:18:39] got a bit further away.... had to change sudo.schema to sudo.ldif too [13:28:08] dealing now with the security rules... [13:28:29] getting [13:28:33] https://www.irccloud.com/pastebin/VLe5cwRg/ [13:29:12] tried already several incantations of olcDatabase, with `{N}`, `hdb`, no luck [13:42:53] huh, simply removing that header seems to work [13:43:31] dcaro: I never played with LDAP so I have no idea about this :/ [13:43:47] are you testing in lima-kilo? [13:43:54] yep 🤞 [13:44:11] I was able to make the setup pass without errors, seeing now if it works in lima-kilo [13:46:08] okok, scripts to add users are failing to connect xd, looking [13:54:39] this works \ol [13:54:42] `I have no name!@foxtrot-ldap-7f67f5f57d-x5msh:/$ /opt/bitnami/openldap/bin/ldapwhoami -D "cn=admin,dc=wikimedia,dc=org" -H ldap://127.0.0.1:1389 -w admin` [13:54:45] so something is something xd [13:54:57] I think the port it was using before was different [13:59:02] almost there! I was able to add a user, looking now to run the tests and such [13:59:12] no idea if it's "proper" though [13:59:36] bd808: I created the roles for T399488; I'm going to be out for some of the morning but please lmk if there are other immediate blockers. [13:59:37] T399488: Add k8s_admin, k8s_developer, and k8s_viewer roles expected by default Magnum config for Kubernetes auth using Keystone auth - https://phabricator.wikimedia.org/T399488 [14:08:33] got this working [14:08:39] https://www.irccloud.com/pastebin/7HedTwBD/ [14:08:45] ansible still gets stuck though [14:30:22] I think I got it working... [14:37:52] thank you andrewbogott [15:50:36] bd808: did it work? [16:27:33] andrewbogott: I got this far: T399731 [16:27:33] T399731: Cloud VPS project member (admin role) unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731 [16:27:54] I have the role on a couple of accounts, but haven't had time to try to use it yet. [16:29:20] I started reading openstack config and think I see that `openstack role add` not working is expected [16:34:43] if you see some noise in #-cloud-feed, I resolved a few tasks that were due to alert firing during the incident last week [16:35:37] we need to file a dcops task for the failed hard drive on cloudcephosd1013, I'll do that tomorrow [16:36:35] * dhinus offline [17:36:33] dhinus: 1013 is scheduled for decom so we shouldn't spend much time worrying about the drive [17:46:17] bd808: at the moment does anyone have access to the zuuldevopsbot account besides you? [17:47:59] Also, I'm confused by the bug title 'project member (admin role)' -- zuuldevopsbot doesn't have an admin role, right, just 'member'? [17:49:29] * dcaro off [17:49:31] cya tomorrow! [18:20:56] andrewbogott: mostly me + gitlab ci. the "(admin role)" because most of us don't remember that "member" is the powerful role name. [18:21:29] ok. I'm going to grant that user the real 'admin' role but scoped to that particular project. Then I want you to see if you can grant roles within the project but NOT in other projects :D [18:21:58] (scoping is a new feature that appeared in keystone after we designed our model so I haven't used it much) [18:23:09] bd808: try now? [18:23:26] Also, I guess I should have asked: is this something you need a process for because it's going to happen a lot, or just a one-off setup thing? [18:25:50] `openstack role add --user bd808 --project c26d9d326bdf464fa1025939ded7e5a2 k8s_developer` seemed to still fail silently. [18:26:08] hm [18:27:16] for the zuul project I think the answer to how often will this be needed is "not often". For other magnum clusters in the future, unclear but authn seems important. [18:28:09] There are other ways I could think about this like replacing the default role mapping with mappings based on the member and reader roles [18:28:49] or extending the horizon dashboard to allow assigning these roles [18:29:33] I don't think it is an urgent problem for sure. I can hack around things for the foreseeable future [18:41:46] I think what you are doing is an example of https://phabricator.wikimedia.org/T396016, I'm taking the chance to do a bit of research. [18:42:08] are you using clouds.yaml to set up the zuuldevopsbot user? [18:42:42] If yes, can you try changing your keystone endpoint port to 25357 and try again? [18:45:32] keystone has an is_admin check which is cryptic, it could mean the role, or the port, or both [19:54:18] I haven't been using clouds.yaml, but I can try that endpoint switch. [19:55:04] (envvar management is easier than file managment in the place I'm doing most of this work) [20:03:22] andrewbogott: with OS_DEBUG=1 and OS_AUTH_URL=https://openstack.eqiad1.wikimediacloud.org:25357/v3 I can see `{"error":{"code":403,"message":"You are not authorized to perform the requested action: identity:create_grant.","title":"Forbidden"}}` [20:19:24] andrewbogott: a private paste with lots of OS_DEBUG=1 details at https://phabricator.wikimedia.org/P79264. I can help you get a working test environment with this user if it would be helpful too. [21:03:11] * andrewbogott is surprised/wonders what that endpoint is for then [21:30:31] bd808: I'm reading https://docs.openstack.org/keystone/latest/admin/service-api-protection.html for the 30th time and don't see any acknowledgement about who can manage role assignment on a project. So now I don't even know why it works for novaadmin. Maybe your eyes will pick out something that mine can't find [21:33:32] "Users with admin on a project shouldn’t be able to manage things outside the project because it would violate the tenancy of their role assignment (this doesn’t apply consistently since services are addressing this individually at their own pace)." -- make some sense [21:33:58] it does, but you're trying to manage things /inside/ the project, specifically, role assignment on the project. [21:34:17] Also, I got my test here to work just now so now I just need to figure out why it doesn't work for you... [21:34:21] where are you running your commands? [21:34:25] how do we lookup the binding for identity:create_grant? [21:34:49] I am using a Docker container on my laptop as the runtime . [21:35:29] Specifically the container that `make shell` in https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning provides. [21:36:23] ok! So I think running on your laptop is the issue. [21:36:30] https://wikitech.wikimedia.org/wiki/Help:Using_OpenStack_APIs#Application_Credentials <- see first non-callout sentence [21:37:11] auth is via application credentials [21:38:11] restricted or unrestricted? [21:38:55] let me open up Horizon as the ZuulDevOpsBot user and see... [21:39:46] typically 'unrestricted' means 'you can wish for more wishes' [21:39:56] I don't know that that's sufficient but I'm pretty sure it's necessary [21:40:06] "Unrestricted: yes" [21:40:20] which I now remember is needed to make Magnum work. [21:41:07] oh! That credential only has the member and reader roles bound to it [21:41:53] I bet that means that a user is supposed to get a new set of creds anytime the are given new roles. [21:42:05] let me try making a new set of creds... [21:43:09] hmmm... in the application credential create dialog I only see the member and reader roles [21:43:24] so the new admin role is hidden from that UI I guess? [21:44:13] "You may select one or more roles for this application credential. If you do not select any, all of the roles you have assigned on the current project will be applied to the application credential." -- I guess I can try that [21:45:26] When I did it it proposed 'admin' as a role to add to the app creds. [21:45:31] Of course, they still don't work for role assignment... [21:46:58] admin did not magically show up in the bound roles list once the creds were created. :/ [21:48:39] andrewbogott: PEBCAK. I was in the zuul3 project rather than the zuul project when I made the new credentials. Trying again now that I can see all the roles. [21:49:01] (horizon UI could really use some help) [21:52:37] I can't make it work even when I have 'admin' and 'unrestricted' app creds. Either it's explicitly disallowed to do this with app creds, or I'm making some obvious mistake. I can do it on a cloudcontrol with a 'normal' user if i give that user admin on a project. [21:52:42] {"error":{"code":403,"message":"You are not authorized to perform the requested action: identity:create_grant.","title":"Forbidden"}} [21:53:32] the cloudcontrol way uses the user's password instead of app credentials? [21:54:43] yeah [21:55:04] I'm not getting a 403 though, I'm getting silent failure (and I see forbidden messages in the keystone logs but not on the cli) [21:55:33] `export OS_DEBUG=1` is how I see the 403 error message [21:56:16] I don't know if that passes through the wmcs-openstack wrapper and sudo or not [22:00:25] https://www.irccloud.com/pastebin/Pu2bXFom/ [22:00:31] silent fail still [22:00:50] oh wait, wrong project name! [22:01:15] ok, when I type the right things it works. [22:01:17] So this is possible! [22:01:25] with app creds [22:02:23] works with the role name too (my bad example above used the ID) [22:26:51] andrewbogott: it worked! I had yet another PEBKAC with switching app credentials. In the envvar way I'm using them right now I switched the envvar that tofu uses, but not the envvar that the openstack cli uses. [22:27:07] I owe you some boring toil :) [22:50:19] great!