[06:56:15] neat [07:01:15] T374002 [07:01:15] T374002: codfw1dev: rabbitmq is not working because some auth failures - https://phabricator.wikimedia.org/T374002 [08:39:21] terminator layouts are a blessing [08:39:24] https://usercontent.irccloud-cdn.com/file/GcTPJPmo/image.png [09:02:16] dhinus: are you around? [10:11:17] arturo: I am now :) [10:15:16] I'm having a fun time with tofu. I renamed a resource which was being imported, and caused a cascade of disconnections between the state and the openstack db [10:16:38] is that on codfw or eqiad? [10:19:26] codfw1dev [10:19:43] the state discrepancy is now solved [10:20:13] now I'm trying for openstack to actually bound the router interface to the router with a hard reset via this https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/35 [10:20:39] i.e, I will destroy the interface and then revert the patch to re-create it again [10:21:15] have you tried destroying using the CLI, and recreating with "tofu apply"? [10:21:28] good idea, let me try that [10:24:03] mmm [10:24:14] Can we delete a single resource via the CLI? [10:24:30] I would expect so [10:24:37] but it might depend on the resource [10:26:46] tofu plan -destroy --target 'module.router_interfaces.openstack_networking_router_interface_v2.router_interface["cloudinstances2b-gw-flat"]' [10:26:53] that also works [10:27:03] yeah, it works [10:27:10] I'll replace `plan` with `destroy` [10:27:20] with "CLI" I was actually referring to the openstack CLI [10:27:26] ah! [10:27:30] but tofu CLI also works! [10:27:36] might even be easier [10:28:01] if I delete from the openstack DB, I guess tofu will detect it is missing, right? that's the main point [10:29:14] yes, that's what I expect [10:30:44] sudo wmcs-openstack router remove port cloudinstances2b-gw 8ac30351-a7f6-4f50-ac66-d991db17369c [10:30:51] looks good [10:30:55] ok, run that [10:31:02] now let's do tofu apply [10:31:26] ok, it detects the change [10:31:29] https://www.irccloud.com/pastebin/2mM5isvO/ [10:32:01] :-( [10:32:15] error https://www.irccloud.com/pastebin/7jAChRjW/ [10:32:41] hmm why 409 conflict? [10:32:51] Port already has an attached device [10:32:52] I'll remove both the port attachment and the port itself from openstack [10:33:24] probably that's the issue [10:35:03] mmmm [10:36:25] using the openstack CLI, removing the port attachment from the router also removes the port itself [10:37:18] ok, new theory [10:37:32] the tofu error is because when the port is created, it contains this [10:37:36] # device_id_router_name: cloudinstances2b-gw [10:37:36] # device_owner: network:ha_router_replicated_interface [10:38:03] which conflicts with the actions being taken via the router_interfaces resource, which also establishes ownership [10:38:25] so I think we need to: [10:38:51] what is a "router_interface"? [10:38:53] 1) cleanup the extra ownership data from tofu data [10:38:54] 2) remove the ports manually from openstack [10:38:54] 3) re-run tofu apply [10:39:14] router_interface is the resource they have in tofu to manage port attachment to routers [10:39:56] ok, so if you remove the port attachment AND the port itself from openstack CLI, that should clear everything? [10:41:44] the last tofu apply failed because it successfully created the port, but then failed to create the "interface" becaues the attachment was still present in the openstack db [10:42:02] (or at least that's my understanding) [10:43:04] yeah, when the port is created by tofu, it contains ownership information (device attachment info), so when tofu tries next to attach it to the router, it fails because the info is already present [10:43:19] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/36 [10:45:04] I'm still confused, isn't the result of this MR equivalent to running "openstack remove port-attachment"? [10:46:29] not really, because tofu also attaches the port via the router_interfaces resource, which is untouched by the MR [10:46:51] let's check the plan for this MR before applying [10:47:28] the plan looks correct, because I have deleted the port and the port attachment from the openstack DB [10:47:34] so tofu will re-create everything [10:48:03] with `device_owner = (known after apply)` instead of it being set explicitly (wrong) by us [10:48:18] ok makes sense! [10:48:28] approved [10:48:36] merging and applying [10:48:41] this is probably because when importing resources it hardcodes more data than it actually needs [10:48:52] yes [10:49:10] 100% my fault. I had this doubt several times: shall I import this data field, or let openstack/tofu figure it out [10:49:30] it can be confusing yes [10:50:10] it worked this time! 🎉 [10:50:14] nice one! [10:50:22] qr-db4d1c30-20 UNKNOWN 172.16.129.1/24 [10:50:30] and the interface was created in the neutron router [10:50:52] what is UNKNOWN? [10:51:09] https://i.gifer.com/2Gb.gif [10:51:36] that output is from `ip -br a+ [10:51:36] :D [10:51:42] `ip -br a` [10:52:04] because these are virtual interfaces, I think the linux kernel doesn't know if they are UP or DOWN, so therefore UNKNOWN [10:52:06] ah ok [10:52:26] full output here: [10:52:29] https://www.irccloud.com/pastebin/aeignNkX/ [10:52:49] in neutron, a virtual router is instrumented as a linux netns [10:53:09] inside the netns you can see all the interfaces the virtual router has [10:53:20] with the addresses [10:53:31] you can see there the gateway addresses, and the floating IP addresses [10:54:01] nice [12:35:48] topranks: finally got a VM running on the vxlan virtual subnet. It is not reachable via ssh because whatever firewall somewhere, but if you are interested, you can play with it via console [12:36:57] arturo: nice! [12:37:15] I'm fairly busy today but send me the details when you can and I'll have a look when I've a minute [12:37:32] sure, thanks [12:37:34] if we could get two of them to test traffic between them (as well as just to the outside) that would be even better :) [12:37:44] ok [14:17:07] dhinus: I'm about to merge this https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/37 we will see in a bit again how tofu behaves for renaming openstack resources [14:18:16] ok! I expect it to delete and recreate, but we'll see [14:20:32] https://www.irccloud.com/pastebin/WJMBU8hZ/ [14:20:41] can I get a few folks to log out and in to horizon.wikimedia.org? [14:20:51] this one seems to be a race condition, creation of the new network started before the deletion of the old one [14:21:16] arturo: yes, that's probably something that they forgot to encode in the provider [14:21:39] it should work with a re-run [14:21:59] dhinus: I'll follow your earlier suggestion, delete from the openstack CLI, then tofu apply again [14:22:36] andrewbogott: clicked sign out, then a magic redirection loged me it again, I guess I have the IDP cookie [14:22:49] yeah, the logout behavior is a bit weird [14:22:55] if you have a live idp session [14:23:21] I guess you can't logout at all, and that's maybe the point [14:23:55] andrewbogott: worked for me, I was not logged in and I logged in successfully [14:23:57] through IDP [14:24:07] awesome [14:24:15] sign out is broken, similar to arturo [14:24:19] I guess those two days of it not working in labtest are paying off. [14:24:39] Apparently sign out is broken for all SSO services so I'm going to open a general task for the SSO people to work on, and consider my part in this finished :) [14:24:58] <3 [14:25:04] 🎉 [14:26:08] dhinus: I did not need to delete anything by hand, because nothing was present. A second tofu run created the resources just fine. [14:26:17] thank you for testing! I'm sure we'll find more weirdness with this but it's not 100% broken [14:26:54] arturo: good [14:30:00] working with IaC for openstack, feels like flying compared to using the CLI directly ... [14:30:57] <3 [14:45:37] I think the thing that is confusing about the signout on Horizon right now is that the logout action returns you to a URL that immediate redirects back into the SSO flow. It would be more typical/intuitive to land on a "you are logged out" page where you had to click to visit a URL that put you back into the SSO flow. [14:47:29] Basically POST /auth/logout/ should not 301 to /auth/login/ [14:48:59] Alternately GET /auth/login/ could require some interaction to turn into POST /auth/login/ where the SSO flow starts. [14:49:33] should logout from horizon logout also from the SSO ? [14:51:13] I'm told that this behavior is similar with other SSO services (which I haven't noticed because who logs out of web sites?) [14:51:32] Anyway I've created https://phabricator.wikimedia.org/T374123 and dropped it on the SSO team. If them implement a logout page it's very easy to point horizon at it. [14:52:26] I think it's low priority due to a strong believe that no one ever clicks on the 'log out' button in Horizon unless I ask them to :) [14:53:06] Woo Horizon login via SSO, thank you all so much, it works flawlessly for me (and so is much better than the old system). <3 [14:54:45] +1 [14:54:50] It's only better because 2fa is temporarily disabled, don't get used to it :) [14:54:57] arturo: I think that if logging out of an OIDC client triggered logging out at the OIDC provider as a side effect that would be even more confusing for folks. Imagine logging out of a random service that authed against okta triggering an okta logout [14:56:03] bd808: then maybe the sign out button in horizon should be disabled completely? The point being: shall one be allowed into horizon if not into the SSO as well? I guess the answer is no [14:56:05] andrewbogott: I don't think it will be much worse when idp.mw.o adds 2FA. That session should live for quite a while. [14:56:43] arturo: removing the sign out button would be possible [14:56:49] bd808: is it longer than 7 days? [14:57:11] Lasting for more than a session would be an improvement, IME. [14:57:21] andrewbogott: I don't know, but I would guess not. [14:57:44] Huh, in theory Horizon already supported 7-day sessions with 'remember me' checked. But maybe that only ever worked for me? [14:57:55] * andrewbogott will try not to be too curious about the past [14:58:06] * bd808 has had not problems with horizon session duration for years [14:58:16] *has not had [14:58:20] arturo: I'm going to stay attached to the dream that is T374123 and leave things as is for now unless you feel strongly [14:58:20] T374123: IDP/SSO logout behavior is weird - https://phabricator.wikimedia.org/T374123 [14:59:19] andrewbogott: The "remember me" box never worked for me, but maybe I was Doing It Wrong™ somehow. [14:59:26] andrewbogott: as I said before, the weirdness here is on your client side where it puts you immediately back into the SSO auth flow. It isn't a CAS/OIDC provider problem IMO [14:59:31] my point is that the sign out button in horizon doing nothing is perfectly fine, because it is no longer in control of the session [14:59:38] oh dang, James_F, sorry for the years of suffering. [15:00:00] I just kept a window open on Horizon for days at a time, which worked OK. [15:00:10] the logout is in charge of the local session, just not the remote session which again is 100% how SSO works [15:00:18] bd808: yeah, but I can give Horizon a 'go hear on logout' url, I just don't know what url to use currently. [15:00:34] *here [15:00:51] andrewbogott: any url that doesn't redirect into the SSO flow automatically [15:00:51] andrewbogott: idp.wikimedia.org/logout is to log out of SSO [15:01:12] yeah, I thought about just dumping people into the wikitech front page but that was very weird [15:01:20] RhinosF1: we don't want to log out, though, just provide the option of logging out [15:01:24] logging out of horizon should not log you out of idp [15:01:32] * andrewbogott -> meeting [15:01:55] andrewbogott: I guess you need an page in the middle of logout asking people if they want to also logout of SSO then [15:02:05] arturo: meeting time :) [15:02:09] omw [15:02:11] RhinosF1: yes, that's what that task is for [16:03:12] * arturo offline [16:06:42] Raymond_Ndibe: you around to sync on the k8s upgrade? [16:09:06] I'm around. Though I missed most of the initial toolsbeta update while on that meeting with Deja [16:10:30] I'm in the upgrade sync meeting. can you send me the link if you are another? [16:13:21] dcaro: want me to drain cloudvirts for https://phabricator.wikimedia.org/T374043 or is it too soon? [16:19:28] andrewbogott: feel free yes, I'm doing ceph right now [16:19:44] 'k [16:28:44] arturo: you still around? [16:29:54] dcaro: we are here: meet.google.com/eki-iaia-ahz [17:20:17] * dhinus offline [17:53:52] so current status of toolsbeta, 3 control nodes, all with 8G of ram, one on 1.27, tests passing, no errors on kyverno or controller-manager :) [17:54:13] with that stable I'm clocking out [17:54:18] cya on monday! [17:54:20] * dcaro off