[07:48:25] morning! [08:10:49] morning :) [08:15:35] I'm going to start messing around with toolsbeta, to try to figure out what's what [08:16:03] ok! do you need me to do something? [08:18:44] o/ [08:20:31] not really, I'm reading what Raymond_Ndibe left from friday, will probably try to take the 1.27 control node out of the pool, see if that help, then take another one out of the pool (ex. test 2x1.26 controls, 1x1.27+1x1.26 controls) to see if it's the version that seems to have issues with [08:20:56] if you want I'm happy to collab on it [08:25:21] dcaro: I want to do a couple of other things first, will ping you [08:30:46] ack, let me know if you need running tests on toolsbeta to avoid me getting confused by the unexpected noise [08:31:30] I will not be doing anything on toolsbeta [08:32:14] ack [08:32:16] thanks [08:33:32] hmm, I'm seeing tests being run right now :/ [08:35:21] oh, maybe it's a leftover that just retriggers [08:36:08] I think raymond was seeing something similar, thought that I was running tests (but I wasn't) [08:36:12] I'm also available if you need any help with toolsbeta [08:48:08] I think that the issue we are seeing is because we have a mix of versions in the controllers, and the new ones add that `controller-uid` label that 1.26 does not know and fails to cleanup [08:48:09] https://github.com/kubernetes/kubernetes/pull/114930 [08:48:40] from the release notes of 1.27 [08:48:47] `Pods owned by a Job now uses the labels batch.kubernetes.io/job-name and batch.kubernetes.io/controller-uid. The legacy labels job-name and controller-uid are still added for compatibility. (#114930, @kannon92)` [09:00:06] I'm going to try and leave the 1.27 controller by itself, see if it can cleanup those jobs [09:02:18] yep, that cleaned up all the jobs [09:02:24] https://www.irccloud.com/pastebin/LGIzD5Wf/%20 [09:02:51] it might be a transient bug that happens only while upgrading controllers, that nobody noticed [09:04:19] (so if you upgrade your control node fast enough, nobody notices) [09:16:29] good catch 👍 [09:16:33] okok, I'm restoring the 'unstable' state of the cluster, added a note in the etherpad for raymond to pick up when he starts the day [09:17:05] * dcaro back to hand-holding ceph [09:17:20] thanks :) [09:50:17] dhinus: I'm working on this https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/38 [09:51:49] arturo: looking [09:54:17] is the plan failing? [09:54:29] yes [09:54:36] there is some weirdness with the imports [09:54:46] I guess because auth and how special designate is [09:55:06] so, zones are attached to certain projects [09:55:43] but I guess tofu runs as the admin project, so when importing, it tries to import into the wrong project [09:56:03] I think the provider somewhat knows this, because there is a note here [09:56:03] https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/dns_zone_v2#import [09:56:15] in which they import using `zone_id/project_id` [09:56:45] but I'm not sure if I'm representing that well in the tofu code [09:57:29] error being [09:57:31] https://www.irccloud.com/pastebin/RRSduN0M/ [09:58:37] oh, if you are running plans and such, I'd appreciate testing https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1070020 [09:58:52] ok [09:59:52] the import id is being rendered correctly, apparently, like this [09:59:53] module.zones.openstack_dns_zone_v2.zone["0-29.57.15.185.in-addr.arpa."]: Refreshing state... [id=7796b60e-7bc6-4cb4-9e4d-dae039d3f912/cloudinfra-codfw1dev] [10:06:52] maybe I need to grant tofu additional roles in the projects [10:11:20] arturo: maybe we can try using TF_LOG=1 to see the exact API call that is failing [10:13:09] oh now they have levels, so it should be TF_LOG=DEBUG [10:17:43] dhinus: something like this [10:17:46] https://www.irccloud.com/pastebin/4pRegyQQ/ [10:19:31] I think this is related to how designate works with auth [10:19:33] see this [10:19:34] /v2/zones/66b8ae44-8ddc-4921-95fe-1708635a8646 returns 404, it could be as you said because the token is for the wrong project, but I'm not sure [10:19:36] https://www.irccloud.com/pastebin/ZKG8GPCH/ [10:20:08] right yeah I remember I had that problem when using the CLI [10:37:16] the provider docs mention "(requires an assigned user role in target project)", but it seems odd that you need to assign it for each project [10:39:56] related: https://github.com/terraform-provider-openstack/terraform-provider-openstack/pull/1254 [10:41:18] seems to confirm that "you should also add a role association to that project" [10:43:25] I have added that already [10:43:27] https://www.irccloud.com/pastebin/Prl7cr5b/ [10:43:49] the second entry is the main role assignment in the 'default' domain [10:43:59] which should give RW access on all openstack [10:44:17] the first entry is a dedicated role assignment I created in the cloudinfra-codfw1dev project [10:44:36] (as admin) [10:52:02] hmm looking at the source code it seems like the separator is : and not / [10:52:07] unless I'm missing something [10:52:10] https://github.com/terraform-provider-openstack/terraform-provider-openstack/pull/1087/files#diff-8bc2016912b20c219ba8af0ad68dce18332d54d69ae4a9e5e57047d9b4b79155R23 [10:52:22] "Allow import from different project with id:project_id" [10:52:38] ok, let me try that [10:52:40] good catch [10:52:57] maybe they changed it later [10:52:57] but also, that seems to be an error condition. Why I didn't get that? [10:53:04] but we don't have the latest version of the provider [10:53:16] * dcaro lunch [10:53:24] this patch is from 2020 [10:53:43] yes this patch uses : [10:53:47] but the latest version uses / [10:54:04] ok [10:54:51] it works! [10:55:23] 🎉 dhinus [10:56:13] yeayyy [10:56:28] I think it's actually an error in the docs [10:56:43] the code still uses : in the latest versions, but the docs were updated here by mistake https://github.com/terraform-provider-openstack/terraform-provider-openstack/commit/e1a17df3c9eccc8bde0cefc378f5d9de72cf61df [10:57:20] oh I see [10:57:24] this should be reported [10:57:29] I'll send a PR yes [10:57:33] would you like to report / send a patch? [10:57:35] great! [10:58:21] register it here https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Opensource_Contributions [10:58:26] :-P [10:58:33] ok :) [10:58:53] * dhinus lunch [11:35:42] I have created an object storage container in testlabs. what would be the easiest way to get S3-style credentials for it? alternatively, could I just set it to public? I'm not going to store anything sensitive in there [12:17:40] blancadesal: I guess it can be public, no problem [12:29:09] hmm, I think public refers to just read access though so that's no good [12:29:42] oh I see [13:59:44] there seems to have been a blip on the openstack exporter stats, as in they are missing for a bit of time [13:59:48] enough to trigger an alert [13:59:52] https://www.irccloud.com/pastebin/q4L6MQS0/ [14:00:14] could the nat changes do that? [14:00:21] or anyone rebooted the exporter or similar? [14:03:08] not me [14:04:09] somebody logged a "connection loss" in #-cloud so maybe there was a larger issue? [14:05:12] maybe, that would discard the exporter thingie [14:06:37] mmm [14:07:12] I don't see how the change I merged could do that [14:07:25] but it is definitely a network change [14:07:51] so I would stay suspicious to any disruptions [17:13:35] * dcaro off