[07:29:31] greetings [07:38:42] checking the alerts, I see OpenstackAPIResponse still firing though that's a 12h average, latency has recovered from the dashboards [07:40:06] not sure what happened, though for sure following the rabbitmq cluster rebuild a.ndrewbogott did things improved [07:42:45] then opentofu-infra-diff.service also failed at that time to talk to openstack apis, I'll kick it off again [07:52:36] filed T418444 for investigation [07:52:37] T418444: Increased openstack latency and rabbitmq cluster rebuild - https://phabricator.wikimedia.org/T418444 [09:04:37] morning [09:05:12] morning [09:06:23] rabbit has been flaky many times :/, it has never been clear what is the issue (or if it's one single issue) [09:07:59] heh, still looking into it though in this case it seems automated restarts due to cert renewal [09:08:45] https://phabricator.wikimedia.org/P89033 unclear yet to me though why openstack (and rabbitmq?) freaked out [09:12:29] nova-api logs are kinda noisy [09:17:24] It's not good that the three servers went down though, I think that it does not manage well when they go out bit by bit [09:18:06] (it seems to start trying to clean up on the leftover nodes) [09:20:00] hah, it == nova or rabbit ? [09:20:10] nova-api [09:20:33] ack, thank you for taking a look [09:21:04] it seems also that it was not able to connect to rabbitmq02 until 21:02, 30 min after (looking in cloudcontrol1007, nov-api.service) [09:21:29] but there's so much noise on connection errors :/ [09:22:38] rabbitmq01 also seems to have been disconnected from 20:34->21:04 [09:23:08] that would mean it was only connected to rabbitmq03, if any other service was only on the others it might have been a split brain [09:30:48] interesting [09:31:18] I'll be looking into it deeper later today FWIW, though by any means that shouldn't stop anyone from looking! [09:31:57] I'll just have a quick look, will leave anything I find in the task [09:32:04] (*interesting) [09:32:05] ack thank you [11:33:58] * dcaro lunch [16:52:10] I think I've found what's hogging the objects on harbor side [16:52:16] https://www.irccloud.com/pastebin/jqQGB1Gj/ [16:52:30] now I have to find out why those are not cleaned up xd [16:52:56] lol [16:53:00] self DoS [16:53:47] it's not us xd [16:53:52] but kinda yep [16:55:03] wait, it might be us, looking [16:55:35] (there was another tool used by a user that ran the functional tests too) [16:56:11] yep, it's this one [17:00:33] I think it just keeps reusing and pushing the same image over and over (so creating layers), that's the functional tests not cleaning up after themselves I think [19:25:40] * dcaro off [19:25:50] p [19:32:17] cya!