[07:27:03] volans: has something recently changed within debmonitor I just saw this? https://phabricator.wikimedia.org/P12972 [07:28:04] marostegui: it might be a re-occurrence of T199911 [07:28:05] T199911: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 [07:28:16] Ah, I wasn't aware of that task [07:29:01] Thanks volans [07:29:14] marostegui: we could apply the toil::systemd_scope_cleanup class to those hosts too [07:29:25] so that they will be cleanup automatically [07:29:50] volans: Shall I create a task for that? Those are "owned" by Jaime, so I would wait for him to make that call [07:30:17] that would be great [07:30:33] I didn't even code the bandaid, that was fil.ippo :) [07:30:36] I will do that, thank you [07:30:37] but feel free to add me too [07:30:43] Sure thing [07:30:57] that class is applied to all swift hosts fwiw [07:33:44] marostegui: the 5:51 one matches totally a full disk utlization according to grafana [07:34:06] so yeah I think it's most likely that [07:34:14] Yeah, I was just checking that and adding it to the task [07:34:29] I was checking if it was the small disk (which I assume) [07:37:15] the failures after that in teh log are "expected" [07:37:53] volans: I have created the task, so jaime can double check how to follow it up [07:39:07] k, thx a lot [07:39:22] no, thank you! :* [07:39:26] grazie [13:08:59] we just had a massive spike on errors [13:10:42] <_joe_> recovered or not? [13:11:26] <_joe_> can anyone else check logstash? [13:11:33] I am checking it [13:11:37] Looks like it is recovering now [13:11:52] <_joe_> it has already recovered [13:12:00] <_joe_> it was a flurry of jobs not being enqueued [13:12:21] yeah, looks so [13:12:33] mostly on commons and dewiki