[07:53:58] elukey: I will check if normal jobs are being very delayed [07:55:19] and what we have done in the past split the jobrunner cluster further, ie depool a few servers from the videoscaler group, so to allow normal job processing [07:58:14] is there also a way to reduce the amount of transcode jobs running per host? [07:58:21] I mean limit the concurrency [08:01:26] not that I recall, what I think we should do is [08:01:32] depool a few jobrunners [08:01:52] and kill ffmpeg in the busiest servers [08:02:03] and find what/who is uploading [08:04:22] effie: do we need more people in from your team (i.e. page) or is the situation under control? [08:04:50] let me do this first aid actions first [08:04:53] and then we will see [08:06:46] ack lemme know if you need me [08:27:14] elukey: thanks for the ping, I think things are under control for now [08:30:21] effie: nice! Can you write a one sentence summary in here about what it was done so people knows? [08:30:35] I am opening a task [08:30:50] perfect [08:31:44] but basically, even though we have some rate limits on the edge, but probably wasnt enought [08:32:27] basically I killed ffmpeg in the 2-3 top busy servers, and depooled some more from the videoscaler cluster, to make room for notmal jobs [08:32:35] normal* [08:32:57] ack thanks [08:33:11] and I pushed more traffic to the sole-jobrunners ones [08:33:25] now those videoscaling jobs will be retried [08:34:03] so it is a bandaid [08:55:15] going afk, ping me on the phone if needed!