[13:04:32] Hi, I have this fun problem that my hourly bash goes zombie container from time to time and breaks people's workflow. https://phabricator.wikimedia.org/T385203 I can't put a health check on a bot. Is there something else I can do? [13:04:59] even a daily restart would be fine [13:10:04] Amir1: this would be it https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/129 xd, not released yet, but it's on my "immediate todo list" [13:10:14] do you know why it goes zombie? [13:10:26] (if it's not NFS/io issues, then there might be a way to work around it) [13:10:55] thanks! I have no idea the container goes zombine [13:11:05] but timeout is a fine solution to me [13:36:53] !log tools.quickcategories toolforge envvars create TOOL_EXPECTED_DATABASE_ERROR 'The tool is temporarily non-functional due to database maintenance.' && webservice restart [13:36:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [13:40:00] Amir1: I'll try to get it done next week, if you can wait that'd be great :) [13:43:44] yeah, it happens from time to time. It's not urgent