[17:03:01] Hello 0lly! do we have any scripting or alerting that could output a process list in the alert email body? We're having a lot of problems with resource hogs (ref T372416 ) and I was wondering if we have anything that could speed up firefighting until we have cgroups in place [17:03:01] T372416: Implement cgroups for users' JupyterHub environments in order to mitigate resource contention on the stat servers - https://phabricator.wikimedia.org/T372416 [17:38:57] inflatador: None comes to mind at the moment, but let me dig through our docs and think about way to do it (if possible). [17:40:21] denisse ACK thanks, I don't think node exporter has this feature unfortunately [17:53:25] FIRING: [9x] SystemdUnitFailed: alertmanager-irc-relay.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:53:41] ^ That's me, taking a look. [17:58:25] RESOLVED: [4x] SystemdUnitFailed: prometheus-icinga-exporter.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed