[08:09:32] a lot of SLA end miss in alerts :( [08:13:10] so it seems to me that the cluster was busy and that webrequest coords are running late [18:02:08] PROBLEM - Hadoop NodeManager on analytics1054 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [18:05:52] RECOVERY - Hadoop NodeManager on analytics1054 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [18:14:04] !log re-run webrequest-load-coord-text 26/04/2020T16 via Hue [18:14:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:14:18] !log restart nodemanager on analytics1054 - failed due to heap pressure [18:14:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log