[05:06:59] <wikibugs>	 10Analytics, 10Pageviews-API: Add wikimania.wikimedia.org to pageview definition - https://phabricator.wikimedia.org/T216525 (10Billinghurst) Did this get discussed at any point?
[12:04:20] <icinga-wm>	 PROBLEM - Webrequests Varnishkafka log producer on cp3053 is CRITICAL: connect to address 10.20.0.53 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[12:18:20] <icinga-wm>	 RECOVERY - Webrequests Varnishkafka log producer on cp3053 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[14:04:40] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1087 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[14:09:08] <joal>	 excuse-me, I think I hammered too strong with spark :( --^
[17:56:46] <elukey>	 usercache/joal/appcache/application_1583418280867_12912 , Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
[17:56:49] <elukey>	 :P
[17:57:38] <elukey>	 very weird though that the systemd unit still sees yarn as up
[17:57:44] <elukey>	 meanwhile no process is around
[17:58:12] <elukey>	 !log restart hadoop-yarn-nodemanger on an-worker1087
[17:58:13] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:58:38] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1087 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[18:01:22] <elukey>	 ah ok I just checked the systemd unit
[18:01:31] <elukey>	 Restart=no
[18:01:50] <elukey>	 ok that explains it
[18:02:08] <elukey>	 maybe the thought is to avoid auto-restarting and force people to check
[18:02:17] <elukey>	 but it doesn't make a lot of sense to me
[18:02:44] <elukey>	 same thing in bigtop
[18:02:49] <elukey>	 anyway, afk again :)
[20:25:12] <wikibugs>	 10Quarry, 10DBA, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Mike_Peel) I'm now getting the normal 'killed' message for going over 30 minutes, rather than the MySQL error. So perhaps things are now back to normal?