[04:21:49] RECOVERY - Check the last execution of monitor_refine_mediawiki_events on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:18:35] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1004 is CRITICAL: connect to address 10.64.5.104 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [13:58:31] the issue was due to a big python process [13:58:36] tizianop: hello :) [13:58:46] hi! [13:58:58] I just sent an email, I had to kill your python script, stat1004 was a bit under stress [13:59:09] (ssh was barely working..) [13:59:37] I saw it, sorry for that :/ [13:59:57] no problem! it happens! :) [14:00:47] going afk again, please feel free to restart your work, just a big more gentle on stat1004 :) [14:00:56] I was steaming from hadoop to a python script with a pipe command and it filled the ram [14:01:02] it's 21GB [14:01:18] any chance I can copy it on the local disk? [14:01:35] yeah sure, 21G is fine [14:01:39] please copy it in your home dir [14:01:46] ok! thanks! [14:01:47] it is in the /srv partition that is huge [14:01:56] /dev/mapper/stat1004--vg-data 7.2T 1.7T 5.1T 26% /srv [14:02:03] :) [14:02:15] great! [14:02:22] (your home is symlinked to /srv/home/etc..) [14:02:29] o/ [14:20:59] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1004 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs