[07:15:41] razzi: https://phabricator.wikimedia.org/T273026 (for the ifup issue) [07:18:49] also fixed an-test-presto, it is sadly logging steadly that it cannot connect to the coordinator [07:23:07] (03CR) 10Elukey: [C: 03+1] "Great Dan!" [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/655749 (https://phabricator.wikimedia.org/T233336) (owner: 10Milimetric) [09:05:09] 10Analytics, 10SRE: archiva artifact links point to 127.0.0.1 - https://phabricator.wikimedia.org/T164993 (10hashar) It works indeed, thank you @elukey. I am so happy when some old tasks get easily fixed :] [15:23:59] PROBLEM - Presto Server on an-test-presto1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [16:10:07] RECOVERY - Presto Server on an-test-presto1001 is OK: PROCS OK: 1 process with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:48:59] joal: o/ [17:49:19] I am going to disable datanode partition alarms for hadoop backup, some workers have already partitions filling up [17:49:57] we have still ~100TB left of space on HDFS, just writing this warning since I noticed the backup script still in progress [18:02:54] also I found the presto issue! [18:03:01] https://github.com/prestodb/presto/pull/15655 [18:03:11] just tested on the test cluster, it works [20:40:24] elukey: Thanks for disabling alerts! And also thanks for finding the presto bug - man you're a machine :)