[00:42:47] PROBLEM - Hadoop NodeManager on analytics1039 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [00:59:05] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CRITICAL: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: LOST [01:15:11] (03PS1) 10GoranSMilovanovic: init [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/474519 [01:16:14] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] init [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/474519 (owner: 10GoranSMilovanovic) [04:40:50] Does anyone have advice on attempting to map Countries to Languages? [04:41:04] I started out by using UN demographics data [04:42:04] http://data.un.org/Data.aspx?d=POP&f=tableCode:27 [04:42:45] but there are some problems: e.g. there is only one "Norwegian" and "Punjabi" and there is a language called "Arabic and French" spoken in Morocco and Algeria [04:43:18] I'd be satisfied with just being able to confidently say "this is top primary language spoken in this country" [04:52:58] actually that datasource is obviously missing some countries [05:07:09] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [05:37:17] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is OK: OK [06:38:41] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused [07:08:49] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is OK: OK [08:33:44] ah snap analytics1039 again [08:34:10] another disk failure [08:35:43] mmmm no wait, something is off, not sure why yarn died in there [08:37:13] RECOVERY - Hadoop NodeManager on analytics1039 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [08:37:37] !log restart yarn on analytics1039 - not clear why the process failed (nothing in the logs, no other disks failed) [08:37:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:37:50] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [08:44:44] !log re-run webrequest-load-wf-text-2018-11-17-23 via Hue [08:44:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:45:36] 10Quarry, 10Google-Code-in-2018: Create api health point for monitoring - https://phabricator.wikimedia.org/T205151 (10rafidaslam) a:03rafidaslam I'll work on this [08:56:20] so the failed webrequest job is 0006284-181112144035577-oozie-oozi-W [08:56:27] but I don't see why refine failed [08:56:39] in the Hue summary there is a generic Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [2] [08:56:56] but then I don't find anything in the correspondent Yarn app logs [08:56:59] (it succeeded) [08:58:23] but an1039 failed 10 minutes before this job failed, that can't be a coincidence :D [08:59:24] ah ok now I see why an1039 failed, it was logged in the ops channel [08:59:29] the root partition was filled [09:46:01] all good now, going afk again :) [12:19:50] 10Quarry: "/run//meta", "/rev//meta", "/query//meta" endpoints throw internal server error (code 500) when the id doesn't exist - https://phabricator.wikimedia.org/T209783 (10rafidaslam) [12:20:04] 10Quarry: "/run//meta", "/rev//meta", "/query//meta" endpoints throw internal server error (code 500) when the id doesn't exist - https://phabricator.wikimedia.org/T209783 (10rafidaslam) a:03rafidaslam I'll work on this [12:36:29] (03PS1) 10Rafidaslam: app.py: Handle meta endpoints when the specified id doesn't exist [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474530 (https://phabricator.wikimedia.org/T209783) [14:28:23] (03PS1) 10Rafidaslam: Add "/health/summary/v1/" API endpoint [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474532 (https://phabricator.wikimedia.org/T205151) [18:29:08] (03CR) 10Zhuyifei1999: "I recommend doing:" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474530 (https://phabricator.wikimedia.org/T209783) (owner: 10Rafidaslam) [18:34:41] (03CR) 10Zhuyifei1999: [C: 04-1] Add "/health/summary/v1/" API endpoint (033 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474532 (https://phabricator.wikimedia.org/T205151) (owner: 10Rafidaslam) [20:57:13] (03CR) 10Framawiki: [C: 04-1] "Agree with zhuyifei." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474530 (https://phabricator.wikimedia.org/T209783) (owner: 10Rafidaslam) [20:57:55] 10Quarry, 10Patch-For-Review: Handle meta endpoints when the specified id doesn't exist - https://phabricator.wikimedia.org/T209783 (10Framawiki) p:05Triage>03Low [21:04:39] 10Quarry, 10Google-Code-in-2018, 10Patch-For-Review: Create api health point for monitoring - https://phabricator.wikimedia.org/T205151 (10Framawiki) [21:23:03] (03CR) 10Framawiki: [C: 04-1] "Good start!" (034 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474532 (https://phabricator.wikimedia.org/T205151) (owner: 10Rafidaslam) [21:24:28] (03CR) 10Framawiki: [C: 04-1] Add "/health/summary/v1/" API endpoint (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474532 (https://phabricator.wikimedia.org/T205151) (owner: 10Rafidaslam)