[00:07:15] hey Ironholds, can you come on a hangout? [00:07:25] sure [00:07:40] just send me a link [00:08:09] folks, see you on wednesday! [00:10:08] by mforns [00:10:20] bye [00:10:22] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14 [00:41:35] hey Ironholds [00:41:41] can I summon you again? [00:42:22] god, I need to run, bbl [00:42:26] DarTar, sure. can you get an actual room? [00:42:27] okay [04:30:09] (PS1) Nuria: Adding stubs for pageview MVP [analytics/dashiki] - https://gerrit.wikimedia.org/r/172480 [05:03:18] YuviPanda, it's nice to know -en has not got /less/ stupid [05:03:27] Ironholds: yeah [05:03:45] that was just... [05:03:45] sigh [15:15:12] ottomata, you alive? [15:16:06] yup hiya [15:17:54] ottomata, yay! So I'm going to make one final stab at getting RJDBC and Hive to talk to each other. [15:18:47] Suppose you were on stat1002 and using JDBC to connect to hive. What hostname/port would you point it at? [15:23:57] hm [15:24:08] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLs [15:24:13] if the default port is 10000 [15:24:14] then [15:24:31] jdbc:hive2://analytics1027.eqiad.wmnet:10000/ [15:25:13] i see something listening on analytics1027 on 1000 [15:25:14] ta! [15:25:14] probably that [15:25:39] ja that's it. [15:25:48] ...holy crap [15:27:46] it worked! Sort of? [15:28:46] it's claiming the only table is "temp_view" which contains site, os, unique_users. [15:28:48] ottomata, thoughts? [15:29:08] hm, what database did you choose? [15:29:35] yeah taht's the default database [15:29:36] aha, it's using the default database [15:29:41] do [15:29:44] hmn. I tried to pass in dbname. [15:29:50] jdbc:hive2://analytics1027.eqiad.wmnet:10000/wmf_raw [15:29:50] ? [15:29:59] sensible! [15:30:19] IT WORKS [15:30:22] okay, final test.. [15:31:07] Error while compiling statement: FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.apache.hcatalog.data.JsonSerDe not found)) [15:31:18] OO [15:31:19] hm [15:31:20] yes. hm [15:31:23] hmn. I assume the JsonSerDe isn't being automatically included because of the way the connection is being made? [15:31:27] yes [15:31:30] oh [15:31:34] can you issue hive queries? [15:31:34] Lets see what happens if I append the "manually include that JAR" line to the start of the query. [15:31:37] yup [15:31:38] snap [15:31:38] yeah [15:31:39] exactly [15:32:09] dammit, I can't find the code for that. It used to live on wikitech. [15:33:34] ottomata? [15:34:11] finding.. [15:34:55] ADD JAR file:///usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.12.0-cdh5.0.2.jar,; [15:34:56] i think [15:34:59] maybe with quotes [15:35:00] oh, no comma [15:35:03] ADD JAR file:///usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.12.0-cdh5.0.2.jar; [15:35:13] ja no quotes [15:35:40] ta [15:36:22] Error while processing statement: file:///usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.12.0-cdh5.0.2.jar; does not exist [15:39:11] hm [15:39:18] i just did it in hive query [15:39:24] try it witou thte file:// [15:41:43] kk [15:42:56] hmn; nothing :( [15:43:06] lets see if I can add the auxpath argument when creating the connection object. [15:45:55] same thing without the file? [15:46:04] yup [15:46:05] Ironholds: give me the whole set of commands you are working with up to this point, and I will try too [15:46:11] sure! [15:46:15] you'll need to install some libs ;p [15:46:22] oh, ha [15:46:27] or I coul djust sudo to you :p [15:48:41] ottomata, https://gist.github.com/Ironholds/97daabd45b0b66e56f05 [15:48:43] hah [15:49:21] oh [15:49:28] Ironholds: try doing it in two queries [15:49:55] dbSendQuery(conn, "ADD JAR usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.12.0-cdh5.0.2.jar;") [15:49:59] then your real, query [15:51:08] I did; it objects that there's nothing to return and also claims the file doth not exist [15:54:12] hm [15:54:19] nothing to return is fine [15:54:29] on its own without file:// it complains it doesn't exit? [15:55:56] OH [15:55:58] Ironholds [15:56:01] you are missing the leading / [15:56:05] aha! [15:56:09] just remove two of the slashes from file:/// [15:56:10] not 3 :) [15:56:22] nope, same complaint [15:58:05] OH [15:58:06] hm [15:58:09] try it with your classpath thingee [15:58:14] instead of doing add jar or auxpath [15:58:17] it might do the same thing... [16:01:20] hm nope [16:05:14] AH [16:05:16] Ironholds: [16:05:18] no semicolon? [16:05:32] yeah! [16:05:38] mabye. [16:05:48] this looks like progress [16:05:48] dbSendQuery(conn, "ADD JAR file:///usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-0.12.0-cdh5.0.2.jar") [16:05:58] without the ; at the end of the filename [16:06:42] hmn [16:06:47] ohhh [16:06:48] that'd make sense [16:06:54] haven't got the real query to work yet though [16:07:29] oop [16:07:30] I may have [16:07:35] lets see if anything happens [16:07:43] (Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask) [16:07:44] nope [16:07:45] yeah [16:07:49] new error, though! :D [16:07:51] how do I print the result of a query? [16:07:57] to_fetch [16:07:58] ? [16:20:44] ottomata, yep [16:20:46] ah, hangon [16:20:49] no, to_fetch is the object [16:20:52] <- == = [16:21:03] you want fetch(to_fetch, -1) [16:21:35] ottomata, does the query-monitor register these? [16:23:09] ? [16:23:18] ottomata, the weird thing behind a firewall [16:23:27] it will if the job actually gets submitted [16:23:31] don't think i've seen that happen yet [16:23:31] aha [16:24:01] hmn [16:24:05] so far its just interacting with hive server [16:24:05] maybe some kind of permissions error? [16:24:09] and metadata [16:24:09] lack of specifying the username [16:24:14] hmm, dont' think so yet [16:24:18] worth a try though [16:24:44] "ironholds" and NULL I assume [16:26:15] * Ironholds tests [16:26:29] well, it's doing SOMEthing [16:26:48] OOH [16:26:53] oh! for me too (i'm selecting from a different table) [16:26:57] IT WORKS [16:26:58] i thikn! [16:26:59] * Ironholds dances [16:27:01] yep! [16:27:02] yeah! [16:27:04] just retrieved a data.frame of data [16:27:07] it worked from webrequest? [16:27:12] YEP [16:27:14] We have R/Hive integration! [16:27:16] awesooOOOOMe [16:27:19] :D [16:27:21] * Ironholds dances [16:27:30] okay, I'm gonna buy some smokes, integrate this with WMUtils and send an email [16:27:32] * Ironholds nods firmly [16:27:39] Ironholds: btw, do you get how to create your own external tables on top of your own data? [16:27:42] e.g., sampled-1000? [16:27:51] oh, yeah, I've done it before [16:27:54] ok cool. [16:28:17] (I actually don't think that's the answer here. But I think we should experiment with distributing tasks /over/ the cluster, totally) [16:30:34] aye cool, just making sure you knew how