[13:17:26] good morning analytics! happy vote day :) [13:18:56] hey milimetric [13:30:18] hey average_drifter! just made myself some eggs [13:34:49] bacon and toast ? [13:58:19] yo ottomata, average_drifter [13:58:36] morning [13:58:45] man the storage place doesn't open til 9.30 [13:58:48] that through a wrench in my early morning storage plans [13:58:51] moving plans* [13:59:01] gonna be a late work night for me! [15:31:53] * drdee is wishing that ottomata was around [17:47:14] wow, dead quiet today :) [17:48:24] HIIIIII [17:53:54] yoooooooo [17:54:07] i am making headway with oozie ottomata [17:54:31] see https://app.asana.com/0/828917834272/2378327096958 [17:55:08] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:56:17] mornin [17:56:33] hey! [17:56:38] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:57:16] oooo subtasks! [17:57:18] i am in. [17:57:25] yeah! asana has gotten a lot more appealing lately [17:57:36] drdee: do you handle subtasks when you make your report? [17:57:54] not yet [17:58:02] ps. i am in the hangout [17:58:19] (i say just in case we see that race condition from yesterday( [17:58:21] ) [17:58:30] its happening again [17:59:47] is there an easy way to make provoke the generation of a new target and append it to the "all" target from within configure.ac ? [18:03:45] uhhhhh???? no clue [18:03:51] ottomata: this is the current error i am struggling with: [18:03:52] E0803: IO error, org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred. FailedObject: org.apache.oozie.WorkflowJobBean@21397218 [18:40:49] ottomata, so any thoughts about the openjpa exception? [18:41:34] brb a moment [18:41:39] aight [18:41:43] (what's openjpa?) [18:44:44] java persistance [18:44:53] dependency of oozie [18:51:44] ottomata: this is the actual error: [18:51:46] Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Data truncation: Data too long for column 'proto_action_conf' at row 1 {prepstmnt 978613927 INSERT INTO WF_JOBS (id, app_name, app_path, conf, group_name, parent_id, run, user_name, bean_type, auth_token, created_time, end_time, external_id, last_modified_time, log_token, proto_action_conf, sla_xml, start_time, status, wf_instance) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, [18:51:47] ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=1406, state=22001] [18:52:30] I looked in the oozie database at the WF_JOBS, the column type for 'proto_action_conf' is TEXT and max size is 64Kb [18:53:11] i could do an ALTER TABLE and change text to medium text, which has max. size of 64Mb [18:53:11] i mean 16Mb [18:53:17] try? [18:55:18] yeah i looked at that too [18:55:23] TADAAAA [18:55:25] IT WORKS! [18:55:33] http://analytics1001.eqiad.wmnet:8888/oozie/list_oozie_workflow/0000003-121106173454484-oozie-oozi-W/ [18:55:47] whoa that's it? [18:55:56] i had looked at that yesterday and decided that it probably wasn't my problem [18:55:56] amazing! [18:56:06] i was doubtful that the job conf was over 64K [18:56:18] but hey, cool! [18:56:18] hm [18:56:24] so that should be puppetized then too, hmmm [18:56:53] i would call this a bug :D [18:57:08] i didn't think it would work to be honest [18:57:20] yeah totally [18:57:30] you wanna submit that upstream somehow? [18:57:45] yeah i'll open a ticket :) [18:58:06] this is the fix: ALTER TABLE WF_JOBS CHANGE proto_action_conf proto_action_conf MEDIUMTEXT; [18:58:21] ok cool [18:58:38] will work on puppetizing all these changes now [18:58:46] cool [18:59:54] hmm, drdee, the oozie.service.JPAService.jdbc.driver stuff wasn't set to mysql? [18:59:55] it should ahve been... [19:00:35] no it wasn't [19:00:47] that is strange, I am explicitly doing that in puppet, hmmmmm [19:04:14] ohh. openjpa is the annotations lib. yeah, it's awesome. [19:04:19] drdee, re " oozie.service.AuthorizationService.security.enabled " [19:04:23] oops [19:04:23] no [19:04:29] "the proxy server stuff should end up in httpfs.xml, not core-site.xml" [19:04:33] which proxy server stuff? [19:04:45] i mean proxy user [19:04:45] okay, i think i just had a really really good iead. [19:04:47] *idea. [19:04:50] the stuff I put in httpfs-site was the httpfs proxy user [19:04:51] brb, writing things down. [19:04:52] just like the hue proxy user [19:05:03] core site has webhdfs proxy user [19:05:31] and oozie-site has oozie proxy users [19:05:32] iirc, i tried first core-site.xml but that didn't work but putting it in httpfs did work [19:05:33] right? [19:05:46] which bit? [19:05:49] there are serveral proxy user configs [19:05:54] the oozie proxy user [19:05:55] yes [19:05:56] httpfs.proxyuser.hue.hosts [19:06:02] hadoop.proxyuser.hue.hosts [19:06:09] oozie.service.ProxyUserService.proxyuser [19:06:18] oh, hm [19:06:20] i think i did the first two [19:06:33] but it could be that only the first is requried [19:06:48] oozie proxy user is commented out in oozie-site [19:06:53] the first two I did as well [19:07:18] httpfs.proxyuser in httpfs-site.xml, and hadoop.proxyuser in core-site.xml [19:07:19] the hadoop.proxyuser is the one for webhdfs [19:07:36] ah there are also oozie/hue proxy users [19:07:41] oozie.service.ProxyUserService.proxyuser.hue.hosts [19:07:50] those are in oozie-size.xml [19:14:54] louisdang: it seems that we just got oozie to work, you wanna give it a spin as well? [19:15:01] drdee, sure [19:15:04] drdee, what's this mean? [19:15:09] The specified scratchDir is unusable: /usr/lib/oozie/oozie-server-0.20/work/Catalina/localhost/_ [19:15:25] if you look in the comments then you can find the fix [19:15:30] two temp folders were missing [19:15:44] don't see it [19:16:09] Puppetize the following: [19:16:09] sudo mkdir -p /usr/lib/oozie/oozie-server/work/Catalina/localhost/oozie [19:16:09] sudo mkdir -p /usr/lib/oozie/oozie-server/work/Catalina/localhost/_ [19:16:09] chown oozie:oozie /usr/lib/oozie/oozie-server/work/Catalina/localhost/oozie [19:16:09] chown oozie:oozie /usr/lib/oozie/oozie-server/work/Catalina/localhost/_ [19:16:27] it's the first comment in the task [19:16:28] ok cool [19:16:55] ohhh, i need to click '3 more comments' [19:16:55] to see them [19:16:57] ok cool [19:17:26] that is kind of a weird spot for that stuff, hmm [19:17:28] ok [19:22:28] ok finally managed to get udp-filters package with the binary inside it [19:22:34] ncie [19:22:49] did you fix the man installing problem as well? [19:22:53] yes [19:23:05] how? [19:23:20] I can't remember, I did a lot of stuff [19:23:20] help2man was needed, I didn't have it locally [19:23:41] there were some path problems [19:23:51] package name problems in debian/control [19:23:56] but.. I'm going to hit git review now [19:24:22] the debianize.sh was updated as well [19:24:27] I pushed that to the repo on github [19:24:46] used dpkg -x to test if all the needed stuff was in there [19:25:59] oh wait, I forgot something :( [19:26:05] forgot to replace version in manpage [19:26:35] so there are a couple of places that need version replaced [19:26:42] * configure.ac [19:26:43] * source files [19:26:45] * manpage [19:26:52] * debian/control [19:28:37] oh no, not debian/control [19:32:58] help2man [19:33:06] uhm, where should I replace the manpage version ? [19:37:59] average_1rifter: ideally we would one canonical place for version [19:38:44] drdee: yes, the one generated in debianize.sh is basically used everywhere [19:39:30] cool [19:39:34] ottomata, there is still an impersonate error with oozie: [19:39:35] Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: oozie is not allowed to impersonate hdfs [19:39:49] hm [19:39:53] so maybe that proxy user statement needs to be put in all three config files [19:40:23] hm, maybe core site needs to know about oozie user [19:40:49] or it's because we only had it active on 1 ndoe [19:40:55] and not the other nodes..... [19:41:34] the oozie job got killed after 66% [19:44:28] hm [19:44:53] let's just first puppetize it and run it on all the nodes and then run the job again [19:45:12] yeah, working on it, gotta get my VM back to puppet testable state [19:45:23] it was all funky after I messed with the hadoop conf pseudo stuff [19:45:41] brb, getting some coffee [19:45:52] i should do that toooooo hmmm [20:02:40] back! [20:02:48] that was a pretty good idea, i think. [20:02:52] i will type it up tonight. [20:04:07] haha, ok [20:04:12] going to make us wait for it, eh? [20:04:34] i think i came up with a pretty good idea for how to do an elastic block store in a non-homogenous, high-variance environment but still (critically) maintain the RF constraint [20:05:02] (the secret is calculating current RF as a sum of probabilities the block is available) [20:05:17] another question.. uhm, how can I specify where udp-filter.1 should go uhm.. [20:05:18] (RF?) [20:05:26] I want to tell it to go in [20:05:32] debian/udp-filters/usr/share/man/man1/ [20:05:35] Replication Factor [20:05:41] (ah) [20:05:47] i think there is a debian/man file [20:05:50] or maybe just debian/files [20:06:01] i really love ronn, http://rtomayko.github.com/ronn/ronn-format.7.html average_1rifter [20:06:10] because i am in love with markdown. [20:06:18] and someday i will marry it. [20:06:49] ronn or markdown? [20:07:16] markdown. [20:07:23] it's just so elegant! [20:10:03] W: GPG error: http://apt.wikimedia.org precise-wikimedia Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 09DBD9F93F6CD44A [20:10:03] how can I fix that ? [20:10:13] I need to do that locally uhm [20:10:36] yeah, you need to make it so that you have a gpg key in your keyring [20:10:50] I have a gpg key right now [20:10:53] and that the gpg key's ID needs to match exactly the ID string you put in the changelog [20:10:57] but does it need to be signed ? [20:11:06] OH pffff [20:11:12] sorry, I should read your message closer [20:11:24] that is a different problem, that is apt not happy with the apt gpg [20:11:43] i thikn you can import the wmf key with apt-key somehow [20:11:49] not sure how though [20:11:49] I didn't know apt had a gpg until now. well I suspected sometimes but now it's obvious [20:11:49] soudln't really matter, right? [20:12:35] it can't install packages from that source because of the gpg [20:12:52] packages from apt.wikimedia.org don't show up on my machine aptitude search [20:13:05] can I tell it to ignore the gpg ? [20:14:18] ottomata: https://gist.github.com/0421e4d3a872716dd9b5 [20:14:21] ottomata: these are the keys I have right now [20:14:51] i think you can [20:14:53] hmmm [20:15:03] in the sources file maybe? [20:15:22] http://wiki.debian.org/SecureApt#How_to_find_and_add_a_key [20:15:26] I need to do that for apt.wikimedia.org [20:15:32] sure [20:15:39] but I don't know what to run [20:16:18] user@garage:~/wikistats/udp-filters$ gpg --keyserver apt.wikimedia.org --recv-keys 55BE302B [20:16:21] gpg: requesting key 55BE302B from hkp server apt.wikimedia.org [20:16:23] ?: apt.wikimedia.org: Connection refused [20:16:26] gpgkeys: HTTP fetch error 7: couldn't connect: Connection refused [20:16:31] gpg: no valid OpenPGP data found. [20:16:33] gpg: Total number processed: 0 [20:16:39] I don't know which key to get [20:18:59] yeah, hm, i dunno either [20:19:04] ok [20:32:49] ottomata: how can I make packages of libanon and libcidr for my x86 machine please ? [20:33:09] ummm, if all goes well [20:33:13] just run dpkg-buildpackage [20:33:13] ? [20:34:57] user@garage:~/wikistats/libcidr$ git remote add origin ssh://spetrea@gerrit.wikimedia.org:29418/analytics/libcidr.git [20:35:03] user@garage:~/wikistats/libcidr$ git pull origin master [20:35:03] fatal: '/analytics/libcidr.git': not a Gerrit project [20:35:09] libcidr is not packaged ? I know libanon is [20:35:18] and I thought libcidr was there also [20:36:02] it is here: [20:36:03] https://github.com/wmf-analytics/libcidr [20:36:16] i know, not consistent, I've been waiting for gerrit replication stuff to fix that [20:38:01] for libcidr dpkg-buildpackage worked, thanks [20:38:20] but for libanon it didn't [20:38:47] oh sorry [20:39:15] ottomata: https://gist.github.com/c990370444a4776f02e9 [20:39:15] hm, wait what? [20:39:15] hmmmm [20:39:54] ahhhh, i think I was building the packages for tha tin branches [20:40:08] can I have your debian/ for libanon ? [20:40:08] i was trying to follow git -deb build package stuff, [20:40:59] i thought for sure I documented this somewhere... [20:41:00] hang on [20:41:28] ok [20:42:04] you should do this [20:42:28] I need it because I'm building udp-filter packages locally and dh_shlibdeps requires those dependency packages to be on the system even though I do dpkg-buildpackage -d (and -d means ignore dependencies) [20:42:28] yeah it is there [20:42:30] it is in a branch, called debian [20:42:35] git checkout debian [20:42:36] alright, switching to branch [20:42:42] what I was doing with this [20:42:47] i had modified the source [20:42:57] to get this to build with newer versions of openssl [20:43:36] * drdee is reading  [20:43:43] anyway, there is a debian branch, in which you work on debian packaging [20:43:46] when you are ready to build a package [20:44:06] ok, got packages for both libcidr and libanon for my architecture now [20:44:20] you create a new branch from debian, named after the version you are going to build [20:44:21] I was struggling with just the libs installed on my systems, but packages were actually needed [20:44:28] then modify the changelog in that branch [20:44:44] i did this because I was building this for both lucid and precise [20:47:34] dschoon: "elastic block store in a non-homogenous, high-variance environment", which environment are you referring to? [20:47:45] none that exists :) [20:48:40] ottomata, has puppet run with the new config changes? [20:48:54] no, just about done fixing my vm, and I found a few bugs in my puppet stuff along the way [20:49:02] k [20:49:09] i don't want to commit the new puppet stuff unless I can test it [20:49:47] k [20:51:23] phew, ok cool, finally got that bit fixed, um, drdee [20:51:30] re tomcat apr [20:51:42] you compiled and installed manually? [20:51:53] no, just apt-get [20:52:23] apt-get libtcnative-1 [20:52:23] ahh mmk cool [20:55:09] ok great got the build going locally again [20:55:20] now I can continue on fixing that manpage [20:56:19] nice [21:00:37] drdee [21:00:39] is this [21:00:44] NoSuchFieldError: IS_SECURITY_ENABLED [21:00:44] fixed by adding [21:00:53] oozie.service.AuthorizationService.security.enabled [21:00:53] true [21:00:53] ? [21:01:17] mmmmmmmmmm [21:01:20] i don't think so [21:01:35] because the IS_SECURITY_ENABLED is a Tomcat / Catalina error [21:01:43] what is that about? anything? [21:01:46] ok [21:01:49] i just saw it in the logs [21:01:51] it is a todo [21:01:57] ooooh ok [21:01:57] thought it was something specific [21:01:58] not even sure if it's a real issue [21:02:06] but i thought it was important enough to make a note [21:05:14] ok cool [21:05:17] running puppet now! [21:06:51] aight [21:07:26] cool, things still work :) [21:07:29] at least, my sleep job does [21:07:42] making coffee [21:10:42] drdee: does this look right ? https://gist.github.com/e0aa0fbf5869d8a20eae [21:11:28] average_1rifter: yes that looks pretty good to me [21:11:44] alright then, I will do the git review now [21:12:47] ok seems that the version has automagically made it's way to the manpage [21:13:27] drdee: please review https://gerrit.wikimedia.org/r/32137 [21:14:05] what is the output of lintian? [21:14:58] moment please [21:16:15] drdee: https://gist.github.com/5dd40ff4cc6a9f61e699 [21:17:10] let's fix those errors as well [21:17:12] ok [21:19:54] done: https://gerrit.wikimedia.org/r/#/c/32137/ [21:23:05] drdee: https://gerrit.wikimedia.org/r/#/c/32137/1/configure.ac <--- debianize.sh takes care of those [21:23:43] but I will write in the code "VERSION_DEBIANIZE_PLACEHOLDER" [21:23:55] cool [21:24:19] drdee, i restarted hadoop master stuff [21:24:19] it's a bit confusing to have different version numbers all over the place [21:24:19] back to this error [21:24:20] This request requires HTTP authentication (). [21:24:24] yes [21:24:41] This request requires HTTP authentication ().--> where? [21:24:58] i see that in hue job designer when I submit a job [21:25:08] is that because I set that flag to true? [21:25:18] yeah i think so :D [21:25:23] oozie.service.AuthorizationService.security.enabled [21:25:25] hah [21:25:32] why did I set it to true? [21:25:33] will check if that is so [21:25:41] the reason i wanted to turn it on is to prevent people from killing each other's oozie jobs [21:25:56] aye [21:25:56] oh [21:26:06] right, we don't want people to kill our core jobs [21:26:07] why is there an /etc/oozie/conf/hadoop-conf.xml file? [21:26:07] hmm [21:26:19] i don't know [21:26:31] hm so [21:26:31] we had this error yesterday [21:26:39] and we fixed it by passing some flag on the cli to oozie [21:26:45] do you remember that? [21:26:49] which error? [21:26:53] the nullpointer [21:26:55] ? [21:27:00] no, requires auth [21:27:00] right? [21:27:08] on the CLI? [21:27:15] -auth SIMPLE [21:27:22] right? [21:27:32] that was to fix the null pointer exception [21:27:36] ah hmm [21:27:47] auth error is probably due to impersonation [21:28:01] lemme doulbe check that [21:28:01] try su - oozie 'command line" [21:28:22] i haven't been able to get oozie to run from cli yet [21:28:31] i always get that stupid auth error [21:30:13] yup think so [21:30:13] User: oozie is not allowed to impersonate otto [21:46:17] ottomata, can i help you? [21:47:39] i got it! [21:47:40] i think i need you to verify [21:47:46] you said there was a job that died at some point? [21:50:42] yup [21:51:40] in hue or on the cli? [21:51:46] hue [21:52:06] ok, running a sample job right now [21:52:41] k watching [21:52:57] brb food [21:53:38] still running [21:53:48] ohh you can watch it? [21:53:48] cool [21:53:50] can I add my name in debian/copyright ? at "This work was packaged for Debian by" [21:54:03] well, watching logs [21:54:13] are you running it now? [21:54:22] yes [21:54:34] average_1rifter: let me think [21:54:36] ………. [21:54:39] ………. [21:54:46] of course!!!!! [21:54:49] :) [21:54:52] you wrote it you get the credits [21:55:01] thanks :) [21:55:38] ottomata: http://analytics1001.eqiad.wmnet:8888/oozie/list_oozie_workflow/0000001-121106214409934-oozie-oozi-W/ [21:56:00] das good! [21:56:59] where it says "It was downloaded from" inside debian/copyright [21:56:59] we need an URL [21:57:14] but I don't know what URL we should have there (this would fix a lintian error) [21:58:01] https://gerrit.wikimedia.org/r/p/analytics/upd-filters.git [21:58:23] wait [21:58:28] yes ? [21:58:35] use copyright-sample from root folder [21:58:44] that one is already tuned, IIRC [21:58:57] same applies to the other sample file [21:59:04] drdee, wy 50% foreverrrrr? [21:59:12] i don't knoowwwwww...... [21:59:22] we need ganglia and JMX monitoring :D [21:59:41] it seems way too slow [22:00:03] drdee: these are contents of copyright-sample http://bit.ly/RH4Jlq [22:00:12] so it's missing the URL [22:00:17] but I can use the one you told me above [22:00:32] and fill in the copyright-sample and copy it to debian/copyright [22:00:35] okay, can you use this template [22:00:40] yes [22:00:56] and replace those placeholders with values from debianize.sh? [22:01:00] yes [22:01:04] hmm, drdee, hadoop says the job finished and succeeded [22:01:11] hue and oozie think it is still running [22:01:34] odd [22:01:42] hmm, [22:01:45] brb phone [22:02:16] it is comfortingly familiar to see NPE pop up in #analytics with frequency these days [22:02:24] ahhh yes. the java world. so predictable! [22:06:14] back [22:06:24] NPE? [22:06:38] null-pointer exception [22:07:10] dschoon is just trolling right now :D [22:07:23] i mean, it *is* predictable [22:07:39] as that's surely the most common exception you'll see [22:07:46] followed by the dreaded OOM! [22:07:46] ha [22:13:06] ottomata, so how can we debug this issue? [22:14:32] ah, phone again, one sec [22:29:16] sorry [22:29:18] yeah, goooooood question! [22:33:34] hmm yeah my sleep jobs failed too [22:33:49] for the same reasons [22:34:03] Unknown hadoop job [job_1352238269504_0001] associated with action [22:34:50] drdee: [22:34:50] http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201203.mbox/%3C92308400.32028.1332178177590.JavaMail.tomcat@hel.zones.apache.org%3E [22:36:30] * drdee is reading [22:37:21] https://issues.apache.org/jira/browse/MAPREDUCE-4033 [22:37:51] thanks, i was looking for that [22:39:02] it might be a config error [22:39:10] the bug was closed as invalid [22:39:31] hm [22:39:43] i wonder if it is that job.tracker.url thing we set to make hue happy [22:39:56] it does seem like the jobs run successfully though [22:40:02] that stupid thing again [22:40:09] the logs look like they query over and over again for job status [22:40:15] and then eventually realize that the job doesn't exist in hadoop anymore [22:40:24] but i think they are trying to query the job.tracker.url [22:40:26] not sure, reading [22:40:27] in jura, comment 3 seems to be interesting [22:41:55] what is 'AM' in that comment, do you know? [22:44:38] administration maintenance? [22:45:41] hmmm, i think this is not job tracker [22:45:48] they are talking about job history server [22:45:49] where can we find the log file as the one that is attached to the JIRA ticket? [22:47:25] a few places, i'm looking in oozie web gui [22:47:34] http://analytics1001.eqiad.wmnet:11000/oozie/ [22:47:40] click on job, then job log tab [22:48:00] also in /var/log/oozie/oozie.log [22:50:29] i think we need the actual hadoop log [22:52:00] oh for the job [22:53:39] es [22:53:39] yes [22:56:50] 2012-11-06 21:45:33,570 WARN [AsyncDispatcher event handler] org.mortbay.log: Job end notification to http://analytics1001.eqiad.wmnet:11000/oozie/callback?id=0000000-121106214409934-oozie-oozi-W@root-node&status=SUCCEEDED& failed [22:56:54] java.net.UnknownHostException: analytics1001.eqiad.wmnet [22:56:55] ah ha! [22:57:00] interesting [22:57:05] hmmm, where would it get that url? [22:57:19] that can only come from our browsers [22:57:19] hmmmmmmmmmm [22:57:20] hm! [22:57:21] hmmm [22:57:49] oh that is an ajax async thing? weird [22:57:49] werid, i dunno [22:57:49] that could be eme [22:57:54] that's in my job file [22:57:57] for my sleep job [22:58:04] job log file [22:58:04] ohh reallly [22:58:18] i think that is the problem somehow though [22:58:27] its trying to tell oozie -> hue that the job has finished [22:58:33] but the url is totally wrong [22:58:34] hmmm, weird, going to try something [22:59:06] but why is the url totally wrong?this part looks fine to me: http://analytics1001.eqiad.wmnet:11000/oozie [22:59:22] no, analytics1001.eqiad.wmnet is a tricky thing we are doing for the vpn [22:59:35] so it should be localhost:11000? [22:59:36] there is no real an01 .eqiad name [22:59:41] ok [22:59:47] no, it should be analytics1001.wikimedia.org [23:00:03] uhmmmm do this: [23:00:07] sudo /etc/init.d/oozie restart [23:00:19] it outputs a whole bunch of stuff [23:00:25] maybe it says something about the server naem [23:02:10] Setting OOZIE_BASE_URL: http://analytics1001.eqiad.wmnet:11000/oozie [23:02:21] :) [23:02:32] Setting OOZIE_HTTP_HOSTNAME: analytics1001.eqiad.wmnet [23:04:23] hmmm oo [23:04:28] interesting [23:05:27] i am trying to figure out where this happens [23:05:47] /usr/lib/oozie/bin/oozie-sys.sh [23:05:51] ./bin/oozie-sys.sh: export OOZIE_HTTP_HOSTNAME=`hostname -f` [23:06:11] OOZIE_HTTP_HOSTNAME : The host name Oozie server runs on. Default value is the output of the command hostname -f [23:06:17] —fqdn [23:06:20] weird [23:06:20] ok [23:06:29] OHHH because /etc/hosts overrides dns [23:06:40] can fix [23:06:40] AWESOME! [23:07:04] oh man, we are really taming this beast! [23:07:46] hmmm, ok, links might get funky in vpn connected sites now though [23:08:12] seems normal, ok we'll see [23:08:51] fire up another job? [23:11:20] mmmm i just fired up a job and it got killed with a new error: [23:11:20] java.lang.NoClassDefFoundError: org/apache/pig/Main [23:11:37] let's stick to hive sample job [23:11:50] also got killed [23:12:04] java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver [23:12:07] i just tried the shakespeare job [23:12:14] there is a class path error now [23:12:14] eh? [23:12:20] shakespeare and sleep job worked! [23:12:22] for me [23:12:27] lucky you [23:12:34] try the hive or pig job [23:13:04] (from http://analytics1001.eqiad.wmnet:8888/oozie/list_workflows/) [23:14:02] hm [23:14:33] yeah it seems like the job isn't making it into the cluster correctly? there isn't a yarn log file [23:14:46] should be simple path issue, or not? [23:15:04] maybe, yeah [23:15:35] ok the 'Shell' job does work [23:16:17] oh hm [23:16:51] so far the 'pig', 'hive' and 'swoop' jobs don't work [23:18:58] the 'forks' job also works, so afaict, all non-java oozie workflow jobs work [23:22:55] hmmmm [23:22:57] weird [23:23:54] i think i know it [23:24:24] does oozie in hue know that the oozie user libs are in hdfs://user/oozie/share/libs [23:27:36] hmmmmmmmmm HMMMMMMMMMMM idea [23:28:35] can you try again? [23:28:51] check line 244 in oozie/oozie-site.xml [23:29:05] i can paste as well [23:29:06] /user/${user.name}/share/lib [23:29:08] [23:29:09] System library path to use for workflow applications. [23:29:10] This path is added to workflow application if their job properties sets [23:29:11] the property 'oozie.use.system.libpath' to true. [23:29:19] so it substitutes user.name [23:29:29] but it should always be 'oozie' [23:29:34] yeah that's what i'm tryhing [23:29:35] i jsut set it to oozie manually [23:29:38] and restarted both oozie and hue [23:29:49] try now ( I wonder if hue is filling in its own user name) [23:29:53] (which you probably wondered as well :) ) [23:30:11] yup :) [23:30:19] or the name of the current logged in user [23:30:43] nope [23:30:48] got killed [23:31:18] same error: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver [23:32:01] poopers [23:32:24] ok, grabbing some takeout, be back in a min to keep hammering away [23:32:48] man we are sooooo close [23:33:00] takeout is waiting, so close indeed! [23:33:16] alright bonne appetite! [23:34:29] CDH4.1.2 now available [23:34:49] it fixes the oozie cli null pointer issue [23:54:14] hmmm [23:55:35] mmmmmm indeed [23:57:21] i didn't see your change to /user/${user.name}/share/lib in oozie-site.xml [23:57:30] so i did it myself again [23:58:16] hm [23:59:05] bwerrr [23:59:15] yeah wha I changed that [23:59:15] did oyu change it? [23:59:17] i don't see your change! [23:59:21] ??? [23:59:30] on an01? [23:59:30] in /etc/oozie/oozie-site.xml [23:59:30] yes [23:59:37] /etc/oozie/conf/oozie-site.xml [23:59:46] yes that's what i meant [23:59:51] ah but there is one in /etc/oozie [23:59:53] probably not used