[00:09:59] pgehres: there should be two extra fields [00:10:07] a language accept http header [00:10:15] so accept stuff like en-us [00:10:32] and x-carrier header, that should be 2 or 4 characters long [00:10:33] iirc [00:10:54] dschoon: this is not about zero, this applies to all frontend cache servers [00:11:02] drdee: okay, i can confirm that i am seeing the new fields [00:11:16] just trying to figure out why why regexes still work ... [00:11:20] ok [00:11:24] let me know if you have issues [00:11:36] brb in 90 minutes [00:11:36] my only issue is that I am not having issues [00:11:46] cool! [00:13:16] aha! python match is ignoring the new fields since they are at the end [00:13:17] phew [00:13:26] * pgehres is not crazy [00:15:46] so, the only issue that I see is that cp1043.wikimedia.org seems to still be on the old format [00:51:08] drdee: why is the graph Erik attached to the email so spiky ? [00:51:44] spikey ? dunno the correct way to write it [00:52:24] in any case, I would expect it to have no spikes [00:53:11] I saw this in some other graphs related to req/minute [01:39:53] http://www.regexper.com/ [14:21:34] http://stat1.wikimedia.org/spetrea/new_pageview_mobile_reports/r3/pageviews.html [14:21:39] hello [14:21:44] I have a problem with color ramps [14:21:53] if anyone has dealt with this please let me know [14:22:12] the problem is basically I need to describe graphically the delta between two months [14:22:22] so the delta is basically a percentage of increaase or decrease [14:22:58] and now the problem is that there is some code in wikistats related to the color ramp, I had a look at it, it's like 400 lines.. uhm, trying to make it simple [14:23:09] so I can understand it also.. [14:23:38] if you know some module that does color ramps and it allows me to set the spectrum (like for example I want just green,red and white in the spectrum) [14:23:45] please let me know [14:24:02] it doesn't matter what language the module is in, I can port the code [14:24:31] the link above shows the spectrum I have and the color ramp [14:26:41] in the meantime I'll try to modify the code to get the spectrum needed [15:01:21] morning [15:02:26] hey average_drifter [15:02:32] d3 does color ramps very well [15:04:34] morning [15:04:36] colorScale = d3.scale.linear().domain([-110,90]).range(['green','red','white']) would make a scale that maps that domain to green, red, and white [15:05:03] then you can go colorScale(inputVal). You can try porting the code from just that part of d3: https://github.com/mbostock/d3/wiki/Quantitative-Scales#wiki-linear [15:05:20] hi otto [15:05:40] i saw last night that your script worked but not for you? [15:06:36] still not working for me [15:06:57] i can run it as diederik, but he has the wrong kraken.jar so it doesn't work, i'm thikning maybe my new kraken.jar has a problem [15:07:30] hey milimetric [15:07:42] milimetric: I have to have a look at d3 [15:07:56] i'll help you find the code for that scale [15:08:18] https://github.com/mbostock/d3/blob/master/src/scale/linear.js [15:08:51] * average_drifter is having a look [15:10:45] so it looks like that's mostly setup code, and then the actual scaling is done by this: https://github.com/mbostock/d3/blob/master/src/scale/polylinear.js or by this: https://github.com/mbostock/d3/blob/master/src/scale/bilinear.js [15:17:23] milimetric: where is the implementation of interpolate ? [15:17:28] milimetric, since you asked, and I got no other brain bouncers here...:) [15:17:33] I have grepped through the d3 code and found multiple matches [15:17:35] so, I compiled a new kraken.jar file [15:17:51] It has a pig UDF called GeoIpLookup.java [15:17:52] average_drifter: I'm looking too [15:18:07] otto, which repo is this committed to? [15:18:07] I modified it so that it returns a 7th field for continent [15:18:11] kraken [15:18:32] https://github.com/wmf-analytics/kraken/blob/master/src/main/java/org/wikimedia/analytics/kraken/pig/GeoIpLookup.java [15:18:43] pig has a local and mapred mode [15:18:48] local mode works with files on your local filesystem [15:18:52] https://github.com/mbostock/d3/blob/master/src/core/interpolate.js [15:18:55] mapred mode is default and works with hdfs filesystem [15:18:57] average_drifter ^^ [15:19:04] milimetric: thanks ! [15:19:22] otto, gotcha [15:19:44] so, I just tried running diederiks blog.pig script [15:19:53] with his version of kraken.jar (the old one) [15:19:54] and then mine [15:19:56] it works with his [15:19:58] mine gives [15:20:12] java.io.FileNotFoundException: File does not exist: null [15:20:16] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) [15:20:20] in mapred mode [15:20:39] but my new kraken.jar will work in local mode [15:20:55] hm, so it's just fetching the file from hadoop that fails [15:21:08] (note, the blog.pig script is not using the new continent field, its not even using the GeoIpLookup UDF) [15:21:18] right [15:21:33] can you run blog.pig with your new jar? [15:21:45] so it seems that my modification, or maybe my recompilation broke something [15:21:51] not in mapred mode [15:21:53] local mode is fine [15:21:56] hmmmm, maybe, hmmm [15:22:11] maybe when I compiled it didn't get the correct hadoop deps and now the hadoop stuff doesn't work [15:24:46] is it too complicated for me to build and run the pig script? [15:25:46] no, shouldn't be, it was pretty easy for me [15:25:51] checkout kraken somewhere [15:25:53] run [15:26:00] checkout on an01 probably [15:26:01] then run [15:26:07] mvn compile [15:26:09] mvn package [15:26:15] that shoudl give you a kraken.jar in target/ [15:26:20] cool, trying [15:26:32] make some working directory somewhere, and cp the things you need there: [15:26:48] so doing it locally is not going to work because of access problems, right? [15:26:58] you'll need a file to work on, but it will [15:28:20] cp kraken/target/kraken.jar ./pigdir [15:28:20] cp /home/otto/pig/krakensrc/blog0.log ./pigdir [15:28:20] cp /home/diederik/geoip-1.2.5.jar ./pigdir [15:28:20] cp /usr/share/GeoIP/GeoIP{,City}.dat ./pigdir [15:28:37] cp /home/otto/krakensrc/piggybank.jar ./pigdir [15:28:46] cd pigdir [15:28:59] hadoop fs -put GeoIP.dat [15:28:59] hadoop fs -put GeoIPCity.dat [15:32:26] morning guys [15:32:40] morning drdee [15:32:42] averag_drifter, milimetric, ottomata [15:33:02] hm so I'm not allowed to ssh into an01 [15:33:09] is kripke an ok place to try? [15:33:16] to try what? [15:33:22] naw that won't work [15:33:25] compiling kraken and running pig [15:33:26] you can't ssh into an01? [15:33:31] analytics1001.wikimedia.org [15:33:38] drdee, my problem is not missing jars or .dat files [15:33:42] my new kraken.jar doesn't work [15:33:46] ok [15:33:53] did you push your commit? [15:33:53] yeah, i get permission denied [15:34:01] i can compile kraken.jar [15:34:26] yes [15:34:29] hm [15:34:34] yeah, try it [15:34:41] maybe my key isn't on an01? [15:34:45] yeah lemme see about that [15:35:11] yup, you are not on them, lemme fix that [15:42:01] ottomata, 1 sec working on it [15:43:28] ok on an01:~/kraken_new.jar; try that one [15:48:08] drdee, same error when I use that jar [15:48:14] yes me too [15:48:26] at least i can replicate it now :) [15:48:42] are we compilling with some wrong version of hadoop deps [15:48:48] that is keeping the new .jar from working with hadoop? [15:48:51] cause it works in local mode [15:51:00] no [15:51:08] i think i know the problem [15:51:11] oh? [15:51:23] read line 321 https://github.com/wmf-analytics/kraken/blob/25ec7e8f360c1e87fc3b0b83758dbb8a03b7d94e/src/main/java/org/wikimedia/analytics/kraken/pig/GeoIpLookup.java [15:51:51] hm [15:51:55] we have no support for both ip4 and ip6 addresses [15:52:07] however the getCacheFiles() always tries to load both db's [15:52:15] let me fix this [15:52:19] hm, ok [15:52:26] and we only supply 1 db [15:52:31] so it's trying to load 'null' [15:52:45] that doesn't work obviously [15:52:59] i mean, i guess that makes sense, but this was happening before too [15:53:01] I didn't change that bit [15:53:12] and I'm testing with your blog.pig script, not my new continent one [15:53:13] i know [15:53:18] and blog.pig doesn't even use GeoIpLookup [15:53:51] but it's the getCacheFiles() that distributes the dat file(s) to all nodes [15:53:58] and that's always required by all functions [15:54:04] yea, no i mean, what you are saying seems to make sense [15:54:11] for this error, and it is worth looking into [15:54:15] but, this worked before, and not now [15:54:18] and that didn't change at all [15:54:34] unless, your .jar was made before the ip6dat cache file stuff was added or something [15:54:39] exactly [15:54:45] that's what happened [15:54:50] the old kraken.jar did not have this et [15:54:54] ah [15:54:55] hold on [15:54:58] I think I forgot to modify getCacheFile for the ip6dat support [15:55:03] yes:) [15:55:06] i am fixing it now [16:00:26] I'm mostly done with the Funnel UDF too: https://gist.github.com/4443809 [16:00:59] ottomata, fix tested and pushed [16:01:09] cool louisdang! reading it right now [16:01:31] ottomata, you can copy an:/home/diederik/kraken_new.jar that one works now [16:02:02] ok cool [16:02:55] sorry for all the hassle ;) [16:03:21] yeah sorry about that, my mistake [16:03:47] louisdang; that looks sweet, let's push it and try to come up with some unit-tests [16:03:53] ok [16:06:14] pushed [16:06:58] drdee, i'm still getting the same error [16:07:49] * drdee is puzzled [16:08:45] can you reproduce? [16:08:46] the problem is with the load statement isn't it? [16:09:02] Message: java.io.FileNotFoundException: File does not exist: null [16:09:02] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) [16:09:09] i just ran the pig script with the new kraken_jar and it works [16:09:18] did you update your pig script to use the new kraken.jar ? [16:09:44] ok, if it works for you then I must be doing something wrong [16:09:51] i copied the new kraken.jar to my cwd [16:10:05] pig script is just doing [16:10:06] REGISTER 'kraken.jar' [16:10:52] but where is your pig script? [16:11:08] i'm currently using [16:11:17] sorry where do you run your pig script from? [16:11:22] /home/otto/pig/krakensrc/c/blog.pig [16:11:29] cwd == /home/otto/pig/krakensrc/c [16:11:45] just to be sure, I cped your kraken_new.jar as is [16:11:51] but that folder has still the old kraken.jar [16:11:53] and then modified blog.pig to register that [16:11:56] where? [16:12:06] /home/otto/pig/krakensrc/c [16:12:17] /home/otto/pig/krakensrc [16:12:19] /c [16:12:30] i'm in a new dir to try to isolate things [16:12:36] and now i'm registering kraken_new.jar anyway [16:12:45] in pig/krakensrc/c/blog.pig [16:12:50] ok [16:13:17] can i try it? [16:13:22] drdee, did you update getCountryCode too or just GeoIPLookup? [16:13:22] ja please [16:13:37] oh yeah, blog.pig uses getCountryCode, not GeoIpLookup [16:14:10] script is running from /home/otto/pig/krakensrc/c [16:14:18] are you running blog.pig? [16:14:22] no continent.pig [16:14:29] aye, I saw you edited that just now [16:14:36] i was about to try that, since it uses GeoIpLookup [16:14:42] blog.pig is what I was testing with [16:14:47] it doesn't use GeoIpLookup [16:14:47] oohhhhhhhhhh [16:15:01] i thought you were looking at the continent stuff [16:15:11] i was trying to use as few new pieces as possible [16:15:25] and we knew that blog.pig worked for you in the past, and I was gettin gthe error with it [16:15:26] hold onn [16:15:33] but I betcha you need to fix the other UDFs that load geoip files too [16:16:01] yeah, continent.pig looks like it is working for me [16:16:07] since it uses your fixed GeoIpLookup [16:16:22] jasaa, works, cool [16:16:29] blog.pig also works forme [16:18:34] hmmmm [16:18:36] ottomata, so all is good? [16:18:38] but this would need fixed too, no? [16:18:38] https://github.com/wmf-analytics/kraken/blob/master/src/main/java/org/wikimedia/analytics/kraken/pig/GetCountryCode.java#L52 [16:18:52] continent.pig works for me, and that's all i'm using right now, so all is good for me [16:18:59] but I think you should make the same fixes in your other UDFs [16:19:09] aight [16:19:11] or mabye even abstract out the Geo .dat using UDfs [16:19:16] so you don't ahve duplicated functions [16:19:40] class PigGeocoder extends EvanFunc ... [16:19:47] class GetCountryCode extends PigGeocoder [16:19:49] or something [16:20:00] class GeoIpLookup extends PigGeocoder [16:20:00] yes that's even better, thanks for the tip [16:20:16] i will open an issue [16:21:11] drdee, as soon as I get this cleaned up a bit, can you help me write an oozie job for mobile continent stuff? [16:21:24] i need to know how to do it, and we want to get something up so that milimetric can make limn automatically graph the results [16:21:27] YES! [16:21:48] but i am also having an oozie issue, with the coordinator part [16:21:54] i'm running continent pig on 2013 mobile logs now, just as a test [16:21:54] so we can work on that together [16:21:59] if that works then i'll clean up a bit then we can do that [16:22:01] let me know when you ned me [16:22:07] ok, prob in 20 or 30 mins [16:22:13] perfect [16:22:17] * drdee is getting coffee [16:22:39] milimetric, just so I can start thinking about it, what format do you need these files to be in? [16:23:56] drdee, should we put the new kraken.jar (and maybe geoip-1.2.5.jar) at /libs in hdfs? [16:24:51] ottomata, dschoon was working on the map file format and I think he tweaked it a bit [16:24:55] yes we should have one canonical place to put that stuff [16:25:15] but for the timeseries format, it's just basic csv, I'll paste an example [16:25:32] /libs already has some of those files [16:25:39] i'd rather it be /wmf/lib maybe [16:25:47] ok [16:25:48] not sure if I created /libs or if someone else did [16:25:51] or if it is in use right now [16:25:56] http://dev-reportcard.wmflabs.org/data/datafiles/rc/rc_comscore_region_uv.csv [16:25:58] how about, we make it /wmf/lib [16:26:00] i think you did [16:26:07] and I copy the new files into /wmf/lib, and just start using that [16:26:08] oozie has /user/oozie/share/libs [16:26:10] i'll leave /libs for now [16:27:01] /user/oozie/share/lib [16:27:02] not plural [16:27:04] but yeah [16:28:09] milimetric, if we have time data in the timestamp as well [16:28:11] is that ok? [16:28:13] what format shoudl that be? [16:28:23] dschoon wanted mobile continents to be per hour [16:28:39] I think it would still parse ok [16:28:48] 2008/07/01 HH:MM:SS [16:28:48] ? [16:28:48] if not I'd just fix the parsing because we need hours yea [16:28:52] yeah, that works [16:29:11] ok cool [16:29:31] hm, drdee, I'm going to add continent name here too :p [16:29:34] as an 8th field [16:29:48] country name comes back, why not continent name too [16:45:01] out to get some stuff, bb in 30m [17:18:54] back [17:19:08] welcome! [17:19:13] :D [17:32:09] ottomata, this is very cool: https://github.com/twitter/ambrose [17:32:25] can we install that :D :D [17:32:57] i think I did onceā€¦. [17:33:04] but yeah, could puppetize [17:33:07] adding it to my todo [17:33:10] k [17:40:11] http://www.slideshare.net/hortonworks/new-features-in-pig-011 [17:40:17] cool stuff is coming! [17:41:57] oo looks nice [17:42:04] is cube basically a shortcut for group by? [17:42:09] group by and count? [17:42:30] it looks like that, i have to study it a bit more in detail [17:42:58] i think it creates multidimensional group by and count [17:52:22] morning [17:52:26] morning [17:53:01] did we get all of pgehres|away's questions answered last night? [17:53:32] about the log cutover? [17:53:51] in the future, it'd probably be wise to defer such changes to the morning the next day, so everyone has a chance to react [17:57:40] dschoon, define morning :) [17:57:54] not 8pm EST [17:58:26] actually it's real close to the UTC midnight [17:58:48] which is helpful for erik [17:58:49] z [17:58:58] well, true [17:59:02] but we have a lot of consumers [17:59:11] it was kinda disconcerting to see FR all frantic [17:59:27] why were they frantic, i thought they said we could do it after the fundraiser? [17:59:38] yes, and we communicated this many times [18:00:31] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90?authuser=1 [18:25:11] drdee, ops puppet things are slow atm, what's up with your oozie thing? [18:26:14] hmm, wait weird, ok [18:26:22] the snmptrap is getting to nagios fine [18:26:32] when I run it 'Last Update' changes [18:26:33] but [18:26:37] it isn't changing the status [18:27:38] hm [18:28:18] oops, wrong chat [18:28:20] meant that for ops [19:19:57] hmmmm, drdee, or louisdang, either of you got a sec to help me with a pig script? [19:20:21] ottomata, sure [19:20:31] i'm basically trying to do this [19:20:32] http://stackoverflow.com/questions/11578815/pivoting-in-pig [19:22:12] ottomata, does the given answer work? [19:23:05] something isn't working right [19:23:15] https://gist.github.com/4446225 [19:25:41] i think the problem is that it is filtering on the original count input, rather than the current group [19:26:03] in that FOREACH, is there a way to refer to the current bag? [19:26:12] (i'm not even sure if I am using the correct terms) [19:28:14] group? [19:28:46] let me check [19:31:55] I'm not familiar with using FILTER statements in nested foreach [19:32:45] is it related to: https://issues.apache.org/jira/browse/PIG-1798? [19:33:26] hm, not sure but I don't think so [19:33:45] i think i'm just doing somethign wrong... [19:36:59] oh [19:37:22] what's the count column called in the original load statement ottomata? [19:37:55] I think it col1.value should be col1.total for example [19:38:19] (sorry, in meeting) [19:38:32] k [19:58:32] I'm pumped [19:58:44] just fixed my vimrc , damn molokai and xterm-256color [19:59:07] solarized is awesome for gvim .. but not very sure about the console.. [19:59:25] maybe an if statement in the .vimrc makes sense to select the right colorscheme depending on gvim/vim [20:15:53] louisdang [20:16:03] its called value, but it can be named whatever [20:16:07] COUNT = FOREACH GROUPED GENERATE [20:16:07] FLATTEN(group) AS (hour, continent_name), [20:16:07] COUNT_STAR($1) as value PARALLEL 1; [20:16:10] ottomata, ok [20:16:46] yeah, i'm changed the main text color to something a little brighter/darker for the teminal [20:16:54] not sure If i like it that much [20:26:54] drdee, can help, my head is in this pig thing at the moment though [20:26:56] how's oozie? [20:27:10] i am still stuck with the coordinator [20:27:19] maybe you can try it with a different user account [20:27:36] or maybe it's permission thing in Hue, i have no clue to be honest [20:33:50] louisdang: [20:33:51] https://gist.github.com/4446964 [20:33:58] what is different between these two examples? [20:34:03] drdee you can look too :) [20:34:08] looking [20:34:46] brb [20:37:06] mmmmmm and i assume that the stack overflow example works? [20:38:12] yes, [20:38:16] that gist shows the result [20:38:42] that's me running it [20:39:35] i'll add dump count and dump input in that gist [20:39:36] to make it more clear [20:39:49] got it [20:39:58] is Value a Pig keyword? [20:40:11] because in the stack overflow example it is capitalized and in your code it is not [20:45:26] erg, internet was so good here most of the day [20:46:15] drdee, no [20:46:17] it is a named field [20:47:08] ok [20:49:21] ok updated gist [20:49:21] https://gist.github.com/4446964 [20:49:27] with more of the scripts [20:50:08] you can see that grp and GROUPED_COUNT are of the same data hierarchy [20:50:28] yup [20:50:43] COUNT is also a pig function so maybe not good idea to call your variable like that [20:50:53] a key (Id in grp, hour in GROUPED_COUNT) and then a bag [20:50:53] hmmmm [20:50:55] oh hmmm [20:50:57] maybe yeah [20:50:57] ok [20:53:53] drdee, you might be right about that, it still isn't workign but it is happier atm [20:55:09] new error? [20:55:20] no, just empty results [20:55:25] ok :) [21:05:02] ungghh this should be workingnnnnnggggg [21:12:03] lemme see if cafe internet is working again [21:13:14] ja better [21:13:44] ergh, dunno why this isn't working, drdee, i'm going to walk outside for a second, then let's look at oozie [21:13:54] sounds good [21:24:02] oook [21:24:05] coordinator joooob [21:24:41] what's that mean? :p [21:25:03] drdee [21:25:12] yo [21:25:29] go to http://hue.analytics.wikimedia.org/oozie/ [21:25:35] and click on Coordinators [21:25:59] the coordinator is the actual scheduler, the workflow describes what (which script etc.) the coordinator describes when [21:26:12] hm ok [21:26:40] so what' up? [21:26:48] it doesn't run? [21:26:52] when it is supposed to? [21:27:18] right so create a new coordinator [21:27:32] k [21:27:41] as workflow use Pig job blog.wikimedia.org [21:27:51] k [21:28:00] then go down to 'data' [21:28:06] and click on 'Datasets' [21:28:07] frequency? [21:28:12] whatever [21:28:18] this is just to debug [21:28:22] do hourly [21:29:16] create an input dataset, that reads data from /wmf/raw/webrequest/webrequest_blog/ [21:29:24] and an output dataset that stores it somewhere [21:29:54] then back in the coordinator page [21:30:26] add the input dataset and output dataset that you just created [21:30:44] there will be 2 pulldown menu's [21:30:51] one is empty and one has the dataset names [21:30:59] select the dataset and do save [21:31:03] you will get an error [21:33:09] name [21:33:12] this field is required? [21:33:14] yes [21:33:18] hm [21:33:26] but it's not being populaed [21:33:36] so that's why i said i think it's a bug [21:33:50] or a config error but that just seems unlikely [21:34:39] hmm [21:34:43] The inputs and outputs of the workflow must be mapped to some data. Click Add and select a dataset from the Dataset drop-down menu and map it to one variable of your workflow. [21:34:56] does the workflow have to define input parameter variables? [21:35:18] yes [21:38:37] i mean, maybe the workflow itself has to define variables that get popluated in that dropdown list [21:38:43] but we aren't defining any? [21:40:15] drdee, in this case, why does it need inputs and outputs [21:40:22] isn't the input hardcoded in your blog.pig script? [21:40:51] and you are defining output in the pig action [21:40:53] output=/user/diederik/blog/temp_results [21:41:40] no the input is not hardcoded in blog.pig [21:41:53] you have to specify an input and output param on the CLI [21:42:27] but you could be right that I misunderstanding the 'name' thing although why wouldn't they call it 'param'? :) [21:43:36] dunno, [21:43:45] if you are using the blog.pig script in your homedir, it is hardcoded [21:44:02] LOG_FIELDS = LOAD '/wmf/raw/webrequest/webrequest-blog/*' USING PigStorage(' ') AS ( [21:44:11] also, you might wnat to import [21:44:18] /wmf/raw/webrequest/webrequest-blog/*/part* [21:44:30] there are some empty _SUCCESS files in the import directories that aren't part of the data [21:45:56] aight, ok that's stupid let met fix that [21:48:31] ok copied new blog.pig to hdfs [21:48:53] but wait, didn't you ahve this working? and its not really blog.pig if you don't hardcode the input :p [21:50:28] right it's more generic [21:50:34] let's try again [21:50:40] hmmm, drdee [21:50:40] http://hue.analytics.wikimedia.org/oozie/list_oozie_coordinator/0000005-121220185229624-oozie-oozi-C [21:50:57] mmmmmm [21:50:58] indeed [21:51:37] the log says no action to complete [21:51:42] buuut, it has the datasets there, right? [21:51:47] explain this to me [21:51:53] what is a dataset to a coordinator? [21:51:59] how does that info get passed to your pig action [21:52:02] or whatever action? [21:53:10] i am even surprised that a coordinator is running [21:53:26] these are all good questions which we both are trying to figure out :D [21:53:46] i just tried to edit the coordinator using the updated workflow [21:53:53] but the name drop down menu is still empty [21:55:14] a dataset (AFAIK) defines an input/output param that is required by the workflow [21:55:46] ok, where are those defined in the workflow? [21:56:05] in the pig script using the $input and $output variable [21:56:34] how are those passed to the pig script? [21:56:43] check http://hue.analytics.wikimedia.org/oozie/edit_action/114 [21:56:57] look at params [21:56:58] ok right, then you don't need any datasets then, right? [21:57:05] those are manually entered in the action [21:57:09] you do, for the workflow it's only a single run [21:57:18] the coordinator is for recurring runs [21:57:20] so how does it get passed from coordinator to workflow to action [21:57:23] ? [21:57:25] dunno [21:57:33] that's what i am trying to figure out as well [21:57:51] i think there is a 2nd step after saving the coordinator (which we can't) [21:57:59] that allows you to map the dataset to a param [21:58:08] i saved the coordinator somehow [21:58:44] i don't see that anywhere though [21:58:49] are you sure? [21:58:59] of? if I saved it? [21:59:05] yes that you saved it [21:59:05] http://hue.analytics.wikimedia.org/oozie/edit_coordinator/38 [21:59:14] oh, I saved it with no inputs or outputs [21:59:23] dunno why the submitted coordinator shows them [21:59:29] mmmm [22:00:21] can you make your coordinator 'is_shared' ? (under advanced) [22:00:52] done [22:01:41] do you know the difference between an action param and an action argument? [22:01:52] semantics???? :D [22:01:54] drdee, dschoon, the deploy script is still in the works, I apologize. I'm being very slow and methodical because I'm getting rid of unneeded code, planning for the future, etc. [22:02:03] no worries :) [22:02:06] I'll stop work tomorrow no matter what [22:02:19] it's an important piece of infrastructure [22:02:21] ottomata, i have an idea [22:02:22] naw, they are different selections in the ui [22:02:37] when I use those terms, its a matter of context usually [22:02:47] if i'm defining a func and talking about the variables there, I call them parameters [22:02:53] but if I am calling a func, I call them arguments [22:03:13] buuut, i dunno what t hey mean in a oozie actionm [22:04:13] drdee, reading this page [22:04:18] looks like they use arguments for pig script input [22:04:18] http://archive.cloudera.com/cdh/3/oozie/WorkflowFunctionalSpec.html#a3.2.3_Pig_Action [22:04:31] -param [22:04:31] INPUT=${inputDir} [22:04:31] -param [22:04:31] OUTPUT=${outputDir}/pig-output3 [22:06:24] ok [22:10:19] check http://hue.analytics.wikimedia.org/oozie/edit_coordinator/6 [22:10:24] that is an example coordinator [22:10:46] it does not have a pulldown menu [22:10:52] just a text input field [22:12:28] hmm, ok but why are you trying to use datasets if you don't know how they are used? :p [22:12:43] you defined the input and output in your pig action [22:13:21] Creating a Coordinator [22:13:22] To create a coordinator: [22:13:23] Click the Create button at the top right of the Action Chooser. [22:13:25] In the Name field, type a name. [22:13:26] In the Workflow drop-down list, choose a workflow that the coordinator will schedule. [22:13:27] In the Frequency area, specify how often the workflow will be scheduled and how many times it will run. [22:13:28] Click Save. The coordinator editor opens. Proceed with Editing a Coordinator. [22:13:32] >>>> In the Name field, type a name. [22:13:39] so why do we get a pulldown menu? [22:14:08] ottomata: you defined the input and output in your pig action [22:14:26] well that was just the first step in making sure that the oozie workflow actually works [22:14:52] okay how about this: [22:15:09] hue detects that the pig script uses more than 1 variable [22:15:23] but has a bug in populating the pulldown menu [22:15:56] the 'Creating a Coordinator' stuff came from https://ccp.cloudera.com/display/CDH4DOC/Oozie+Editor+and+Dashboard#OozieEditorandDashboard-Coordinators [22:16:41] i don't think hue is going to detect if pig uses parameters [22:16:48] i mean, ja it might be a hue bug [22:16:57] ignore my comment about the name stuff [22:16:59] but i think you are right to try to get it to work as is [22:17:07] that refers to the name of the coordinator [22:17:42] The inputs and outputs of the workflow must be mapped to some data. Click Add and select a dataset from the Dataset drop-down menu and map it to one variable of your workflow. [22:17:43] If no datasets exist, follow the procedure in Creating a Dataset. [22:17:44] Select a dataset from the Dataset drop-down menu. [22:17:45] Click Save. [22:18:03] so that actually says nothing about the name pulldown menu [22:18:36] >>>>> and map it to one variable of your workflow. [22:18:51] so the workflow needs to define the input and output variable [22:19:00] and that should show up in the name pulldown menu [22:19:05] (name is really a stupid name) [22:19:22] oh it worked! [22:19:50] i mean, i got the coordinator to run the pig script with hardcoding the params in the pig action [22:19:58] so, at least we got that far :) [22:20:41] ah ha! [22:20:43] got it [22:20:47] you have to parameterize your arguments [22:20:50] so by adding [22:20:56] input=${INPUT} [22:20:59] in the pig acation [22:21:07] GOT IT [22:21:08] the name drop down was populated [22:21:19] we both figured it out at the same time [22:21:20] D [22:21:21] :D [22:21:25] OUTPUT=${output} [22:21:46] in the pig script i use $input and $output so i guess it should be lowercase [22:22:09] AWESOME!!!!! [22:23:01] okay coordinator save [22:23:01] d [22:23:22] arrgghghh [22:23:34] ${HOUR} is not a parameter [22:24:06] http://hue.analytics.wikimedia.org/oozie/list_oozie_coordinator/0000140-121220185229624-oozie-oozi-C [22:25:09] THAAAAANKS OTTOMATA [22:26:06] well no [22:26:08] i mean [22:26:19] you should use [22:26:24] output=${WHATEVER_YOU_WANT} [22:26:35] output=${OUTPUT} is what I was using [22:26:47] but yeah, that hour thing looks annoying [22:26:53] might ahve to change the way hadoop kafka is importing [22:27:15] yeah the hour thing is definitely annoying [22:27:22] oh, well, actually [22:27:28] when you submit the job [22:27:35] milimetric: i pushed maps [22:27:35] it asks you for unknown params [22:27:42] only supports fill for now [22:27:49] still needs the hover box [22:27:54] i will debug this :D [22:27:57] but that's easy. it's like legend. [22:28:10] cool [22:28:25] you'll need to pull both develop and reportcard-data feature/d3 [22:28:47] good stuff [22:28:47] i'll merge now [22:28:53] test first :) [22:28:58] i have a meeting in a few, so i'll be back after that. [22:28:59] bbl [22:29:43] cool, can't wait to play with this. I won't until I'm done deploying [22:34:15] ok cool, glad we got that far drdee [22:42:35] ottomata, so i don't i can actually really run the coordinator [22:42:53] because we cannot parameterize the hour component in the filename of the imported files from kafka [22:43:34] right, that's a prob [22:43:52] what is ${MINUTE} then? [22:43:55] daily minute number? [22:49:00] no, i think just between 0 and 60 [22:49:38] really? naw can't be [22:49:45] how else could you choose a time? [22:49:53] not sure if i understand you [22:50:08] oh because you can't parametrize the hour [22:50:17] you have to do it through the $MINUTE? [22:50:27] could be the case,dunno [22:50:31] i will google a bit [22:55:25] k [23:14:02] ottomata: http://oozie.apache.org/docs/3.2.0-incubating/client/apidocs/src-html/org/apache/oozie/client/CoordinatorJob.html [23:14:07] oozie does know HOUR [23:15:03] ah cool [23:16:26] oh actually Hue also works with HOUR [23:16:34] the fact that it asks for a small input is a small known bug [23:16:35] http://grokbase.com/t/cloudera/hue-user/12bacst98y/hue-oozie-coodinator [23:16:44] and will be fixed soon [23:17:14] you just have to leave the HOUR field empty [23:17:59] I get this error right now: [23:17:59] Coord Action Input Check Error: org.apache.oozie.service.HadoopAccessorException: E0904: Scheme [null] not supported in uri [/wmf/raw/webrequest/webrequest-blog/2013-01-03_16.22] [23:30:03] ottomata: we ran into another oozie bug: https://issues.apache.org/jira/browse/OOZIE-1144 [23:30:11] which has been fixed in trunk [23:42:13] aye hm [23:42:14] ok [23:42:18] i'm outttiiieee [23:42:21] ttyl boys