[01:26:38] dschoon yes that's what i am saying blame ottomata , i also told him not to name that file like that [01:27:07] louisdang, we are still stuck with hue [01:27:31] we tried running the provided samples by hue and those didn't run either [01:27:51] maybe we should start with a new hue.ini file and start from scratch [13:31:08] morning! [13:44:51] morning milimetric, average_drifter [13:44:52] morning drdee [13:53:04] average_drifter time to build a debian package? [14:21:05] morning drdee [14:21:10] yoyo [14:21:10] squashing some commits into debianize now [14:21:21] uhm we might have a small problem [14:21:37] so basically we had a VERSION oneliner that would produce the next version [14:21:53] uhm, the problem is that it collides with the current one [14:21:53] yup [14:21:55] not sure why that is [14:22:33] are you guys having issues with gmail? [14:23:11] works for me [14:23:16] gmail I mean [14:24:03] k [14:26:09] quoting from david's email: [14:26:09] "To at least pair this with some good news: I'm comfortable promising a full-stack Limn release before the December meeting, with all the new d3 goodness, new edit UI, and new options. We should also have bar and area chart-types available. If we're lucky, a world-geo map type, but santa only brings presents to good kids" [14:27:34] milimetric ^^ [14:28:16] oh good [14:28:16] when was this? [14:28:54] I'm sorry if I was a space cadet and missed the email, am I cc-ed? [14:32:28] drdee ^^ [14:50:32] phew, morning all, late morning on my half day :p [14:50:32] guess i'll be here the rest of the day [14:51:46] yo ottomata, have you installed the geoip libs on the cisco's? [14:51:59] i am looking for the geoip db's on an01 [14:52:02] but can't find them [14:53:02] /usr/share/GeoIP [14:56:20] ty [14:56:23] and about Hue........ [14:57:11] maybe rollback the change you made regarding the tasktracker [14:58:54] hm, what change was that? [15:02:24] drdee: can I change the automatic version generating code to include the revision count ? [15:02:41] average_drifter, sure [15:02:41] drdee: should we subscribe to semver.org ? [15:02:46] yes [15:02:47] I mean to their versioning scheme\ [15:02:50] yes [15:03:04] ottomata, we set task tracker thing to no value or something like that [15:04:48] hey louisdang, i am having some issues with your page view script [15:05:04] drdee, what is it? [15:06:04] did we have a specific gist for the page view code? [15:06:58] no, just email [15:07:27] https://gist.github.com/747186677a8f7c4b92b5 [15:07:32] the error is line 28 [15:07:57] column 56> Syntax error, unexpected symbol at or near ',' [15:08:14] also, i mavenized the whole kraken source code [15:08:20] so you might want to resync with wmf-analytics/kraken [15:08:24] I see, this isn't my newest version [15:08:35] ohhh argg [15:08:36] yes I've done that [15:08:46] cool [15:08:54] can you add your version to the gist? [15:09:20] i also put your code in a new package: org.wikimedia.analytics.kraken.pig [15:10:43] ok [15:10:43] drdee, https://gist.github.com/1ae782d76288ed8e3c4a [15:11:27] it looks like Pig doesn't like reusing variable names when you group by [15:11:53] so renaming the relation fixed that [15:13:22] ottomata, did you install the geoip stuff on all cisco's? [15:14:58] shoudl be [15:15:39] drdee, what does the time stamp look like again? Like: 2012-11-02T06:00Z? [15:16:38] louisdang: 2012-01-25T18:41:25.356 [15:17:09] ottomata…. weird pig complains it cannot find the geoip db [15:17:09] ok the timestamp converter might not work [15:17:43] drdee, are you using the akela GeoIP UDF? [15:17:49] I had to modify it to get it to work with HDFS [15:18:00] i am using your script :) [15:18:27] org.wikimedia.analytics.kraken.pig.GeoIpLookup('/usr/share/GeoIP/GeoIP.dat'); [15:18:33] ok [15:20:21] drdee, for the first argument in toMonth change that to: 'yyyy-MM-dd'T'HH:mm:ss.SSS' to make sure it works with your timestamp format [15:21:53] drdee, "yyyy-MM-dd\'T'HH:mm:ss.SSS [15:21:58] drdee, "yyyy-MM-dd\'T\'HH:mm:ss.SSS [15:36:20] ok [15:38:01] louisdang: java.lang.RuntimeException: could not instantiate 'org.wikimedia.analytics.kraken.pig.ConvertDateFormat' with arguments '[yyyy-MM-ddTHH:mm:ss.SSS, yyyy-MM]' [15:38:27] it looks like it's missing the ' ' between T [15:41:29] got it, those ' need to be escaped else it won't work either [15:41:42] ok [15:41:48] i still have the :java.io.FileNotFoundException: File does not exist: /usr/share/GeoIP/GeoIP.dat [15:41:59] which is very strange [15:42:05] is it in HDFS? [15:42:07] because the file does exist,could it be a permission issue [15:42:15] no [15:42:17] it's local fs [15:42:23] should it be on hdfs? [15:42:37] if you're using mapreduce mode yeah [15:43:03] or maybe putting file:// in front of it will use the local file [15:43:20] since in mapreduce mode the default context is hdfs [15:43:53] java.lang.IllegalArgumentException: Resource name must be relative [15:43:59] when trying file:// [15:44:32] yeah I don't know exactly how to use a local file in mapreduce mode [15:44:57] can you put it in hdfs and change the argument? [15:50:46] the original akela script said to put it in hdfs too [16:00:57] drdee, it looks like you have to modify the UDF to use the distributed cache to use the file locally: http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html and http://stackoverflow.com/questions/5106679/how-do-i-read-static-files-in-a-pig-udf [16:07:05] i remember having that problem with the akela stuff when I was playing with it [16:07:21] I had a really hard time getting it to load the GeoIP stuff properly [16:07:52] and was getting really confused between local file load and distributed cache load [16:07:52] yeah it's not easy :) [16:07:52] but I tried tons of variations [16:07:52] i think that's when I stopped working on it :p [16:07:59] yeah I did too, modifying the UDF seemed to work for me [16:08:12] as long as I the dat file in HDFS [16:08:57] but I think it could be inefficient in a real system if you have a bunch of map tasks accessing the same file [16:09:46] i am not getting this [16:09:46] so you can modify the UDF to use the distributed cache as it says here: http://ofps.oreilly.com/titles/9781449302641/writing_udfs.htm [16:10:10] if i enter an absolute path it says Resource name must be relative [16:12:01] can you load it into HDFS? [16:12:07] the file is both on hdfs and local [16:12:19] did you change the argument to the relative path on HDFS? [16:12:34] say 'GeoIP.dat' [16:13:54] or './GeoIP.dat' [16:13:54] File does not exist: GeoIP.dat [16:13:59] but how would it know where to find it? [16:14:13] I just put it in my user folder [16:14:17] user/louis/ [16:14:31] and used 'GeoIP.dat' and that seemed to work [16:15:59] okay it's running but this is ridiculous [16:16:45] it assumes a relative path from your home folder on hdf [16:16:46] s [16:16:54] okay it's not running anymore [16:16:55] ob job_1351787518765_0040 has failed! Stop running all dependent jobs [16:17:10] ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1351787518765_0040_m_000016_2 Info:Container killed by the ApplicationMaster. [16:18:02] yeah I got that too, put languages.txt in HDFS and change the argument of ParseWikiUrl to 'languages.txt' [16:18:30] thx [16:18:47] and languages should be where? [16:19:18] one sec didn't work for me [16:19:18] I put it in my user folder though like GeoIP.dat [16:31:05] drdee, there's a fix for this but it involves modifying ParseWikiUrl can I send you the new code? [16:31:21] just send a pull request ;) [16:33:39] ok [16:34:22] I get an error using maven: error: generics are not supported in -source 1.3 [16:37:57] when is that error happening? [16:38:16] compiling [16:40:47] are you compiling with java 7 again? [16:41:54] louisdang: see v [16:42:02] http://stackoverflow.com/questions/7597950/maven-error-generics-are-not-supported-in-source-1-3-i-am-using-1-6 [16:46:08] louisdang, does that solve the problem? [16:46:24] no [16:48:43] :) [16:49:13] did you update the pom.xml? [16:49:44] would just editing and saving it work? [16:51:25] the pom you mean? [16:51:34] yes that would work [16:52:02] hold on i can fix it [16:52:05] then you have to sync, okay? [16:52:09] ok [16:55:03] louisdang; done [16:55:09] fetch the new pom.xml and try again [17:00:35] haaaangout! [17:00:36] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:02:48] working [17:07:58] drdee: can you please explain this part from a version of the debianize.sh https://gist.github.com/ceb8145f99dec64356b0 [17:08:21] sure give me 5 minutes [17:14:57] I am late for no good reason. [17:14:57] Thanks Muni. [17:15:00] Are we still a-scrummin? [17:15:47] just finished [17:15:55] average_drifter i just replied to your gist question [17:16:12] ah well. [17:16:23] yesterday i did amazing things that i guess y'all will never know about [17:16:41] but dschooon milimetric let's talk limn in https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:16:44] one sec [17:16:47] now i am getting coffee [17:16:49] k [17:18:00] drdee: oh so you mean if the repo dir has a different name than the actual repo name [17:18:10] drdee: so you want the repo name [17:18:18] drdee: right ? [17:18:31] assuming that the repo name matches the package name, yes [17:22:59] drdee, just ran puppet on an01 [17:23:08] buncha hive changes, eh? [17:23:24] i hope you puppetized them :) [17:23:56] nope [17:24:01] but I got them from the changes diff [17:24:06] so I can [17:24:14] k [17:28:48] yeah, drdee, i will puppetize these now, but go ahead and try your query [17:28:49] in hue [17:28:50] see if it works [17:29:04] YES LET"S DO IT [17:29:04] oh, q [17:29:12] hive.intermediate.compression.codec [17:29:19] hokay [17:29:19] back [17:29:19] org.apache.hadoop.io.compress.SnappyCodec [17:29:20] do I need to puppetize snappy now? [17:29:24] hiiiii dschoon! [17:29:35] hola [17:29:43] drdee, milimetric did you want to chat? [17:29:51] i'm in that hangout [17:29:51] ottomata, no [17:29:51] k [17:29:57] snappy is already built-in hadoop [17:29:59] ok [17:30:23] it was just the snappy cli stuff that you were manually installing, right? [17:30:30] yup [17:33:27] OTTOMATA IT IS WORKING!!!!! [17:33:38] yeehaawwwww [17:33:44] thanks louisdang for his confs [17:33:53] I used those to slowly check a few pieces til I found that [17:34:25] cool [17:34:30] what was wrong? [17:39:07] hue gets really confused whne mapred.job.tracker [17:39:13] even though we aren't running a jobtracker (since we are using yarn) [17:39:35] drdee and I thought we ahd tried this before, but we used a value of 'ignoreme' instead of a real url like you have [17:39:35] i don't exactly know why [17:39:47] but putting the real url there (even though there is no jobtracker) works [17:40:04] I see [17:40:20] drdee: https://github.com/wmf-analytics/kraken/pull/6 [17:42:00] drdee, do I need to create the hive_metrics database? [17:42:16] yes please do [17:42:30] did you create it manually? [17:43:04] probably [17:43:05] hm, actually, it doesn't exist [17:43:06] hm [18:08:56] drdee, can I test hive metrics somehow? [18:09:10] just run a hive query [18:09:24] it should store metrics in mysql [18:14:24] louisdang: got a new version of the ParseWikiUrl function? [18:17:27] drdee, yes sent pull request [18:18:14] ok [18:20:34] dschoon, milimetric, got kicked out again but i think we were done [18:20:49] ok [18:21:56] drdee, do I have to create the tables too? [18:22:25] i thought that the jdbc string had a option like createTables=true [18:22:33] i've seen create database, hmm [18:23:30] have you actually seen metrics inserted yet? or are you just guessing? :p [18:23:46] did you test the hive chnages that you made? [18:26:01] well we were still trying to get hue to work :) [18:26:06] haha, ok [18:26:22] louisdang: you can give hue a try again, it seems that ottomata has solved the problem [18:26:34] were these changes that you made in attempt to make hue work? [18:26:37] should I keep these [18:26:38] ? [18:26:58] louisdang: but the pig script is still giving an error: Unable to recreate exception from backed error: AttemptID:attempt_1351787518765_0048_m_000035_1 Info:Container killed by the ApplicationMaster. [18:27:05] ottomata, what changes? [18:27:21] hm [18:27:21] https://gist.github.com/4002920 [18:27:26] drdee, is there anything more in the log? [18:28:04] ottomata, no those were just hive optimizations and customization options [18:28:28] shoudl we keep them? [18:28:47] yes please :) [18:29:00] i spend the whole weekend reading on them [18:29:16] ok cool [18:29:53] i'm fine with most of them, not so sure about the stats ones…just cause I haven't gotten them to actually do anything yet [18:29:53] and i'm not having much google luck [18:30:52] dschoon: pushed [18:31:41] cool. will check it out [18:36:07] louisdang: ERROR 2078: Caught error from UDF: org.wikimedia.analytics.kraken.pig.ParseWikiUrl, Out of bounds access [String index out of range: 3] [18:36:26] yo doooods, i need to get some food, i'll be back on in like an hour [18:36:38] happy snacking! [18:37:08] me snacking too [18:38:54] drdee, I see the problem fixing it [18:40:12] thx [18:43:23] ottomata: any word on progress on the dells? [18:46:45] milimetric: you need to check in some files [18:46:45] 404 on jquery.custom [18:47:18] my bad - pushed [18:47:26] forgot to git add . [18:48:58] s'all good. [18:49:33] okay, let's see what we can get done. [18:50:37] drdee, try this: https://github.com/wmf-analytics/kraken/pull/7 [19:53:19] baaaaack, things are far in a beach town :/ [19:57:45] dschoon, you around? [20:00:23] meeting [20:00:23] brb [20:00:55] ok [20:03:39] louisdang: trying again [20:03:48] alright [20:04:20] seems to look good! [20:04:30] 48 maps completed [20:04:36] nice [20:04:46] yes very nice and all thanks to you! [20:04:49] have you given hue a spin? [20:04:53] it should be working now [20:05:02] yes the hive query works [20:05:51] do you think you can fix the pig shell on hue? [20:07:31] that should work [20:07:36] what is the error you get? [20:08:03] 2012-11-02 20:07:49,863 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-cdh4.1.1 (rexported) compiled Oct 16 2012, 12:27:24 [20:08:03] 2012-11-02 20:07:49,863 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null [20:08:03] 2012-11-02 20:07:49,917 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. /bin/false/.pig_history (Not a directory) [20:08:03] Details at logfile: /dev/null [20:08:25] grumble [20:08:59] ottomata, can you help louisdang troubleshooting this pig shell error in hue? [20:11:26] ok [20:11:51] oh interesting [20:11:52] hm, ok [20:18:25] louisdang, what are you running? [20:19:07] right now? [20:19:21] a hive script through beeswax [20:19:36] you are having trouble with pig shell, right? [20:19:49] i want to reproduce [20:19:58] ok [20:20:39] i'm using pig through http://analytics1001.wikimedia.org:8888/shell/create?keyName=pig [20:21:15] right [20:21:29] that happens to you when you just load the shell? [20:22:43] yes [20:23:30] hmm [20:23:42] that's weird [20:23:42] ohhhhhh [20:23:42] interesting [20:23:42] i know [20:24:40] try now [20:27:22] ottomata, I get Error creating temp dir in hadoop.tmp.dir /tmp/hadoop-louisdang due to Permission denied now [20:28:03] hmm [20:28:54] and now? [20:29:18] works now [20:29:27] yay cool [20:30:11] thanks ottomata [20:31:37] average_drifter: ping [20:32:39] brb [20:36:35] drdee: pong [20:36:47] drdee: uhm, I'm struggling with the debianization [20:36:54] what's the problem? [20:37:37] merged debianize.sh in udp-filters with the one in debianize/ [20:39:01] k [20:39:07] i spent a bit of time making the debian/ directory work, and I wasn't using debianize.sh to do that [20:39:07] i don't htink [20:39:10] but maybe this was too long ago [20:40:49] yeah i think so, average_drifter and i worked on this a couple of weeks ago [20:41:23] drdee, i've got all those changes to hive-site.xml puppetized except for the stats ones [20:41:30] ok [20:41:31] well, i mean [20:41:45] they are useable in puppet, but off by default [20:41:48] if we figure out what that is and how to make it work, we can turn it o n [20:42:00] i will try to get the metrics stuff to work in the coming days [20:42:37] ok [20:49:06] drdee, did you fix feedback.pig? [20:49:29] i first want to run the page views stuff [20:49:36] but you can try it yourself now :) [20:49:48] just upload the files using hue to your home folder and give it a spin [20:50:45] ok [20:50:57] drdee, I need permission to access the file browser [20:51:08] 1 sec [20:51:50] try agin [20:53:07] drdee, works [20:56:45] louisdang: geocoding does not seem to work, i only see either '--' as output or a couple of spaces [20:58:55] ok let me try it on hue [20:59:11] since the output is fine using example.log on my computer [20:59:27] are the month, languages, isMobile fine? [20:59:44] yes, those look good [21:00:02] drdee, it could be that we're using different dat files. I'm using GeoIPCity.dat from maxmind's website [21:00:03] where did you get yours? [21:00:50] we should also sort the output: project, month, site, country [21:01:07] project is the language? [21:01:11] yes [21:01:26] we have subscription, i used the general GeoIP.dat [21:01:43] ok [21:01:45] i assumed that they had the same interface, [21:02:02] yeah I think they would [21:02:03] can it work with both types of files? [21:05:41] I don't know. Should we try the newer GeoIP API? [21:07:47] what do you mean? [21:11:13] drdee, right now the count.pig script is registering: REGISTER 'geoip-1.2.5.jar' [21:11:29] but there's a geoip-1.2.8 on maxmind's website [21:12:01] i don't think it really matters [21:25:00] dschoon, still meeting? [21:28:41] drdee, can you try running getLocation() on an address using GeoIP API and see what you get? [21:31:58] yes, ottomata [21:34:17] hi, ok, i'm responding to your email [21:34:18] about varnish log format [21:38:48] drdee: as jeff_green went home for the day, I was hoping I could get one of y'all to set up a 1:1 gather on meta for the two strings "RecordImpression?" and "BannerRandom?" [21:39:24] it'll be a firehose... but I only need it to run for, say, 5 minutes [21:41:19] okay, i need to eat a food [21:41:26] (meeting over) i will be back soon. [21:54:58] back [21:55:16] ottomata left? [21:55:33] seems so [22:02:40] drdee, I can't tell what's wrong with the geocoding right now [22:03:40] ok, i'll look into it as well [22:04:26] to make sure that the problem is with the API or not you can try running some of the addresses through getLocation() [22:18:02] drdee, I ran the hive query you originally asked for btw [22:25:28] cool [22:56:24] all righty folks enjoy your weekend! [22:56:32] i am outta here [22:57:08] see ya man [22:57:09] you too