[13:02:57] morning average_drifter [13:05:49] morning drdee [13:06:34] yoyo [13:11:00] I was at a hackathon this weekend. [13:11:08] I teamed up with some guys to make a Perl app [13:11:24] were? in bukarest? [13:11:27] yeah [13:11:43] cool! who organized the hackathon? [13:11:47] Teamnet [13:12:27] The Perl web app was supposed to be something that you put a newsletter in html and hit submit and then the newsletter gets sent to some predefined Y!/Gmail/AOL/Live e-mails and then it would start through Selenium a browser on a Windows machine and would check the inboxes of those e-mails [13:12:50] and then it would take screenshots of the newsletter as seen through the eyes of a potential subscriber [13:13:13] so a guy wrote part of the thing which was doing the selenium stuff [13:13:36] I collaborated with them and gave some indications, and wrote a mock in Mojolicious that would be the web front-end [13:14:01] nice! did you finish it? [13:14:17] and when I tried to start his selenium Perl script with `perl script.pl &` Mojolicious's morbo would say "Can't access port 3000 , already in use" , and then time ran out [13:14:23] we didn't finish, we didn't have a working app [13:14:24] it was 24hj [13:14:27] 24hours [13:14:54] 4 men and 24h .. well.. maybe next time [13:15:33] There were 200 contenstants of which ~6 girls [13:16:03] But what our team wanted to do was basically litmus.com [13:16:49] i've got two nice tasks scheduled for this week [13:16:55] oh cool [13:17:04] tell me more please [13:17:35] 1) run / create a separate instance of wikistats on stat1 using the edits sampled files as input [13:18:21] it should run as ordinary wikistats but save the output somewhere else, then ottomata can help get wikistats.wikimedia.org/editors/ online [13:18:28] and we would have stats only for editors [13:19:08] i think you already wrote the functionality to run wikistats with a config specified on the command line [13:19:15] yes [13:19:23] using it for development locally [13:19:30] so this should be quite straightforward and maybe we can start working on that first today [13:19:39] yep [13:19:55] 2) deprecate the shell-out stuff for geocoding [13:20:11] we run the existing sampled files through udp-filter and append the country to each line [13:20:25] this would lead to a massive performance improvement [13:20:56] cool [13:20:56] if we do this then we need one more new feature for udp-filters: [13:21:16] some log lines have a value for the x-forwarded header [13:21:48] if that's the case then the ip for geocoding should be fetched from this field and not from the regular ip fiedl [13:22:00] and that ip address should be used for geo coding [13:22:03] does that make sense? [13:22:13] yes [13:22:28] in the x-forwarded for, there can be sometimes 1, 2 or even 3 ip addresses. [13:22:37] and which of them do we geocode ? [13:28:33] good morning :) [13:28:56] good morning from impending doom land! [13:32:08] ottomata, moooorning [13:32:17] are you leaving NY? [13:32:32] good morning milimetric? [13:32:36] no way man! [13:32:46] yeah, it's still morning-ish [13:32:46] okay! [13:32:55] psh, so far so good. bone dry in the basement [13:33:02] Irene filled us up about a foot last year [13:33:08] the panda pit invited me to a zone a evacuation 3 day long slumber party [13:33:09] all buckled up? enough food / water / blankets? [13:33:15] thinking of going…heheh [13:33:19] :D [13:33:30] i think i have no concept of the potential consequences [13:33:31] i got about 20 pounds of rice, we're good for like a year [13:33:44] ha, i got a latte from the cafe on my street [13:33:45] i think i will get like 1% of what you guys are going to get [13:33:48] and half a loaf of apple cake I baked last night [13:34:17] yeah, baking is key, we made a couple loaves of bread. Smell of yeast is comforting [13:34:39] what time is landfall? [13:34:44] i think tonight [13:36:03] http://www.nytimes.com/interactive/2012/10/26/us/hurricane-sandy-map.html?partner=rss&emc=rss [13:36:20] heh, was just about to paste that [13:36:27] d3! [13:36:31] it was supposed to come closer to toronto (of course at much much less strength [13:36:45] but it seems to go back east sooner [13:36:47] it's really funny how it's heading directly for Philly [13:37:13] ahhh its going to miss us! [13:37:26] i think milimetric is going to have front row seats [13:37:26] max winds have been upgraded quite a bit, that's a little troubling [13:37:40] it was only supposed to be 60-70 last night, now it's 80+ [13:38:09] did you close off your windows? [13:38:11] i hope i can make it to the standup so you guys can see the tree dance behind me :) [13:38:18] i mean with wood? [13:38:39] we don't have any shutters no, if it gets bad we'll just go into the basement [13:41:23] what? brooklyn's flooded already ottomata? https://twitter.com/greenpainting/status/262894932648402944/photo/1/large [13:41:38] haha, in red hook, amazing [13:44:18] man, i dunno if I can make it to the slumber party! [13:44:30] haha, the guys at the panda pit have canoes! I wonder if they'll use them [14:05:07] average_drifter: you wanna first start with running wikistats for just editors? [14:06:19] ottomata, you wanna help me debug some more hadoop permission issues? [14:06:26] (basically still same problem as last week) [14:12:25] drdee: yeah [14:12:45] drdee: but I have some pending un-closed reviews in the udp-filters so could I do those first ? [14:12:52] also [14:12:54] 15:22 < drdee> in the x-forwarded for, there can be sometimes 1, 2 or even 3 ip addresses. [14:12:57] 15:22 < average_drifter> and which of them do we geocode ? [14:13:01] yes let's finish those first [14:13:41] x-forwarded, we have to look up the exact documentation of the field, i think it's the last field that is the original ip that we need [14:13:48] but we need to confirm that [14:15:12] drdee, sorry, yeah certainly [14:15:23] i'm going to change locations real quick, then I am all yours, be back on in 10ish [14:15:40] okidoki [14:16:08] drdee, ottomata, if we wanted a test site for the reportcard how would we get the domain name setup? [14:35:27] hokay hey drdee what can we do? [14:37:41] right [14:38:00] milimetric; i think we already have a dev-reportcard.wmflabs.org running [14:38:02] on kripke [14:38:12] oh, cool [14:38:15] ottomta, so it's about not being able to run hive queries in hue [14:38:27] which lead to a filenotfound error [14:38:45] i guess we need to study the error log in more detail.... [14:39:13] i'll log in to hue as the hdfs user and see if that works [14:40:18] ok, how do I reproduce? [14:40:20] what do I do? [14:40:26] go to hue [14:40:32] go to beeswax [14:40:41] clone 'my saved query' [14:40:43] run it [14:41:12] clone you can do in 'saved queries' [14:41:55] i am also running a pig job which is almost done [14:42:01] so it might take a couple of minutes [14:44:09] is it running for you? [14:45:07] i get Caused by: java.io.FileNotFoundException: /tmp/hue/hive_2012-10-29_07-41-30_875_8893723402349132014/-mr-10001/45dde67b-8a73-4ab4-aff6-efd5416a9870 (No such file or directory) [14:46:39] this thread might be the same issue: [14:46:40] https://groups.google.com/a/cloudera.org/group/hue-user/tree/browse_frm/month/2012-06/208ca4e57906eae7?rnum=11&lnk=nl [14:46:52] (i just did a quick scan) [14:49:12] ah, here is some info [14:49:21] the path it is listing is local on namenode, not in hdfs [14:51:22] conf issue? [14:51:30] no, i don't htink so [14:51:35] not sure yet [14:51:40] permissions look fine i think [14:51:50] it does create that directory, so it had the ability to read/write [14:51:58] dunno what that file is supposed to be [14:52:58] me neither [14:55:34] hmmmmmmmmmmmm actually i'm not sure about that [14:55:58] there is /tmp/hue/hue_20121029074141_1177770c-06d0-4799-b3ab-0421fd35ade9.log on local fs [14:59:08] you have a different log file that you are debugging [14:59:12] yeah that is different per job [14:59:23] i mean *different* as in different semantics [14:59:53] mine seems to be a hive error [14:59:57] so maybe the hive user cannot write in /tmp/hue [15:00:08] hm [15:00:22] /tmp/hue/hive_2012-10-29_07-41-30_875_8893723402349132014/-mr-10001/45dde67b-8a73-4ab4-aff6-efd5416a9870 [15:00:30] right [15:00:41] in hdfs [15:01:31] but its weird though [15:02:21] what if we make the hue and hive system accounts part of the hadoop group [15:03:50] i think i know it [15:03:50] it's a bug [15:03:56] this path exists /tmp/hue/hive_2012-10-29_07-41-30_875_8893723402349132014/ [15:04:12] yeah, shoudln't matter, the perms are fine, I'm happy to chmod 777 them to try [15:04:20] yeah, i'm sure it does [15:04:20] then the missing part is -mr-10001 [15:04:21] beacuse teh jobconf.xml is created [15:04:25] don't chmod 777 [15:04:35] let me explain [15:04:35] so -mr-10001 is missing [15:04:45] -local-10002 -local-10003 are present on hdfs [15:05:06] ok [15:05:09] the reason is this: the job is so small that Hive is going back to local mode instead of distributed mode [15:05:28] that's why the folders are called local [15:05:35] oh hm [15:05:39] but somehow it still expects mr folders (distributed mode) [15:05:54] the way to check this is to load more data in revision and run the query again [15:06:04] if it works then it's a bug in hive [15:06:48] small jobs run much faster in local mode so it's a way to optimize performance [15:06:54] ther emight be config values [15:06:57] reading this [15:06:58] https://issues.apache.org/jira/browse/HIVE-1408 [15:07:05] this is my theory :) counter theories ? :D [15:07:34] right [15:07:38] set hive.exec.mode.local.auto=false [15:07:40] and run query again [15:07:52] maybe you can even specify that in the query itself [15:07:54] let met rtry [15:08:05] you can do it in HIve cli [15:08:16] btw, it would be good to know if this fails in hive w/o hue [15:08:31] note that this feature is disabled by default. [15:08:36] i think it's enabled [15:09:44] nope [15:09:44] hive.exec.mode.local.auto=false [15:09:45] hive.exec.mode.local.auto=false [15:09:45] hive.exec.mode.local.auto.input.files.max=4 [15:09:45] hive.exec.mode.local.auto.inputbytes.max=134217728 [15:10:12] if you run the query it says: [15:10:12] Job running in-process (local Hadoop) [15:10:56] hmmm [15:11:00] interesting, maybe hue sets that? [15:11:12] try explicitly turning it off in your query [15:11:39] can you run multiple statements in hue? [15:12:54] in hive, yes, you can [15:13:02] but not in hue [15:13:02] yeah [15:13:10] so i just tried your query in hive cli [15:13:13] w/o setting that [15:13:14] it says [15:13:15] Job running in-process (local Hadoop) [15:13:23] then with setting auto=false [15:13:23] still: [15:13:23] Job running in-process (local Hadoop) [15:15:08] but, it works in hive, but not in hue [15:15:11] either way [15:15:15] so something is wrong with hue [15:16:51] mmmm [15:17:18] i am trying this in hive as the hue user [15:17:18] and it works [15:17:20] sudo -u hue hive [15:18:11] the tmp dirs that hive uses look way different than hue [15:18:28] /tmp/hadoop-hue/mapred/local/ ... [15:26:09] drdee, I know someone told me something about kripke at some point but how do I find out more info without bugging you guys? How do you guys know what wiki page to look up? Just save bookmarks? [15:26:38] sort of :D [15:28:29] ottomata, if we just copy the revision file a couple of times on hdfs and then try again... [15:28:59] go right ahead :) i dunno if it will make a diff, but worth a try [15:29:47] although there are 70 files in that flder [15:29:58] fider = folder [15:30:03] ok so where's kripke? [15:30:29] wait [15:30:37] hey, milimetric [15:30:42] kripke is in labs [15:30:42] on labs [15:30:57] kripke.pmtpa.wmflabs [15:30:59] ottomata, check http://analytics1001.eqiad.wmnet:8888/filebrowser/view/user/hive/warehouse/revision# [15:31:02] cool, thanks [15:31:11] so it sees only 1 file [15:31:11] but there are 70 [15:31:34] ok wait i just have to move them [15:31:35] hold on [15:32:09] done [15:32:15] let's run the queyr again [15:33:01] crap "Job running in-process (local Hadoop)" [15:33:08] is hue set to run in local mode or something? [15:37:33] i dunnooooo, hive says that from the cli too [15:41:10] this seems to be the exact same problem [15:41:11] https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/gHVq9C5H6RE [15:42:19] try adding this to hive-site.xml [15:42:19] [15:42:21] mapreduce.jobtracker.address [15:42:22] ignorethis [15:42:22] [15:42:28] and restart hive [15:42:43] this might be a YARN issue exactly [15:45:05] ottomata, are you adjusting hive-site.xml? [15:47:03] no, but will, i was readin gthat page [15:47:47] k [15:52:35] it seems to be running now! [15:52:46] so hard to read the log in hue with the screen refreshing all the time [15:53:43] super annoying [15:53:55] can't they just do a simple ajax call [15:54:12] but the fix worked? [15:54:40] i've got some config changes i would like to make to hive today [15:54:58] so if i do that then we have a whole bunch to puppetize this afternoon, sounds good? [15:56:37] yes! your log looks good [15:56:40] WHIIHHHHAAAAAA [15:57:16] ottomata ^^ [15:57:51] yeah that sounds good [15:58:09] dunno what the job is doing though [15:58:37] it's just running a count query [16:03:37] i mean, its taking while [16:08:38] i think there were still two jobs running that i started [16:08:44] they should probably be killed [16:09:32] i only see the one job running in application list [16:10:19] drdee, just msg'ed you [16:23:53] ottomata, can you create a table 'hive_metrics' in mysql for hive user? [16:25:28] sure [16:25:35] should that be puppetized on install? [16:25:39] i mean, you can do this too :) [16:26:20] is this part of hive_metastore? [16:27:20] ottomata, no i can't anymore because you took away my root account on mysql :D [16:30:23] you have sudo though :) [16:32:28] true :) [16:32:36] is your hive query still running? [16:42:53] drdee, it is just going [16:45:43] ottomata, there is a hadoop job in a limbo state but can't kill it [16:46:23] mine? [16:46:25] i will kill it [16:46:38] oh yeah, cli isn't really responding [16:46:38] hm [16:46:58] its one of the hue jobs we started this morning [16:47:16] oh [16:47:27] this one, right? [16:47:27] http://analytics1001.eqiad.wmnet:8088/cluster/app/application_1351260490464_0133 [16:47:37] this is the one I ran after I changed that hive-site setting [16:47:37] its still going [16:47:40] yes [16:47:54] i don't think it's running well [16:48:06] there is no tracker url [16:49:52] yeah [16:50:06] i think settin gthehehe [16:50:06] hehe [16:50:10] that's because we set this: [16:50:10] mapreduce.jobtracker.address [16:50:10] ignorethis [16:50:20] hmmmmmhmmmmmmmmmm [16:50:35] that's a very bad trade off [16:50:44] maybe hue was trying to run MRv1 jobs [16:50:49] still that job should not run for one hour [16:50:54] no, its not working [16:50:55] for sure [16:52:51] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [16:57:04] ottomata, i did update the hive-site.xml so feel free to puppetize it although we should test saving the metrics to mysql [16:57:24] ok, i like how you write down the changes I need to make in asana [16:57:40] ok, i killed the proc and restarted resource manager and hive-server2 [16:58:29] i still don't think hue is running right, maybe we need to go back to mrv1 [16:58:35] which would suck [16:58:44] i thikn hue is trying to submit mrv1 jobs [16:58:44] maybe we can change that [16:59:14] but i i read hue.ini (/etc/hue/hue.ini) [16:59:17] and there it just talks about YARN [16:59:29] would upgrading to CDH4.1 fix it? [16:59:49] ha, maybe, that is on my todo [17:00:00] is also a great test for puppet [17:00:04] oh standuptime, lemme get headphones [17:00:13] k [17:00:27] average_drifter: you wanna join https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:00:33] erosen, milimetric ^^ [17:27:24] oy. [17:27:52] when i sent that email i was thinking, "i will wildly overestimate, because Planning Fallacy. so i will say 10:30, and surely be out of this line by 10." [17:27:54] MEANWHILE, 10:30 ROLLS AROUND [17:28:24] anyway. here i am, milimetric drdee ottomata [17:28:31] should i be needed for anything :) [17:28:46] morning [17:29:08] morning [17:30:16] ottomata: what're you workin on? anything i can help with? [17:30:30] otherwise i am going to finish figuring out what's wrong with my avro script [17:30:55] i'm working on creating a puppet hdfs file provider [17:31:10] dschoon, i added some more comments to the avro ether pad doc [17:31:15] awesome [17:31:26] i'm going to toss that up on the wiki once i'm done [17:31:32] i wanna make sure the schemas work, hence the avro test [17:32:11] totally [17:35:11] ottomata, two quick questions: [17:35:17] 1) what is the status of the two cisco machines? [17:35:26] 2) how is ganglia coming along? [17:35:56] both of those are that I need ops help [17:36:11] people help me for a while, get stumped, and then move on to other thhings [17:36:11] until I poke them again [17:36:11] also, i had a brief conversation with kelly (from legal) in the hall on friday, just to confirm the privacy policy discussion was on hiatus until things settled down for them [17:36:34] she said that they hoped to pick that up in maybe a week or two, so we ought to get to work revising the draft. (drdee) [17:40:27] ok, although i see our role primarily as giving feedback, not revising the draft ourselves [17:40:41] right, yes. [17:40:51] but we were asked to help with the non-legal language [17:41:10] as the document dually functions to explain things to the community and to lay things out legally. [17:41:13] sure, we can wordsmith a couple of sentences [17:41:26] and also to make sure things make sense technically [17:41:27] obv [17:41:30] anyway [17:41:46] just a reminder so that doesn't fall off the map. i plan on trying to get to it at the end of the week [18:30:33] hey drdee, did you want to jump on the meeting with jmorgan? [18:30:47] yup [19:22:17] python unicode support, i hate you so much. [19:51:33] drdee: got log2avro.py to work. the cache servers seem to have a rather liberal idea of what their output format looks like. [19:52:02] you mean our cache servers? [19:52:02] i've been seeing '-' for just about every field at some point, from http_method to response_size to response_time [19:52:05] yes. [19:52:17] this is what i have been trying to fix for months now [19:52:25] it's like whack-a-mole [19:52:31] yes. [19:52:31] you fix one isue [19:52:36] the next one pops up [19:52:59] this is also *why* we need to move to tab separator before drinking from the firehose [19:53:02] else it will just bork [19:53:28] i have my script replacing invalid numeric values with -1. [19:53:45] is it on github? [19:53:51] actually, i found that if you split on ' ' and limit to 13 splits, everything lines up. [19:53:57] no, the first successful run is still going. [19:54:39] looks like it should be done pretty soon [19:54:41] maybe. [19:54:55] I'm going to go out for 1h , bbl [19:55:54] *grumble* [19:56:04] i should have run it with pypy. i'd probably be done by now. [19:58:27] i think it's going to take at least another half an hour :/ [19:58:28] brb [19:58:32] lunch time, i think. [21:11:06] dschoon you around? [21:14:29] yarhg arygh arygh arygh [21:14:31] my puppet stuff does not work! [21:14:36] i have tried 3 different ways [21:14:43] and the best way should work! [21:14:49] but i dunno how to make it woorrrrkrkkkk [21:14:57] and documentation is very sparse here [21:14:59] super yargh [21:15:17] in better news, leslie just helped me figure this out: [21:15:18] http://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [21:15:20] AH [21:15:20] no they are down [21:22:22] :) [21:22:25] i mean :( [21:22:59] yeah it stops working after a few minutes, don't know why [21:23:04] we thought we had it back up [21:23:04] dunno now [21:26:48] hihi [21:26:54] milimetric: back [21:27:10] so, avro [21:27:17] at least my experiment was a success. [21:27:56] k, lemme know when you're happy with your avro work [21:28:00] we can deploy test-reportcard [21:28:07] word. [21:29:32] the difference between the avro file and the plaintext is about 1.7% [21:29:35] which is fantastic. [21:30:15] the differences are: [21:30:50] - the numeric fields (timestamp, IP, response_size, response_time, response_status) are encoded as such [21:31:05] - the http_method field is encoded as an enum [21:31:30] - every record gains 4 bytes for the product code [21:31:48] - several empty fields are included (uid, ua_flags, carrier, metadata, tags) [21:32:45] so even despite all that overhead for future data, the numeric conversion saves a huge amount -- almost 10MB on a 350MB logfile [21:33:18] i'll check all this crap into the kraken repo after i clean it up [21:33:43] cool, please also have a look at some of the suggestion i made in the avro doc [21:33:55] checking out now, might be back later tonight [21:33:58] right. looking now. [21:38:27] my remarks were particularly about referer, user agent string and timestamp) [21:57:44] drdee: i replied to the timestamp thread a bunch of times over the weekend. [21:57:45] did you not get those messages? [21:58:55] (The same comments applied to storing IP address as an int or byte[4] [21:58:55] ) [22:11:09] goddamn pypy is fast. [22:11:56] running my avro script using python 2.7 took 6x longer