[01:15:29] drdee: when is the august report card coming out? http://reportcard.wmflabs.org/ [02:03:26] the august report is out, except it still says july [02:12:21] ok - https://wikimediafoundation.org/w/index.php?title=Template:Reports-en&diff=84365&oldid=83946 [02:12:50] (although strictly speaking http://reportcard.wmflabs.org/graphs/pageviews_mobile_target and http://reportcard.wmflabs.org/graphs/active_editors_target are still not quite up to date ;) [02:45:47] you have to poke howie for those numbers, we don't have them [12:38:39] morning average_drifter [12:45:36] hey drdee [12:46:41] yoyoyoyo [12:48:44] one more small feature we need: collector should have a command line option to specify where to store the output :( [12:49:32] alright [12:49:33] (applies primarily to where to store the berkeley db's) [12:50:30] is the git tutorial today? [12:50:39] yes [12:50:42] where? [12:50:56] #git-gerrit [12:51:03] now? [12:51:28] http://bit.ly/US19bQ [12:51:32] this should resolve to your local hour [12:51:54] ty [12:52:09] you're welcome :) [12:52:35] allright [12:52:39] what we could do [12:52:41] is to go to labs [12:53:01] fetch your recoded udp-filter stuff in a new branch [12:53:11] and start building the debian package to see if it works [12:53:18] meanwhile we can wait for ottomata to do his review [12:53:23] one approved, we mege [12:53:24]  [12:53:27] we merge [12:53:33] and then we also have the debian script ready [12:53:36] good plan? [12:56:50] yes [12:57:19] ok so I have this on my list [12:57:36] 1) merge changes for run.sh from /25408 we talked about yesterday [12:57:47] 2) add collector output directory switch for cmdline [12:58:06] 3) debianize on labs [12:58:26] 4) go through review of Diederik of wikistats and make a new git review [12:58:33] please feel free to re-prioritize [12:59:07] excellent! [12:59:12] and i will start with 3) [12:59:23] we can do that now, shared screen [12:59:23] ? [13:00:51] hold on, let me first make one phone call [13:00:57] ok [13:20:16] ready [13:52:17] morning! [13:52:22] drdee, it looks like the figured people might have found their problem [13:52:38] morning [13:52:47] yes 404's due to caching issues, right? [13:52:55] yeah [13:53:15] so. i was going to get up and run the pig stuff this morning (i was having a problem, was goign to figure it out) [13:53:16] but now... [13:53:17] hm [13:53:21] we should be able to confirm that quickly by counting the 404's :D [13:53:29] aye hm [13:53:45] ohh and for 2011 the fundraising campaign used both B11 and C11 [13:53:56] that's why the count was so low [13:54:18] well, also they weren't really running in it september [13:54:23] there are def way more B11s in nov [13:54:45] k [13:55:18] ottomata: I just looked on the code review [13:55:29] ja? [13:55:41] ottomata: yes, the match_interal_traffic is splitting stuff, putting '\0' stuff in the string to split it into parts [13:55:58] ottomata: so I don't want to affect what happens after that [13:56:04] ottomata: so that's why I'm creating a copy [13:56:26] aye cool [13:56:36] average_drifter: did you add sufficient checks to prevent buffer overflows [13:56:37] tis merged! [13:57:05] still building meld [13:57:06] drdee: I should do strncmp wherever needed instead of strcmp and also strncpy instead of strcpy [13:57:14] yes, good idea [13:58:31] I watched this explanation about buffer overflows recently https://vimeo.com/22550600 [13:58:48] it does explain how they can be crafted but not very much about preventing them [13:59:18] ty [13:59:20] it has voiceover so the guy goes through all the steps, it was interesting to see that [14:12:26] just wanted to say that in that video at 12:18 he shows the actual shellcode used [14:12:35] the vid is 18m so it's quite long [14:15:11] the shellcode there was probably crafted using nasm(or some other assembler) and hexdump to show the bytecode [14:17:18] very interesting [14:17:29] still installing meld [14:17:37] dear lord that has many dependencies [14:19:02] it does but it is worth it (if installation is succesful :) [14:24:15] average_drifter... [14:24:27] so i just fetched the patch set into a new branch [14:24:34] cool [14:24:47] mergetool [14:25:54] are yoy watching? [14:25:56] yes [14:26:11] show me :) [14:26:20] a new meld will be fired up for each file to be merged [14:27:02] soo uh [14:27:07] can you open a console please ? [14:29:48] that was the wrong direction [14:29:53] now we have the older version [14:29:54] anyways it's fine [14:30:03] wait [14:30:06] so that's just one file [14:30:18] but there were multiple files which had to be merged [14:31:13] yes [14:31:15] hold on [15:28:24] changing locs, be back in a bit [16:03:31] ottomata; is all the fundraising data loaded in kraken [16:03:33] ? [16:09:10] * milimetric is grabbing lunch [16:31:13] back [16:32:42] me oto [16:32:50] well [16:32:51] me too [16:32:52] and [16:32:53] me otto [16:43:41] and me toooo [16:50:31] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:23:02] drdee, ready? [17:23:11] yeah [17:23:12] I got https://gerrit.wikimedia.org/r/#/c/26474/ through https://gerrit.wikimedia.org/r/#/c/26480/ [17:23:45] yep [17:24:06] so you can accept all of them except one [17:24:13] (i'd go with one of the ones where i push a file) [17:26:10] ok [17:26:37] i will do a -1 on https://gerrit.wikimedia.org/r/#/c/26475/ [17:27:17] k, cool [17:27:35] ok all have been accepter except for 475 [17:27:45] so basically all that means for me is that this commit wasn't merged to origin/master yes? [17:28:05] and the other ones are not merged either [17:28:08] hang on, gotta restart because my sip client sux [17:28:12] oh really? [17:28:13] ok [17:28:23] because they are dependent on each other [17:28:23] brb [17:28:32] i gotcha [17:34:41] ooo, drdee: [17:34:42] https://github.com/linkedin/datafu [17:36:26] cool cool cool [17:38:21] drdee, i'm back but i'll try to listen to the call [17:39:02] this seems way too convoluted to me [17:39:07] i am out [17:39:16] and getting coffee :) [17:39:41] i'm gonna skip the gerrit thing after all [17:44:04] bah! [17:44:04] http://www.cloudera.com/blog/2012/10/cdh4-1-now-released/ [17:44:11] the day I installed the first version! [17:44:11] hehe [17:44:40] whoaaa [17:44:41] Quorum based storage – Quorum-based Storage for HDFS provides the ability for HDFS to store its own NameNode edit logs, allowing you to run a highly available NameNode without external storage or custom fencing. [17:44:43] nice! [17:45:43] ncie [17:46:12] yeah super nice! [17:46:23] this was the main reason we wanted DSE in the first place [17:46:30] now that CDH has it [17:46:32] coooool! [17:54:18] drdee - good call. I just wasted 30 minutes of my life connecting to whatever the heck sip is [17:54:26] epic fail [17:56:16] drdee when you get back ping me so I can ammend the commit properly and git review again [18:00:03] back [18:01:52] milimetric ^^ [18:04:24] cool, so git commit -a --amend [18:04:30] I'm trying to figure out how to pass it the specific commit [18:04:48] yes that's the crucial part [18:05:19] maybe git rebase ^ --interactive [18:07:21] http://git-scm.com/book/ch6-1.html [18:07:22] reading the docs on that now [18:10:09] drdee - I don't understand rebase, the docs are like 10 pages [18:10:21] reading (this may take a while) [18:17:00] GREAT walkthrough of git rebase further down the page: http://blog.jacius.info/2008/6/22/git-tip-fix-a-mistake-in-a-previous-commit/ [18:44:02] drdee, wanna help me figure out why my pig stuff isn't working? [18:44:10] most certainly [18:44:45] ok, so the comments here are helpful [18:44:46] https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/eval/geoip/GeoIpLookup.java [18:44:55] on how to use the GeoIPLookup thing [18:44:57] from akela [18:45:25] here's my error: [18:45:25] ERROR 1200: Pig script failed to parse: [18:45:26] Failed to generate logical plan. Nested exception: java.lang.RuntimeException: could not instantiate 'com.mozilla.pig.eval.geoip.GeoIpLookup' with arguments '[GeoIPCity.dat]' [18:46:45] yep, got it and what is your command line? [18:47:26] https://gist.github.com/3828956 [18:47:37] i've tried several different ways of passing the .dat file [18:47:38] in DSE [18:47:44] it worked when I specified the local filesystem path [18:48:01] /usr/share/GeoIP/GeoIPCity.dat [18:48:08] but I think things have changed in the newer version of pig i'm using [18:48:13] like the instructions there say [18:48:22] it expects it to be in my /user/otto/GeoIPCity.dat folder [18:48:24] and it is [18:50:02] if I use the local filesystem path [18:50:06] I get much farther in the script [18:50:09] all the way to the DUMP command [18:50:26] so the script will parse with /usr/share/GeoIP/GeoIPCity.dat [18:50:30] and the job will start [18:50:32] k [18:50:34] but it dies [18:50:40] :[ [18:50:45] 2012-10-03 18:49:23,588 [Thread-3] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:otto (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /usr/share/GeoIP/GeoIPCity.dat [18:50:45] 2012-10-03 18:49:23,588 [Thread-3] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - Job2819459054452218790.jar got an error while submitting [18:50:45] java.io.FileNotFoundException: File does not exist: /usr/share/GeoIP/GeoIPCity.dat [18:51:11] is it looking locally or on HDFS? [18:51:30] i don't know, on the DSE pig stuff [18:51:32] it was def local [18:51:34] that's how I got it to work [18:51:47] but, those docs say it looks in HDFS [18:51:53] try putting in HDFS? [18:51:59] yeah tried that too [18:52:35] if I use an HDFS or relative path [18:52:40] the script will not parse [18:52:47] if I use an absolute local filesystem path [18:52:50] it parses, but the job dies [18:53:22] strange.... [18:53:40] what version of pig had DSE and what version does cDH4 have? [18:53:56] we are currently running Apache Pig version 0.9.2-cdh4.0.1 (rexported) [18:53:58] and dse? ummm [18:54:04] i dunno [18:56:43] i can't really tell without having it installed somewhere... [18:58:14] :) [18:58:20] i am googling [18:58:22] but no luck soy ar [18:58:31] i am peeking into DSE .debs [18:58:34] peaking* [18:58:37] peeking peaking [18:58:38] peek [18:58:39] peak [18:58:40] peek [18:58:51] peek [18:58:53] . [18:59:28] ./usr/share/dse/pig/lib/pig-0.8.3.jar [18:59:52] and from the commit message in akela: [18:59:53] https://github.com/mozilla-metrics/akela/commit/432c02a153789c28409902863aa1b8dec5be065f [18:59:55] oops [19:00:00] https://github.com/mozilla-metrics/akela/commit/432c02a153789c28409902863aa1b8dec5be065f [19:00:03] ack! [19:00:09] Add the getCacheFiles method from Pig 0.9 [19:03:59] yup [19:04:05] that looks like the culprit [19:10:27] ottomata: http://www.jarvana.com/jarvana/view/org/dspace/dependencies/dspace-geoip/1.2.3/dspace-geoip-1.2.3-javadoc.jar!/com/maxmind/geoip/LookupService.html [19:11:02] if we would add this line: [19:11:03] String fileName = getClass().getResource("/GeoIP.dat").toExternalForm().substring(6); [19:11:25] to line 64 in src/main/java/com/mozilla/pig/eval/geoip/GeoIpLookup.java [19:11:28] would that solve it? [19:14:44] or [19:14:46] @param filename Basename of the GeoIP Database file. Should be located in your home dir in HDFS [19:14:55] did you put it in your home dir on HDFS? [19:14:58] yes [19:15:09] grumble [19:16:12] did you put GeoIPCity.dat in your home folder or GeoIP.dat? [19:16:16] City [19:16:19] that is weird though [19:16:24] try the other one [19:16:27] the docs don't actually seem to do what the cdoe looks like it does [19:16:32] :) [19:16:36] no that won't make a difference, I'm passing the argument in [19:16:42] see ok [19:16:47] lookupService = new LookupService(lookupFilename); [19:16:56] right? that comes from directly what I pass in [19:16:57] but [19:17:00] cacheFiles.add(lookupFilename + "#" + lookupFilename); [19:17:23] i find this weird [19:17:24] acheFiles.add(lookupFilename + "#" + lookupFilename); [19:17:26] that is supposed to tell pig that it should 'cache' the given filename from the local filesystem (i'm pretty sure') and name it with the value that comes after the # [19:17:37] but I don't thikn I need to cache this file [19:17:42] since it is already deployed on all the machines [19:17:51] or, hmmm [19:17:52] haah [19:17:53] i bet [19:17:57] if I make a local directory [19:18:09] /user/local/GeoIPCity.dat [19:18:10] it might work [19:18:15] that is dumb, but lemme try [19:19:23] grr nope [19:19:25] ould not instantiate 'com.mozilla.pig.eval.geoip.GeoIpLookup' with arguments '[GeoIPCity.dat]' [19:19:34] hmm, ok i'm going to try to edit the java code and see if I can fix it... [19:19:40] don't really know what I'm doing, but mayyybe [19:34:47] ottomata: could you copy oxygen:/a/squid/404.log to kraken? [19:35:42] zat a request from jeff green? [19:35:48] yes [19:36:15] and another simple pig request: count URL's and sort them by frequency [19:36:25] (no geocoding :D ) [19:37:36] hmmmm, would that be more urgent? i'm having trouble with my udf at the moment [19:37:42] think that doesn't require a udf? [19:37:50] nope it doesn't [19:39:05] request URLs? [19:40:11] YES [19:40:21] i mean yes without the capitals [19:42:45] loading 404.log into hadoop now [19:44:00] k [19:50:00] just got in [19:50:04] woo, STUFF [19:50:05] I HAVE STUFF [19:51:32] WHAT KIND OF STUFF? [19:52:27] THE KIND THAT USED TO BE IN STORAGE [19:52:30] LIKE AN AERON [19:52:34] AND A 30" MONITOR [19:52:53] THAT"S AWESOME! [19:54:31] drdee, re 404 uri counts [19:54:37] got it, but there are a lot of lines [19:54:38] yes sir [19:54:42] you want me to truncate? [19:54:50] the uris with fewer hits? [19:55:02] just count every URL [19:55:06] yeah I did that [19:55:18] how many uniques ar ether? [19:55:24] are there? [19:56:02] but the file i7278207s [19:56:03] oops [19:56:04] 7278207 [19:56:29] can haz cdh4? [19:56:34] yessuh you can [19:56:44] sorry, i should have told that you should only count URL's with BannerControl [19:56:50] haha [19:56:52] geez [19:56:56] whoopsie [19:57:35] okay. i desperately need a shower [19:57:38] of this i am certain [19:57:43] as i smell like moving. [19:57:44] oof [19:57:49] thankfully, i do not feel like vom [19:57:53] BannerController [19:57:53] which is a vast improvement over yesterday [19:57:54] ? [19:58:01] brb guys [19:58:22] yes count any url that contains the string BannerControl [19:58:53] BannerController [19:58:56] ler [19:58:58] right? [19:59:17] for example [19:59:17] http://en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&303-4 444496 [19:59:50] yep looks good [20:00:27] that's better [20:00:30] 543 uniques [20:00:52] can you email me that list? [20:01:57] https://gist.github.com/3829470 [20:02:07] ty [20:06:35] super cool right! [20:06:55] 14 seconds to process 6.5 GB data! [20:07:03] BAM [20:15:58] ottomata, feel like running another MR job? (this time probably using hadoop streaming and udp-filter)? [20:16:13] using the 1:1 banner impression data [20:17:01] counts for 404 vs 200/302 [20:17:41] and possibly geocoded as well [20:20:16] that sounds exciting! [20:21:40] yeah bring it on, as long as I don't have to UDF geocode [20:21:42] this thing is annoying [20:22:06] can I use the days we talked about? [20:22:11] 9-30 and 10-01? [20:22:15] i loaded those in already [20:22:27] and for that, I do'nt think I need udp-filter [20:22:31] I already have the status as a field in pig [20:22:34] I can just filter on those and count [20:22:35] great [20:22:42] okay let's start with that [20:22:54] ok, so you want two numbers? [20:23:08] 404 count and 200+302 count? [20:23:14] yes as a start and assuming that geocoding is not possible right now [20:23:19] yes [20:23:40] i'm going to get you three numbers, 200 and 302 separate [20:23:41] that'll be easier [20:23:44] and we can just add it afterwards [20:23:49] k [20:29:40] hey dschoon, check this: [20:29:42] http://www.cloudera.com/blog/2012/10/cdh4-1-now-released/ [20:29:46] i saw your email [20:29:48] i mean [20:29:49] yeah! [20:29:56] i agreed with asher [20:30:04] that namenode spof was not that big a deal [20:30:10] it was maintenance that i care about [20:30:12] with dse [20:31:17] aye, thus far it has been a bit harder to set things up [20:31:20] than with DSE [20:31:22] more services to start, etc. [20:33:53] but! [20:33:57] it's nice to have those changes [20:39:31] aye ya [20:49:30] dudes does gerrit just keep emailing you over and over until you fix the review? [20:50:07] because it's doing that to me so I'm trying to figure out if it's singled me out in some sort of hazing the new guy process [20:51:16] it likes you :) [20:53:34] how about this low tech solution: just make another commit on the release branch and submit all of them again? [20:54:20] gerrit is great at helping [20:54:24] see how it's helping you? [20:54:26] no [20:54:26] lol [20:54:29] it's a regular little helper [20:54:38] because then you have a new change set in gerrit [20:54:39] it's just a little... dumb [20:54:51] oh really? so what happens to the old one? [20:54:52] i would have said "very dumb" but whatevs [20:55:19] in your particular case that's very annoying because the other 4 commits will not be merged [20:55:20] the old changeset can't be deleted or cancelled by me? [20:55:30] that's why in gerrit you don't want dependent changesets [20:55:57] so i don't think your idea is going to fly with gerrit [20:56:33] nah that's crazy talk, the idea shall fly! [20:56:59] good luck! [20:57:00] I'm just surprised by how smart I have to be to defeat its dumbness [20:57:56] remember: "A fool may ask more questions in an hour than a wise man can answer in seven years." [20:58:14] (here the fool is gerrit) [21:05:41] drdee: since the TCP part is mentioned to not be very stable and stuff in the collector, do we really need it ? [21:05:53] drdee: I mean, maybe it's used by some systems or components [21:06:00] yes, just leave it there [21:06:03] drdee: but since the majority of stuff is geared towards udp [21:06:06] oh alright [21:06:15] the tcp stuff is only for testing IIRC [21:06:20] ok [21:14:09] drdee: ok, I was able to amend via git rebase -i HEAD~2 [21:14:13] BUT [21:14:21] :) [21:14:27] that squashed my two merge commits I think [21:14:28] there is a BUT? [21:14:45] so I did git review and it said it processed the updates [21:14:49] can you check it out? [21:15:15] I'm hoping that it just updated the commits I already submitted as opposed to making a new set because it didn't give me any addresses [21:15:44] :) [21:15:52] * average_drifter smiles [21:15:55] oooh, look at that - fixed https://gerrit.wikimedia.org/r/#/c/26475/ [21:16:50] awsome! [21:16:53] it merged [21:16:57] oh f*** yea [21:17:00] wooooot! [21:17:02] maybe you can write this down :) [21:17:06] yeah totally WOOT WOOT [21:17:19] np, so the only weird part is figuring out what to tell it to rebase to [21:17:24] right [21:17:35] like I did git rebase -i HEAD~2 and that got the last 4 commits!! [21:17:38] wt*? [21:17:48] any idea what that's about? [21:18:04] isn't git rebase -i HEAD~2 only considering the last 2 commits ? [21:18:11] you would THINK so [21:18:25] git is subtle [21:18:28] lol [21:18:43] so yeah, basically I did git rebase -i HEAD~x and varied x until I was happy [21:18:57] k, I'll write it up [21:19:15] so what was x? [21:19:48] 2 [21:19:55] hehe [21:19:57] lol [21:20:48] isn't' there a syntax that just provide the short sha to rebase? [21:22:17] yeah, that breaks horibly in our case [21:22:24] because the SHA is from a different branch [21:22:49] I'm putting this on a wiki so you're welcome to try it and update the HEAD~x part [21:22:50] :) [21:23:37] oo, somehow I filled up an03's HDFS space!! [21:23:47] http://analytics1001.wikimedia.org:50070/dfsnodelist.jsp?whatNodes=LIVE [21:24:26] wow [21:24:34] replication factor? [21:26:18] yeah 3 [21:26:33] drdee, ugh, it still failed to merge the two commits that got lost in the rebase [21:26:39] the 10 of banner logs that I loaded in from nov 2011 are 2TB [21:26:52] sent you an example [21:27:01] brb [21:27:08] annnd +500Gb for the days you asked me to load today [21:27:23] not sure why just an03 is full, and not others [21:27:31] ok, now I have more of a reason to get those other machines up [21:27:32] :) [21:27:38] i don't really want to delete any of this [21:28:14] but hmm, it might actually interfere with my status counts job [21:28:15] grrrrr [21:28:19] getting failed maps [21:28:55] erggghhhhhhhh, i'm going to remove those 10 days from last year [21:37:13] yeah, hbase would automatically resplit things for you [21:37:19] but i think you have to do it manually with hdfs [21:38:26] this is why most people break up big files into small chunks [21:38:35] but i'm not as familiar with hadoop as i once was [21:38:43] so poke around at the CLI docs [21:39:14] drdee [21:39:18] the status counts job isrunning [21:39:23] i gotta go soon, i'm not sure i'll be back on to check it today [21:39:25] it is here [21:39:26] http://analytics1001.wikimedia.org:8088/proxy/application_1349195921521_0030/mapreduce/job/job_1349195921521_0030 [21:39:35] you can wait til it is done [21:39:37] then check the output at [21:39:51] /user/otto/logs/banner1/status_counts_20120930-20121001 [21:39:55] you should be able to run [21:40:00] hadoop fs -cat /user/otto/logs/banner1/status_counts_20120930-20121001/part* [21:40:02] to see the output [21:40:05] i *think* it will work [21:40:09] Can anyone help me get a web directory on stat1? [21:40:10] sweet [21:40:32] halfak, I would say yes, but I gotta leave now [21:40:40] if you want, send me an email with what you need and I will see what I can do in the morning [21:40:45] otto@wikimedia.org [21:40:51] Will do. Thanks [21:41:58] drdee: should the output dir be created automatically if it isn't there ? [21:42:31] nah, just exit with an explanation [21:42:35] ok [21:42:48] really? [21:42:56] ...why not just create it? [21:43:25] because what happens when you don't have permission? [21:43:35] you still have to exit and explain the error [21:43:38] THEN you exist :P [21:43:41] *exit [21:45:24] it just seems lame to exit when you can easily fix the problem. [21:46:08] drdee, my writeup though you see that email? I don't think the first changeset was resolved [21:46:09] http://www.mediawiki.org/wiki/User:DAndreescu/GitFlowGerrit [21:46:43] dschoon, it is the collector daemon we don't need to make this a shiny piece of software, it just needs to run [21:48:50] hokay [21:48:52] good point [21:54:03] milimetric: yeah so the problem is that gerrit still expects to merge the V1 of the change set but it has become V2 [21:54:11] it almost sounds like a gerrit bug…..[( [21:55:00] ok, so how do I just cancel the change set? [21:55:21] blasphemy! gerrit has no bugs! [21:55:24] :D [21:55:32] you can abandon a changeset [21:55:41] just click the big abandon button [21:55:51] but not sure what happens to the change sets that are dependent on it [21:56:06] oh! i bet this is the voodoo chicken bug. first you need to find a chicken [21:56:28] then you kill it in halal style, and spread the blood out on the gerrit machine [21:56:45] wait, no. i'm thinking of something else. [21:56:46] nm [21:57:21] as you can see milimetric, dschoon is our staunchest defender of gerrit [21:57:30] the flag bearer [21:57:34] ya he seems to really love it [21:57:41] i definitely have never argued that we should abandon it and use github instead [21:57:50] and surely my team rues the day i moved limn there [21:57:58] lol [21:58:55] ok so one way to fix this would be to just start the release branch and then squash your commits into one big commit [21:59:08] do you guys hate that? [21:59:20] yes, but then you have the massive review at the end [21:59:24] most folks at WMF hate that [21:59:54] well it'd still be a massive review at the end unless you do git-review as you go [21:59:56] OOOH! [22:00:02] i personally find it really unfortunate that we rebase at all [22:00:04] it loses history [22:00:14] yeah, i don't like it either [22:00:35] (my oooh was misplaced, nvm) [22:03:16] if we're trying to do a git-review for each commit then I'd think gerrit should commit to develop [22:03:28] gerrit commits?! [22:03:35] holy chirst that would be terrifying [22:04:32] guys guys! [22:04:40] i totally forgot i have this excellent soundsystem! [22:04:43] I AM SO EXCITED [22:04:55] WE ARE ALSO VERY EXCITED FOR YOU!!!! [22:05:04] MOVING IS LIKE CHRISTMAS [22:05:08] YOU FIND SO MUCH STUFF [22:05:14] SOME OF IT YOU EVEN LIKE [22:20:40] dschoon: we are launching too many mappers on that job, we need to increase the HDFS block to 256Mb or maybe even 512 (it is right now 64mb [22:21:13] really? [22:21:23] because quantity of mappers has never been a problem [22:21:29] it just means the shuffle is expensive [22:21:32] oh. [22:21:37] we're low on disk, aren't we? [22:21:48] no if a mapper runs for 5 seconds than that's bad for perofmrance [22:21:55] ehh. [22:22:00] yes, sometimes. [22:22:01] quantity of mappers is a problem [22:22:12] but changing the blocksize means resplitting everything [22:22:16] it should be between 10-100 (depending on mapper) per node [22:22:16] i'd wait until otto is back [22:22:24] of course, i am just saying [22:22:34] yeah, 40/node was roughly what the original paper said, iirc [22:23:16] we have launched about 2500 mappers right now and we are half way the job [22:23:23] eh. [22:23:29] i agree a bigger block would help. [22:23:39] 256 sounds good [22:23:45] we also need more space :( [22:24:37] most definitely [22:25:04] hdd tshirt! [22:26:47] yes [22:26:56] you should finish the kraken logo [22:27:03] i should [22:27:18] but i havne't come up with a design i'm satisfied with [22:47:32] drdee: https://gerrit.wikimedia.org/r/26554 [22:47:34] drdee: please review [22:51:12] average_drifter: done, https://gerrit.wikimedia.org/r/#/c/26554/ [22:51:22] drdee: wow that was fast [22:51:57] drdee: yes you're right [22:52:09] drdee: thanks [22:53:45] drdee: sorry about the gerrit crapstorm. I'll let it be for now and try to tackle it again after I've used it a bit more [22:54:12] i plan on sticking with git flow [22:54:34] k, i'm off to dinner - nite [22:54:39] laterz [23:01:10] dschoon: are you going to watch the presidential debates tonight? [23:02:21] is that tonight? [23:02:22] i should. [23:02:42] yup