[00:20:15] mk [00:43:33] dschoon you around? [00:43:39] i am [00:44:21] what's up, milimetric [00:44:22] ? [00:44:24] ok, I'm a little stuck. I'm trying to trigger an event from ChartView and handle it from its children. But I suspect its children are getting created later or something [00:44:50] wanna look at a screen share quick? [00:45:15] I figure I could kinda feel my way around it but you'll probably see what's wrong quicker [00:46:31] dschoon^ [00:46:50] sure. [00:46:56] you should checkpoint and push first [00:47:07] then i can read along if things are unclear [00:48:57] k, pushed and sent link [01:44:56] woo [12:57:57] morning! [13:16:48] morning milimetric [13:16:56] hey drdee [13:17:09] if you haven't seen food matters, you should [13:18:19] what is that? [13:18:23] hey average_drifter [13:23:28] hey drdee_ [13:23:37] yo [13:23:57] drdee_: just talked with Erik, he reverted some of my changes in the git repo, workin on getting them back in there [13:24:37] what do you mean exactly? did you submit it through gerrit? [13:31:30] milimetric, do you have experience with putty & windows? [13:31:51] yeah, I've used it before [13:32:39] could you maybe reach out to erik z and try to get his private/public key setup to work [13:32:43] oh, btw, ghetto quality version of the documentary Food Matters: http://vimeo.com/37938865 [13:32:58] i tried it a couple of times but it' still not perfect [13:33:06] oh, that part basically is broken as far as I can tell [13:33:12] i sat with two other very smart people for like two days and we couldn't get it [13:33:23] can he authenticate with just his password on http link? [13:35:01] I'll reach out though, np [13:40:55] or maybe we can do it with the three of us [13:44:50] i know it's hard, we have succeeded on some other machines so maybe the three of us can fix this [13:52:39] dan, https://plus.google.com/hangouts/_/0486b5bae27aed045d21381f6919ef84d03efd66?pqs=1&hl=en&authuser=1 [13:52:55] i mean milimetric ^^ [14:44:59] drdee: http://www.garage-coding.com:8010/builders/udp-filters-builder [14:45:05] drdee: udp-filters CI [14:45:13] i like it! [14:45:18] :) [14:45:59] ottomata, gerrit setup question [14:46:09] i cloned wikistats on stat1 [14:46:26] but obviously under my username [14:46:28] yessuhhhh [14:46:44] but we also want erik and stefan to be able to push / pull that repo [14:46:53] what's the best way to set it up [14:47:02] a special wikistats gerrit account? [14:47:33] and give the private key to me, erik and stefan? [14:47:48] wellll we had this problem with gerrit-stats too, right? [14:47:54] * average_drifter thinks of a post-update hook on the gerrit server that would update the wikistats_git on stat1 [14:48:00] probably yes, a special account, we could always just use teh stats user [14:48:13] but yeah, a script that is allowed to sudo to stats and git pull [14:48:20] that you guys are allowed to run [14:48:26] or something [14:48:27] that's how we did it with gerrit stats [14:48:34] i could try to write somethign more generic [14:48:55] that would take a list of allowed places to pull or something [14:49:12] k [14:49:21] do you really need to push from that repo too? [14:49:23] or just pull [14:49:23] ? [14:49:35] maybe we need one generic analytics gerrit user account for this kind of stuff [14:49:41] i think the stats user is fine [14:49:54] right now we do need push as we are configuring wikistats on stat1 [14:50:06] but i can just do the manual push myself for the coming days [14:50:11] aye [14:50:12] don't expect to need push in the long run [14:50:24] stats user is fine [14:50:44] ok [14:50:46] added it to my todo [14:51:34] * average_drifter runs to grab some lunch [14:52:13] ty [14:52:24] man after my own heart. Names meals by their order in the day, not the time [14:54:15] drdee! yesterday I learned that there IS a multicast udp2log stream! [14:54:27] meaning we don't need the log2udp relay to get the firehose [14:54:38] really? [14:54:42] yeah, notpeter set it up a while ago [14:54:44] and oxygen is actually using it [14:54:52] because i know i tried to set it up with mark early this year [14:54:55] one of the socat procs I killed yesterday while trying to fix packet loss was this [14:54:59] right before you arrived [14:55:00] and that brought down the site [14:55:06] oxygen has a socat process [14:55:13] that receives the udp2log stream from all the sources [14:55:19] okay! that's very cool [14:55:19] like udp2log usually does on emery and lock [14:55:27] and then broadcasts it to a multicast addy [14:55:35] udp2log-oxgen is started up with --multicast 233.58.59.1 [14:55:38] which means that, fro example [14:55:40] on an11 [14:55:46] if I start up udp2log the same way [14:55:47] it gets the firehose [14:55:55] WOOOOOOOOWWWW [14:56:00] AMAZING! [14:56:09] yeah, I wish I knew that [14:56:18] especially since I caused udp2log on oxygen to stop for hours yesterday [14:56:21] until asher told me that [14:56:29] oh quick question, can i copy files from stat1 to an01? [14:56:37] yeah, should be able to [14:56:46] an01.eqiad.wmnet? [14:56:49] no [14:56:54] analytics1001.wikimedia.org [14:56:56] ohhhhh [14:56:57] ok [14:56:58] ty [14:57:14] btw, the an** names only exist in our local /etc/hosts file [14:57:22] so that only resolves from our local servers [14:57:49] duhhhh [14:57:50] ty [15:18:59] hey ottomata, i am running out of diskspace on an01 when copying my fixed sample archived files [15:21:31] haha, yeah , hehe [15:21:40] where are you copying them? [15:21:46] ah /a [15:21:47] ? [15:21:48] i see [15:22:12] there's actually a lot more space avail there, I jus thaven't partitioned the drives [15:22:28] i could lvm them all up together and give you lots of space [15:23:28] i was trying my home folder [15:23:35] oh [15:23:52] oh all of the sampled logs are in /a/logs/sampled too [15:23:54] did you know that? [15:24:02] nope [15:24:03] i will make a new separate partition [15:24:08] with the 3 remaning drives [15:24:16] shall i copy my files to that location? [15:24:18] should give us another i dunno, 800GB of space [15:24:22] well there's on 49G avail there [15:24:23] one sec [15:32:10] ok, drdee [15:32:19] /a now has 800GB free [15:32:25] i'm copying over a buncha files from /a_orig [15:32:32] that will take up around 200GB of it [15:32:42] once /a_orig is copied, i'll add it to the volume group in /a [15:32:46] and give it another 200G of space [15:32:51] so we should have a 1TB workspace on an01 [15:49:50] coool, drdee, I have udp2log on an11 consuming from multicast and then producing into kafka, and then a console consumer on an17 just consuming the stream from kafka [15:52:57] RIGHT ON! [16:20:50] "The dirty truth is, though, that many analysts and scientists spend as much time or more working with mere megabytes or gigabytes of data: a small sample pulled from a larger set, or the aggregated results of a Hadoop job, or just a dataset that isn’t all that big (like, say, all of Wikipedia, which can be squeezed into a few gigs without too much trouble)." http://strata.oreilly.com/2012/10/matlab-r-julia-languages-for-data-analysis.ht [16:39:20] ottomata: that's hot! [16:39:33] yeah cool eh! [16:51:10] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [16:55:31] drdee: question about geocoding anonymity [16:55:39] shoot [16:56:01] i'm setting it up to write to a database, and I'm looking at the way it deals with the total number of edits for the top K cities in a country [16:56:31] it currently takes the number of edits from each of the top 10 cities and then divides that by the number of edits from the largest city [16:56:41] and then multiplies by 10 [16:56:56] aight [16:57:09] I'd like to make it more interpretable [16:57:23] what do you think about just having the number of edits for a city [16:57:32] but putting a lower threshold on the number of edits from a city required to reprot [16:57:55] the reason we did this was that way you could not identify the editors but still would have the ability to rank order cities [16:58:08] cool [16:58:18] so what about dividing by total edits [16:58:20] and then multiplying by ten [16:58:27] i guess you can reconstruct it [16:58:36] given the total edits [16:58:47] but you can also reconstruct it in the current state [17:01:43] sorry, but i'm curios, what are you doing with the cities? [17:07:38] http://www.nytimes.com/interactive/2012/10/15/us/politics/swing-history.html?hp [17:07:56] Latest Mike Bostock visualization [17:21:03] hokay, heading into the office [17:21:05] be there in a few [17:25:34] Nixon and Reagan - the most decisive presidential victories ever. By far. Huh. [17:38:16] ottomata, can you add in your cdh4 puppet script that the following file needs to be downloaded from oracle: mysql-connector-java-5.1.22.tar.gz [17:38:30] i am trying to get swoop to work [17:38:38] swoop -> sqoop [17:38:43] yeah, sorta, i can puppetize that, but that won't be part of cdh4, will it? [17:40:18] it has to be, because that lib is required for sqoop [17:40:22] and swoop is part of cdh4 [17:40:33] stupid autocorrect [17:40:43] so swoop does not work out of the box right now [17:40:47] arrrgh [17:40:48] sqoop [17:40:50] sqoop [17:40:51] sqoop [17:42:15] the file needs to be installed in /usr/lib/sqoop/lib/ [17:43:39] um, hm, ok will look into that [17:43:48] uhhhhh, is there a .deb? [17:43:57] i will look around [17:44:16] for now i'll make it an asana task, else i will forget :) [17:45:53] BAM BAM [17:46:01] first sqoop import successfully completed! [17:46:11] this is f*** cool [17:46:54] coz now we can join shit :) [17:47:24] cool, so into hive? [17:48:48] yeah you can assign that asana task to me [17:48:53] that's cool [17:50:31] no this was straight into hdfs [17:50:44] i'll try hive as well [17:51:26] louisdang: i got a double booking, can i call you a bit later this afternoon, let's say around 2:30? [17:51:34] yeah sure [17:53:13] what does the file look like on hdfs? [17:53:18] ottomata, there is a deb package [17:53:18] i'm curious! [17:53:24] updating asana [17:54:30] ottomata, it's four files just like the output of a regular MR job [17:56:06] yeah just curious how it maps table data to file data. csvs? [17:56:08] tsvs i guess? [17:56:12] can I -cat one [17:56:13] ? [17:56:18] where'd you import to? [17:57:08] /user/diederik/enwiki [17:58:02] so is that a particular table? [17:58:05] mmm interesting, not sure if the format is corret [17:58:11] yes, that is the recent changes table [17:58:23] maybe it's binary [17:58:26] yeah looks like it [17:58:37] i have to look into it a bit more [17:58:37] or um, ascii hex encode dbinary [17:58:42] btw [17:58:43] but import itself does work [17:58:55] pig can load binary data, if you know the structure [17:59:11] yup [18:23:32] it appears we need some daleks in the kitchen [18:23:42] DE-SCAL-INATE [18:23:42] DE-SCAL-INATE [18:23:45] isn't that what they say? [18:26:57] was? [18:27:06] when? [18:27:13] wer? [18:27:29] wark! [18:28:34] louisdang, what is your skype handle? [18:28:43] just louisdang? [18:28:47] drdee, louis.p.dang [18:29:02] invite sent [18:32:22] the prologue to taleb's new book is so good. http://less.ly/msc/books/nassim_taleb-antifragile_prologue.pdf [18:44:22] dschoon, I just pushed my amazing invisible rendering [18:44:48] I'm running low on fuel so I'll wander around for a while but you're welcome to take a peek. Everything I mean to render is rendering but nothing's showing [18:45:04] and by render of course I mean attach to DOM [18:45:43] aiight [18:45:51] i'll check it out in a bit. working with ottomata atm [19:06:18] gotta go run an errand pretty far away, I'll be back online in about two hours. [19:07:58] coolio [19:08:02] i'll email any notes [19:20:55] ottomata, can i copy the fixed sampled log files to hdfs and overwrite the ones in /usr/otto/logs ? [19:21:25] sure [19:23:50] copying... [19:36:19] why isn't there a progress meter when copying data to hdfs.... [19:49:34] ottomata, how long did it take you to copy the files to hdfs? [19:51:50] long time [19:52:04] like hours? days? [19:52:59] oh it just finished :) [20:00:32] ummm, 4 hours? [20:02:03] for me it was much faster [20:02:05] about 30 minutes [20:38:07] do we host etherpad lite somewhere? [20:38:25] people have experimented with it. [20:38:34] i was planning on setting it up on kripke. [20:38:40] so much better. [20:52:06] BAM BAM BAM BAM [20:52:16] swoop import into hive succesful [20:52:30] sqoop, i said sqoop sqoop [20:52:58] now the only problem is that most fields are binary :) [20:53:15] ottomata, dschoon ^^ [20:53:25] nice! [20:56:44] yes but how to unmangle all the binary fields [20:57:28] heh [20:57:35] binary? [20:57:38] which files? [20:57:40] *fields [21:01:34] lot's of them, i was using the recent changes table from enwiki and a lot of columns in mediawiki have column type varbinary [21:02:07] so fields like title, comment, ip, they are all varbinary in mediawiki [21:16:18] ah [21:41:18] back! [22:30:28] ottomata, where did you put the script to count per hour the traffic data? [22:30:38] i got a CLI version of snappy installed on an01 [22:31:01] it's called snzip [22:31:07] and snunzip [22:33:24] ottomata, quick question [22:33:30] where did you put the script to count per hour the traffic data?