[00:16:10] tnegrin, I just realised two things [00:16:22] just two? [00:16:25] first, Grace was not around for that time I amused myself rewriting pop songs to refer to our team. [00:16:38] no [00:16:39] Second, I owe you all Katy Perry's "this is how we 'doop". I'm like three quarters of the way through the rewrite. [00:16:44] No need to thank me. [00:17:13] Grace is elizabethan, you’re more modern [00:17:15] it’s all good [00:17:21] I'm going to form a band with halfak and leila and tour the world's CS conferences doing gigs. [00:17:28] Like filk music, but for researchers [00:17:38] We're gonna call ourselves We R CHIentists [00:17:48] just stop [00:17:50] ;) [00:17:53] hey, last time I checked I was the only team-member who knew Latin. [00:18:00] I think I get the medieval claim here. Hmph. [00:18:08] show don’t tell [00:18:39] don't forget DarTar, Ironholds [00:18:39] * halfak looks on amused [00:18:42] sic, dominus [00:18:44] * Ironholds bows [00:19:00] leila, he's gonna be the singer. Dario Tenorelli. [00:19:02] hey — Ironholds — were you telling me about a bug that impacted article counts? [00:19:04] I've got it aaaall worked out. [00:19:19] hmn. I don't think so? Except the redirect problem, of course. [00:19:24] oh, and URL encoding *spits* [00:21:08] I’ll play the timpani [00:21:17] DarTar, are you in the meeting? [00:21:28] am I? [00:21:32] if not, let's do a Hangout, we have 40 min and I can't find kaldari [00:21:38] and we need to test Nulls [00:21:54] do we know if the code is live? [00:21:59] I don't [00:22:03] :( [00:22:18] hang on, let me check with max [00:22:57] DarTar, the code is out [00:22:59] I sent you an invite [00:23:08] k [00:27:30] * Ironholds cackles [00:27:35] my entire standup entry is a series of haikus [00:36:03] What? No shuff? [00:36:09] How did order come? [00:37:02] I shuff'd! [00:37:12] OK. *wipes brow* [00:37:24] halfak, I don't suppose your C++ foo taught you if variadic functions could have arguments passed to them via an intermediary variadic function? [00:37:40] so, var_fun_1 = function(...){} [00:37:57] var_fun_2 = function(...){var_fun_1(...){} for a styupid and deliberately broken example [00:38:50] I don't see anything wrong with that example other than unbalanced curlies [00:39:26] Oh. Wait. one sec [00:40:28] Ba. No workie. [00:40:29] http://stackoverflow.com/questions/3530771/passing-variable-arguments-to-another-function-that-accepts-a-variable-argument [00:41:15] perfect! [00:41:19] that's a valid workaround. Thanks :) [00:43:59] :D [00:47:15] DarTar, it's live. enjoy it. [00:47:31] IT’S ALIVE [00:48:45] leila: can we capture a timestamp as a cutoff for the real(tm) data? [00:49:09] yup. I made a note [00:49:30] will update docs with that info, DarTar [00:49:52] danke [01:13:17] geolookup is almost done, lovely people [01:49:48] YESS [01:49:51] we have Ipv6 geolookup [02:26:53] eeeheeee [02:26:56] hey, leila? :D [02:27:24] is geolocating a million IPv4 and IPv6 IPs in 4 seconds, retrieving country, city, country ISO, region name, continent and connection type good? :D [02:27:28] * Ironholds dances [02:28:12] that's why you seem ecstatic... [02:28:16] I'd say that's pretty good, Ironholds. ;-) [02:28:22] * Guerillero is impressed [02:28:37] how are you doing it Ironholds? [02:28:57] Oh, Ops had me guinea-pig the new Geolocation API [02:29:07] and because of the format they're using, you can retrieve pretty much every field in one query [02:29:11] codebase is....vun moment. [02:29:36] https://github.com/Ironholds/rgeoip [02:30:08] cool! thanks! [02:31:23] I'm going to get it into a more useful format and send an email :). The documentation and convenience functions are lacking. [02:34:52] A list containing vectors that contain strings. That is the C++ I know! [02:43:31] well, if I was doing it Properly I'd make a list of vectors of strings, transposed [02:43:41] so, each list entry would contain all country codes, all city codes, etc, etc. [02:43:47] that way it's easily convertable into a data.frame [02:43:56] but that's a pain, and converting it into a df on the other end isn't that much of a pain [02:45:35] * Guerillero nods [02:46:26] Your C++ looks good, btw. [02:46:34] thank you! It's fairly mucky, but it works. [02:49:32] that really isn't a problem, until someone else tries to adapt it for another purpose [02:51:15] * Guerillero forgot existed yesterday on my exam and wasted time writing an absolute value function [02:52:10] hahah [02:52:18] aaand now it won't install on stat1002. That's weird. [06:25:43] evening leila :) [06:25:54] evening Ironholds. how's it going? [06:25:54] I have got the geoip library to install on stat1002/3. [06:26:06] it turned out to be a stupid bug in 1.0.2 that was fixed in 1.0.3 [06:26:14] ah! cool! [06:26:15] thus causing endless confusion because stat1002 has .2 and I have .3 [06:26:26] so I'm just finishing off documentation, and then I'll declare an initial release [06:26:33] and try to work out how to submit things to CRAN [06:27:15] and maybe get some rest [06:27:30] maybe! [06:27:38] but I'm not paid to sleep [06:27:56] haha! [14:55:15] morning, party people! [15:51:03] quiddity, to answer a question from ages ago: [15:51:17] my Erdos-Bacon number is 9 [15:54:36] (this is of no relevance or importance, but you asked!) [16:02:41] quiddity, and my Erdos-Bacon-Sabbath is 13! [16:38:02] mornin' Ironholds. [16:39:02] morning! :) [17:09:10] :> [17:13:26] aaargh [17:31:35] Ironholds, leila: DarTar and I are going to miss standup today. [17:31:53] okay, halfak. thanks for letting us know. [17:32:21] Totally. My items should be up to date. :) [17:33:22] halfak, how come? :( [17:33:24] I have awesome news! [17:34:18] Working on some strategy stuff with Toby. [17:34:23] Boo. Sorry to miss it. [18:29:54] yo halfak :) [18:30:38] hey ottomata! I ran over the snappy files last night :) I've got a working setup. I'll need to run again in order to time it though. [18:30:58] ok cool, do you mind if I move the .xml.bz2 file into a directory in /user/halfak/hadoopin/diffengine? [18:30:59] at [18:31:14] /mnt/hdfs/user/halfak/hadoopin/diffengine/simplewiki-20141122-pages-meta-history-xml-bz2 or something [18:31:14] or [18:31:21] maybe i can move each file into a directory under than name [18:31:38] Sure. Either way, rock on. Just let me know what you did when you are done. [18:31:47] ok cool [18:31:54] i want them in subdirs so I can [put hive tables on them [18:31:58] I'm still working on the rhyme and reason of my hdfs directories. Currently a mess :/ [18:32:02] +1 [18:32:08] ja, it always will be :) [18:32:09] heheh [18:37:59] halfak: [18:38:00] simplewiki-20141122-pages-meta-history/ [18:38:00] ├── avro-bz2 [18:38:01] ├── avro-snappy [18:38:01] ├── json-bz2 [18:38:01] └── xml [18:38:42] +1 Also, nice ASCII :) [18:38:47] tree [18:38:48] :) [18:39:02] tree -d simplewiki-20141122-pages-meta-history/ [21:22:27] Ironholds, we can't clone Github repo on stat1002, right? [21:22:50] I mean, we theoretically could, but I don't know how git and proxies interact [21:22:59] the proxy advice otto gave for future-stat1003 applies to stat1002 now. [21:23:21] I see. I think the point was to set it in a way that we can't do these kind of thins on it? [21:23:43] really? [21:23:49] I thought the point was to avoid a public IP address [21:23:59] owwww. right! [21:23:59] like, you can still have interactions, but stat100* has to begin them [21:24:11] it won't just accept random pings on 443 or 80. [21:24:15] got it. you're right Ironholds [21:24:16] yeah, actually, the github thing i find really annoying too [21:24:19] there should be a way... [21:24:26] if only github didn't force https [21:24:40] lemme figure it our then, and let you know, cuz it's easier if we can [21:24:47] probably [21:24:47] git config --global http.proxy %HTTP_PROXY% [21:25:03] dunno if that works with https though [21:25:03] hmm [21:25:04] trying :) [21:26:10] oh [21:26:12] that totally works [21:26:15] awesome! [21:26:20] doh! [21:26:22] * Ironholds downloads all the things [21:26:23] leila, ironholds: git config --global http.proxy http://webproxy.eqiad.wmnet:8080 [21:26:26] cool! [21:26:55] urggh [21:26:58] ALMOST DONE WITH GEOLOCATION [21:26:59] * Ironholds cries [21:27:07] haa! thanks ottomata! this makes life much easier [21:27:12] https://github.com/Ironholds/rgeoip look at this pile of shit. Nobody should have to write in that many languages. [21:28:56] where is halfak when I need him!? [21:28:57] :) [21:29:18] ottomata, what do you need him for? [21:29:19] ^d, what [21:29:22] just to chat :) [21:29:23] check it! [21:29:23] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/xmldumps#Results [21:29:28] i think parquet is going to be very good for us [21:29:42] WOAH [21:29:43] <^d> Ironholds: I was trying to see if a nick was taken :( [21:29:45] that is a very naive example, but pretty good [21:29:46] <^d> Sadly, it was. [21:29:52] can we apply that to the pageviews too? [21:29:54] cumulative cpu time is the most relevent metric [21:29:55] ^d, why are you butters? ;p [21:30:02] webrequest, yeah, that would be part of fabled ETL [21:30:10] this experimentation should inform that [21:30:24] aha [21:30:33] The Legend Of ETL [21:30:36] heheh [21:30:39] needs a new name [21:30:41] actually, we want to call that [21:30:42] refinement! [21:30:43] :) [21:30:45] <^d> Ironholds: Best character in south park, duh. [21:30:49] refined webrequests :) [21:31:06] does that make pageviews smelting? [21:31:23] ottomata, at this rate we're going to have to call it unobtanium refining [21:31:27] because it doesn't exist ;p [21:31:47] hah [21:31:55] fabled! [21:32:04] refabling? [21:32:06] haha [21:32:26] we should totally have a project for that [21:32:42] phabricator for things we're going to do, and fablericator for things we would love to do if we had Inf money and time [21:33:07] haha [21:33:31] okay! I am DONE. [21:33:38] you know what? I think I might have a celebratory drink tonight [21:33:52] in two weeks, I've written and released three different FOSS libraries in two languages. [21:34:22] that's worth a toast [21:34:28] :) [21:36:35] HMMM, i betcha I could make camus write camus or avro files [21:36:36] instead of json [21:36:42] for raw (unfabled) webrequests [21:36:43] hmmm [21:36:53] parquet or avro files* [21:36:54] hm [21:40:19] whoa, this is cool too [21:40:26] (i am talking here because, hm, where else?!) [21:40:26] http://blog.cloudera.com/blog/2014/08/new-in-cdh-5-1-hdfs-read-caching/ [21:40:43] allows you to specify files to cache in memory (or via a job too) [21:40:57] so, if you were going to run a bunch of jobs or hive queries on a some files, you could pre cache them [21:41:00] i think anyway.. :) [21:42:49] anyone got a windows machine I can test on? [21:47:48] wait, resolved it. [22:16:01] I think this month's FOSS contributions may have genuinely removed any incentive I have to write large amounts of code, for the time being. [22:35:47] im creating a tool that is going to read RCstream, and then try and look up every diff from the revision IDs in the RCstream, i'm worried that even if i spawn enough threads the APIs will reject me for reading too quickly, will this be an issue? [22:59:47] notconfusing, probably not, but it's kind of a dick thing to do [23:00:00] the general rule is as long as you wait for query1 to finish before launching query2, we don't care [23:00:17] multithreading, though, is asking for trouble. Even if it doesn't actively prohibit you, you shouldn't be doing it without a REALLY good reason [23:00:25] (such as "you're google and this is how your crawler works") [23:02:07] eugh. I think I might have actually burnt out my brain. [23:19:55] leila, can I ask a favour? ;) [23:20:05] you can always ask Ironholds. ;-) [23:20:15] if I send you a .gz, can you install it on your local and tell me what it does if it blows up? [23:20:32] I can do it in exactly 1 hour 40 min [23:20:50] yay! [23:20:55] sent :D [23:21:03] :D [23:39:02] so Ironholds, are you trying to blow up my machine? [23:39:03] D [23:39:13] no, I'm testing some C++ [23:39:22] one blocker is that the test is "can I bind the dependent library into the package" [23:39:44] since I have already installed the dependent library on my machine, I can't tell if removing calls to it breaks things [23:40:40] oh, wait. I see. [23:41:45] I think I may have fixed it. [23:42:04] I did tar -xzf rgeoip_0.5.0.tar.gz with no compalint [23:42:12] oh, no [23:42:22] R and then install.packages("rgeoip_0.5.0.tar.gz") [23:42:29] only, I have sent a new version! Suggest testing that instead :D [23:43:43] output in the email [23:43:47] ta! [23:45:05] leila, okay, one final version? :D [23:45:12] k ;-) [23:45:44] sent! [23:46:42] send output [23:47:07] aha! [23:47:16] okay, how many carved wooden things do I owe you for one more test? ;) [23:47:31] if it's one, nothing [23:47:41] if it doesn't get resolved, we should do it later [23:47:55] gotcha [23:47:59] ooh, this is interesting! new bugs [23:48:12] let's do it later :) [23:48:52] ooki. :-) [23:58:59] aaaah I think I got it working [23:59:01] * Ironholds falls over