[00:11:26] there is some, linked off the Kraken page, louisdang [00:11:35] specifically about event-logging [00:11:41] there's a bit more on the hardware page [00:11:56] https://www.mediawiki.org/wiki/Analytics/2012-2013_Roadmap/Hardware [00:12:11] but that is a little dated. [00:12:24] i haven't had the time to braindump where we're at currently [00:12:35] but i'm the guy who has architected most of it, so you're welcome to pester me [00:12:40] i'm dsc@wikimedia.org [00:13:51] I know a lot of that is about hardware, but I think you can get at least a sketch of what the diagram means: https://upload.wikimedia.org/wikipedia/mediawiki/3/38/Kraken_flow_diagram.png [00:15:38] the logging overview is https://www.mediawiki.org/wiki/Analytics/Kraken/Request_Logging [00:17:24] ok [00:31:40] dschoon: what is that diagram made with ? [00:34:31] average_drifter: you just missed dschoon [00:35:06] not sure what he used, but I'm guessing it's a Mac app of some sort [00:36:53] oh cool [00:37:08] it looked pretty and that's why I was asking [00:50:34] I don't know if you guys are still working, but I just received my lab account [01:04:50] anyone do programming stuff? I'd like a Wikimedia related iPhone app that brings together Wikinews, Wikipedia and Commons stuff. :) [01:34:14] purplepopple: I do programming but uh [01:34:28] purplepopple: not so sure about iPhone apps [01:36:44] purplepopple: how about building one with phonegap ? [01:36:56] purplepopple: http://www.youtube.com/watch?v=wOH4aGows40 [01:37:28] purplepopple: but I don't really understand what it is you want built.. unless you write it down, perhaps make some sketches on paper and take some shots of them .. [01:40:17] purplepopple: looks like Wikipedia is already using phonegap http://www.youtube.com/watch?v=wOH4aGows40#t=2m45s [01:40:23] purplepopple: except I don't know what they use it for [01:43:53] Um [01:44:12] average_drifter: I want to be able to take a topic and then have things similar. [01:45:00] So link Wikinews articles for say Skiing that primarily display, than an ability to pull up a select list of Wikipedia articles, and then pull up images from certain categories. [01:45:59] It would have been really useful during the Paralympics. [01:48:05] drdee , dschoon : I don't know if you guys are still working, but I just received my lab account [02:14:57] louisdang, excellent! [02:15:34] can I get access to the project? [02:15:52] tomorrow if it's too late today [02:21:56] jeremyb: can you give the loginviashell right to user 'louisdang' in labs? [02:37:05] purplepopple: if you have pen and paper, make a drawing, I'd like to have a look [11:29:14] Toolserver still having issues? :( [12:47:45] morning! [12:48:22] morning milimetric [12:48:31] 1 sec i got a nice link for you [12:48:38] :) [12:48:54] www.mail-archive.com/dri-devel@lists.sourceforge.net/msg39091.html [12:49:12] it's from the master himself [12:49:14] oooh, git talk. sexy [12:49:19] I love git talk in the morning [12:49:21] i know you like it [12:49:36] (feel free to change the topic :D) [12:49:40] oh badass. The daddy of git [12:50:08] yes, so i usually consider his opinion slightly longer :D [12:50:24] hey drdee [12:50:31] but actually his 'rules' make total sense [12:50:36] yo average_drifter [12:50:44] that perl one-liner was scary [12:50:45] can I past 6 lines here ? [12:50:49] *paste [12:50:53] sure [12:50:57] user@garage:~/wikistats/udp-filters$ perl -e 'sub diff_consec {return `git diff HEAD~$_[0] HEAD~$_[1] src/udp-filter.c 2>&1 | perl -ne "/^[+-][^+-]|fatal/&&print"` }; sub rel2sha1{return `git rev-parse HEAD~$_[0]`}; $i=0; while(1){$i++; $diff=diff_consec($i,$i+1); last if $diff =~ /fatal/; if($diff =~ /VERSION(.*)/){ $sha1=rel2sha1($i); chomp $sha1; print $sha1." => ".$1."\n"; }; }; ' [12:51:03] cae3acdc74c505d0bbf4bdd956dddcf92e8bdb5e => _NUMBER 0.3.0 //Please keep incrementing this after each bug or new feature [12:51:07] 50e5f68a37b692e1923b01d58a9be1d9047a9924 => _NUMBER 0.2.6 [12:51:09] 56281eb814c7de9369e8c1bc9d43f4d80e560dd5 => _NUMBER 0.2.4 [12:51:12] 4d46c81bd05bb6c0a432c96f9d25cc04b1fb2923 => _NUMBER 0.2.3 [12:51:14] 8d8b30dffd2941d308ce3973d1b4d06d3dc99080 => _NUMBER 0.2.0 [12:51:15] it says what sha1 commits were responsible for VERSION changes [12:51:21] this is the output [12:51:21] nice [12:51:47] milimetric: did you see average_drifter's 'one-liner'? [12:51:59] okay so let's tag those commits [12:52:16] ok [12:52:27] yeah, saw it [12:52:52] didn't understand it. should I try? [12:52:58] just run it on a repo [12:53:07] probably no results [12:53:07] oh ok. I was scared [12:53:28] it will search all your commits for when the string 'VERSION' was changed [12:54:01] right, but that doesn't exist in reportcard-data [12:54:11] or limn [12:54:15] well you can change the string :) [12:54:28] is that a standard you guys have? VERSION_? [12:54:37] actually, we should have a decent versioning system for limn as well [12:55:22] the SHA doesn't help me personally because you can't use it for anything useful if you're always crossing branches like I do [12:55:40] git reflog is pretty good though [12:55:50] milimetric: are sha1 unique across branches ? I think they are right ? [12:55:59] yep, they are [12:56:21] but git rebase -i someSHA doesn't work if you have merges in there [12:56:25] it gives some weird error [12:56:56] really? [12:56:58] that's surprising [12:57:28] yeah, that's why I had to guess X in git rebase -i HEAD~X [12:57:52] milimetric: I iterate over X in the oneliner, so it guesses for me :) [12:58:01] lemme tell you, X is not friggin intuitive at all. I used 2 to get the last 4 commits in my case. ?! [12:58:45] yep, my innocent python eyes adjusted enough to see that bit (X guessing) [13:00:18] milimetric, wait [13:00:30] I'm not going anywhere :) [13:00:36] i change the second patch set of the 5 that you submitted [13:00:48] so maybe that's why it was '2' [13:01:09] well, I mean before I ever did git review [13:01:10] i was already worried that you were about to leave [13:01:51] nope, never, you'd have to kick me out [13:03:38] :D [13:04:00] average_drifter ready for some debian packaging? [13:04:01] yeah, so what Linus says is what everyone on the web echoes [13:04:10] drdee: a moment please [13:04:12] or the other way around :) [13:04:26] but for us that basically means we're never allowed to rebase to please gerrit [13:04:46] well you can rebase your *shit* [13:04:48] of course, my echoes I meant his opinion [13:04:58] drdee: got a question [13:05:00] but this is what puzzles me about gerrit [13:05:03] yeah, but you can't do that for gerrit's purpose [13:05:06] drdee: if I tag these with the version in the source code... [13:05:13] it seems to violate some of git's assumptions / decisonns [13:05:20] drdee: do we have the guarantee that the next version will be bigger ? [13:05:23] because you'd be off on some feature branches, merge them to dev (with other people's stuff) and then you can't rebase your feature branches [13:05:41] average_drifter: no, not yet, [13:05:53] that's the part that we have to copy from webstatscollector [13:06:12] gerrit doesn't violate anything, it's just too simplistic about the way it expects people to do collaborative editing. So it stops git from being as easy and powerful as it should be [13:06:15] VERSION=`git describe | awk -F'-g[0-9a-fA-F]+' '{print $1}' | sed -e 's/\-/./g' ` [13:06:28] i do think gerrit violates git [13:06:28] drdee: does that look at the current tags and make the next one bigger ? [13:06:40] git commit --amend as standard workflow is not right [13:07:06] particularly because git was all about have a trail of who did what when [13:07:11] oh, that part isn't gerrit's philosophy, I look at that as a bug [13:07:14] and you are not supposed to change that [13:07:25] you should be allowed to submit a second patch set to resolve a previous patch set [13:07:34] something like git review --amend pathSet1 [13:07:46] fork gerrit? :) [13:08:04] just press the fork button [13:09:00] average_drifter you have to push your tags first [13:09:14] something like git push --tags [13:09:17] or whatever [13:10:33] oh, interesting. Gerrit is written in Java. And it's on google code not github as far as I can see [13:12:29] woa, I wonder what this magic sauce does: https://github.com/fbzhong/git-gerrit [13:12:52] git gerrit apply 123 [13:12:52] git gerrit push [13:16:35] that seems much better than git-review [13:16:38] (on first sight) [13:21:48] yeah, take a look, see if it's good. I gotta focus on d3 for a bit, trying to understand how it layers everything isn't easy [13:22:15] brb, restarting [13:25:10] that's fast reboot [13:25:55] mooooooorning oooooottoooooomaaaataaaaaa [13:27:24] morninng [13:29:13] drdee: tags made [13:29:19] attomatically :) [13:29:23] drdee: tags pushed [13:29:28] let me pull [13:29:35] ok [13:29:40] yep got them [13:30:20] to the lab cave? [13:32:23] yes [13:32:33] a wordplay relating to the batcave :) [13:33:00] indeed [13:33:53] very sweet, no gcc warnings/errors/ and all tests are green [13:35:30] let me give you an invite for mikogo [13:35:53] drdee: ok [13:37:49] are you logged in> [14:20:28] brb guys [16:03:01] grabbing lunch [16:35:38] drdee: nope, i don't have that power [16:35:51] ok,ty, who has btw? [16:36:01] i think just sysops [16:36:08] err [16:36:13] no, i think maybe cloudadmin [16:36:26] ask #wikimedia-labs ;) [16:42:25] mornin kids [16:42:35] which power, drdee? [16:44:22] morning dschoon [16:44:32] ninja turtle power? [16:44:35] dschoon: the power to grant shell [16:44:36] word. [16:44:38] ahh [16:44:45] yeah, no idea which bit that is [16:44:52] it magically solved itself [16:44:57] the wondrous world of labs [16:45:01] labs! made of magic. [16:45:06] yup [16:45:07] well we still don't know which bit it is [16:45:38] anyways louisdang is now member of the hadoop project in labs, so he can start playing [16:45:59] cool, i'll start looking around [16:46:08] what's happening there? [16:46:23] so read careful the access tutorial on labs console regarding setting up SSH access [16:46:28] is there also a cassandra? [16:46:46] did we make a final decision on kafka and friends? [16:47:29] we're still pending on working out the client situation there [16:47:45] if you are interested in writing zk support for the C++ client [16:47:50] by all means, please do so :) [16:47:55] client for what? [16:48:11] for sending data from the cache servers [16:49:14] i'm not following [16:49:22] for which system is this? [16:49:26] bookeeper? [16:49:27] Kafka. [16:49:40] so, kafka is the current first choice then? [16:49:45] Kafka uses Zookeeper to coordinate and load-balance the producer clients. [16:49:52] yaeh, there's a whole big doc on the wiki [16:50:05] a kafka wiki or wikitech or ? [16:50:36] sec [16:50:47] https://www.mediawiki.org/wiki/Analytics/Kraken/Request_Logging [16:56:19] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [16:58:04] er, hangout just crashed for me. [16:58:56] i have my checkin with robla [16:59:28] actually...need to handle a situation first [16:59:33] hopefully just 5 min [16:59:34] ok [17:00:09] ps. robla, i'll be in the office around noon. [17:00:38] ottomata ^^ [17:02:06] i have an errand to run today, and it can only be done before 1pm, and before i leave for vacation tomorrow :P [17:02:29] okay, it keeps dumping me [17:03:17] i dunno what the deal is [17:03:57] dschoon, do you I'll be able to play with Maven + Eclipse + Storm before you head out for vacation? [17:04:02] yeah [17:04:05] can you get the tutorial done before then? [17:04:06] cool [17:04:11] that would be cool indeed [17:04:13] i should have something up at the end of today [17:04:17] ok cool [17:04:29] i want to also make a little sample application [17:04:35] that you can just check out of gerrit [17:04:39] nice [17:04:39] which has all the parts working [17:04:46] and i'll write up the steps to do that via Eclipse [17:05:14] and maybe it will embed prolog or something. [17:05:18] (looking at you, gerrit) [17:07:01] oh, mediawiki! so neither of my guesses [17:07:11] yeah, that's where all our stuff is [17:07:26] the root of all the kraken docs can be found there [17:14:19] doh, erosen [17:14:33] copy paste error [17:14:36] hehe [17:14:39] cool [17:14:40] when I edit the filters back on the 19th [17:14:41] 18th [17:14:44] the orange coasts [17:14:50] orange ivory coast logs [17:14:55] are in digi-malaysia file [17:14:57] i am fixing [17:15:42] cool [17:15:45] thanks [17:16:16] if you need to extract them out [17:16:19] i can show you how [17:16:23] you can use udp-filter [17:20:20] okay, gotta go run an errand. [17:20:29] back around noon, i think. sigh. [18:54:49] milimetric, what's cooking? [19:01:58] ottomata.... [19:02:03] question about whoami [19:02:13] you are drdee [19:02:18] :D [19:02:19] you're welcome! [19:02:23] thanks! [19:02:46] yes but my question does make sense :) [19:03:29] actually, I might be able to give a useful answer to a question about whoami [19:03:56] i thought whoami /groups would show all group memberships for current user [19:04:01] but it doesn't [19:05:00] "/groups" looks very Windows [19:05:39] try just "groups drdee" or whatever [19:06:11] yes that works [19:06:12] ty [19:06:16] no prob [19:07:16] louisdang: how are things with labs? [19:07:57] I looked around a bit [19:08:53] is pig installed on master? [19:09:09] no [19:09:18] but you also might want to upgrade hadoop first [19:09:28] ok [19:09:28] not sure which version is installed, best to install CDH4 [19:09:37] but ssh access is working? [19:09:43] yes [19:09:51] i have access to master, slave and rds [19:10:41] ottomata? [19:18:59] milimetric? [19:19:07] average_drifter? [19:23:30] I am here [19:23:38] * average_drifter raises hand [19:28:02] cool [19:33:52] ohhhh shit son, we got Capri Sun in the office [19:59:01] milimetric|lunch, just fyi, i'm about [19:59:14] oops [19:59:15] lol [19:59:38] word. [19:59:54] yeah, I figured out why the double click wasn't registering [20:00:01] it's because the brush was catching it and not bubbling it up [20:00:07] ahh [20:00:52] so i hacked it for now to just toggle the brush, but I'll think of something better [20:01:14] I'm working on zooming by using .data(filtered data).transition() [20:01:36] I hypothesize that's better than transforms [20:02:10] i would even go so far as to say that's where I was going with my first approach but had forgotten halfway there :) [20:02:45] hm [20:02:51] you'll have to fill me in [20:03:00] will do [20:03:04] but you could also just register a dblclick handler on the brush. [20:03:25] true [20:04:15] but - that's not overly complicated :) [20:04:33] hehe [20:04:44] limn: we promise complexity, and definitely deliver! [20:11:04] brb, restarting [20:11:07] hey guys [20:11:12] q about rc_page_request_data [20:11:31] is there a place that translates the project/language code into the column name in the csv? [20:11:38] i.e. [20:11:58] i want to know what http://(..)\.wikipedia\.org [20:12:08] i want the matches for the 6 columns [20:12:13] 6 language names [20:12:19] in that file [20:12:26] drdee maybe you know? [20:13:37] don't really understand your question [20:15:14] ottomata ^^ [20:15:27] um [20:15:31] so this [20:15:32] http://reportcard.wmflabs.org/graphs/pageviews [20:15:49] i want to translate 'Japanese', 'Spanish', etc [20:15:58] into the corresponding project language's in the domains [20:15:59] so [20:16:02] h [20:16:03] oh [20:16:03] why? [20:16:04] English => en [20:16:13] using it to match and count [20:16:15] i think that's hardcoded in the translation scripts [20:16:18] in a pig script [20:16:19] that's fine [20:16:25] i only need the ones that we generate for reportcard [20:16:27] so those 6 [20:16:32] get the yaml file [20:16:39] is this right? [20:16:39] en|ja|es|de|ru|fr [20:16:42] it isn't in the yaml file [20:16:46] or check the old_to_new files in the reportcard project [20:16:46] but i don't see why you would want to do that [20:17:05] how else am I going to extract the project_language from the url? [20:17:14] i'm just trying to duplicate reportcard data here [20:17:50] https://gerrit.wikimedia.org/r/gitweb?p=analytics/reportcard/data.git;a=tree;f=old_rc_new;h=4af6c9f63a00644cab9e6b118d8939b85b751a0f;hb=HEAD [20:17:54] somewhere in one of those [20:17:58] maybe? [20:18:15] i can look more closely if you want [20:18:16] why not count each subdomain [20:18:23] but i'm digging through maven atm [20:18:31] i'm looking [20:18:36] and then post process the results file and get the projects that you want [20:18:39] you want me to count all? [20:18:41] hmmm, ok [20:18:45] that's a more generic solution [20:19:04] don't know why we should restrict ourselves to some arbitrary selected projects [20:19:36] you guys are doing that with reportcard, no? [20:20:09] yes [20:20:22] i agree with drdee [20:20:30] tho it'd be good to have a mapping somewhere [20:20:33] that's canonical [20:20:36] definitely [20:20:44] you could extract it from ezachte's stuff on stats [20:20:50] but that should be fetched somewhere out of mediawiki [20:21:00] so you fix the whole internernalization stuff as well [20:21:09] heh. [20:21:12] some day. [20:21:21] they have it hardcoded [20:21:24] that's canonical [20:24:52] dschoon, joke's on us, brush doesn't like it when you try to tell it how to handle its clicks [20:24:57] heh [20:25:19] out of curiosity, does the click ever make it to the div below the svg? [20:38:25] sweet. my old maven stuff still works. [20:49:53] cooool [20:49:56] ok, running: [20:49:57] https://github.com/wmf-analytics/kraken/blob/master/src/pig/monthly_subdomain_counts.pig [20:50:01] woo [20:50:03] http://analytics1001.wikimedia.org:8088/proxy/application_1349459834725_0024/mapreduce/job/job_1349459834725_0024 [20:50:08] this is running on all sampled1 data [20:50:16] nov 2011 - oct 2012 [20:50:24] sorry [20:50:27] sampled1000 [20:50:39] nice [20:52:34] brb [21:44:53] raw raw raw [21:44:57] it finished, but the numbers do not look right [21:45:43] can you paste them in gist? [21:47:11] ah i just realized I had left the en|ja|.. stuff in [21:47:32] https://gist.github.com/3842613 [21:47:33] but still [21:47:46] its missing too many recrods [21:48:02] there is only one entry for 2012-01 [21:48:04] en [21:48:07] no other languages? [21:48:09] that is not right [21:48:20] ohhh it is not sorted [21:48:21] hehe [21:48:46] that looks better [21:48:51] still need to do all subdomains [21:48:53] but those are the 6 [21:50:10] um, hey [21:50:14] rc_page_requests.csv [21:50:17] has 9 column headers [21:50:20] but only 8 columns [21:50:41] those columns are ignored [21:50:46] you have to look at the yaml file [21:50:56] https://gerrit.wikimedia.org/r/gitweb?p=analytics/reportcard/data.git;a=blob;f=datafiles/rc_page_requests.csv;h=6db9dbd35b370106aaf0ecda7ff3c43162d06276;hb=HEAD [21:52:20] it outputs both total and all projects, that's the same header actually [21:52:36] so there should be only 8 columns [21:53:38] oh reportcard sums them into total? [21:53:45] what's the difference between total and all projects? [21:55:00] ha welp, for english 2012-09 [21:55:06] numbers are way off: [21:55:19] reportcard: 9165370101 [21:55:19] pig: 25288316000 [21:55:24] (i scaled pig up) [21:58:12] drdee, ottomata, dschoon: you guys are all on the wmfresearch list, aren't you? [21:58:18] don't thikn so [21:58:19] yup [21:58:23] and HaeB? [21:58:31] dunno [21:58:41] i am [21:58:51] ottomata: k, I'll go ahead and add you if you don't mind (it's pretty low volume) [21:58:56] ottomata, all projects and total is the same as i said [21:59:18] cool [21:59:26] but why are there 2 headers for one column? [21:59:43] that's a bug [22:00:09] in reportcard? [22:00:16] or limn? [22:00:20] i mean. [22:00:24] no in the python scripts generating those files [22:00:36] drdee / ottomata, was just writing you an email by coincidence - sent ;) [22:00:54] wouldn't limn/dygraphs not read it properly then? [22:01:21] no it ignores it completely [22:01:24] it reads the yaml file [22:01:35] oh, yaml…oh and then chooses the columns? [22:01:36] right [22:01:38] hmm, right? [22:01:38] hmm [22:01:58] yes [22:02:29] but pig is off because you don't follow erik zachtes business logic [22:02:39] so you have to look for text/html in the mimetype [22:03:17] probably only count 200/302 status codes [22:04:06] but before spending a whole time on this, it makes more sense to capture erik's business logic in a flow diagram [22:04:17] and then use that flow diagram to write the pig script [22:04:50] haha, ok [22:04:59] um, does that exist? does someone konw that? [22:05:36] well it exists in my head, in ez head and in the perl scripts [22:05:46] so it desperately needs to be written down :D [22:06:20] but you can make the two fixes that i suggested [22:06:26] you should come much closer [22:07:43] hm [22:07:46] ok [22:08:37] drdee: I'm trying to install CDH4, but I can't format the HDFS as another user, it says: Sorry, user louisdang is not allowed to execute '/usr/bin/hdfs dfs' as hdfs on i-0000007a.pmtpa.wmflabs. [22:08:59] hmm [22:09:15] hmm, you should be able to, hmmm, [22:09:17] but [22:09:18] try to sudo [22:09:23] oh [22:09:24] ottomata, can louisdang use your puppet script to install CDH4? [22:09:38] yes, but it probably isn't worth setting it up [22:10:35] ok [22:11:32] why do you need to run hdfs dfs? [22:12:08] to format? [22:12:13] yeah [22:12:18] oh you are running this? [22:12:18] sudo -u hdfs hdfs namenode -format [22:12:21] do you have sudo perms? [22:12:24] i thikn you do, right? [22:12:27] can you sudo -s [22:12:27] he should [22:12:27] sudo -s [22:12:29] to root? [22:12:29] yes, that's the command I used [22:12:35] it probably won't let you sudo -u without special perms [22:12:36] so [22:12:37] sudo -s [22:12:37] sudo -u [22:12:38] then try to run it [22:12:42] sudo -u hdfs hdfs namenode -format [22:12:49] ok I'll try that [22:13:45] yes that works. didn't think I had root [22:14:05] louisdang, so that machine is a VM [22:14:16] that's why you have root [22:14:25] oh ok [22:14:33] if it breaks you fix it or we delete the instance :D [22:17:09] alright, I'll be more adventurous then [22:23:47] ok, trying again, still only selecting those few languages [22:23:55] but filtering for 302, 202, and text/html [22:31:32] all right guys have a great loooooooong weekend! [22:31:39] :) [22:34:57] have fun [22:47:09] bye drdee [22:47:13] have a great weekend [22:54:02] hm [22:54:07] closer: [22:54:07] 2012-09 [22:54:08] pig: 6944272000 [22:54:08] limn: 9165370101 [22:54:31] but now I have fewer! [22:54:33] growl [22:54:33] hehe