[11:07:03] so you would like a benchmark on a n270 atom ? [11:07:13] ops wrong channel... [13:55:59] I'm at the dentist [13:56:04] they have wireless [13:56:12] it's boring [14:01:26] hey average_1rifter [14:01:32] I LOVE dentists in Romania [14:01:50] milimetric: you do ? [14:01:52] milimetric: hi [14:01:56] totally [14:02:09] milimetric: why's that ? dentists in .us are not as skilled ? [14:02:14] a guy here charged me 2000 to mess up my wisdom tooth extraction [14:02:17] my dentist is a nice blonde lady in her 40s [14:02:20] I was infected for two months [14:02:35] wow hmm [14:02:47] Then I went to another guy who charged me 600 to xray my teeth and give me some antibiotics that didn't work [14:03:14] hmm that's a lot [14:03:17] I came to Romania and for $100 (private dentist, no insurance) she saw me three times and nursed me completely back to health [14:03:20] with no dangerous drugs [14:04:03] oh she said the antibiotics that they gave me in the US were a classic mistake - waaay too strong [14:04:36] anyway - just know that doctors there overall are doing it because they love what they do and not to get paid. And that's pretty rare [14:05:12] (that's just one anecdote but I've been to many great doctors there and many shitty ones here) [14:10:46] * YuviPanda is going to a dentist for the first time after ~10 years in about 2 weeks [15:13:56] morning guys [15:14:24] milimetric, can you email me instructions on how to setup a full working instance of limn on my local machine? [15:19:51] morning ottomata [15:25:14] mornign! [15:25:40] drdee, quick checking about priorities: [15:25:48] yo] [15:25:52] when paravoid is available [15:25:56] i'm going to work on puppet [15:25:58] when not [15:26:08] oozie automatiing [15:26:13] s'ok? [15:26:15] drdee my IRC isn't pinging me anymore - I'll get those instructions up on the Readme [15:26:15] yes [15:26:24] (dropping proxy/access priority) [15:26:43] the reason i am asking i want to do demo a kraken workflow using ssh tunneling and limn on my machine [15:27:07] so i can point limn to localhost for a datafile [15:27:11] that should work right? [15:27:47] actually [15:27:58] drdee, before oozie, shoudl be a cron to copy generated reports to stat1, right? [15:28:01] stat1001 [15:28:06] so that limn can graph them? [15:28:20] the things from /wmf/public/ ? [15:28:23] right [15:28:27] oh yeah I could just sync the whole thing [15:28:28] that'd be fine [15:28:29] yes good idea [15:36:09] ottomata, can you paste one more time the command for ssh tunneling? i will write it down now :) [15:36:33] i should write a script and put it in kraken repo :) [15:36:54] ssh -v bast1001.wikimedia.org -L 8081:analytics1027.eqiad.wmnet:8888 [15:36:56] that's for hue [15:39:10] do y'all know how to put tunnel commands in the ssh/config? [15:39:22] ottomata: ^ [15:40:36] in .ssh/config, naw never done that before [15:40:42] i think i have it [15:40:47] i'll send it along when it works [15:42:19] ottomata: i think i've got the config file working [15:42:29] but I'm getting a public key denied message [15:42:41] did I lost access as a result of the security thing? [15:42:53] s/lost/lose/ [15:44:29] don't think so, just use bast1001.wikimedia.org though, to be safe [15:44:36] instead of analytics1001 [15:44:43] that's what I'm doing, i think [15:44:45] hm [15:44:50] can you just ssh into that? [15:44:51] i just copied the line you sent to diederik [15:45:04] no [15:45:19] ssh erosen@bast1001.wikimedia.org [15:45:20] Permission denied (publickey). [15:45:42] drdee: install instructions for Limn: https://github.com/milimetric/limn#install [15:46:00] hmmm, weird [15:46:03] well try analytics1001 then :p[ [15:46:22] yeah that works [15:46:44] thanks for the help [15:51:46] cool visualization: http://www.guardian.co.uk/world/interactive/2013/feb/12/state-of-the-union-reading-level [15:55:30] erosen: ping [15:57:15] geohacker: hey there [15:58:01] erosen: I've cleaned up whatever dumps you shared and have started tinkering with them [15:58:21] nice [15:58:29] was wondering whether we can find rest of them :-/ [15:58:41] edit size, interesting [15:58:55] also edits by geography, sounds a bit tricky now. [15:59:00] do you mean the number of bytes of an edit on average? [15:59:07] erosen: exactly. [15:59:28] that will show us how productive the edits were. [15:59:34] indeed [15:59:42] i don't have such a dataset sitting around myself [15:59:51] but I suspect that some of ErikZ's files do [15:59:58] let me look around wikistats [16:00:08] that will be awesome. [16:00:49] just looked and I suspect I was misremembering [16:01:02] however, one approach would be to use number of edits during a period [16:01:08] and change in the article namespace total size [16:01:14] over that same period [16:01:21] and just compute the average your self [16:01:35] hmm that sounds good. [16:01:43] now erikz [16:01:55] 's size field isn't updated for a lot of the wikis [16:02:02] yeah. [16:02:05] but I believe i have that data on gp-dev [16:02:10] okay. [16:02:17] let me look [16:02:21] cool [16:02:42] erosen: also, do you think I should scrape out rest from stats? or can we find the dumps somewhere? [16:02:52] i [16:02:58] let me look a bit right now [16:03:11] awesome. thank you so much! [16:03:11] I suspect that there is a csv sitting somewhere on a server which I use [16:03:26] and it would be good to expose more numbers through the gp-dev dashboard [16:04:17] most of the code that I'm going be writing will use d3. [16:04:30] awesome [16:04:31] I'm sure we can hook that up to the dashboard [16:04:46] :) [16:05:07] not to push framework on you, but there is already a map functionality in the dashboard software we are developing, limn [16:05:32] great. didn't know that. [16:05:36] though I suspect you might want more customization [16:05:56] let me take a look [16:06:25] milimetric: do you have the link to the limn map handy? [16:06:39] which one, the old fabian one? [16:06:48] no the new dechoon one [16:06:53] http://reportcard.wmflabs.org/graphs/editors_by_geo [16:06:55] I'm checking github [16:06:57] i mean of that data [16:07:19] geohacker: that is the one i was thinking og [16:07:51] neat [16:08:26] geohacker: looks like I have the total size handy, but for some reason I am having trouble finding the link [16:09:12] erosen: uh oh. :-/ [16:09:25] should be okay [16:10:21] that map is neat. [16:10:40] do we have some way to do it at the country level? [16:10:51] I'm going to be doing that. [16:10:57] if I get the data. [16:11:17] drdee, erosen, if you pull kraken [16:11:22] bin/ktunnel [16:11:27] map edits by geography in the Indian states. [16:11:32] easy use: [16:11:33] ktunnel hue [16:11:40] will open a tunnel to hue, etc. [16:11:44] oh nice [16:11:52] danke [16:12:39] thanks otto [16:12:41] ja [16:12:47] it'll print the url you can navigate to as well; [16:12:48] when you use it [16:14:03] so, hm [16:14:05] milimetric [16:14:20] if limn/reportcard was running on a prod host [16:14:22] stat1001 for example [16:14:29] it could access data in kraken without having to copy it off [16:14:31] in /wmf/public [16:14:52] geohacker: bad news about the db size, looks like the files are no more up to date than wikistats [16:14:56] we need to puppetize limn :) [16:15:09] someone was working on .deb, right? [16:15:11] that's the first step [16:15:15] http://gp-dev.wmflabs.org/graphs/indic_language_db_size [16:16:30] geohacker: ^^ [16:16:47] erosen: checking [16:18:05] i can run the analysis myself without too much trouble, but I'm not sure what sort of schedule i can commit to [16:18:27] erosen: eh. why does it say zero beyond a point from 2011? [16:18:42] the script must have stopped collecting the data [16:18:48] darn [16:19:02] it seems to match the point in http://stats.wikimedia.org/EN/TablesWikipediaHI.htm where the numbers disappear for the db size colun [16:19:20] I just guessed that. [16:19:56] i've made graphs like this before: http://gp-dev.wmflabs.org/graphs/ar_bytes_monthly [16:20:00] what do we do? [16:20:19] just never made it automatic [16:20:30] oh nice. [16:21:02] i can take a crack at adding graphs for all of indic languages [16:21:04] erosen: do you think we can run the scripts specific for indic? [16:21:20] yeah, i can at least [16:21:27] that would be fantastic. [16:21:29] it uses the databases at the moment [16:21:37] do you have a tool server account by chance? [16:21:46] i'm so sorry for this trouble. [16:21:46] erosen: nope. [16:21:50] no worries, it's all things that should be done eventually [16:22:24] I can create an account and request access. [16:23:04] milimetric, erosen, I forget what our status was on oozie + limnification [16:23:08] not sure it is entirely worthwhile [16:23:15] milimetric, did you build something cool into limn to graph tsvs? [16:23:17] something like that? [16:23:20] ottomata: that last comment was for geohacker [16:23:24] (aye) [16:23:30] erosen: I'll do what you suggest. [16:23:37] ottomata: i think dan did [16:23:48] tsv is graphable [16:23:50] nothing special [16:23:54] works just like csvs [16:24:03] it needs datasources yamls, right? [16:24:05] milimetric: does it pivot though? [16:24:09] the "cool" thing I built is the ability to pivot [16:24:11] but those only need to be generated once? [16:24:12] riiiight [16:24:13] nice [16:24:15] but that's just hardcoded [16:24:20] like - it pivots on column 5 [16:24:21] :D [16:24:25] ok, can we try something then? [16:24:25] hehe [16:24:31] i have a .tsv of webrequest loss hourly [16:24:34] in kraken [16:24:47] its got a buncha columns [16:24:51] yes, but is it super urgent? Right now I found two annoying bugs that got introduced somewhere in the last month [16:24:55] and I'm git bisecting my heart out [16:24:56] nawww, not at all [16:25:10] just trying to figure out the best way to make this data available to limn [16:25:21] ok, then I promise I'll give you my undivided as soon as I figure these bugs out [16:25:24] real quick though, do you thikn I should just wait until limn is productionizable (.deb + puppet)? [16:25:40] i wouldn't have to think about this if so, since it would be able to access the files in kraken from stat1001 [16:25:41] drdee hasn't updated me on when that's going to happen [16:26:12] drdee, average_drifter - where's debianization on the list? It's starting to hold things up [16:26:29] oh he said average_drifter's doin dentist stuff for a few days [16:26:32] right [16:26:33] milimetric: I am working on the package as we speak [16:26:35] ok ottomata so we shouldn't wait then [16:26:36] as we speak [16:26:42] I started writing the files in debian/ [16:26:44] cool, thanks! [16:26:45] dude take it easy :) [16:26:45] in a separate branch [16:26:55] no need to work while you're recovering from tooth stuff [16:26:59] I was to the doctor and they scheduled me next week btw [16:27:15] oh, ok [16:27:38] thanks average_drifter! [16:27:43] erosen: so how do we go about this? [16:28:02] i'm going to generate some graphs for all of the indic langauges [16:28:04] ottomata: yeah, so let's just plan on debianization and puppetization being done sometime relatively soon [16:28:12] it should be pretty easy to do as a one off [16:28:24] erosen: okay. great. [16:28:26] and if you need them to be regularly updated, we can do that later [16:28:41] erosen: sounds good. [16:46:04] first push [16:46:18] far away from the first package, but will have one soon [16:46:28] limn branch is called debianization [16:46:34] feel free to keep an eye on it [16:49:12] geohacker: here is the hindi graph http://gp-dev.wmflabs.org/graphs/hi_bytes_daily [16:49:21] should be able to do the others easily now [16:50:28] erosen: checking [16:51:12] sync-ing git log with debian changelog [16:51:19] wow there's a lot of commits in the limn repo [16:51:37] erosen: awesome. this looks good. [16:51:55] erosen: is this a lot of manual work? or do you have a workflow? [16:52:18] it's a workflow [16:52:29] perfect. [16:52:32] i had a script which was parameterizable by language [16:52:44] awesome [16:52:56] so I just set it up to take in a list of languages and hooked it up to my dashboard deploy setup [16:53:59] great. [16:54:23] erosen: so I'm checking edit size from the etherpad. [16:54:30] ottomata: can you please help me with htaccess thingie ? [16:54:32] cool [16:54:45] ottomata: I have a directory /home/spetrea/www/private on stat1 [16:54:53] ottomata: I created a .htaccess and a .htpasswd for it [16:55:01] oh yeah, sorry, gimmee fiiiiive minutes [16:55:02] sorry [16:55:03] will help [16:55:18] ottomata: but I also need from someone who has access to configure Apache and restart it so I can use my private dir [16:55:21] ok thanks [16:56:29] erosen: this is the size of the db daily right? [16:56:49] yeah let me tell you a bit about how it works (I'll include this in the graph description) [16:57:02] okay. [16:57:24] basically it iterates over the revisions and looks at the changes in the size of the article in bytes [16:57:41] it then integrates those per revision changes into the total size as of a point in time [16:57:48] the graph I gave you has daily resolution [16:58:18] however it also generates a monthly value: http://gp-dev.wmflabs.org/graphs/hi_bytes_monthly [16:58:20] erosen: hmm. I might have to aggregate it monthly. [16:58:25] ah cool [16:58:57] and a have a month to month change graph too: http://gp-dev.wmflabs.org/graphs/hi_bytes_added_monthly [17:00:07] erosen: what are those negative values? [17:00:07] geohacker: also a quick note about something I noticed: these numbers can be negative if a lot of content or articles were deleted [17:00:09] hehe [17:00:15] ah :D [17:00:23] I guess we both noticed the same things [17:00:32] *fist pump* [17:00:43] *fist pump*, indeed [17:01:17] the other languages are still churning, but shouldn't be more than another 20min [17:01:25] perfect. [17:02:15] erosen: I think if we take the daily dump, that will reflect the edit size. right? [17:02:28] for that day [17:02:51] i don't think i follow [17:02:59] what do you mean by daily dump? [17:03:11] erosen: sorry. I meant the daily graph. [17:03:39] the differences in the daily_SIZE graph should give you the amount of content added [17:03:49] I can also add a daily bytes added graph [17:03:53] if that is useful [17:04:07] erosen: yes. perfect. [17:04:33] k, one sec [17:15:05] brb [17:19:51] geohacker: here is the daily graph for Hindi: http://gp-dev.wmflabs.org/graphs/hi_bytes_added_daily [17:26:35] milimetric: Do you have another limn repo? [17:28:59] erosen: back now. hmm that table is empty. am I missing something? [17:29:03] hmm [17:29:11] it shouldn't be [17:29:17] erosen: http://gp-dev.wmflabs.org/graphs/hi_bytes_added_daily [17:29:41] oo interesting [17:29:45] :-/ [17:29:47] i must have updated it just now [17:29:54] it was working in my browser [17:30:20] one sec [17:30:26] sure :) [17:31:28] weird, now it works [17:31:34] for me at least [17:31:42] geohacker; can you check (same url) [17:33:03] geohacker: also, all of the other languages finished, I am rerunning so as to generate a daily bytes added graph (which will be much faster because the queries are cached from the first run) [17:33:04] checking [17:33:23] geohacker: i added a list of the languages I am using to you etherpad document [17:33:40] mornin [17:33:49] erosen: works now. [17:34:32] great [17:34:40] hey preilly - sorry was at lunch [17:34:46] We work on limn via forks [17:34:57] so wikimedia/limn is the only repository that matters [17:35:00] milimetric: What is limn sanity ? [17:35:02] we deploy from there [17:35:31] milimetric: I understand the fork and pull request model I was asking about another repo that I heard about [17:35:34] oh, heh, that was just a proof of concept that d3 and knockout could be used in a very simple way [17:35:41] erosen: saw the etherpad. perfect. I just have to change URL language cod right? [17:35:48] *code. [17:35:50] milimetric: where does that live? [17:36:10] it's on my milimetric github account, but it was never meant to see light of day - perhaps I should delete [17:36:23] all the limn relevant work I did got merged into limn proper [17:36:55] * preilly — Forking milimetric/limn-sanity [17:36:58] I just kind of like the name and was thinking I could use it for other proof of concepts [17:37:13] geohacker: yup, justs replace the language code in urls [17:37:14] hahaha, it really probably doesn't even work at this point [17:37:43] geohacker: all of the languages should now have a daily_bytes_added graph/data [17:37:53] preilly: you interested in working on limn? :) [17:38:00] geohacker: what were the remaining dimensions you were interesting again? [17:38:21] erosen: the ones at the bottom of the pad [17:38:23] milimetric: nope I'm just reviewing code [17:38:42] geohacker: so, edits by geography? [17:38:45] or all of the ones form wikistats [17:38:55] erosen: all of them, I just updated it. [17:39:03] k [17:39:07] preilly: ok, then. Let me know if you have any questions [17:39:08] god i love aesop rock. [17:39:24] milimetric: will do — thanks [17:39:27] fyi, kraigparkinson -- i'll be in the office after scrum [17:40:01] dschoon: have you seen Ian perform live? [17:40:21] i missed him when i saw rjd2 back in 2010 :( [17:40:31] erosen: do we have the edit size monthly as well? [17:40:38] dschoon: he is live in Oakland, CA on Apr 22 [17:40:50] geohacker: do you mean bytes added? [17:41:09] erosen: oh yes. sorry. I keep going back to what we wrote in the pad. [17:41:09] if so, yes: http://gp-dev.wmflabs.org/graphs/hi_bytes_added_monthly [17:41:09] dschoon: here is the link: http://www.thenewparish.com/event/195433-aesop-rock-rob-sonic-dj-oakland/ [17:41:32] cool, thanks, preilly [17:41:41] geohacker: however something weird is going on with that graph I just linked [17:41:59] erosen: yes. just noticed. [17:42:01] :-/ [17:42:04] ya [17:42:31] erosen: firebug tells me that n is undefined formatters.mod.js line 60 [17:42:45] yeah [17:43:46] i suspect it has to do with negative values or something funny [17:43:54] uh oh [17:43:56] milimetric: have a sec for a limn q? [17:44:26] you can still grab the csv at least:http://gp-dev.wmflabs.org/data/datafiles/gp/hi_bytes_added_monthly.csv [17:44:29] just not as pretty [17:44:35] hi [17:44:44] erosen: i bet this is somehow callout-node related [17:44:46] erosen, what's up [17:45:00] we're seeing an unexpected behavior here: http://gp-dev.wmflabs.org/graphs/hi_bytes_added_monthly [17:45:02] i think i've mostly excised formatters.js otherwise [17:45:02] in favor of d3.format [17:45:08] gotcha [17:45:39] erosen: also the link to the csv in the graph page has two ../gp/gp/.. [17:45:51] yeah, i noticed that [17:45:51] ooo [17:45:57] i think i know what is happening [17:46:04] milimetric: don't worry about it [17:46:34] geohacker: this is a result of a hack which does a sed command i run on the data files. one sec [17:47:13] erosen sure. [17:50:16] geohacker: fixed, i think [17:50:23] bleh. stupid wifi. [17:50:25] erosen, sorry. what was the issue you were seeing? [17:50:36] erosen: checking [17:51:22] erosen: awesome. [17:51:25] thanks! [17:51:57] erosen: should we start looking at the other items in the list? [17:52:04] dschoon: it was a js error which had to do with a formatter and some "toFixed" method [17:52:24] dschoon: but it turns out it was just because the datasource was pointing at the wrong path for the datafile [17:52:27] geohacker: sure [17:53:14] erosen: also, if we can document this somewhere, at least the workflow, I think that will make things easy next time. [17:53:40] ahh. yeah, toFixed is a method of Number, so you're right that it was probably n == null [17:54:25] fyi messing with namenode conf stuff [17:54:30] things funky for a sec [17:54:32] geohacker: i'm a fan of documentation, but I'm not sure I know what you mean [17:54:44] geohacker: do you mean the code that generates these graphs? [17:54:57] erosen: both the code and the process. [17:55:01] I'm planning on setting them up to run on a cron job [17:55:09] erosen: right. [17:55:11] i can show you the code though, in case you are interested [17:55:26] would love to take a peak! [17:55:40] erosen: are these erik's perl scripts? [17:55:55] I might have seen them. [17:55:57] these are my python scripts which interact with the databases [17:56:11] erosen: ah clean. would love to see them. [17:56:34] * geohacker is not a fan of perl. never. [17:57:08] hehe [17:57:29] here it is on gerrit: https://gerrit.wikimedia.org/r/gitweb?p=analytics/global-dev/dashboard.git;a=blob;f=db_size/total_size.py;h=8705e3f98cb36ef7f8e43a2146bc9f1fc13b1837;hb=HEAD [17:59:22] erosen: thanks. this looks neat. [18:00:22] feel free to extend or ask questions, if you do get an account on toolserver or when wikimedia labs get the slave dbs running, you should be able to run it on any mediawiki db [18:00:30] ahoy ahoy. https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [18:02:55] erosen: that sounds exciting. I'll request for a toolserver access. [18:06:42] erosen: next is number of new user accounts. [18:07:50] i think that number is handy [18:09:15] geohacker: here is the new editors column from ErikZ: http://gp-dev.wmflabs.org/graphs/indic_language_new_editors [18:10:21] erosen: brilliant. [18:10:30] just updated the etherpad. [18:10:53] erosen: next is number of articles. [18:16:14] average_drifter [18:16:21] what is it you want to keep in private/ [18:16:21] ? [18:17:21] Can't seem to jump back into the hangout. Is it done? [18:17:57] yeah [18:18:14] k [18:18:20] Thanks [18:18:51] geohacker: sorry for the delay [18:19:03] ErikZ has a new articles column [18:19:24] http://gp-dev.wmflabs.org/graphs/indic_language_new_articles_per_day [18:19:30] but it is a little complicated [18:20:00] it is the average number of new articles per day for each month [18:20:03] erosen: checking. [18:20:29] it is meant to normalize months with different lengths [18:20:59] erosen: oh okay. [18:21:29] erosen: if the monthly is not available, I can aggregate them. [18:21:46] ottomata: private files [18:22:06] ottomata: I currently have .zip files with passwords in /home/spetrea/www/ [18:22:13] geohacker: i think those numbers are monthly [18:22:14] ottomata: and drdee told me I should put them somewhere safe [18:22:26] they just represent a the average per-day value for that month [18:22:29] ottomata: so I thought about having a passworded /home/spetrea/www/private [18:22:39] what passwords? [18:22:59] ottomata: well basically you see some zips in /home/spetrea/www right ? [18:23:08] erosen: yes. that's perfect. [18:23:14] we'll actually guys never mind [18:23:14] ottomata: well those zips have passwords because they contain squid logs and stuff which have IPs in them [18:23:19] great. quick progress [18:23:44] average_drifter, ottomata, let's do it different [18:23:49] erosen: yes! this is awesome. just updated the pad. [18:23:49] just leave the files on stat1 [18:23:55] and ping me when they are ready [18:24:01] i will ssh and get them myself [18:24:13] drdee: so not exposed through http, just on stat1 [18:24:23] ok [18:24:25] yes [18:24:29] erosen: I think the number of articles can be calculated out of the new articles data. [18:24:38] erosen: next is page views. [18:24:51] geohacker: i was thinking that too. the only snag is deleted articles [18:25:07] erosen: oh I missed that. [18:25:23] hrmph. [18:26:00] but i found a column in an erikZ csv which has num articles [18:26:36] should have a graph in a second [18:26:51] awesome. [18:28:06] geohacker: here it is: http://gp-dev.wmflabs.org/graphs/indic_language_num_articles [18:28:19] * geohacker clicks. [18:29:08] erosen: great. \o/ next is page views. [18:29:47] geohacker: also for a definition of what an article is in those counts, see: http://stats.wikimedia.org/EN/TablesWikipediaHI.htm [18:29:58] and crtl-f "Articles (excl. redirects)" [18:30:14] or here: http://www.mediawiki.org/wiki/Analytics/Metric_definitions [18:30:17] drdee: are you in the office? [18:30:24] yes i am [18:30:31] hi (wave) [18:30:44] erosen: noted. [18:30:47] thanks. [18:30:48] wanna come to 6 for the analyst scrum? [18:30:51] sure [18:30:53] I have 62 booked [18:30:59] on my way [18:31:03] k cool [18:31:35] geohacker: so for page views, i don't have a good answer yet [18:31:55] i know that such data gets used in graphs like this one: http://reportcard.wmflabs.org/graphs/pageviews [18:32:04] milimetric: any ideas on where this data comes from? [18:32:36] hmm [18:32:38] Erik Zachte sends me a zip and I convert it into limn-friendly erosen [18:32:43] k [18:32:55] drdee - any idea where EZ gets his pageviews data that he sends me monthly? [18:33:17] also - something I've been meaning to ask for a while: [18:33:27] my hope is that it lives somewhere on stat1, which I can parse for the dashboard [18:33:31] everyone: when are we going to move EZ's reportcard metrics to our new system? [18:33:42] DarTar: you in the hangout? [18:33:55] erosen, milimetric: I found these - http://dumps.wikimedia.org/other/ [18:34:07] erosen: joining in a sec [18:34:18] cool, just checking i was at the right place [18:34:30] actually, YuviPanda sent me there. [18:34:47] geohacker: so there is definitely this: http://dumps.wikimedia.org/other/pagecounts-raw/ [18:34:53] that is the main source of article level page counts [18:35:48] erosen: hmm. that requires quite a bit of cleaning up. [18:35:52] you could aggregate by project to get the number you want [18:36:24] yes. [18:36:37] but there must be an aggregate number somewhere [18:38:31] milimetric: ok I got somethin in place [18:41:26] erosen: any luck? [18:41:32] awesome average_drifter [18:41:34] let's chat [18:42:02] geohacker: sorry, in a short meeting [18:42:47] erosen: sorry. no worries. [18:43:11] happens to be with diederik / drdee, who will know more, hopefully [18:59:04] milimetric: aww, thin lines, love it :) [18:59:26] :) DarTar: I had that from the start, but the code wasn't pushed to dev [18:59:46] area graphs aren't stacked and descriptions aren't there [18:59:56] but I have a whole file of notes on all the details [18:59:57] I have a pretty crazy day today (last day in the office, flying out to BOS tomorrow), so it'll probably need to wait until the weekend [19:00:02] so don't go doing anything with it before we talk [19:00:06] yeah, no problem [19:00:18] just let me know if you want to make changes and we'll talk :) [19:00:30] Monday the office is closed, do you want to set up a time to talk on Tue? [19:05:03] erosen: I really have to hit the sack now. Do you think we can figure out the rest tomorrow? [19:05:30] and this is awesome. we just have two more to go! thank you so much! [19:05:40] yeah [19:05:49] i actually just found a file with the per project page counts [19:06:00] should be able to get those graphing in a a few min [19:06:00] awesome [19:06:10] but we can totally check in tomorrow [19:06:15] perfect [19:06:19] tomorrow then. [19:06:31] thank you so much erosen! [19:06:40] np, glad to help [19:17:27] milimetric: re: performance of the big hairy hourly registration plot, removing smoothing on Chrome looks ok (3 secs to load, tolerable but still slower than the dygraphs version) but on Safari it freezes my browser for more than 10 seconds, which is not good [19:17:58] DarTar: yeah, it's part of that bug in Safari I mentioned before [19:18:04] kk [19:18:17] when they draw paths they have a very bad O(n) algorithm for adding *each* segment to the path [19:18:40] so that basically kills performance I think with any SVG until they grab the patch that Chrome submitted [19:27:03] milimetric: ic [19:28:25] geohacker: just in case your still around: http://gp-dev.wmflabs.org/graphs/indic_language_page_views [19:45:44] erosen, i pushed the first version of the pig cidr function [19:45:49] nice [19:46:15] you invoke the function by supplying a comma separated list of CIDR [19:46:24] interesting [19:46:25] i mean you register the function [19:46:32] gotcha [19:46:34] and then you call Cidr(ipadddress) [19:46:37] nice [19:46:41] it will return true/false [19:47:42] not that this isn't useful, but can you remind me why we need to do cidr matching with the new x-cs set up? [19:48:13] to do pre-launch zero analysis [19:50:30] gotcha [19:50:49] just fyi, i have the code to this in python [19:51:10] ok didn't know that :( [19:51:18] sorry about that [19:51:25] i mean this will be much faster i'm sure [19:51:36] it was very slow in python [19:51:54] i also have a json object which maps from carriers to cidr ranges [19:51:58] it took me about 1.5 hour plus test case so it was not like a huge time waste [19:52:18] so we can turn that into a java map and do the lookup with a UDF, which will be nice [19:52:21] oh nice [19:52:28] where is the json file? [19:52:38] erosen: PM'ing you with a SQL access issue, when you have a sec [19:53:08] k [20:09:40] ok erosen, same issue here, user can only see log + information_schema [20:09:56] cool [20:10:00] * DarTar dropping a line to PY [20:12:39] I do see the DB when using the regular research user tho [20:15:11] interesting [20:15:20] must just be a permissions issue [20:21:50] i'ma eat some lunch [20:22:15] then milimetric lmk if you still need anything from me to get unblocked? [20:22:24] brb [20:22:40] I'm ok dschoon, starting puppetizing [20:33:04] can I ask somethin about Travis CI ? [20:33:26] can you assume they have git installed on the machine you're running tests on ? [20:33:36] also, do they have an IRC chan ? [20:34:14] uhm, also can you install stuff on there, like for example maybe you have some deps(maybe stuff that needs gcc to compile) [20:38:29] average_drifter: I don't know all the answers, but their docs are pretty good http://about.travis-ci.org/docs/ [20:41:56] gonna skim through them [20:41:58] dschoon: thanks [20:42:56] milimetric: aiight. i imagine you'll have to do some wrestling with npm and node; if you want to talk through any of that i'm happy to help [20:58:58] average_drifter: Travis CI gives you a full VM to do basically whatever you want. Based on the "language" you declare that your project has, the VM comes pre-loaded with most everything you need. If you need other stuff, you can install as part of scripts that you define in .travis.yaml [20:59:41] average_drifter: you can look at the Limn .travis.yaml for an example, but Travis already has everything we need in the Node-specific VM [21:37:54] drdee_, dschoon, I'm writing a wikipage / email witha summary of the existing system and things we'd like to change [21:38:10] do you think it is bad to mention that we are saving raw IP addresses on mediawiki.org? [21:38:33] *blinks* [21:38:35] oh. [21:38:51] heh, i thought for a moment you were saying we were storing them in a wiki [21:38:57] which was like, wait wait. we're not THAT incompetent. [21:39:01] hahah, nono [21:39:09] just mentioning the fact on a public wiki page [21:39:29] consult robla [21:40:10] he's on a retreat! [21:40:14] oh, true. [21:40:20] go ahead and leave that off [21:40:21] erik M is too, right? [21:40:26] no, moeller is here [21:40:28] oh ok [21:40:31] i will ask him when i go back downstairs [21:40:34] ok cool, danke [21:40:41] yeah just ask if it is ok to mention it [21:42:06] i think if you want to use more vague language ("data sanitization is not yet satisfactory") that'd be fine [21:43:46] ok cool [21:57:13] ottomata: I am doing a dummy's guide to puppet :)http://docs.puppetlabs.com/learning/ral.html [21:57:28] when do you think you'll have time to puppetize Limn with me? [21:57:35] I speak broken puppet now ! [21:59:09] we got .deb? [21:59:11] :) [22:00:34] yep, the .deb is almost ready [22:00:46] average_drifter is finishing it up [22:01:11] ottomata: do you know what version of node is in our apt? [22:01:34] i believe limn requires >= 0.8.x [22:02:24] i think we looked into that [22:02:30] and whatever is necessary is available [22:02:35] (may have changed since I last looked though) [22:03:06] heh, i feel like that step is always an ordeal for unclear reasons [22:04:45] dschoon [22:04:58] ? [22:04:59] average_drifter added the chris lea deb [22:05:04] *nod* [22:05:15] if it works, great :D [22:06:03] of nodejs [22:06:03] so it's the latest and it gets packaged with Limn [22:06:03] rather - referenced [22:06:58] ok dudes [22:06:59] http://www.mediawiki.org/wiki/Analytics/Kraken/Overview [22:07:01] Feedback please [22:07:13] i gotta peace out sometime soon, and I'd like to send this to ops before I do [22:07:16] although, hm, I could send it tomorrow [22:07:19] morning [22:10:23] dschoon, drdee [22:10:26] drdee_ ^ [22:10:29] looking it over [22:10:32] oh sorry, danke [22:10:42] but i think it'd be better not to rush if you're on your way out the door [22:11:16] yeah, don't worry [22:11:18] let's send it tomorrow [22:11:22] okay. [22:11:29] unless you guys ok it, don't rush it though [22:11:36] i'll be online, gotta prep for some guests [22:11:41] gotta run home, etc. [22:11:43] but i'll be around [22:11:54] cool [22:12:32] puppet is very nice [22:13:04] yeah, its super great, its got its annoyances (doesn't everything), some people say chef is better, but overall its really nice [22:13:24] ottomata, do you want us to edit directly? [22:13:25] its really cool when you can abstract things really elegantly [22:13:43] for little things, yeah i think, for bigger things, sure, but maybe let me know so I know where to look? [22:16:55] ok, be back in a bit, but mostly afk [22:16:55] lataaass! [23:01:52] drdee_: can you grant me write access to limnpy? [23:02:07] don't think so [23:02:09] preilly: you're probably the person to talk about this ^^ [23:02:10] ask preilly [23:02:13] on it [23:10:25] erosen: what repo? [23:10:35] wikimedia/limnpy [23:11:03] erosen: done [23:11:07] thanks [23:11:31] erosen: wait what is your GitHub username? [23:11:38] embr [23:11:45] erosen: yeah okay done [23:11:45] seems to have worked [23:11:53] did you add a team? [23:12:19] looks like analytics is on there now, at least [23:12:20] thanks