[00:59:25] milimetric, I think I fixed all the problems in the script that is run by cron, but I still can't see the graphs being updated after over one hour [01:03:37] jgonera: I think the problem is that the cron job that runs generate.py doesn't get monitored in any wya [01:04:08] I'm almost certain that YuviPanda set that up, but he says one of us did [01:04:13] milimetric, yeah, how did we get that stack trace last time? [01:04:22] heh, christian did it manually [01:04:26] he just ran generate.py [01:04:41] i thought he had found some logs but he hadn't [01:05:41] if you want to see stacktraces from cronjobs redirect stderr to some file on disk [01:05:55] generate.py 2>/tmp/somestacktrace.txt [01:06:07] maybe this helps [01:06:17] * average had similar problems not long ago when setting up a cron job [01:07:41] jgonera , milimetric ^^ [01:07:54] right [01:08:10] oh but jgonera, things are fine: http://mobile-reportcard.wmflabs.org/graphs/monthly-contributions [01:09:13] ok, did you need to create a virtualenv manually somewhere average ? [01:09:46] milimetric, not everywhere: http://mobile-reportcard.wmflabs.org/graphs/edits-monthly-unique-editors [01:10:13] that graph looks fine too [01:10:19] 0 for november? [01:10:20] you're just looking for it to have Nov. data right? [01:10:35] no, it has 807 for other namespaces [01:10:42] and 6.74k for main namespace [01:10:58] jgonera: well, if you want a self-contained thing, you can use virtualenv sure [01:11:05] ah - when using external URLs, limn doesn't cache-bust [01:11:19] that's probably why, your browser might just be holding on to the old datafiles [01:11:19] I'm not familiar with it so I didn't fix it in case I made it worse [01:12:12] jgonera: but cron has its little annoyances and throws various fits, so if there are cron-specific problems like, the script expecting stuff to be in some relative path and they're not there, or expecting to find some ENV variables and they're not there cause it's running under cron [01:17:15] about what milimetric said, the browser holding on to the old json file [01:17:21] grep -lir "http://mobile-reportcard.wmflabs.org/graphs/monthly-contributions.json" ~/.cache/google-chrome/Default/Cache/ | xargs rm [01:17:30] that delets the cache files for that json [01:17:38] :) [01:17:43] or "clear cache" [01:17:51] yeah but that clears all the cache.. [01:17:55] :P [01:29:01] uh, milimetric, average, I thought that Ctrl+Shift+R would get rid of this [01:29:36] hah, it does in Firefox, but not in Chrome [01:29:37] thanks [01:32:43] right, Chrome caches *harder* to maximize fun [01:32:47] you gotta clear manually, yep [14:08:19] hey ottomata , qchris_away , milimetric [14:15:55] yyooyo [14:26:17] ah - sorry average, was ignoring you for email [14:26:20] hi [14:26:43] I don't know if you've seen the latest patchset on my change [14:26:54] but it's implemented and now I just have to add tests [14:27:04] if you want to help, you're welcome [14:29:11] while doing it, I think I fixed the CSRF handling so now all our forms are actually using it. This was never a priority because we don't expect people to hijack metric configurations :) but it's good to have [14:29:42] so I removed "fake_csrf" which should now break a bunch of tests which I'm going to fix [14:29:56] but take a look and feel free to review / comment if you see anything busted [14:37:25] Hi average [15:02:56] Is hangout broken? [15:02:57] google hangouts having problems? [15:02:58] yep [15:03:06] i'll text toby that we're gonna do standup here [15:03:16] its better [15:03:18] try again [15:03:29] milimetric: [15:33:35] uh oh [15:41:48] gotta run to pick up my car, back in just a few mins [15:42:24] ottomata: you mean like literally ? [15:43:12] ottomata: you're strong [15:44:57] ha [16:44:34] ottomata: Could I steal some half our or so today about serving geowiki through apache? [16:45:33] i'm looking at it right now! [16:45:39] Ah. Great! [16:47:37] so, one thing that i've found a little easier than manually using source => puppet:///private ... [16:47:48] hmm [16:47:54] hmm actually [16:47:54] nm [16:48:34] hm, yeah nm, i like it [16:49:18] hm ok [16:49:36] ? [16:51:26] hey milimetric , is Sprint Planning not happening today ? [16:51:27] that's odd [16:51:43] that's what Toby said yeah [16:51:47] during standup [16:51:58] that we have no sprint planning [16:52:09] we can use the time to task out cards if you guys all agree [16:52:35] I thought we use the time to work on discussing cards. Yes :-) [16:52:51] ok let's discuss cards [16:53:48] should we do it in +2h ? [16:53:51] i like scala quaranta and machiavelli [16:54:27] average: I think that was the time ... yes. [16:55:05] would you guys like to move it up? [16:55:27] i can try to get a hold of toby and see when he's available [16:55:43] or should i schedule a tasking meeting for us at the same time? [16:55:58] I'd prefer to schedule a tasking meeting at the same time. [17:36:00] https://www.udacity.com/course/ud617 [17:36:05] Introduction to Hadoop [17:36:22] gonna watch all of that [17:46:00] ottomata, you win my 'analyst of the day' award [17:46:57] haha [17:46:58] :) [17:51:05] jesus christ, what format is this file?! [17:51:27] binary! [17:51:29] it is maxmind [17:51:33] you gotta use their tools/libraries to use it [17:51:47] what are you trying to do? [17:52:16] try this [17:52:18] /usr/bin/geoiplookup [17:52:45] Ironholds: ^ [17:56:45] I was going to take some IPs and run em through; bah. Okay,lesse what libraries they have.. [18:00:14] well, geoiplookup will work, if you don't want to code [18:01:22] point! [18:03:29] it's automagically streaming everything in that'll be a pain [18:04:33] oh, sweet [18:04:37] there may actually be an R library. [18:09:48] ..there is, but it requires compiled C code. [18:10:36] wait, what's streaming? [18:10:40] what are you trying to do? [18:11:16] where are your IPs coming from? what format are they in? [18:11:19] grab a list of IPs from the db, geolookup each one, save the resulting data, repeat [18:11:29] (I'm not crazy, toby said it was okay) [18:11:32] ah [18:11:39] actually it looks like this should be relatively trivial once I write the setup [18:11:49] I can just send a shell command from inside the language to handle it. hrm. [18:11:55] yeah you can do that, you'll have to code for that i think, ah, you could probably script it with geoiplookup, but that woudl be annoying [18:11:59] you could do that [18:11:59] sure [18:11:59] I'll experiment tomorrow and see what happens. yay, 20 percent day. [18:12:07] what are you scripting in? [18:12:15] php? [18:12:29] R (don't worry, I promise python is on my to do list. I'll get on it as soon as R stops being interesting, and as soon as our Pandas version works) [18:14:37] ha ok [18:25:00] ottomata, next question; how often do these dbs update? [18:25:10] every tuesday, as is the norm for their paid dbs, or..? [18:26:39] ummm, yeah probably, if maxmind releases a new one, these will be updated shortly after [18:26:44] i think puppet tries to download new files weekly [18:26:47] so ja [18:26:47] something like that [18:27:06] sweet; ta :) [18:27:17] I'll set the cron job to run on thursdays [18:58:00] (PS8) Milimetric: Async cohort upload UI and fixes for validation [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/93081 [21:02:09] brb, going to train station [21:16:31] back [21:32:42] halfak, Ironholds: any suggestion for a small or mid-size wiki with a historical high volume of deletions? [21:33:10] Not sure what the problem is. [21:33:14] Is the problem the deletions? [21:33:17] Or the analysis of them? [21:33:24] Or maybe an explanation as to why? [21:33:24] The third party documentation doesn't exist [21:34:09] halfak: the context is the analysis of discrepancies between active editor data generated by including or excluding the archive table [21:34:24] i.e. my mail from last night [21:34:28] Ahh. I see. [21:35:03] I'm still not sure what I might suggest. [21:36:27] On one hand, I'm concerned about counting spammers as active editors. On the other hand, in Enwiki, most deletions are due to lacking notability. [21:36:50] Do you have a sense for how much of the deletions can be chalked up to bad-faith editing? [21:36:53] DarTar, hmn. [21:37:18] DarTar, I guess you could just iterate through the slaves measuring archive table size and revision table size and, well, ooking for discrepancies ;p [21:37:33] brb [21:39:04] I'm generating the same historical series that I shared last night (dump vs revision vs revisionUarchive) for a bunch of different projects on top of enwiki [21:39:38] I want people to be able to discuss what deviation from the dump-based definition is acceptable [21:39:40] qchris_away , milimetric I have a lead on who you might get in the loop about the API [21:39:58] oh yeah? [21:40:04] yeah [21:40:44] and also drafting something like this, which hopefully can be generalized to other (editor) metrics: https://meta.wikimedia.org/wiki/Research:Monthly_active_editors#Principles [21:40:48] I'm planning on adding footnotes explaining each Y and N [21:41:03] (average - this is when you tell us who your lead is or start asking for money :) ) [21:42:02] that's awesome DarTar (the principles) [21:42:12] milimetric: it's a start :) [21:42:28] also, what's a "deletion" vs. a revert? [21:42:35] milimetric: I rarely if ever mention the word money [21:42:45] no! I was joking! [21:43:04] but yeah, it sounded like a ransom [21:43:12] I'm gonna give details in like 20s [21:43:17] np :) [21:45:36] Ironholds: k, so I'm currently generating en, de, fr, es, ru, ar, it, pt (and waiting to get the dump-based series from Erik Z) [21:45:52] awesome! [21:47:27] so you can clone the mediawiki-core repo [21:47:37] and then run this weird oneliner on it [21:47:39] git log --name-only --format="%an||%H" | perl -MData::Dumper -ne 'BEGIN{$is_api_file ={}; $is_api_file->{$_}=1 for split(/\n/,`grep -lir "extends ApiBase"`); $count={}; $author=""; $was_he_involved_in_api_code=0; }; @parts = split(/\|\|/,$_); if(@parts==2){ if($author && $was_he_involved_in_api) { $count->{$author}++; }; $author=$parts[0]; $was_he_involved_in_api=0; next; }; next unless $_ !~ /^\s*$/; chomp; $file = $_; $was_he_involv [21:47:52] It will give you the top #10 API commiters [21:47:52] average hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/10 [21:48:02] qchris_away: ^^ [21:48:31] woa [21:48:37] good work [21:48:57] we should make sure we cc those people along with analytics-l when Christian sends the message [21:49:06] milimetric: first it greps through the mediawiki-core for files that have "extends ApiBase" which is a good thing if you want to know what files belong to the API, all those classes, they all derive from ApiBase [21:49:09] (if you want, just generate the list and send it to him in email) [21:49:17] yeah, I kind of see that :) [21:49:39] and then just count their commits that are API-related [21:58:30] average: Thanks! [22:06:51] halfak: yt? [22:06:57] we're hanging out [22:07:23] Yes. Sorry. Got stuck dealing with a couple of bug reports. [22:07:55] milimetric: e-mail sent [22:08:35] thanks average! [22:36:43] (PS9) Milimetric: Async cohort upload UI and fixes for validation [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/93081 [22:41:15] average: https://gerrit.wikimedia.org/r/#/c/93081/ [22:41:17] ready for review [22:41:29] if i'm out tomorrow and you're not playing with hive you can review it [22:41:37] but if you're doing other stuff I'll just self-review [22:41:53] (PS10) Milimetric: Async cohort upload UI and fixes for validation [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/93081 [22:43:28] nite everyone - I'm offline tomorrow [22:43:38] halfak: just one thing I'd like to ask you [22:43:59] Sure. [22:44:01] if you need 15 mins of distraction, the SQL review thing [22:44:06] (in your inbox) [22:44:33] I want to make sure I'm not missing something and I could really use an extra pair of eyeballs