[01:25:29] Ironholds: I wonder, is there anything against making aggregate 'edits from country per month/week' data publicly available? [01:25:36] YuviPanda, yes [01:25:51] after thresholding to remove smaller numbers [01:26:12] Ironholds: go on [01:26:13] probably not. but reid is an information theory nerd and we'r einvolving him to come up with k-anonymity and i-anonymity strategies for geodata [01:26:22] so he should probably handle this ;p [01:26:28] I see [01:26:29] oh [01:26:29] well [01:26:46] I'll still keep writing this under the assumption they'll all be puppetized and public one day ;) [01:30:50] YuviPanda, okies :D [01:32:05] lzia: hey! [01:32:07] am here now [02:16:54] Ironholds: jesus, just checking checkuser tables was blazing fast, one join on recentchanges to filter out bots and it is fucking slow [02:22:09] denormalize all the tables [02:23:35] YuviPanda, why are you joining on recent changes to remove bots? [02:23:43] join on user_groups [02:24:02] Ironholds: well, bot accounts don't necessarily have to make only bot edits... [02:24:07] but sure, doing otherwise is stupid.... [02:24:11] harrumpf, fine [02:24:48] should I use a subquery like AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE ug_group = 'bot') [02:24:54] that sounds bad [02:25:04] (was picking off J-Mo's http://quarry.wmflabs.org/query/310) [02:26:36] no, subqueries is how you do [02:27:10] the alternate way of doing it is a left join where ug_group ='bot' AND ug_user=NULL or someshit. [02:27:16] * Ironholds forgets precise syntax [02:29:42] Ironholds: heh, that is faster [02:30:10] I guess the subquery gets cached [02:30:29] tole you [02:33:04] Ironholds: tada, pushed the entire runner infrastructure and a 'edits per country' processor function [02:33:18] Ironholds: I can invoke it thus: [02:33:18] python run.py --wiki enwiki --start-date 2014-07-01 --end-date 2014-08-01 edits_per_country [02:33:28] and can add other processor functions like 'edits_per_country' easily [02:33:37] nice! [02:34:22] Ironholds: and date handling is also abstracted out this way, and it stores results in an sqlite file I can explore (I got sqlite3 installed on stat machines), and I can run things one after the other and they'll accumulate in one place :) [02:34:53] * Ironholds patpats [02:35:02] some day I will know how to code as well as program ;p [02:35:03] enwiki|labs|edits| [02:35:16] I'm detecting Labs IPs separately too :) [02:36:06] Ironholds: you should start doing ops work, it helped :) [02:37:29] Ironholds: I've to join on another table now to detect mobile edits tho [02:37:58] well, I think I might actually just learn python from the ground up [02:38:05] I got exposed to decorators yesterday. blew my mind. [02:38:50] Ironholds: heh :D Decorators are fun :) I use them a lot [02:39:05] especially @property, which does both decorator *and* __get__ trickery [02:39:22] err [02:39:25] __getattr__ [02:41:37] heh [02:43:29] Ironholds: ruby actually has way more mind blowing dynamic capabilities [02:43:37] * YuviPanda should check it out again at some point in the future [02:43:48] ewww [02:49:17] Ironholds: heh :) [02:54:37] Ironholds: similar simpler trick to detect mobile edits? [02:54:50] I guess I'll just have to LEFT JOIN on change_tag [02:54:56] or tag_summary [02:55:06] and then CASE WHEN on LIKE "%mobile% [02:55:24] LIKE %mobile% THEN 'mobile' ELSE 'desktop' CASE END or whatnot [02:55:30] since if it's !mobile or NULL...desktop. [02:56:16] sigh [02:59:30] works, still slowwwww [03:02:42] so, writing png-versus-jpeg checkers for my image composition library [03:02:50] realised functions are uncompiled/non-binary objects in R.r [03:02:59] realised I can jsut return the pertinent function under a new name and call it done [03:04:54] Ironholds: so, I already have desktop and mobile edits per country, I guess I can just grab internet-accessible-population in the same thing and then start making graphs :) [03:05:41] showoff ;p [03:05:42] cool! [03:07:38] Ironholds: :D [03:07:43] Ironholds: can you point me to the ITU data? [03:08:50] Ironholds: I'm also surprised by the magnitude by which mobile edits are smaller than desktop [03:09:23] http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx have fun diving :D [03:09:26] oh yeah, it's like 2% [03:10:11] Ironholds: wait, data from 2005?! [03:10:12] ugh [03:11:35] oh, it goes up to 2012 [03:11:39] but fuck, stupid .xls [03:14:12] :D [03:14:17] you know, R can read those [03:14:19] * Ironholds smiles sweetly [03:14:22] well, so can python [03:14:29] :P [03:14:50] Ironholds: do you know you can run Python *in* Excel? ;) [03:16:30] ...oh god WHY [03:18:07] Ironholds: BECAUSE YOU CAN THAT'S WHY [03:18:16] Ironholds: and do you know you do that by embedding Python in... VB.NET? [03:21:08] ... [03:21:16] you know I was having a nice evening until 5 minutes ago [03:21:24] ooh, and I get a bloody lie-in tomorrow. [03:21:26] that'll be nice [03:21:29] heh [03:27:03] Ironholds: I also realized I should be making all the graphs with ipython :D [03:31:35] Ironholds: you might enjoy http://blog.yhathq.com/posts/ggplot-for-python.html [06:34:17] err another api bug [13:52:24] kree [13:52:32] * Ironholds wanders in, zombie-like [14:12:42] :-) [14:40:07] phuedx: I need to add a user group system [14:40:15] phuedx: so I can have a 'sudo' [14:40:26] eh? [14:40:28] example? [14:41:22] phuedx: essentially lets me 'become' another user :) [14:41:28] mostly to debug issues they might have [14:41:52] why not fork? [14:42:10] phuedx: see for example http://quarry.wmflabs.org/query/runs/all [14:42:38] phuedx: two queries 'queued', way to fix is to hit submit again [14:51:11] sudoers? [14:51:25] rather than a full blown group system? [14:51:33] phuedx: just as easy to implement, no? [14:51:53] you must be faster than i am [14:52:04] unless theres a flask user group system ;) [14:52:07] actually, there has to be [14:52:49] the other way of looking at this is that there's not a system which is checking these "queued" queries [14:54:10] phuedx: yeaaaaah, I can add that to the queued system [14:54:23] phuedx: or better, not have the app crash :| [14:54:33] (the mysql server 'went away' again, even though I'm recycling them) [15:00:50] :/ [15:13:22] phuedx: yea [15:13:30] phuedx: I was considering moving to postgres for local server [15:13:36] but that seemed insane [15:15:36] anything in the logs? [15:18:54] phuedx: nothing in mysql logs, no. sqlalchemy has a hack suggested [15:19:06] oh? [15:19:33] yeah [15:19:38] to recycle queries [15:39:36] hey YuviPanda. re my yesterday's message. :-) [15:39:44] * YuviPanda waves at leila [15:40:31] I read the etherpad, the project sounds interesting. I bet in the next 6-9 months, some breaking stories will come out of such a project. [15:40:51] Do you and Ironholds want to do it? Is this in collaboration with somewhere else or internal? [15:41:08] what project is this? [15:41:13] oh, the DMZ thing. [15:41:23] Actually I'm somewhat checked out of it at the moment. Too much to do :/. [15:41:34] and on geodata requests I just got an email from Scott Hale asking if I'd like to collaborate on a paper. [15:41:47] leila: so, I already have per country mobile / desktop data for wikis, and a scaffolding in place to write queryes easily [15:41:50] To go along with the collaboration with Han-Teng and the collaboration with Heather and Brent (people seem to REALLY LIKE GEODATA) [15:42:24] leila, if you have some free cycles or it's interesting/you think you could do something with it, get involved! I'm not as around as I thought I would be :/ [15:42:34] so YuviPanda, what's the timeline you have in mind? I'm basically booked for the rest of this quarter. :-\ We can chat and see if this is something we want to kick start next quarter? [15:43:04] cycles are non-existent for this quarter, Ironholds, as you know. ;-) [15:43:21] that's true ;p [15:43:26] well, so, my suggestion, as someone meddling, would be. [15:43:36] how about yuvi builds a framework and investigates what seems interesting to him [15:43:49] and then when you have some cycles you get involved and point at the things that would be interesting to others. [15:43:59] oh, totally. we should not block him if he has cycles. [15:44:02] this is basically the cycle I have with you/Dario/Aaron, as a fellow novice researcher, to work out how to steer my work. [15:44:20] like, mobile sessions. I did what made sense to me and then Dario indicates what made sense to everyone else. [15:44:27] YuviPanda, the main question I have for you: when do you want to jump in this? [15:46:04] leila: last few days! I already have an underlying framework for running queries, and data about desktop and mobile edits per country on any wiki [15:46:29] leila: my plan was to just keep writing things and blog interesting potential results as I go along (blog is CC BY) [15:47:32] uhun. okay. I'll touch-base with you after September 25 then, to see where you're with it, and if there is space for me to contribute. Sorry that I can't do much now. The plate is full at the moment. [15:48:04] leila: 'tis ok! I'll just email you links to blog posts as I make them! I'm quite enjoying things at the moment as well :) [15:48:43] cool. that sounds good. thanks for sharing the etherpad. It's always nice to know what others are interested about and are working on. :-) [15:48:56] leila: :D sweeet :) [15:49:40] Ironholds, you are missed here! that's all I can say. [15:50:48] awww [15:50:55] k, I'm signing off to read. YuviPanda, we're reading http://www.tomkins-family.com/static/papers/src/WSF+13.pdf and will talk about it at noon. Since you have a lot of energy, I think you should be aware of it. :-) [15:51:04] leila: done! [15:51:09] hahaha [15:51:14] leila: add me to any invites? [15:51:34] leila, aww! [15:52:22] k, I'll add you optional. Let me just tell you the idea: we read the paper and discuss, with the goal to understand if extensions of it can be applied to Wikipedia (or other projects). It's more of a brainstorming session if you will. [15:52:58] leila: makes sense, yeah. [15:53:06] leila: this is the first 'rersearchy' paper I'll be reading [15:53:09] and obviously, don't feel obligated. [15:53:11] also there's a cat on my laptop again [15:53:18] leila: ;) [15:53:32] hahaha! this is a good one. it's one of my favorites. it's very well written. [15:53:38] hahaha. ;-) [15:54:12] and if you get bored, skip sections 5-7. [15:54:16] k, signing off for now. [15:55:07] leila: cya [16:02:13] that's an interesting paper [18:50:44] ugggh so many requests :/ [18:50:58] okay. Finish external requests and fundraising requests today. Get on referer tracking tomorrow [18:50:59] *nods* [19:03:28] lzia: I'm coming to the hangout but mostly to listen [19:06:26] Ironholds, are you joining? [19:06:57] leila, too much work :/ [19:07:01] Sorry. [19:07:16] np. we'll start then. [19:41:39] phuedx: btw, user groups was deployed a little while ago, with support for 'sudo' [20:01:22] leila: that was fun! do invite me to more of these :) [20:59:49] YuviPanda: how'd you implement it? [21:00:28] hey YuviPanda, it was good that you could join. [21:00:38] That was a very productive meeting. :-) [21:19:41] Ironholds, you remember the queries we talked about few nights ago? [21:19:56] leila, yes! [21:20:06] They are quite slow, which kind of make sense since revision table is big, but how slow do you expect them to be? [21:20:54] are you running it on analytics-store? [21:21:06] if so, I am currently joining between four tables. Which may be a better explanation for the slowness ;p [21:21:12] analytics-slave.eqiad? [21:21:18] ahh, -slave! [21:21:21] nevermind then. [21:21:39] In that case, just slow. I don't really have an estimator for how long it would take, I'm afraid - not run it or things like it much before :/ [21:21:48] you might email halfak? He's the local SQL genius. [21:26:48] k, Ironholds. [21:44:19] J-Mo: Just wanted to say, I now have a 'sudo' right implemented (and granted to myself :P) so I can hit 'run' on other people's queries :) Let me know if you've ideas for other user rights [21:44:34] cool! [21:46:16] J-Mo: one potental one is to grant people more time to run queries [21:46:29] yeah, I could see that being useful [21:52:52] J-Mo: yeah, and hand that out not-very-liberally