[13:31:56] dennyvrandecic_, I just reviewed that bit of the paper you linked. [13:32:19] I think the count is reasonable based on how they defined busts of editing. [13:32:34] However, assuming that all people with one "burst" are bots is dumb. [14:46:27] halfak: thanks. I thought I have misread about that, but yeah, if that is the assumption it explains the high number. and it is dumb [14:46:43] Emufarmers: it seems so [14:47:53] YuviPanda: cant the datasource from stats.grok.se be made queryable? (probably not or it would have been done already) [15:04:15] dennyvrandecic_: it's not mysql afaik [15:04:24] I also don't really know who maintains it [15:12:41] YuviPanda: the rawdata is here: https://dumps.wikimedia.org/other/pagecounts-raw/ [15:12:50] but yeah, it is not mysql [15:13:12] dennyvrandecic_: yea, but can't mysql. I did some back of the envelope calculations and it won't really fit into a mysql instance [15:13:24] thanks, that's good to know :) [15:13:42] of course, if we had a public hadoop instance with a hive interface, I could easily connect quarry to that [15:14:01] dennyvrandecic_: I was considering adding a postgres/OSM interface, tho. That would be easy to do, unsure if there is any value in it [15:20:15] I'd have no idea :) Probably there is, but I'd wait for a usecase, YuviPanda [16:33:18] Ironholds, halfak: standup [16:36:23] Stuck in growth standup. Will come asap [16:36:41] halfak: ok np [17:01:41] hey ewulczyn_, I jumped into the next meeting. I should be done in 30 minutes. [17:02:56] n [17:03:03] #wrongwindow [17:06:27] ewulczyn_, poke [17:40:40] halfak, J-Mo: interesting it’s really unclear to me how to make sense of the ACM policy, the excerpt halfak quoted comes from the same page where I got mine [17:41:07] I think you can post the pre-peer review version of a paper in arxiv. [17:41:51] But the post-peer review version can only go to your website or your institution's website. [17:41:53] yes, it’s not clear whether that includes revisions after the first round of peer review [17:42:18] technically, copyright transfer only happens with the camera ready submission [17:42:42] Good point. [17:42:52] Regardless, I don't think anyone will give you a hard time. [17:43:05] Also, our "institutional repository" is commons. [17:43:08] So there's that. [17:43:12] >:) [17:43:16] * halfak --> lunch [17:43:39] yes, but since I’m a publicity chair I should try and double check before doing something that would be perceived as borderline by the organizers [17:43:52] alright, thanks for the input folks [17:59:34] DarTar, can you do me a favour? [17:59:41] what’s up [17:59:52] When you double-check, if you find out that commons is not an appropriate institutional website, can you forget to tell me? :P [18:00:22] sure, I just dropped a line to the CSCW chair for clarification [18:01:31] the only issue with Commons is the licensing restriction [18:02:22] indeed :( [18:02:57] yay, Brent sent the datas-wtf [18:03:09] it's got full names for countries and it's a CSV. Why do people do the things that hurt me? [18:06:50] Ironholds: as in, there are countries with commas in their full name that break parsing? [18:07:02] oh, no idea. CSVs just make me sad. [18:07:25] because if there /might/ be commas you have to quote, and quoting makes firing it into a db sad. [18:07:29] and then everyone is sad :( [18:07:49] tell Brent and spread some tab love [18:08:12] naw, I feel bad. He's got an undergrad doing it; I can work with CSV and so I will :) [18:08:28] guy is probably paying too much for tuition to spend all his time dealing with exceedingly pedantic requests from some dude he doesn't know. [21:20:43] for the projectcount files in http://dumps.wikimedia.org/other/pagecounts-raw/ , is there an explanation about what the projects are? [21:21:06] i have 1464 projects in there, and some of them I don't understand [21:21:18] dennyvrandecic_, example? [21:21:34] en.mw [21:21:48] www [21:22:20] most of them i get. bs is the bswp, bs.b is bs books, etc [21:22:30] en.mw is en.m.wikipedia [21:22:35] ! [21:22:43] err [21:22:47] that page you linked [21:22:57] scroll down ;p [21:23:20] oh [21:23:23] sorry [21:23:46] np! [21:24:01] i am not sure i get the difference between commons and commons.w [21:24:10] but i guess i can throw this together [21:24:13] I imagine erikZ would know more about it if you have specific questions; he runs webstatscollector [21:24:24] thx [21:24:29] I think there are going to be some idiosyncracies because it's 'dumb' software, in the sense that it just takes what it's given [21:24:45] which is necessary because wow volume. Although we're hoping to make it smart now we have hadoop. [21:25:11] yay smart [21:25:26] just kidding. dumb is good for such stuff [21:25:38] yeah, most of the time [21:25:46] until you're the schmuck who has to parse the results and make it consistent [21:25:59] :) at least we have something to play with [21:26:04] and then you're confronted with all of the user idiosyncracies from upstream and none of the context because of privacy :D [21:26:06] yup! [21:26:28] if it is smart and down, then we schmucks have nothing to do but bother the other schmucks running the smarttool [21:26:54] that's true! [21:29:59] i imagine the pageview data being basically untouched since 2007 and running since then with hourly outputs and not many errors [21:30:13] it probably is not like that, but I think it is not too far away [21:32:14] dennyvrandecic_, eh, you're more right than you are wrong, yeah. [21:32:38] cool [21:32:59] i wish i'd write my services like that [21:33:21] due to me changing institutions all the time, usually the hardware underneath my services disappears at some point :P [21:41:37] https://docs.google.com/a/google.com/spreadsheets/d/1SR_qjHxy8xZ3mfaeU2ypnQOd7i5Sw37T2lE3ljvngQE/pubchart?oid=1261094636&format=interactive [22:40:37] aaaaah [22:40:40] DarTar! [22:40:46] The /bloody referer code ran/ [22:42:37] \o/ [22:44:26] okay, I need to rerun 10 days, but still [22:44:34] we have 492 days of pageviews and referer breakdowns. [22:44:35] BOOYA. [22:44:47] and for the love of good, if you look through and tell me there's an element missing, I will end you ;p [22:45:04] we don’t need those 10 days [22:45:18] did you look up webalyzer? [22:46:05] not yet. and too bad, I'm getting them anyway. [22:46:14] Because it's an excuse to rewrite part of WMUtils before I show it to everyone [22:46:24] this will also allow me to produce something, you know, robust, for the apps pageviews stuff [22:47:06] hey – I’m disappearing to talk to Maryana [22:51:38] have fun! [22:51:48] DarTar: I have the basics up for https://meta.wikimedia.org/wiki/Research:Daily_unique_editors [22:51:56] (^ for when you get back) [22:52:07] <3 [22:52:21] I'll get Daily unique anonymous editors before you get in on Monday [23:00:47] halfak: you’re the man [23:01:13] \o/