[01:29:24] you know, one big oversight in my session reconstruction algorithm: [01:29:29] it can't handle multi-samples [01:29:33] I should really work on that [02:14:13] before i go do a bunch of work, can somebody point me to any work that tries to look at relative underproduction/overproduction in wikipedia across all articles? [02:14:32] define under/overproduction? [02:14:55] i mean, i'm happy to start with edits:views [02:15:08] like, a version of this is common in some of the "implicants of gender gap" stuff [02:15:26] that shows that articles about women are, relative to how often they are viewed, are edited less [02:15:39] i can't actually think of a paper that does that, but i vaguely recall reading one [02:16:21] I can't think of any off the top of my head, but as the Official Guardian of the New Pageviews Definition I am absolutely down to help out with dataset extraction/prep [02:16:22] HaeB: maybe this is a question for you [02:16:37] Ironholds: excellent! [02:16:53] heh. this reminds me of a paper I keep wanting to write [02:17:08] A few months back I worked out a way of geolocating IPs to extract timezone. [02:17:16] so, we can localise server-side events [02:17:26] it'd be interesting to see if there are consistently periods where editors are active but readers aren't. [02:17:32] * Ironholds adds to already-tremendously-long-to-do-list [02:19:38] Ironholds: that soudns awesome [02:39:11] SWEET [02:39:24] mobile editing sessions, even excluding single-edit sessions, are statistically significantly shorter than desktop ones [06:01:09] mako: e.g. i found this one (not peer reviewed) quite interesting: https://en.wikipedia.org/wiki/User:TCO/Improving_Wikipedia%27s_important_articles [06:03:03] as for academic work, there's gorbatai's research which is mentioned there: http://www.opensym.org/ws2011/proceedings%253Ap205-gorbatai.html "Exploring underproduction in Wikipedia" [06:08:07] i don't quite recall if anyone ever tried to find evidence for effect of the edidtor gender gap on content by looking at views vs. edits ratios [06:09:15] you may have been recalling the 2011 riedl et al paper which did that by comparing movie articles to their popularity (and gender affinity) on movielens: http://files.grouplens.org/papers/wp-gender-wikisym2011.pdf [06:11:56] ...that's still (somewhat surprisingly) among the clearest evidence. there's lots of anecdata - frienship bracelets vs. baseball cards, sex and the city vs. "24", etc. - but not a lot of systematic studies [06:12:22] joseph reagle did one a while ago comparing women's biographies on britannica and wikipedia [06:39:22] HaeB: and who won? [06:41:14] PS (mako): i can always recommend using https://meta.wikimedia.org/wiki/Research:Newsletter#Search_the_WRN_archives for that kind of question [06:42:33] Emufarmers: https://meta.wikimedia.org/wiki/Research:Newsletter/2011/September#In_brief [06:58:45] HaeB: I see, thanks [14:37:21] Ironholds: wanna talk after engineering standup in a bit? [15:59:21] Ironholds: Ironholds hiiii [15:59:27] i'm only working a half day today [15:59:33] so if you wanna talk, let's do it sooon! [16:43:42] morning [16:43:51] ottomata, darnit! Okay, shall we wait until everyone is back? [16:44:25] oh? [16:44:35] you want to talk about how to write mapreduce jobs, ja? [16:44:44] totally! but I don't know how much time you have :) [16:44:52] oh i got time, i'm just doing whatever today [16:44:52] * Ironholds was unconscious until 30 minutes ago. Had insomnia last night. [16:44:53] let's talk! [16:45:03] oh, need power, one sec... [16:45:04] sure! Send me an invite and I'll try to sound awake [16:47:12] Ironholds: batcave is free [16:47:14] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave?authuser=1 [16:48:26] "trying to join the call. Please wait." [16:51:08] hm [16:52:13] hurm. Not working? :( [16:55:19] i'm ithere, you are not? [16:55:36] Ironholds: maybe the authuser thing is throwing you off [16:55:37] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave [16:55:47] oh, wait [16:55:52] it's trying to make me join..yeah [17:33:37] Are we doing standup as Oliver requested? [17:40:27] +Ironholds [17:40:36] ggellerman_, oop! Sorry! [17:40:38] will be in in 30s [18:46:47] HaeB: ok good. i knew about the ones you mentioned [18:47:29] HaeB: andreea gorbatai's work (and conversations with her) is sort of what has inspired my thinking on this. she's the only person i know who talks about this in terms in terms of "overproduction" or "underproduction" [18:48:14] HaeB: but the think the concept is much broader but it's sort of hard to think of all the ways people might talk about it in order to make a comprehensive search :-/ [18:48:33] HaeB: I hadn't see the User:TCO thing. that's great [19:09:46] YuviPanda, recommendations for a Java IDE? [19:13:05] Ironholds: other than eclipse? [19:13:15] other than eclipse ;p [19:16:29] grr. fine, I'll use eclipse [19:16:35] all the other options are more terrible [19:22:28] Ironholds: don't use eclipse. IntelliJ [19:22:34] Entire android team uses it [19:22:43] yes mum [19:22:59] I see that adding JAR dependencies is just as much as a PITA as adding C++ libs [19:23:11] * Ironholds once again wishes everything just had an RStudio version [19:23:43] and lots of others use other variants of intellij [19:23:50] Like phpstorm and pycharm [19:25:18] yeah, I tried pycharm and gave up on python [19:25:24] RStudio has spoiled me :( [19:28:09] Ironholds: for data analysis in python, spyder is a much better option [19:28:46] I don't do data analysis in python ;p [19:28:48] but thank you [19:28:52] My stack is R, C++ and Java [19:28:56] suits me down to the ground [19:28:56] then what are you using R for? :P [19:29:18] data analysis! There's just nothing I'd want to analyse where I've gone "R does not have this feature and I cannot build it, let's use python" [19:29:33] fair enough [19:29:43] well, actually, a couple of times [19:29:58] my response was to build it in C++, integrate with R, and end up running faster than Python [19:30:01] much to halfak's dismay [19:30:27] hey, if it gets the job done [19:30:49] anyway, if you ever wander back into python land: spyder if you want something rstudio-like, or ipython notebook if you want something mathematica-like [19:35:41] noted! [19:35:50] I'm happy with learning two langs for the moment :) [19:36:04] but I will race my code against Python's any day of the week ;) [19:46:33] Ironholds: *shrug* it's more important you can prototype something quickly than to have it run in 20 seconds instead of a minute [19:46:41] Ironholds: and if R is the language where you can do that, use R! ;-) [19:46:58] Eh, I opt for the latter. But agreed overall! [19:48:26] tnegrin, wanna see something cool? [19:48:27] http://ironholds.org/woah.html [19:48:38] valhallasw`cloud: in general, i agree with you. but data analysis (esp on wikipedia data) is one place where i actually find myself often carrying about performance. because the difference is often DAYS :) [19:48:40] network graph of random links [19:48:48] the reason this is awesome: that's D3. [19:48:50] often caring even [19:48:52] I generated it entirely from within R. [19:49:21] Ironholds: what are you using to do D3 in R? [19:49:41] http://christophergandrud.github.io/networkD3/ although Hadley is working on a lot of tools (ggvis, shiny) for similar things [19:50:09] networkD3 is network graphs specifically, but there are a load of plugins for other elements - and because of how RStudio is constructed (it has a javascript/html frontend) you can manipulate and test them from inside your IDE, as you would a static, ggplot2 graph [19:51:09] yeah, i don't use rstudio. i came to R before it existed and have never managed to tear myself away from Emacs/ESS even though there are some very cool features [19:53:03] fair! [23:28:18] well, this place is dull today