[07:14:02] not sure if there's anyone looking but https://gist.github.com/anonymous/de2a4b11f395e20faec9 [07:16:54] joal: ^ if you're here [09:56:06] Thanks YuviPanda [09:56:09] will read [14:39:08] YuviPanda, \o/ [14:39:18] Cool to see work towards building the index. [15:10:52] o/ [15:11:01] I need to stop trying to do work in the morning before noon. [15:11:10] Just reading my email takes hours every day :( [15:11:13] * halfak ignores [15:11:19] sorry people who need to talk to me. [15:11:30] I need to do things! [17:36:15] o/ [17:36:41] I was almost in time for the research group meeting, despite the late bus and the traffic, and I see it's been canceled due to an empty agenda :) [17:37:02] Sorry. Should have emailed sooner. [17:37:11] I couldn't think of anything good to toss on there for today. [17:45:29] * halfak creates a google-translation of Italian to encourage some italian collaborators to work with us. [17:45:34] mwahahaha [17:46:04] No worries at all! I wasn't complaining. Now I have time to get tea (and breakfast-pizza) [17:46:13] "But the translation is so awful" "Yes it is! Here's what you need to change to make it better." [17:46:23] * halfak wants some breakfast pizza. [17:46:29] [[Cunningham's law]] [17:46:40] It's left over from yesterday's strategy editathon [17:47:07] Yeah.. Had to miss that. Jenny just passed her prelims for grad school :) [17:47:13] Had to go celebrate [17:48:08] I don't know what it means, but congratulations anyway! [17:48:28] really just a milestone on the way to her PhD. [17:48:41] great! [17:48:44] :) [17:48:53] helllo! https://gist.github.com/anonymous/de2a4b11f395e20faec9 :D :D :D [17:49:08] o/ YuviPanda [17:49:12] I'll do more on that today and also work on our own public interface [17:49:13] hello halfak [17:49:51] YuviPanda, I have a couple of scripts that I'd like to run against the XML dumps and produce a dataset that someone could download. [17:50:02] E.g. https://figshare.com/articles/Wikipedia_Scholarly_Article_Citations/1299540 [17:50:07] \o/ we can make them into notebooks I think [17:50:13] Should be pretty simple. [17:50:20] * YuviPanda nods [17:50:23] how long do they take? [17:50:28] 24h or so [17:50:32] I haven't even started working on the cron interface [17:50:34] right. [17:50:36] On a 16 core machine [17:50:40] :S [17:50:41] ah [17:50:45] well on a labs one it might take far longer [17:50:57] but that's ok since we don't get dumps more frequently than once a month [17:51:05] Even if it took 16 days, that's be useful. [17:51:08] indeed [17:51:16] * halfak runs away to make lunch [17:51:18] ok [17:51:20] back in a bit [17:51:23] I should run away for breakfast [17:51:27] But: EXCITEMENT! [17:51:31] wooo! [17:51:31] :) [17:51:37] guillom: did you see https://gist.github.com/anonymous/de2a4b11f395e20faec9 [17:51:51] * guillom clicks. [20:34:19] halfak: https://phabricator.wikimedia.org/T121797 may be of interest to you, i mentioned this idea briefly yesterday [20:34:40] * halfak clicks [20:34:50] halfak: also, the cross-wiki upload A/B test is running full steam now :) [20:34:57] Woot! :) [20:35:11] (it got translated to 9 languages, too) [20:35:29] BTW, I'm going to be a little bit skeptical of any results that are not both full steam consisting of full weeks. [20:35:47] Since there's periodic patterns in weekdays and weekends. [20:36:06] it's on all english wikis (en.wp being responsible for ~40% of uploads so far), other wikis include a few big wikipedias (in total, about ~20% of uploads so far) [20:36:33] Any qualitative results? Is commons revolting? [20:37:13] well, the upload tool was already enabled for a while, so the revolting already happened :P [20:37:38] people seemed to like that we're doing the test, and seemed to like the interface options [20:38:37] yeah, it'll be on for at least a week. if by then we notice something obvious, then i'd want to finish the test and just enable the obviously best option. i'm not sure if we get enough upload to get significant results in a week, though [20:38:58] ('significant' in the non-maths meaning) [20:39:19] MatmaRex, +1 for going with any obvious option. [20:39:24] since the new interfaces were kind of designed to reduce the number of uploads ;) [20:39:27] Statistics is really only useful when there is a question. [20:39:35] But be cautious of outliers [20:39:46] e.g. individuals to manage to get a mass of good or bad uploads. [20:40:12] this tool would be pretty inconvenient for that, but yeah, good point. [20:40:33] recently i've been looking only at uploads which were the very first edit that person made to commons [20:40:59] which gave a lot saner results than just looking at uploads per tool [20:41:53] (in particular, apparently "experienced users" - 10+ edits on commons - have a deletion rate of like 50% when using cross-wiki upload, which i didn't manage to explain so far) [20:42:14] (my working theory is that i messed up the script which generated that data :P) [20:52:26] MatmaRex, yeah. that sounds surprising. [20:52:35] Are you querying the DB to work that out? [20:53:28] halfak: do you have any fun / interesting questions i can ask of my teahouse dataset? [20:53:49] I'm going to extract the conversations into hierarchical objects as well [20:54:13] I'm guessing you haven't tried splitting out replies yet. [20:54:41] I'd like to see a histogram of replies. I think a histogram of the char length of each discussion would be interesting. [20:54:51] It would be cool to see how that differs on time of day and day of week. [20:54:53] halfak: I've actually split out the replies [20:54:59] just haven't hierarchized them yet [20:55:01] Maybe there were some days that were poorly covered. [20:55:06] ooo yes [20:55:14] I can easily associate dates with them [20:55:14] Do you split on signature-like-things? [20:55:27] I'm splitting on the UTC timestamp preceeded by a user talk wikilink [20:55:42] or rather, any text object that contains that UTC formatted timestamp [20:55:54] preceeded by a wikilink object that is to the user talk namespace [20:56:01] halfak: no, i was querying api.php with some scripts [20:57:01] YuviPanda: depending on how correct you want it to be… people are somewhat often link to user page / user talk / contributions (and when they do, it's usually in this order) [20:57:15] (in their signatures) [20:57:44] oh, hmm. people can put other wikilinks after that [20:58:00] I guess maybe I should just split on the UTC formatted string then, maybe at the end of a line? [20:58:03] or just on the UTC formatted string [20:58:07] and consider it 'good enough' [20:59:31] halfak: heh, someone just linked me to https://commons.wikimedia.org/wiki/Special:Contributions/Theodossi , which proves your earlier point about a single user skewing the results [20:59:50] (these are older uploads, not during the test) [21:00:16] * halfak whistles [21:00:19] lots of uploads! [21:00:45] bah [21:00:47] can't set topic [21:01:13] /topic Welcome to the Wikimedia Research channel -- a space for discussion of wiki research | https://meta.wikimedia.org/wiki/Research - https://www.mediawiki.org/wiki/Analytics/Research_and_Data | Channel is publicly logged @ http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-research | Sciencing the shit outta things [21:01:33] anyway, time to go to office in time for 2:30 meeting with halfak [21:01:42] Might get in trouble with that one. ;) [21:02:06] Otherwise I'd set it. [21:02:11] o/ see you soon YuviPanda [21:02:11] yeah [21:02:24] "see" == type at you more [21:02:28] I feel much more comfortable putting my name on it than asking other people to put their name on it :P [21:02:30] heh yeah [21:02:38] halfak: are we just gonna switch machines or are we gonna switch config setup too? [21:02:54] I'm just about ready for config setup. [21:03:17] I'm ok if you wanna let it marinate and test on staging slowly, and am ok if you just wanna do it too. either's ok [21:03:19] Was just about to run tests when I realized I needed to work on that Wiki labels form. [21:03:21] decide in 1h30min! [21:03:25] Will do [21:03:28] * YuviPanda goes nowwwwwww [21:03:34] it's just so hard to close this laptop [21:03:41] it's interesting how much immersion this small screen offers [21:03:53] I keep getting surprised 'shit, stuff exists outside this rectangle! wow' all the time