[13:48:38] Hey tos3 [13:48:41] You still around? [13:48:49] hey halfak [13:48:53] Yes I'm here [13:49:08] Sorry I missed you yesterday. [13:49:19] No worries [13:50:26] Did you switch over to linux or is there hexchat for Windows? [13:51:36] Hexchat for windows, though I finally got ubuntu too [13:52:01] Still figuring out what to do about my lan port though, before I can use it well [13:54:04] Is your network card not recognized in Ubuntu? [14:08:56] halfak, My lan port just isnt functioning, ubuntu or windows. While I try and get that fixed, I still have to figure out how to use Wifi on ubuntu [14:09:55] You can get an ethernet dongle for pretty cheap (USB to LAN). Every one that I have tried worked right away in Ubuntu. [14:10:03] Any chance you have an android phone? [14:10:42] Right. I'll look into it. Also no, but borrowing one is easy [14:16:47] I was going to say that, in a pinch, I'll often use my android as a wireless adapter via tethering. [14:17:09] The laptop things that a USB to LAN dongle is plugged in, but the phone is connected to WiFi. [14:17:21] * halfak <3 agnostic internet technologies [14:17:58] I think I dont understand. But more importantly, I think I dont think I'll understand [14:22:37] You connect the phone to WiFi then plug it into your computer with a USB cable. Tell it to "tether". Receive intertubes. [14:26:12] Ohk. Sounds complicated but I might give it a shot [14:26:23] halfak, Btw I'm interested in the hackathon [14:29:06] Awesome. If you don't come to Wikimania, where will you be in early Aug? [14:29:29] I hope I dont have to, but it will be the beginning of college [14:29:46] So I'll be back to my hostel room [14:32:01] halfak, ^ [14:32:13] Would that make is easy to borrow the college's facilities to host a meetup & connect with us? [14:33:47] And dc [14:33:49] I last said " halfak, It will be like the last meetup. But hopefully with better connectivity" [14:35:02] +1 I'm going to try to figure out how to get some more presentations on ongoing projects and tools this time. [14:36:38] halfak, +1. Will you be hosting it during Wikimania? [14:37:50] tos2, yes. that's the plan. There's a hackathon for Wikimania in the two days before the main conference starts. that'll be Aug. 6-7th. [14:38:03] We'll be the research wing of this hackathon. [14:38:13] ...if people are down with the idea. [14:38:33] halfak, Right. Sounds like a plan. I think people would be more than game for it. [14:39:10] I can already think of some ideas for on-venue research among Wikimania participants, if that's going to be any good [14:41:20] Sounds like a good idea to me. [14:41:39] Are you imagining surveying the crowd @ Wikimania? [14:42:17] halfak, Not exactly, but something close [14:44:14] OK. I've got some hand-coding problems that it would be nice to bring Wikipedians in on too. E.g. I'd like to extend my analysis of the quality of Wikipedian newcomers to present day and across non-English wikis. That's going to require a bunch of people to sit down with a hand-coding tool I built and help me sift through a random sample of newcomers. [14:44:42] This one is really about manpower, but it also means that non-technical Wikipedians can help with data gathering/curation if they are so inclined. [14:45:45] See https://blog.wikimedia.org/2012/03/27/analysis-of-the-quality-of-newcomers-in-wikipedia-over-time/ for results of my old analysis. [14:45:57] halfak, I dont complete follow but let me check the blog [14:47:59] halfak, I think I had already read it before. So what are you planning righ tnow [14:48:24] I want more recent data for Enwiki and I want to look at other languages. [14:52:48] halfak, I see. So we need editors to help categorize edits [14:53:22] Yes. Wikipedians are the experts after all. [14:53:49] Also, I want to have something available for people who show up and want to help, but don't have any other skills to bring to the table. [14:54:28] halfak, A very valid point. So you intend to ask Wikimanians for help?? [14:54:40] Or just those online? [14:54:52] Both. [14:55:13] The hand-coding tool is a web UI, so it should be trivial to distribute. [14:56:48] halfak, Right. Do you have a link? [14:57:31] Oh man. It used to run on the toolserver. I don' [14:57:39] t think I can show you the UI, but I can get a screenshot. [14:58:10] https://meta.wikimedia.org/wiki/File:NQ_evaluation_tool.v1.png [14:58:28] The UI shows you the edits a user made in their first edit session. [14:58:48] I wanted a link so I could get a rough estimate of how many users will anyone categorize [14:59:19] Oh! I set a minimum at 50. [14:59:31] 50 is too high I think [14:59:32] The last set had coders doing from 150 to 350 users. [14:59:48] And did they actually categorise that many [14:59:52] It will take about an hour to do 50. [15:00:13] Yup. In the AFT study, we had some users doing too many. We had to ask them to stop or they'd dominate the stats. [15:00:50] Interesting. We could set the bar at exactly 50 and that would be perfect, though I think 30-40 will be better [15:01:06] 50 is a minimum due to the stats. [15:01:11] Oh [15:01:23] See also a hand-coder I built for AFT https://en.wikipedia.org/wiki/Wikipedia:Article_Feedback_Tool/Version_5/Feedback_evaluation [15:01:32] And how many users do you think need to be categorized to good results out of it? [15:02:47] 400 per year. [15:03:17] So, if we wanted to look at ptwiki and we could only get a couple of hand coders, I'd recommend only looking at 2013. [15:03:58] So we're looking at approx 30+ editors per wiki, right? [15:05:02] Oh! Sorry. 400 per wiki-year. So each wiki would need to have 400 hand-coded just to get one year. [15:05:20] Frequentist statistics... requires lots of observations. [15:05:34] * tos2 knows [15:06:00] 400 handcoded per wiki-year and 50-100 done by each editor [15:06:29] So take 5-6 editors required to analyse each wiki year [15:06:40] Oh I see. Last time we managed with 5 hand coders. [15:06:50] We coded 2100 newcomers. [15:07:14] Hmm... Then you coded a lot more than 50-100 [15:08:10] I'm assuming worst case scenario, based on the chances that people would be bored to hand code more than 100 (and we ordinarily wouldnt allow them to code more than 250/300) [15:08:26] So 100 per editor gives us 4 editors required per wiki-year [15:09:13] Given you're likely to get something like 30 editors for enwiki, that gives you 7-8 years [15:09:43] For the other wikis, it will be more like 4 years [15:10:31] 4 years does not sound that bad, but it depends on whether we are taking newcomers annually or quarterly [15:11:42] Semesterly (once per 6 months) is what I've done before. [15:12:19] 200 per semester [15:13:37] halfak, Sounds good. I personally prefer 3 months though [15:13:56] Under what reasoning? [15:14:07] 6 months is a lot under wiki-time [15:14:36] If we sample once every 3 months, we'll need 800 observations per year. [15:14:42] I mean things can change pretty rapidly based on certain events, and so we'd be just taking the aggregate [15:15:50] Yes. That or reduce the accuracy. How badly is p affected if you reduce from 200 to 100 observations? [15:16:25] Let's say we'd like to be confident about changes around 5%. [15:16:45] Using a chi^2 test, we can perform a power analysis to check how many observations we need. [15:17:34] With 100 obs, we get p=0.39 [15:17:52] With 200 obs, we get p=0.1736 [15:18:15] With 400 obs, we get p=0.042 [15:18:47] So, I'm already stretching to deal with 200. [15:19:02] So basically we cannot go with anything less than 250/300 [15:19:29] We can do 200. The thing that makes it valuable is the timeseries nature of it. [15:19:31] What are the p for 300? [15:19:35] Right. [15:19:57] With 300 obs, we get p=0.084 [15:20:12] 100 is too error-filled. [15:20:53] Yes. [15:21:30] If we had 100 obs, the 95% CI around 10% would be 7.1%-13.9%. [15:21:45] Woops. that was 300 obs. [15:22:09] With 100, the 95% CI is 5.6% - 17.5% [15:22:44] 100 seems too bad [15:23:01] With 200, the 95% CI is 6.6% - 14.9% [15:23:15] I think even 200 has quite a significant error, which will be a lot better with something like 260 [15:24:00] Meh. Look at the plots. I think that the trends remain pretty clear despite the width of the errorbars. [15:24:27] See a better version of the plot here: https://meta.wikimedia.org/wiki/File:Desirable_newcomer_survival_over_time.png [15:25:53] Interesting [15:26:26] What's that dip in 2006-07? [15:27:20] Huggle [15:27:31] :\ [15:27:39] The reason I built Snuggle [15:27:45] Right [15:28:15] So what wikis are you planning to check [15:28:43] Good Q. Wherever there is interest (and language fluency), I think [15:30:46] I see. [15:35:32] halfak, So when are you planning to start getting editors for this? [15:36:08] Still haven't scheduled the research hackathon yet. :P [16:23:42] Ah, ok [16:34:21] halfak, Did you see aaron's reply to the mail [16:36:13] Yup. I've been talking to him. I encouraged him to bring up his concerns to the larger group.