[14:30:58] !log shutdown travis-ci on 'revscoring': never worked -- always erroring. [15:13:09] halfak: hey, I was looking for you [15:13:12] Around [15:13:17] Hey Amir1! [15:13:24] I just got back into town late last night. [15:13:29] This morning I'm catching up. [15:13:56] cool, first of all, the meeting [15:14:01] it's 10:30 PM for me [15:14:43] It's impossible for me to particiapte [15:14:44] :( Kinda late. It used to be earlier, but ToAruShiroiNeko pushed back. [15:14:44] We could move to tomorrow. [15:14:45] I think that we're pushing up against ToAruShiroiNeko's work schedule. [15:14:49] depends to other people too [15:15:07] Let's plan to do a make-up round tomorrow -- just you and me. [15:15:14] 1300 UTC OK? [15:19:20] checking [15:19:38] that's great [15:19:41] let's do it [15:23:07] secondly halfak, I copied Stefan features to my drive and checked them so I can implement them [15:23:22] some of their features are amazing [15:23:40] e.g. one feature is value of instance of property [15:23:49] in integer! [15:24:10] if it's human the value would be 5 (because we have Q5) [15:24:19] doesn't matter [15:38:55] Bah. Missed the pings. [15:38:59] * halfak reads scrollback [15:41:41] Amir1, wut [15:41:48] I wonder if that would have any predictive value [15:41:55] it doesn't [15:41:55] I think they're just trying things at that point. [15:42:18] if you know even a little bit about AI you wouldn't suggest such thing [15:42:22] specially as integer [15:43:47] +1 [15:51:02] anyway, I asked some of vandal fighters in Wikidata to tell me what kind of vandalism is common in wikidata [15:51:04] I got some response [15:52:25] Cool! [16:16:36] halfak: btw. Can we continue working on clustering [16:16:43] the result was really promising [16:17:07] Amir1, indeed. I saw that you said the ArbCom elections were going to use clustering now too. Wat!? [16:23:06] yeah, ArbCom election in fa.wp was always controversial because we had two kinds of candidates first: not serious ones (e.g. 5 support, 1 oppose) and big ones (e.g. 50 support, 20 oppose) [16:23:22] but if you get percentage the first one comes higher [16:24:47] so 'crats (including me) decided to have a minimum number of votes to consider someone successful, but the number was always a big discussion, I remember huge fights even inside 'crats [16:27:50] we had a case that someone was elected because he had one more oppose than someone else [16:28:37] How about confidence intervals [16:28:39] :) [16:29:06] We've solved some interesting problems around low observation probabilities in statistics with the beta function. [16:29:24] that's an idea too [16:29:30] http://www-users.cs.umn.edu/~halfak/etc/I_heart_Beta/ [16:29:47] interesting [16:29:47] I'm thinking that you could generate a beta and use the lower 95% to judge [16:30:04] Or proportion of the distribution above a certain percentage. [16:30:42] I will read and try to use it [16:32:16] Can you send me features to work with in clustering halfak =? [16:32:16] *? [16:33:08] 1- We need to cluster not-reverted edits to get edit types [16:33:13] Amir1, I can but I'm unlikely to get to it today. [16:33:22] one thing [16:33:35] nvm [16:33:41] I remembered [16:33:45] kk [16:33:47] I have memory of Dori [16:33:56] * halfak is still taming the email wave. [16:34:03] Dori the cloned sheep? [16:34:13] no, Dori in Finding Nemo [16:34:21] https://en.wikipedia.org/wiki/Dolly_%28sheep%29 [16:34:27] I guess it's Dolly after all. [16:34:30] Also Lol :) [16:34:34] halfak: just a status-update-drop - I've been busy with outage fixing and fire-fighting again (another outage on Sunday) and we're even more short handed than usual (Coren has been on medical leave for two weeks, possibly one more week), so haven't had much time to catch up. awight seems to have finished up most of our packaging stuff, however :) I think [16:34:34] next thing is to get Extension:ORES done (/me looks at legoktm) [16:34:57] YuviPanda, thanks. [16:35:06] :) [16:35:10] Awesome YuviPanda [16:35:19] halfak: so in practice I haven't had to do much. Also happy to report that the sunday NFS outage didn't touch ORES nor wikilabels at all :) [16:35:35] I just got a logging patch merged to revscoring that completes what I think is "basic logging". [16:35:40] awesome [16:35:47] \o/ for no NFS issues. [16:35:57] I think that statsd is the last production thing on my plate. [16:36:03] yeah [16:36:18] I wants some statsd in the ORES service and some in precached. [16:36:27] We probably want to turn precached into a daemon first. [16:38:05] halfak: yeah, I commited a puppet role for it but I don't know if we can actually deploy anything right now because of conflicts in requirements.txt [16:38:14] (at least that was the status when I last looked) [16:38:23] YuviPanda, fixed since then. [16:38:36] Should be anyway. [16:38:41] halfak: aha! cool [16:39:14] Anyway, I need to do a full update based on revscoring's recent changes, so it would be fine to wait for that to implement precached. [16:39:26] halfak: yeah [16:39:36] These changes will be big and will break old models, so now is a good time to change scikit-learn version. [16:39:36] halfak: I'm also back in SF end of next week and should have a less packed schedule [16:40:18] If you ever get a flight with a layover in MSP, I'm only a 10 minute trip by train from the airport. [16:40:34] MSP is Delta's major hub. [16:40:56] halfak: :D I'm in some ways forcing myself to stay in SF for at least 6 months so I can re-assess my opinion of it (not too much of a fan ATM), so it'll have to wait I'm afraid :( [16:41:03] but yes, will keep in mind! [16:41:07] it also means I still haven't seen snow! [16:43:31] Ha. Yeah. I know about your enforced detention in the tech city, but I figured you might still have a flight that stops on the way :) [16:43:44] halfak: :D [16:43:56] halfak: I think the only exception is the ops offsite, wherever that is [16:44:56] Minnesota is pretty cool and you've got Andrew around Minneapolis. [16:45:02] * halfak crosses fingers [16:45:13] I'd have to shell out for a lot of beers of ops came to visit. [16:45:26] haha :D [16:45:28] indeed [16:45:33] although I'm reducing beer consumption somewhat [16:45:46] YuviPanda: I'll try and put some time into that this week. RecentChanges is a giant mess [16:46:05] legoktm: yeah, I figured [16:46:11] legoktm: <3 thank you [16:46:45] <3 legoktm [16:47:28] of course, we also eventually need to figure out a 'deployment strategy' for production [16:47:32] * YuviPanda runs away from trebuchet [18:00:29] halfak? [18:00:34] Amir1? [18:00:37] lets dance [19:05:29] hey [19:05:49] I just came from university [19:06:11] Amir1, we just finished up. [19:06:22] ToAruShiroiNeko has a proposal for changing the Friday meeting time. [19:07:27] oh [19:07:30] to when? [19:07:46] An hour later [19:07:47] brb [19:08:59] It's not possible for me, it'll be around 9:30 to 10:30, the time I come back from university [19:19:16] ToAruShiroiNeko, ^ [19:19:20] o/ aetilley [19:20:19] yes [19:20:33] hello [19:24:42] aetilley, OK. So. On to the topic of tasks for you to pick up. [19:25:10] I see three interesting fronts that you can push on. [19:25:18] Do tell. [19:25:49] 1. Bias detection. Applied math and some literature. I can help gather datasets. [19:26:38] 2. Wikilabels defining and administering interesting ways of extracting signal from human judgement. Design & some programming likely. [19:27:51] so current schedule is fine [19:28:01] but every so often I may not be there due to a train delay or something [19:28:17] I should be able to get a 3g conneciton though but it would be audio only [19:28:27] 3. Edit type classification. I think we dove too deep too soon. We should really address what a type of edit it as critique the current classification scheme we have. This would involve digging through Wikipedia essays, discussions of work (e.g. Adminship requests) and some coding for sure. [19:29:00] *type of edit is and critique the current ... [19:29:04] yikes typing! [19:29:21] FWIW, I've got blisters on my fingers from my frisbee golf vacation :S [19:29:37] there is such a thing as a frolf vacation? [19:29:40] i could use a frolf vacation [19:29:51] it seems very leisurely [19:32:36] Interesting. These seem very related. [19:33:01] harej, http://www.hhnsc.com/ [19:33:04] Is there a substatial difference between defining labels and defining features? [19:33:44] aetilley, features are predictive of labels when we're doing it right. [19:34:51] harej, do you frolf? [19:35:13] i *could* frolf. is it physically demanding? [19:35:19] And I guess Features are properties of revisions whereas labels typically reference properties of articles. [19:38:05] sorry, replace "revision" with "edits" I suppose. [19:38:26] "reverted" or "damaging" is a property of a revision. [19:38:47] Really, 'labels' are the predicted values and 'features' are the values we use to make predictions. [19:39:01] 'features' are objective and 'labels' tend to be subjective [19:39:45] "reverted", of course, is objective. But "should have been reverted" is a subjective quantity we assume based on the objective evidence. [19:40:16] I see. [19:40:35] eg. bytes changed vs. good faith [19:41:33] Yeah [19:41:34] :) [19:41:53] What tools do you use to measure how well your models are working? [19:42:44] I mean we had the AUC discussion [19:42:59] but specifically? [19:43:19] Hmm... I'm not sure what you mean. [19:43:25] AUC *is* a tool [19:43:33] I mean software. [19:44:16] I'm trying to picture the standard workflow for task 1 above [19:44:33] http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html [19:44:39] Or really for any of these. I might have a hypothesis but... [19:44:42] ah. [19:44:45] We use that, but the rest of it is our code. [19:45:04] Starting here: https://github.com/wiki-ai/revscoring/blob/master/revscoring/scorer_models/scorer_model.py#L227 [19:45:24] halfak: Sneak preview of the tool I'm working on: https://tools.wmflabs.org/archaeo/test.html [19:45:32] Charts #3 and #4 are based on ORES scores. [19:45:42] Next step is merging #2 and #3. [19:46:15] (This is for [[w:en:Bee]].) [19:47:18] Wow, what's that spike in 2015? [19:47:34] aetilley: Someone proposed it for FA, I think. [19:47:57] Somewhere on my wishlist is adding markers for such events. [19:48:34] guillom, you should try a weighted sum for the wp10 [19:48:38] (Keep in mind this is all very much a work in progress. I'm using this project to learn d3.) [19:48:50] I've been working on that with ragesoss [19:48:54] * halfak gets the codes [19:49:20] halfak: I have a basic mapping from 1 to 6; easy to change. [19:49:30] well the sklearn link you sent is simpler than I expected. I don't know why I assumed you analytics guys spent a lot of time doing complex stuff in R. [19:49:31] https://github.com/halfak/editclass/blob/master/editclass/utilities/score_article_periods.py#L109 [19:49:54] Class values start at 0 for stub and go to 5 for FA [19:50:23] Otherwise, the sum is easy. ragesoss's work suggests that the model is pretty stable. [19:50:38] will do [19:50:48] aetilley, most complex stuff I do in R is getting the plot together. [19:50:56] I suppose a hierarchical regression. [19:51:03] That'd by kinda complicated. [19:51:05] *be [19:51:25] Otherwise, the really complex stuff happens in python because it does complex work better. [19:51:28] I mean I'm pretty sure I could learn R fast, but I'm not sure that's the best use of my time. [19:51:37] Na. [19:51:45] Did you say you knew matlab? [19:51:57] More or less. I prefer octave. [19:52:13] I don't even know what octave is, but it sounds good to me :) [19:52:18] Mathworks frustrates me. [19:52:37] GNU Octave is open source Matlab. [19:53:10] +1 to that then [19:53:18] R being open source S and all [19:53:31] ToAruShiroiNeko, looks like #wikimeda-cep just imploded [19:55:54] one sec [19:57:30] For (1) do you have any of said literature in mind? [19:57:37] halfak [19:58:02] halfak yeah [19:58:04] what happened? [19:59:27] I was going to propose examining the choice of SVM and RF in revscoring, but I'm getting the impression that there are already several ML veterans among us (Amir?). [19:59:45] ToAruShiroiNeko, read scrollback. [20:00:18] I wouldn't refer to any of us as veterans. [20:00:30] * halfak tries to remember what he made #1 [20:00:37] Oh! Bias [20:00:38] I am a veteran student at best. [20:00:39] 1 Bias Detection [20:00:53] :) [20:01:04] NO I am a retired student I suppose [20:01:10] halfak scroll back on this channel? [20:01:25] Oh what happened to -cep? [20:01:33] yeah [20:01:37] I am a bit puzzled [20:01:39] They decided to blow up the channel because it was inactive. Like, right as I was telling you about it. [20:01:57] aetilley, re. lit, I added notes to the etherpad we worked on with awight [20:02:15] halfak awesome [20:02:19] so I can blame you for it then :D [20:02:31] https://tools.wmflabs.org/archaeo/test.html#wp10OverTime Now with weighted sums. [20:02:41] Looks much better; thanks halfak :) [20:03:06] guillom, coool! [20:03:39] I can't find that etherpad. [20:03:47] Does someone else have the bias discussion etherpad handy? [20:04:22] OH. Now that we store article badges in Wikidata, I could even retrieve them and integrate to the prediction. [20:06:46] This one? [20:06:48] https://etherpad.wikimedia.org/p/ORES_bias [20:06:53] Yay! [20:06:55] Yes [20:06:59] I was just about to give up. [20:07:29] Likes 22 and 23 [20:07:44] Let me find an entry point for Zeynep [20:09:09] {{done}} [20:09:22] aetilley, I'd start with those two papers and the lit you think is relevant. [20:09:29] I think we might be due for another discussion then. [20:09:34] Maybe we can take some time on Saturday. [20:09:47] Ok. [20:10:49] Could you also say a few more words about (2) before you go? [20:11:00] Just as a warning. Tufecki is going to hand-wave. She isn't really a sciency-type research, but she'll hit on some key motivations for concern around bias. [20:11:18] hm [20:11:21] Chris will take a socio-infrastructure point of view to the problem that will be much more likely to make testable predictions. [20:12:22] So, for Wikilabels, our current plans are to just put in some adminstration support, but I've heard requests for more powerful strategies for gathering information around hard-to-label items. [20:12:50] So, I imagine that we could have a campaign type that would keep re-assigning a task until it is confident that the task has been appropriately labeled. [20:13:08] This sounds like an optimization problem that balances human effort and information value. [20:13:19] hm [20:13:32] Whither the programming? [20:13:32] I'm sure there's some good lit there. I can't just produce a few links, but I know people who could get us a good angle into the relevant lit. [20:14:01] Programming would be in figuring out how to implement such an intelligent task allocation system. [20:14:19] I see. How's the wikigrok project going? [20:14:35] Dead AFAICT [20:15:19] Why did they kill WikiGrok? I thought that thing had legs. [20:15:43] Ok, I think I'm going to focus on Bias detection for now then. [20:16:48] harej, you and the rest of the research team. [20:16:56] * halfak sighs [20:17:18] aetilley, I think we should start talking about datasets we want to explore this soon too. [20:17:19] ...you mean "you and the research team"? i don't think i am part of the "research team" set [20:17:42] harej, this is wiki land. EVERYONE IS ON MY TEAM [20:17:54] congratulations on your new budget authority! [20:17:54] But seriously, the rest as in not me, but I am still included. [20:18:11] YOU'RE ALL FIRED (except my grantees) [20:18:15] Several times a day, I grab my phone to pass the time for a few minutes, and I think "I'd so much rather be WikiGroking than checking Twitbook". [20:18:27] Wikidata Game!! [20:18:30] Right!? [20:18:35] halfak, do tell. I'm used to getting handed an octave matrix of data. :) [20:18:58] OK. So the first dataset I'm working on dumping out is one that Amir is planning to use to generate clusters. [20:19:04] Each row is an edit. [20:19:19] Columns will contain rev_id, feature values, label. [20:19:39] I think that we want to do the same for the dataset of manually labeled edits. But it will have two labels. [20:19:54] So, rev_id, feature values, reverted_status, human_label. [20:20:02] Oh yeah and ores_score [20:20:25] So you'll be able to look where ores does a bad job and reason about how it relates to reverted_status and the human_label. [20:20:42] I'd also like to get a dataset of ClueBot reverts, but I don't want to write an IRC parser so blah. [20:24:53] ok [20:25:36] Where are these defined? [20:25:50] particularly the things with underscores aboe. [20:25:53] above [20:26:21] Are they all in revscoring? [20:27:06] Or are these MW properties? Sorry if this is noob. [20:39:38] aetilley, sorry, I am not sure what you mean by "defined". [20:40:05] rev_id -- The identification number of a version of a page. It is often used to represent the edit that produced the version. [20:40:24] feature values -- the objective measures we use to make predictions [20:40:33] reverted_status -- Boolean. Was the edit reverted? [20:40:54] human_label -- Did a human labeler say this edit was damaging? good-faith? [20:41:08] ores_score -- What ORES returns right now for a revert prediction. [20:41:28] halfak, That's fine. I wasn't sure if you were referring to specific variables. [20:43:25] One more question. How does this mid-term review go. Do we each submit one? Or do we all contribute to one? [20:45:26] We all contribute to one. [20:47:20] ok. [20:48:29] halfak: if i were to organize a three-person offsite, what would you recommend? [20:51:22] Wear nametags [20:51:25] lol [20:51:34] I dunno. Asking for location or type of activity? [20:55:14] harej, ^? [20:55:30] Location is going to be contingent on who #3 is going to be. Most likely the WikiProject X offsite will be arranged by having two people travel to where a third person lives. [20:56:12] I planned a two-day strategy meeting for Wikimedia DC but it wasn't *really* an offsite and there wasn't much planned outside the meeting itself. [20:57:06] IMO, the best offsites are like hackathons. [20:57:19] They are brief, the hours are long and everyone goes to dinner together. [20:57:32] Brief as in 3 days MAX [20:57:35] 2 days preferred. [20:57:44] You get sick of people after a while and that's counter-productive. [20:57:52] Well.. I get sick of people after a while. [22:24:43] o/ shilad. How's the API work going? [22:45:48] * halfak prepares for revscoring 0.5.0 release. [22:46:02] I've got to adapt all of our use of revscoring before it goes live. [23:30:41] \o/ I have extracted 8.9 million article quality labelings from English Wikipedia.