[15:23:51] halfak: Checking in about the monthly item quality "query". Has it wrapped up? [15:24:14] Oh! It has. I was just moving it to the appropriate place. [15:24:31] 943071264 rows [15:26:01] Just under 1 billion [15:26:04] Thanks [15:29:28] OK it looks like I need to recompress. It'll be at https://analytics.wikimedia.org/datasets/one-off/entity_usage/20170701/ soon [15:33:12] Sounds good. Thank you [16:11:11] https://figshare.com/search?q=:tag:wikimedia-research still doesn't work >:( [16:13:21] o/ lzia [16:13:47] I just caught up on your thread about the new editor gaps project on wiki-research-l [16:14:25] I'm not sure what Kerry is talking about WRT framing: "women are failing at Wikipedia, we have to fix the women" [16:14:41] It seems that the proposal is explicitly to change the environment, but maybe I'm missing something. [16:14:48] *fix the environment [16:31:39] halfak: Related to our conversation on Wednesday about work for Gaby and Julie…I had a couple of questions: 1) How would you define “uninteresting” and “interesting” entity usages? It seems perhaps an uninteresting entity might just be an entity for a template or something like that. 2) What do you think about having them doing the following basic things to start off: [16:32:08] https://docs.google.com/a/umn.edu/document/d/1AmG_K_NqvY4z2kP-6RbZFeM_IV0zSY4_7kv9C0l_mPw/edit?usp=sharing [16:32:12] 1. I think that a real human being will know what's "uninteresting" when they see it :) [16:32:17] I don't know yet what they will see. [16:33:03] hall1467, maybe more interesting to look at examples of misalignment. [16:33:16] High quality/activity, low value and vise versa. [16:33:29] *vice [16:33:44] or is it visa? [16:33:52] latin, who knows? [16:34:19] https://www.merriam-webster.com/dictionary/vice%20versa [16:35:29] Okay, related to 1) true, it's hard to know exactly what we will encounter [16:37:59] Sure, I can ask them to look at misalignment data. It should hopefully be ready by Monday (had to redo the xml processing based on our conversation yesterday and that takes awhile) [16:39:20] While less interesting, are the analyses outlined in the doc something you'd still do? [16:40:36] hall1467, I guess it's not a bad idea to give your undergrads a few different tasks so they can get experience. [16:40:53] And you can generate these datasets right now whereas the misalignment ones are in progress. [16:41:41] Just checked on my recompression job for item quality and it's still running. [16:47:02] Right, I'd like to get them started on something in the domain. Will be a good experience I think [16:47:41] Oh really? I thought it had finished since there was a file in the above directory [16:48:27] I was marveling at how well it had compressed haha [16:48:39] hall1467, yeah, I'm compressing as I write to that so don't trust that file yet [16:48:40] Sorry [16:48:56] Probably better to just recompress and then copy next time [16:49:09] No worries [17:27:03] * leila looks for halfak's ping [17:27:16] o/ leila [17:28:04] I was just trying to see what you thought about the feedback from Kerry. I'm trying to work my way through it and understand. [17:28:07] hi halfak. :) thanks for the ping re the thread. I will respond to Kerry when I get a chance (there are so many points in that email:). But I do agree with what you say. [17:28:43] OK cool. :) I wanted to bring this up because it's a critique I've heard a lot about my own work especially WRT discussing newcomer socialization and the gender gap. [17:29:12] I'm also personally not interested in "fixing women" and much more interested with "fixing the environment" so thats why I wanted to bring it up with you. [17:29:26] halfak: re that specific point, I think she's taking it to an extreme. Sure, you can read the proposal as: population x has lack of confidence, we need to fix it. You can also read it as: system Y should accommodate different levels of confidence at entry, so let's fix system Y. [17:30:07] what I am always concerned about is that when you do work in diversity related issues, people get tempted to think about one extreme. :) [17:30:42] yeah. I guess I see that too. It's good to have these conversations, but they are also prone to explosions. [17:32:21] halfak: another thing you may be interested about that I won't touch in that thread for now. I'm interested for us to focus on humans and try to understand what characterizes the population for which a boost in confidence can help. This binary separation of men/women is too coarse. There are many men I know who can benefit from confidence provided to them early on, there are many women I know who can also live without that extr [17:32:52] (this research won't be able to address what I just said, but my hope is that we think about humans, as opposed to these coarse filters) [17:33:01] +1 I totally agree and I'm really happy to hear (read?) that you're looking at gaps in general. [17:33:32] Surely there are some important obvious categories, but there are likely some important non-obvious categories too :) [17:33:45] I am sympathetic to the argument, but also wonder if it's worth at least keeping in mind since it might help us understand how external factors (like someone's gender) might play a role. [17:33:47] yeah. :) [17:34:02] I may also have no idea what I am talking about. [17:34:07] harej, keeping what in mind? [17:34:17] gender [17:34:29] And other demographic considerations, of course [17:34:37] harej: keeping in mind is good, designing systems to accommodate it may not be. we should keep an eye on this and take steps very intentionally. [17:34:42] Hmm. This is a good example. Where was it stated that gender would be discarded as a consideration? [17:34:53] leila: that is a good way of putting it [17:36:02] harej: an example unrelated to the diversity on Wikimedia projects. We know now that if we're building systems that are going to predict whether a human will commit a crime, we should keep race out of the feature set, because if you don't, you will end up with harmful generalizations. [17:36:20] halfak: nowhere in this research for now. [17:38:36] Right. Somehow suggesting that we not fully focus on gender can imply that we'd disregard it as a consideration to some readers. [17:39:15] leila, +1 re policing. However we would use these demographic predictors in our evaluation when looking for evidence of disparate impact. [17:39:38] yes, depends on what you are predicting of course, halfak. :) [17:39:55] I would love to see how Wikipedia's newcomers experience (esp. quality control and warnings) have a disparate impact across interesting characteristics -- e.g. gender, income, native language, etc. [17:40:10] uhu [17:40:22] One thing I could do then is look for ways to mitigate that impact with ORES edit quality/draft quality models. [17:41:46] Woops! Gotta run. [17:41:53] Back in ~1 hour [17:41:58] o/ [19:22:03] o/ [19:22:05] Took longer than expected. Had lunch ^_^ [20:08:18] leila: I added Andrew Hall's showcase presentation info to wiki https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#July_2017 and updated the phab task: https://phabricator.wikimedia.org/T170179 do we have a second presenter for 7/26, or will it just be Andrew? [20:15:25] also leila: let me know if there's anything else I need to do re: this research showcase [20:31:51] J-Mo: it will be a short one. Just Andrew. [20:32:11] let's keep the presentation to 20-25 min, with 10-15 min Q&A? [20:32:23] (finishing it in 30-min is also fine, np. whatever works for Andrew) [20:33:26] and thank you, J-Mo. You can ping Sarah on the task and ask her to start the announcements, or let me know and I'll do it. [20:34:12] it would help me if you can do it, leila. Kind of swamped today :) [20:34:29] np [20:56:13] leila: My presentation should be within that time range. Probably will be 15-20 minutes :) [20:59:31] great, hall1467. :) then we will have a shorter showcase, lovely. [20:59:39] looking forward to it, hall1467 [21:06:22] Thanks! [21:10:20] halfak: I think the file writing finished. Is that correct? [21:39:04] * halfak checks [21:39:43] Confirmed. Completed compressed file size is 5586776424 [21:40:27] Or 5.203091003 GB [21:40:31] hall1467, ^ [21:42:20] Great! Thanks for checking!