[14:57:20] o/ hey fols. [14:57:22] *folks [14:57:43] I'm looking for a big survey paper about wikipedia quality that I saw making the rounds last year or maybe two years ago [14:59:21] Maybe it was this: http://spectrum.library.concordia.ca/978618/1/WikiLit_Content_-_open_access_version.pdf [14:59:52] Looks like it is this one :)) [16:03:14] Working on https://meta.wikimedia.org/wiki/Research:Interpolating_quality_dynamics_in_Wikipedia_and_demonstrating_the_Keilana_Effect [18:39:24] halfak: that bbc report is a bit of a trainwreck, really :( https://www.youtube.com/watch?v=prwSjrDXNB0 [18:39:53] halfak: and sorry that your review is not out yet - we're dealing with some major dysfunction at the signpost currently https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/Newsroom#Note_from_editor.28s.29-in-chief [18:40:50] ..i will likely publish this issue of the research newsletter independently first this time, later this week [18:42:01] HaeB, thanks for letting me know. No worries. Though I've been considering turning my review into an essay so I can have a permalink ^_^ [18:42:14] Volunteer run activities are sometimes fragile. [18:44:26] the arrangement with the signpost has been working fairly well since 2011 (obv deadlines are always missed, but at least it came out with some regularity), but now it seems a real possibility that it will stop appearing - march 2017 was the first full month without a new issue (since the signpost's founding in 2005) [18:47:08] Wow. Hope he's OK [18:47:31] Even if he is super busy, it's highly unusual to not even show up to say so. [18:48:36] halfak: hi there [18:48:55] Hi matanya [18:48:57] halfak: https://phabricator.wikimedia.org/project/view/1596/ [18:50:04] halfak: and a 5 minute talk about stewards : https://photos.google.com/share/AF1QipM9EvgR7gL5kUCqILgKOM7i1NpOKAdnD5Vlb4SrWlcDJ9O3AByj7Bqdv0DoaETQzw?key=LTZYOXM2ZWl5elZ0U0EwQUhQNHNvVURQbGp5a0Rn [18:57:48] matanya, https://phabricator.wikimedia.org/T162297 [18:59:12] I'll keep adding notes there. Please feel free to do the same. [18:59:28] halfak: how do you want to tackle this ? [18:59:50] would amount of action done help ? [19:05:35] matanya, I just asked a Q on the task. [19:05:40] Maybe we could take it from there. [19:05:48] will do, thanks [19:05:50] Essentially I'm asking "Where should I look first?" [19:25:41] halfak: shed some light there, just tell me if i am in the right direction, please [19:53:33] matanya, what personally tires you out about steward work? [19:53:51] "If only I didn't need to , it'd all be easier." [19:53:53] halfak: checkuser [19:53:58] Interesting [19:54:02] Tell me more :) [19:54:19] it is very very tedious [19:54:43] also, matching IP's and UA manually is very troublesome [19:55:41] in addition, the need to review past locks/blocks every time we try to find user correlation is lengthy [19:56:46] halfak: i need james to approve your access to restricted stuff on stew wiki to show you examples [19:57:00] If it's in the DB, I can see it [19:57:14] What's the URL for this wiki? [19:57:50] steward.wikimedia.org [19:58:38] Looks like that is dbname = "stewardwiki" [19:59:24] matanya, stewardwiki does not use centralauth? [19:59:40] not, it does not afiak [20:00:28] HaeB: um, among the other puzzling things about this BBC clip, am I missing something, or did they not identify any of the people they interviewed? [20:56:17] Emufarmers, agreed that they did not identify any of them [21:37:10] o/ Nettrom [21:37:18] Hey dude. Just wanted to check in on https://meta.wikimedia.org/wiki/Research:Automated_classification_of_article_importance [21:37:52] o/ halfak [21:40:26] I’m currently working on two things: categorizing Low-importance articles in WPMED in order to either filter them out or feed info about them to the classifier, and looking into whether this type of issue is something we’re likely to encounter in other projects as well [21:42:19] Gotcha. Is it just that there's a lot of inconsistency with the use of "low importance" for WPMED? [21:42:39] I remember you gave an example that I found surprising, but from the sounds of it, this is a general problem? [21:43:16] they’re reasonably consistent, but they have a bunch of categories of things that are Low-importance by default because they’re not a core part of WPMED… e.g. all people are Low-importance, regardless of what they’ve discovered [21:43:52] Weird. [21:44:42] I’ve been chatting with J-Mo about it, I can bring it up again in tomorrow’s RG meeting so you can get the full picture? [21:45:46] Sure. I think I've got the gist. It's an interesting problem. Sometimes "importance" != Importance(TM) [21:46:04] And would WPMED like to follow a general rule or their specific ones. [21:46:24] that’s exactly what we’re talking about, that “importance” has multiple meanings… I want to look into whether that also occurs in other projects [21:50:25] I’m on the RG agenda for tomorrow [21:55:28] Cool. Bummer it's complicated, but this is interesting [22:00:26] yeah, coming up with a more generalized approach to discovering these patterns is hard… am thinking I should just hardcode them for now [22:02:01] I wonder if we could learn them. [22:03:44] yeah, I’ve been building networks based on Wikidata and at least for WPMED there’s some good signal there [22:04:11] might be other way to do this, though… e.g. NLP [22:04:12] I wonder if we should look at wikidata for some booleans that would be useful. [22:04:16] E.g. is human [22:04:51] “instance of” Q5 (human) works, yes [22:05:12] scientific journals are also big [22:05:15] I've got a lot of features like that specified for our vandalism detection models. [22:06:49] I’ll write a short Python script to give me the parents of majority-Low-importance articles, that might be a good start (and something along the lines of what I’ve been thinking) [22:07:07] ideally I’d want to move upwards in the graph and merge if possible, but that could be a second step for later if we need to generalize it more [22:11:46] Nettrom, gotcha. I'll be looking into how we can have a Wikidata item lookup in ORES feature extraction. That sounds really interesting to me.