[00:23:55] awight, sorry I missed your message [00:24:18] It means that the observation was flagged for review by the autolabel script [00:25:06] We didn't always have an auto-label script :| [05:18:54] halfak: I'm still not imagining this quite right. Splitting the needs_review from no_review streams after wikilabels feels uncouth, especially since we're going to splice back in after merge_labels. Why avoid piping the .review. wikilabels output through merge_labels? [06:49:18] wiki-ai/editquality#188 (simplify_template - 5e4a007 : Adam Wight): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/354638750 [15:31:04] halfak: https://docs.google.com/drawings/d/1KdDdwKCIRLuveb2vIn6dicgl8vEvZId5WbNcufil9g0/edit [15:32:06] i can't seem to be able to come up with an catchy title for the drafttopic submission for Wikimania [15:32:43] something around "WikiProjects and topics helping article review" but missing a decent title for that [15:37:33] Adopting new articles by guessing WikiProjects [15:37:39] good question [15:37:45] It's a lot of ideas to put into one sentence [15:38:24] i've been banging my head since morning not getting past that enter your title page [15:38:37] lol just put and carry on [15:39:08] <codezee> awight: we can modify that later? [15:39:20] <awight> The EasyChair thing is strange, since you can't upload papers, it's just an abstract anyway [15:39:55] <awight> I think we need to put links to the actual project docs, in a space that small. 300 word max? [15:40:15] <awight> Yes you can definitely change the title later! [15:40:15] <codezee> awight: are you doing a paper submission? [15:40:34] <awight> yeah but more like a 20-min presentation than paper :-) [15:40:52] <awight> https://www.mediawiki.org/wiki/JADE/Wikimania_2018_presentation [15:41:10] <awight> https://docs.google.com/presentation/d/1DJdp98jVg7BRKhfmy-Qfv-eyjhSIhc0p4KXbYhK1QSc/edit [15:41:31] <codezee> well i don't yet have a presentation page, so i might as well create one and fill it later [15:45:31] <halfak> o/ [15:45:42] <halfak> awight, the "needs_review" that weren't labeled are useless. [15:46:24] <awight> What are they doing in the human_labeled file? Maybe we should enhance wikilabels to always normalize its output? [15:46:53] <halfak> they aren't in the human_labeled file. [15:47:06] <awight> codezee: I'm lost as well, IMO it would have been better if the Wikimania submissions were a normal wiki template, that you link to from EasyChair. [15:47:14] <halfak> awight, wikilabels is more general than editquality [15:48:23] <awight> halfak: Are these lines redundant, then? https://github.com/wiki-ai/editquality/blob/master/Makefile#L382-L383] [15:48:31] <awight> s/]// [15:48:42] <halfak> No [15:49:49] <halfak> The merge_labels script assumes (with a warning) that all observations that don't have labels are not damaging and goodfaith. [15:50:08] <halfak> Because it needs to assume *something* [15:52:00] <awight> I'm confused about a few basic things. The autolabeler is the only code responsible for adding a "needs_review" value, right? And the field's value means, true => not a trusted user; false => trusted user. [15:52:29] <awight> When we feed autolabeled observations into wikilabels, we get them back with the needs_review tags intact. [15:53:43] <awight> In the case of "balanced_5k", we can even say that 50% will be autolabeled needs_review. [15:57:04] <awight> So I don't understand what's happening at https://github.com/wiki-ai/editquality/blob/master/Makefile#L367 , where we split out the autoconfirmed edits. Wikilabels will try to have its entire input hand keyed, regardless of the needs_review flag, right? So after we pull human_labeled revisions from wikilabels, why are the needs_review observations any different? [15:59:23] <halfak> "autoconfirmed edits"? [16:00:11] <halfak> We merge the unlabeled needs_review: false edits with the labeled needs_review: false edits. Does that make sense? [16:00:30] <halfak> Then we union that with the needs_review: true edits that have been reviewed. [16:00:50] <halfak> We end up discarding the needs_review: true edits that have not been labeled. [16:01:06] <codezee> halfak: how does "WikiProjects, ML and article review" sound for now? [16:01:10] <wikibugs> 10Scoring-platform-team (Current), 10JADE, 10Documentation: Create mediawiki.org article for Extension:JADE - https://phabricator.wikimedia.org/T189938#4058406 (10awight) [16:01:34] <halfak> Hey codezee! [16:01:44] <halfak> I'm working on one too to talk about backlogs and shifting NPP [16:01:55] <halfak> Here's my keyword list: Artificial intelligence, Newcomer retention, Knowledge equity, Machine learning, ORES, Topic modeling, New page patrol, ACTRIAL [16:02:03] <halfak> easychair says it is too long :| [16:02:20] <codezee> :// [16:03:02] <codezee> halfak: since you're talking about backlongs and NPP what can we make as the theme of this? - maybe why we need to support WikiProjects and how they can help? [16:03:53] <halfak> https://etherpad.wikimedia.org/p/npp_ores_2018 [16:03:56] <halfak> codezee, ^ [16:05:47] <codezee> halfak: so i suppose drafttopic will automatically form a significant part of this...^ [16:06:11] <halfak> I'd spend little time talking about the actual prediction model [16:06:36] <halfak> I'm aiming to present a modified version of the showcase presentation [16:06:55] <codezee> halfak: so whats your take? can it do with one more presentation talking about the details of how we do it? [16:07:05] <halfak> Sure! [16:07:46] <awight> halfak: That's starting to clarify for me. That whole dance is to deal with gaps in wikilabels output, for which each row is either {all values hand-labeled, some but not all values hand-labeled, no values labeled}. [16:09:32] <awight> What if we made the merge utility smarter, so it took --autolabeled-no-review FILE1 --human-labeled FILE2, and threw out unlabeled wikilabels outputs? [16:10:18] <awight> I was also imagining that autolabeling should output two files, since we always want to use the data separately. [16:10:21] <halfak> It can detect "needs_review" with no labels and discard them [16:10:27] <awight> +1 [16:10:31] <awight> already? [16:10:47] <halfak> i don't think it does that. I've been assuming it didn't [16:10:52] <halfak> Been a long time since I worked on it. [16:10:55] <awight> cool. [16:17:33] <awight> halfak: Do we ever see corrupted autolabeled files? I'm surprised that the multiple threads don't step on each other's outputs. [16:17:56] <wikibugs> 10Scoring-platform-team (Current), 10JADE, 10Documentation: Create mediawiki.org article for Extension:JADE - https://phabricator.wikimedia.org/T189938#4058406 (10MarcoAurelio) I took the liberty to start a draft at https://www.mediawiki.org/wiki/Extension:JADE - you can continue there should you wish. Let m... [16:18:36] <wikibugs> 10Scoring-platform-team (Current), 10JADE, 10Documentation: Create mediawiki.org article for Extension:JADE - https://phabricator.wikimedia.org/T189938#4058451 (10awight) @MarcoAurelio Right on, thank you for the help! [16:22:38] <halfak> awight, we write to the output file using a queue [16:22:56] <awight> Looks like it's just an open() [16:23:59] <halfak> queue happens before that [16:24:12] <halfak> inside of para [16:24:14] <halfak> or imap [16:24:16] <halfak> or whatever [16:24:26] <halfak> One single loop of output [16:24:32] <halfak> everything is sequenced there [16:25:09] <halfak> brb interval training [16:26:21] <awight> AIUI, the for loop is still acting on a concurrent iterable thing, so access to shared resources like file pointer labels_f isn't safe. [16:29:48] <halfak> why> [16:38:14] <awight> cos there's no thread synchronization built into file.write, so threads trying to write at the same time will corrupt each other's writes [16:38:29] <halfak> file.write only happens in one thread [16:38:44] <halfak> :S [16:38:55] <halfak> file.write does not happen in the mapped function [16:39:41] <halfak> the mapped function returns a result. That gets put on a queue, the queue is dumped into the main thread/process and then written out. [16:42:37] <halfak> brb again -- next interval [16:48:54] <awight> Whew, okay that makes sense. I found what I wanted in the manual: [16:48:55] <awight> > It blocks until the result is ready [16:48:59] <awight> lol good to know [17:00:39] <awight> halfak: If a human_labeled observation has some but not all of the labels, but has been autolabeled with needs_review=false, should we merge the partial human labels into autolabel defaults? [17:03:13] <halfak> yeah. I think so [17:03:54] <awight> kk [17:03:55] <halfak> yow. That's interval was hard. [17:04:12] <awight> hehe glad I'm just reclining and eating peanuts [17:06:53] <awight> I'm not sure what value "auto_labeled" should have for merged rows with partial hand-labeling [17:11:44] <halfak> true [17:13:12] <awight> Are you sure that wikilabels is capable of returning observations with empty labels? [17:13:34] <awight> Spot-checking in cswiki, there are no rows missing damaging or goodfaith, and no "null" values. [17:15:03] <awight> same thing with enwiki.20k_2015 [17:19:42] <awight> halfak: This looks like a bug, https://github.com/wiki-ai/editquality/blob/master/editquality/utilities/fetch_labels.py#L95-L100 [17:21:09] <awight> IMO we should omit the label or output None for its value. The former would make our merge slightly easier. [17:25:42] <awight> gtg soon. [17:27:20] <halfak> Desired behavior != bug [17:27:29] <halfak> But we can debate desired behavior [17:29:47] <halfak> final interval [17:29:54] <awight> hehehe [17:30:13] <awight> I'm now listening to a Sesame Street radio play involving cookie monster [17:31:06] <awight> halfak: re. reviewing the behavior I'm imagining, I'll just push patches to explain [17:36:07] <paladox> awight lol [17:52:55] <awight> gtg [18:06:34] <codezee> halfak: is it okay to provide links in the abstract? [18:06:46] <codezee> basically the link to the main project page [18:06:49] <halfak> I'd guess so. [18:07:19] <codezee> right now I'm going with the title "WikiProjects and Topic Models - The How?" [22:34:09] <wikibugs> 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10User-Ladsgroup: Clean up old backward compatibility settings of $wgOresModels - https://phabricator.wikimedia.org/T189948#4058681 (10Ladsgroup) [22:37:27] <wikibugs> (03PS1) 10Ladsgroup: Integrate all parts of support for wp10 model [extensions/ORES] - 10https://gerrit.wikimedia.org/r/420212 (https://phabricator.wikimedia.org/T175757) [23:46:11] <wikibugs> 10Scoring-platform-team (Current), 10User-Ladsgroup: Build Scoring platform community monitor - https://phabricator.wikimedia.org/T189954#4058771 (10Ladsgroup)