[01:07:46] o/ halfak [02:31:32] Well the guests are checked in. Hello all. [14:52:46] o/ Amir1 [14:52:49] You around? [20:08:25] halfak: [20:08:27] o/ [20:08:33] I just woke up [20:09:15] Hey. In meeting. Wanted to talk to you about dropping is_anon as a predictor. We can still get .95 AUC. [20:09:38] okay [20:09:52] I've got a little surprise for you [20:10:02] tell me when the meeting is finished [20:10:29] Instead of user.is_anon and user.age, I have a few sets of user_groups ({'sysop'}, {'rollbacker', 'partroller'}, etc.) [20:14:37] +1 [20:59:44] Amir1, just got done with meeting. [20:59:57] awesome [21:00:11] https://tools.wmflabs.org/dexbot/tools/ORES.php [21:00:17] check this out [21:00:27] halfak: ^ [21:00:46] COol! [21:01:14] Seems like the table doesn't really lend itself to the hierarchical structure, but it seems easy enough to work out. [21:01:37] yeah [21:01:46] I just wanted to finish this part [21:01:57] then we can go and fix all other things [21:02:12] I think it's pretty cool :) [21:02:19] right now I'm working on prelabeling [21:02:21] e.g. [21:02:25] https://tools.wmflabs.org/dexbot/tools/ORES.php?wiki=enwiki [21:02:43] I finished models too [21:02:48] anyway [21:03:10] We should have something like this at ores.wmflabs.org/ui/ [21:03:12] we will finish this very soon and I hope we put it in ores.wmflabs.org [21:03:18] :D [21:03:27] yeah, that's my goal [21:03:36] Any fancy PHP in there or just HTML/js? [21:03:50] it's PHP/JS [21:03:54] mostly jquery [21:04:19] :D [21:04:32] speaking of PHP [21:04:40] have you seen my patch about watchilist ? [21:04:49] Yeah I did :) [21:05:05] You said you thought that having a custom echo notification would work too. [21:05:06] once it's merged we can move on from watchlist and fix other things [21:05:10] +1 [21:05:25] halfak: we can have both [21:05:28] Custom echo notification should be deprioritized unless super easy. [21:05:51] it's super easy specially since echo itself is an extension [21:06:07] Great. [21:06:08] Amir1, will try to get all of the proposals from the dev summit up on phab today. [21:06:16] awesome [21:06:29] once I have those done, I'll clean up the pull request for editquality that includes the new wikidata model. [21:06:30] I've so many things to discuss [21:06:48] the dumps extractor PR? [21:06:50] I think we'll want to do a little bit of qualitative analysis before we consider deploying it. [21:06:52] Oh yeah? [21:07:12] yeah [21:07:29] I tell them one by one [21:07:36] they are too much :D [21:07:52] halfak: can you check my PR at editquality [21:07:57] the dumps extractor [21:07:58] Looking at it now. [21:08:02] Hmm... [21:08:11] I might have not pushed my changes to the branch. [21:08:20] or maybe they got overwritten? [21:08:28] it's possible [21:08:54] I didn't know you made changes to the branch [21:09:52] I intended to, but maybe they got lost. [21:09:54] * halfak checks. [21:10:58] Aha! It looks like a recent push was forced and that broke history. [21:11:01] I can recover. [21:11:27] So it looks like I did push earlier and you made modifications again without pulling down my changes. I'll try to merge. [21:11:38] FWIW, I figured out a few nice simplifications in my work on this. [21:11:59] great [21:12:02] sorry for that [21:12:07] No worries. Should be easy. [21:12:21] I thought It's only me who is working on it [21:12:27] I didn't know you're too [21:12:50] Oh yeah. Sorry. I pushed some changes when I took over for an evening. [21:13:00] So... it looks like git messed up though. [21:13:12] Can you tell me what changes you made since your last commit? [21:13:15] Anything big? [21:13:43] I completely deleted the API calls part [21:13:48] and re-wrote it [21:14:00] and then deleted anything related to edit count [21:14:13] Oh! You query all of a user-group first! [21:14:18] Good idea! [21:14:19] yeah [21:15:07] you make a few api calls and then you're done with api [21:15:24] Cool! Let me merge that in. [21:15:49] Looks like editcount was a pain. [21:16:38] I'm not sure how much that one matters. [21:17:48] Oh. Also blocks. [21:18:15] That'll work for revert detection, but I worry about it for wiki labels. [21:20:24] halfak: https://tools.wmflabs.org/dexbot/tools/ORES.php?&wiki=enwiki&models=wp10|goodfaith [21:20:38] It's almost ready [21:21:15] let me finish revids part too [21:21:46] Looks like it fails when a revid cannot be scored. [21:21:56] * halfak tried a revid that was too big to exist [21:22:51] yeah [21:22:51] Looks like this might take a bit more work to merge together. I'm going to need to delay working on the merge for now. [21:23:09] that's in my plan [21:23:11] sure [21:23:13] I will likely be able to pick it up tonight, but I don't want to put off writing those task proposals too long. [21:23:18] what can I do to help? [21:23:51] clean my mess? [21:23:53] :D [21:24:07] On minute. i'll just split my work into a new file and upload that into the branch. [21:26:03] OK. Do a "git pull" on that branch and you can see my script "extract_damaging" next to "extract_balanced_sample" [21:26:12] Note how I use the dump processor function. [21:26:28] That's where 99% of the work happens. [21:26:30] sure [21:27:03] you'll have it in one hour [21:27:11] * Amir1 grabs a coffee [21:27:16] Woot! [21:27:20] * halfak gets back to writing [22:10:57] https://tools.wmflabs.org/dexbot/tools/ORES.php?&wiki=enwiki&models=wp10|goodfaith&revids=176574|547869&go= [22:11:05] the only thing left is error handling [22:48:39] btw we are making progress in wikidata campaign [22:48:39] https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats [22:53:53] \o/ Almost 50% [23:13:08] halfak: should I keep user block parts? [23:13:18] or remove them? [23:13:36] Not sure. I can see how we get a benefit from not having them since we'd need to read the whole blog log in advance. [23:13:39] *block log [23:13:58] We generally only care about blocks when looking for edits that need review in Wiki labels. [23:22:22] Amir1, I have a proposal. [23:22:28] We could do this in two stages. [23:22:41] Oh wait. Nevermind. That would take just as along. [23:22:48] * halfak grumbles and thinks more. [23:22:57] :) [23:34:23] halfak: I think we should keep block log and edit count for wikilabels [23:35:07] this part is enough to give us good results [23:46:06] Amir1, OK. So we solve the problem for reverted detection, but not Wikilabels with this? [23:46:33] We could also just not look up block status/edit count based on flags. [23:46:50] sorry I don't quite understand [23:46:53] We already have an edit count flag -- we could add a "block" flag too. [23:47:07] So we add flags to the script that represent how we want to filter. [23:47:35] In the case where we provide flags for trusted-edits and was-blocked, that information will be retrieved. [23:48:00] When we exclude those flags, only the user-groups lookup will be performed. [23:48:27] we can not perform user-group lookup when we don't want this flag [23:48:34] e.g. we only want reverted edits [23:48:42] Indeed. [23:48:46] I think we'll want it still. [23:48:54] For reverted edits. [23:49:25] yeah [23:49:58] I agree we should keep user-groups lookup and reverted lookup (the latter doesn't require API callas at all) [23:50:18] but my question is whether we should keep user-block and user-editcount in this code [23:50:54] since we can simply asks human to label edits via wikilabels [23:51:08] Yeah. So my proposal is that we keep it in the code with the LRU and use our CLI flags to decide whether or not to perform the lookup. [23:51:34] okay, I get it now [23:51:38] sure thing [23:55:10] Cool :) [23:55:31] I just got through enough of my email wave where I feel like I can work on revscoring for a little bit. [23:55:40] I've been at my desk for 10 hours!