[13:45:27] o/ [13:45:53] Was out riding bikes during my last ping here yesterday [13:59:11] halfak: o/ [14:00:19] I wanted to say that I have finished engineering all of the features except the feature for completeness because my patch hasn't yet reviewed. That said, I tried to move on to train the model by following this old guide: https://github.com/wiki-ai/editquality/blob/master/ipython/reverted_detection_demo.ipynb [14:00:51] Do you think it is a good step while waiting for the patch review? [14:01:01] link me to the patch [14:01:07] https://gerrit.wikimedia.org/r/#/c/356043/7 [14:01:13] Oh I thought for feature [14:01:14] s [14:01:35] nah. It is for modifying Wikidata API param [14:02:04] Since today is a public holiday in Germany, I will ping the Wikidata folks tomorrow [14:02:06] Do you have a PR for features? [14:02:10] not yet [14:02:19] I thought I want to test with the real model first :P [14:02:26] super curious how this turns out [14:03:35] or do you want me to submit a PR first? [14:05:12] Frankly, I only modified revision_oriented.py in both /revscoring/features/wikibase/features and /revscoring/features/wikibase/datasources [14:05:39] submit PR. I can help make sure you're doing it right ;) [14:06:45] kk [14:26:09] halfak: I am working on the PR now [14:52:10] halfak: PR created. https://github.com/wiki-ai/revscoring/pull/319 [14:59:47] OK, generally, I think that the Right(TM) place to do this is in wikiclass. [15:01:21] halfak: hmm are you still reviewing it? [15:14:11] halfak: and for wikiclass, are you referring to this repo? https://github.com/wiki-ai/wikiclass [15:35:02] glorian_wd, yeah. Leave your PR up and I'll move some stuff. [15:35:46] halfak: ok. Let me know once you are done. So I can start fixing the feedback you gave me [15:36:04] fixing the code based on the feedback you gave me* [16:12:36] halfak: are you around? [16:12:47] In meeting. But sort of. [16:13:03] halfak: let me know when your done with the meeting i can wait. [16:13:23] 2 hours from now [16:13:33] fair enough :) [16:22:03] anyone know when wikilabels master will be updted in our wikilabels instance? [17:11:22] Zppix, can you move that task to the "done" column on the main board. [17:11:47] And if you want a deployment, make a task for it and add the things that will be deployed as subtasks. :) [17:11:54] I'll try to get to it this week :D [18:38:34] Ok halfak|Lunch will do asap [18:58:04] my phab is being stupid so it wont allow to move it to done but its on the main workboard [20:09:02] halfak: T167061 created and assigned to you [20:09:03] T167061: Deploy new workset not found for user message on WMF wikilabels instance - https://phabricator.wikimedia.org/T167061 [20:58:38] Thanks Zppix [20:58:51] halfak: need anything else? [20:59:01] Hmm... I can find something :) [20:59:04] * halfak looks [20:59:23] Check this out: https://phabricator.wikimedia.org/T165872 [20:59:32] k 1 sec [20:59:39] I think it'll be a good one to show you another part of the system. [20:59:44] * halfak makes a little demo [21:00:30] I remember while i was looking at the meta page for labels seeing stuff about bvcs [21:00:43] * halfak added an event in his calendar titled "OMG actual work" so I can do things in this channel for a little while :) [21:00:54] hmm... bvcs? [21:01:01] Not sure what what would be referring to [21:01:15] bvds sorry [21:02:07] Oh BWDS! [21:02:21] BWDS needs some refactor love. [21:02:27] I have a half-refactor done. [21:02:29] Needs more. [21:02:38] bwds should learn to accept wikilove xD [21:02:45] But I'm not sure that's a great thing for you to pick up. [21:02:46] lol [21:02:56] i agree [21:03:15] The one with ORES precache is nice and clean. AND you get to play with EventStreams :) [21:04:48] halfak: I already made a pr for moving to event streams [21:05:18] Amir1, right. I was mistaken [21:05:30] I actually linked to "ha" removal [21:05:31] derp [21:05:32] is there a recent dump of wp10 scores, or is https://datasets.wikimedia.org/public-datasets/enwiki/article_quality/wp10-scores-enwiki-20160820.tsv.bz2 the most recent? (ok if it is, but curious as i'm about to start evaluating it as a feature of the search ranking model) [21:05:38] Was just reviewing Amir1's PR [21:05:41] halfak: revscoring shows nothing about ha [21:05:48] https://phabricator.wikimedia.org/T165872 [21:06:09] ebernhardson, nothing more recent now, but we can make one. [21:06:14] file a task for it? [21:06:16] :D [21:06:30] ebernhardson: yeah, the code is super basic and easy to run [21:06:34] \o/ [21:06:38] Yay for past us! [21:06:57] sounds like something that could be automated to make weekly or monthly dumps ;) [21:07:01] i'll file a task, thanks! [21:07:08] halfak: I also have https://github.com/wiki-ai/ores/pull/201 and https://github.com/wiki-ai/wikilabels/pull/181 [21:07:17] ebernhardson, yeah. been talking to analytics about that but they aren't ready for us quite yet. [21:07:29] joal was working on loading our text prediction models into spark last I heard. [21:08:53] T165872 unless informals are stored onwiki i see nothing about ha in the informals lists for hungerian [21:08:54] T165872: Don't use "ha" as an informal in hungarian - https://phabricator.wikimedia.org/T165872 [21:10:36] Zppix, we use English informals in hungarian models. [21:10:39] * halfak gets link [21:11:58] https://github.com/wiki-ai/editquality/blob/master/editquality/feature_lists/huwiki.py#L42 [21:12:26] im not even in the right repo [21:12:28] lol [21:12:40] i thoguht it was related to revscoring [21:12:42] It's confusing, but there's a logic to it :) [21:12:44] Oh it is [21:12:58] revscoring is a framework, editquality implements the framework :D [21:13:09] oh... confusing [21:13:26] heres the solution let hu have own informals? [21:13:29] insteado f importing? [21:13:56] So here, we'll want to use english informals for feature extraction, *but* we don't want "ha" to get matched. [21:14:07] I just implemented a nice way to do that. [21:14:42] so you already did that? [21:14:56] I made it so that we *can* do it :D [21:15:05] Put it in the framework. [21:15:23] $ pip install revscoring==1.3.12 [21:15:27] Zppix, ^ [21:16:04] ok [21:17:02] Hmm... something isn't working. [21:17:05] * halfak looks into it. [21:17:10] I'm working on an example for you :) [21:17:14] i just noticed [21:19:13] halfak: I will try to go to sleep, I doubt I can, I will come back and work on deploying ores review tool in frwiki but if there is anything I should check on sooner (e.g. they are important) just tell me [21:19:30] That sounds perfect, Amir1 [21:19:38] Are you done with the precached PR? [21:19:44] yeah [21:19:45] If so, I'll submit a followup commit. [21:19:48] cool :) [21:19:55] Review will be up to you then :D [21:19:58] Also, CoC PR and UX PR [21:20:38] (The CoC patch is already merged into mediawiki core, this one is just a copy paste of it) [21:20:41] halfak: i noted a grammar issues on Amir1 's wikilabels pr, i was awaitng your review til merging [21:20:50] UX one is rather big [21:22:07] Sounds good. I'll aim to get to those in the next couple of hours. [21:22:14] Sorry that I was in meetings all day today :( [21:22:27] halfak: alot of staff were in meetings today so no worries [21:23:37] Lucky number 13 [21:23:50] $ pip install revscoring==1.3.13 [21:23:53] Zppix, ^ [21:26:37] Zppix, https://gist.github.com/halfak/164f382fbaa2a7d75733a458610f22c3 [21:28:50] So for English informals, I think we'll want to to define english_informals_no_ha like this: https://github.com/wiki-ai/editquality/blob/master/editquality/feature_lists/enwiki.py#L32 [21:28:58] In huwiki.py [21:29:29] And then on this line: https://github.com/wiki-ai/editquality/blob/master/editquality/feature_lists/huwiki.py#L42 [21:29:56] We'll have "enwiki.badwords + english_informals_no_ha" [21:32:48] halfak: I hope you can also review my PR in the "OMG actual work" timeslot ;) [21:34:49] glorian_wd, will likely get to it but it's going to be more work. [21:35:32] halfak: more work? do you mean it's going to take a lot of your time? [21:35:37] yeah. [21:35:38] s'ok [21:35:40] how about asking Amir1 to help you? [21:35:55] * glorian_wd peeking Amir1 [21:36:33] ok halfak [21:37:43] Amir1 is headed to bed. [21:37:55] Zppix, let me know if any of that doesn't make sense :) [21:38:17] how do i like define no ha and make sure it doesnt include ha [21:38:36] halfak: I will poke Amir1 tomorrow :D. But yeah, in case he can't manage to help reviewing the PR, I hope you can get to it soon :) [21:39:29] See the gist link I pasted :) [21:39:48] >>> informals_no_ha = english.informals.excluding(["ha"]) [21:41:29] do i put that in enwiki.py in editquality repo? [21:42:16] Na. put it in the huwiki.py file [21:44:21] so i just put informals_no_ha = english.informals.excluding(["ha"]) on what line and then on line 42 i just replace informals with informals_no_ha? [21:51:58] halfak: [21:52:19] So, there's a subset of features for informal words. You can see how they are defined in a list in the enwiki.py file. [21:52:34] This is what is used on line 42 "enwiki.informals" [22:06:40] halfak: https://github.com/wiki-ai/editquality/pull/70 [22:07:48] Zppix, you shouldn't need to make any modifications at all to enwiki.py [22:08:24] I was linking to that list of informal features to show you how to make a new list with "informals_no_ha" rather than "english.informals" [22:08:45] ugh [22:08:46] E.g. "informals_no_ha.revision.diff.match_prop_delta_decrease" [22:08:51] etc. [22:09:03] ok [22:10:22] so do i do hungarian.informals_no_ha.revision.blah [22:11:05] Not quite. So that "english.informals" thing is a RegexMatcher thing. [22:11:39] When you run english.informals.exclude(["ha"]), you get back a new RegexMatcher thing that will match everything but the exclusions. [22:11:52] So then you can use that in the same way that you would have used "english.informals" [22:12:16] so just remove the current informal setup for huwiki and just use the no ha one? [22:13:11] Zppix, in this case, hungarian has it's own informals, so we'll still want those. [22:13:25] but replace the english one with the no ha? [22:13:27] We just also want to include English informals because people vandalize in English everywhere [22:13:30] right! [22:13:40] i understand now ok [22:13:42] :D [22:15:46] okay check agian halfak [22:17:20] (i saw the typo fixing [22:17:33] Closer, but still not right. [22:17:55] those informals are a list of features, but the RegexMatcher thing isn't a feature. [22:18:26] ok? [22:18:39] Note how on line 42, we don't have "hungarian.informals" [22:19:09] Instead, we have a variable "informals" that points to a list of features inside of "hungarian.informals". E.g. "hungarian.informals.revision.diff.match_prop_delta_increase" [22:19:11] yes [22:19:36] Same deal with "informals_no_ha". E.g. "informals_no_ha.revision.diff.match_prop_delta_increase" [22:20:01] so what do i do with informals_no_ha = english.informals.excluding(["ha]) im confused [22:21:09] Run it on the line before you define "english_informals_no_ha = [informals_no_ha.revision.diff.match_delta_sum, ...]" [22:21:35] Right now, you're doing an assignment inside of a list construction and that won't even compile ;P [22:22:52] so informals_no_ha = excluding [ revision.diff.match. foo [22:25:02] halfak: so where informals is i paste the exluding statement? [22:25:50] so instead of informals [ delta sum blah blah it would be informals_no_ha= excluding statement { [22:27:24] https://gist.github.com/ [22:27:27] Woops [22:27:33] https://gist.github.com/halfak/96a503cd8374591ad1d817b9fc62a188 [22:27:36] Zppix, ^ [22:27:40] Something like that [22:28:23] what is informals_safe_for_hu coming from? [22:28:39] nevermind [22:29:22] review again i just did stuff [22:30:26] Amir1: pr is wip [22:30:47] Zppix, getting closer. [22:31:06] If you put the features into "informals" (which is fine), then you don't need to include it on line 46 [22:31:14] Zppix: please add [WIP] in the title. That way I know it's WIP (see halfak's PR in revscoring) [22:36:33] halfak: Have you seen https://phabricator.wikimedia.org/T113114? [22:36:45] we might do the same for ores 404 errors [22:36:50] please :D [22:37:16] Our 404s should be JSOn [22:37:24] But any UI 404s sounds good to me [22:37:40] Oh wait. Yeah I see. Any real 404 should be this. [22:37:56] A 404 where the user was looking for a score for a wiki we don't have is a JSON 404 [22:37:58] Sound right? [22:38:40] halfak: yeah [22:43:42] halfak: done [22:50:17] halfak: i just relised i used the wrong task number ... [22:55:18] Amir1, https://github.com/wiki-ai/ores/pull/200 [22:56:00] Zppix, looking at your pr [22:56:19] k [22:56:52] Zppix, looks like you only copied down 3 of the features. What about the other 3? [22:57:04] match_prop_delta_increase, but no match_delta_increase [22:57:20] i didnt notice the difference :p will fix [22:57:20] One is an absolute measure and the other is proportional. [22:57:24] Otherwise looks good :) [22:57:37] Oh! one more thing. [22:57:47] You should delete "informals" from line 46 [22:57:53] Since it's already on line 45 [22:59:17] halfak: did you test your changes? [23:00:02] nope :) [23:00:13] done half [23:00:14] I made sure there wasn't a syntax error ;) [23:00:18] halfak: * [23:00:29] :D "half" used to be my nickname in highschool [23:02:29] Amir1: pr is not wip [23:02:38] \o/ [23:03:02] Let's not merge it until we have a new model built for hungarian. Amir1 have time to do that now? [23:03:08] * halfak works on the ui PR [23:03:19] Otherwise I can do it when I'm done. [23:03:43] halfak: phab task? [23:04:02] Amir1, I guess we only have the "remove 'ha'" task [23:04:08] Which I think implies building a new model :) [23:04:14] without "ha" [23:04:18] halfak: I saw on wikitech-l that frwiki now has damaging/goodfaith models, when will those be deployed? [23:04:28] Already deployed [23:04:30] Nice [23:04:44] https://ores.wikimedia.org/v3/scores/frwiki [23:04:45] :D [23:04:46] Did someone already write a config patch to enable the ORES extension for them and configure threshold? If not, I'll do it [23:04:57] RoanKattouw: I'm on it for today's SWAT [23:04:58] I think it was on Amir1's todo list [23:05:01] \o/ [23:05:16] shit, I forgot to add it for today's. I need to do it right now [23:05:32] Good think RoanKattouw swung by! [23:06:05] * Zppix trouts Amir1 [23:06:53] :))) [23:07:12] I'll look at the model_stats output to see if I want to recommend any non-standard thresholds [23:07:25] Great, thanks [23:12:20] So close to having a whole new thresholds strategy for you RoanKattouw [23:12:26] YAy [23:15:16] halfak: good to delte ux branch or do you want it to stay? [23:15:42] Amir1: I think the defaults will be OK for now. I'll ask Joe if he wants to tweak them, because they're not great, but they're also not terrible [23:17:12] Zppix, feel free to del [23:17:14] * Amir1 RoanKattouw: Awesome, made this: https://gerrit.wikimedia.org/r/357328 [23:17:24] Can you check it? [23:18:09] yeah. Definitely not our best model. [23:18:18] Yeah it's not great [23:19:23] The last few ones that I reviewed thresholds for were all the rock stars, as well [23:19:34] Well, fiwiki is somewhat middling but the defaults worked out just fine there so I just didn't touch it [23:20:04] Amir1: Actually I recommend not putting in a verylikelybad for goodfaith for frwiki [23:20:14] Not much of a point [23:21:00] likelybad is 71% precision at 9% recall, not much of a point providing a verylikelybad with precision in the 90s and recall at 3% [23:24:09] Starting work on rebuilding the huwiki models. [23:24:26] Do we have those in prod already? [23:24:35] (RCFilters doesn't think so at least) [23:24:46] huwiki? I don't think we have damaging/goodfaith yet [23:25:09] Nope [23:25:09] https://ores.wikimedia.org/v3/scores/huwiki [23:36:12] OK huwiki models are building. [23:36:25] I had to clean up a few more little issues in huwiki.py, but nothing major. [23:41:59] I'm going to let this go and submit it tomorrow [23:42:05] So I'm out of here. [23:42:09] Have a good one folks! [23:42:10] o/