[04:11:18] okay, wikilabels in staging seems happy [04:18:08] deploying to the main instance [05:14:12] 06Revision-Scoring-As-A-Service, 10wikilabels: Deploy updates for Wikilabels - https://phabricator.wikimedia.org/T134032#2253106 (10Ladsgroup) [05:14:21] 06Revision-Scoring-As-A-Service, 10wikilabels: Review staging protocol for Wikilabels - https://phabricator.wikimedia.org/T133557#2253120 (10Ladsgroup) [05:14:29] 06Revision-Scoring-As-A-Service, 10wikilabels: WikiLabels doesn't handle well revdeleted edits - https://phabricator.wikimedia.org/T130234#2253121 (10Ladsgroup) [07:24:41] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Complete wikidatawiki edit quality campaign - https://phabricator.wikimedia.org/T130274#2253180 (10Ladsgroup) a:03Ladsgroup [07:25:40] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Complete wikidatawiki edit quality campaign - https://phabricator.wikimedia.org/T130274#2131460 (10Ladsgroup) Working on it. I just labeled 450 edits, I'm going to do 263 more and then we are done! [12:17:05] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Complete wikidatawiki edit quality campaign - https://phabricator.wikimedia.org/T130274#2253432 (10Ladsgroup) [12:18:22] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253433 (10Ladsgroup) [12:18:38] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253446 (10Ladsgroup) [13:35:50] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253593 (10Ladsgroup) Damaging: ``` (p3)ladsgroup@ores-compute-01:~/editquality$ make models/wikidatawiki.damaging.gradient_boosting.model cat datasets/wikidataw... [14:11:41] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253642 (10Ladsgroup) Good faith: ``` (p3)ladsgroup@ores-compute-01:~/editquality$ make models/wikidatawiki.goodfaith.gradient_boosting.model cat datasets/wikida... [14:15:33] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253644 (10Ladsgroup) https://github.com/wiki-ai/editquality/pull/30 [14:26:35] o/ [14:29:54] Hey Amir1 [14:30:05] Just sat down and go through my messages from yesterday. [14:30:19] halfak: hey [14:30:29] Digging into the paper review I need to do for the next couple of hours and then I'll be working on the ORES paper. Hopefully, I'll have an agenda for our first meeting. [14:30:42] https://github.com/wiki-ai/editquality/pull/30 [14:30:54] awesome [14:31:04] I want to answer the email very soon [14:31:15] I do it once I'm done with the precaching [14:31:24] ROC-AUC of 99 O.O [14:31:28] Woah [14:31:38] this is so cool [14:31:40] :D [14:31:49] WTF. Better than awesome. Amazing! [14:31:58] Will be very interesting to see this in practice. [14:32:06] BTW, any word on the KDD reviews yet? [14:32:19] not yet [14:32:38] I guess it will start from May 10th or so [14:33:25] kk [14:33:30] Not unreasonable. [14:33:46] * halfak hopes we can cite that in future work :) [14:35:44] Filter rates in the 90s! \o/ I bet the effective filter rate will match our 99% figure. [14:35:52] Some damage just isn't all that damaging. [14:36:09] We should be able to flip the class if interest for goodfaith. [14:36:19] So that we can also report our filter rate of vandalism. [14:36:42] We can revert 70% of damage and expect < 0.1 false-positive rate! [14:36:48] Holy crap is this model effective. [14:36:56] Oh wait... these stats are weird. [14:37:03] Because the sample isn't balanced. [14:37:04] Hmm. [14:37:25] it's 2.4 K against 20K [14:37:25] Or rather because it *is* balanced. We need a natural test set in order to know for sure. [14:37:40] Regardless, these all suggest really strong signal. [14:37:48] I checked numbers of very carefully [14:38:02] the prelabel we loaded to wikilabels was like this [14:38:18] 20K prelabeled and 4K needs review [14:38:50] But we prelabeled the balanced reverted/not-reverted set [14:39:01] So it's biased towards damage at that point. [14:39:08] Good for signal. Bad for interpretation [14:45:16] (p3)ladsgroup@ores-compute-01:~/editquality/datasets$ grep "True" wikidatawiki.rev_damaging.5k_2016.tsv | wc -l [14:45:17] 2697 [14:45:46] halfak: this is number of damaging cases in 4283 that was loaded into wikilabels [14:46:05] 24471 wikidatawiki.prelabeled_revisions.20k_balanced_2015.tsv [14:46:12] We'd need ~10 million edits to have 2700 instances of damage show up! [14:46:38] I think we did it 500K of human edits [14:46:39] Anyway, this is all good. Just hard to know exactly what the tradeoffs will be in practice with regards to false positive *rates* [14:46:40] AFAIK [14:46:42] IIRC [14:46:51] We did that for the paper. [14:47:02] yeah you're right [14:47:05] But for the modeling work, I think we worked with your balanced set from an XML processing job. [14:47:22] yeah [14:47:25] I did that [14:47:33] It was I think about 10M [15:23:44] * halfak works a little bit on text vectorization for sabya. [15:23:48] Check this about Amir1 https://gist.github.com/halfak/f1c334690e846309fd4d8c272aca12a8 [15:24:09] Not related to what we're working on, but it was nice to have an idea turn into a piece of code in minutes. [15:24:14] It works, it's fast and it's simple [15:24:34] This demonstrates a simple and flexible way to create #grams. [15:24:52] The last line generates unigrams, bigrams, trigrams and single-skipgrams for a sequence of number. [15:25:02] Will work great for sequences of tokens in a revision :) [15:30:04] \o/ [15:30:48] I'm trying to understand what's going on [15:32:34] halfak: https://grafana.wikimedia.org/dashboard/db/ores [15:32:43] so our precaching is up now [15:33:11] I think the reason for overloaded was that I was running precache from two instances [15:33:23] one the daemon and the other one was --verbose [15:33:27] running directly [15:41:16] Looks good now. [15:41:35] So, one more thought: we should be able to run two precachers in parallel. [15:41:41] We should be able to run 10! [15:41:56] I think that the work you did to discover the worker queues is key here. [15:42:35] Right now, the workers have a queue that gets populated by jobs. The jobs aren't "known" to the system until processing starts. [15:43:25] I think that delay is messing with our ability to associate requests to score an "in-process* revision with the *in-process* celery job. [15:43:40] I should do a writeup about this. [15:45:47] 10Revision-Scoring-As-A-Service-Backlog: [Spike] Explore ORES hanging of *in-process* scorings - https://phabricator.wikimedia.org/T134064#2253759 (10Halfak) [15:45:55] 10Revision-Scoring-As-A-Service-Backlog, 10ores: [Spike] Explore ORES hanging of *in-process* scorings - https://phabricator.wikimedia.org/T134064#2253772 (10Halfak) [15:45:57] https://phabricator.wikimedia.org/T134064 [15:46:17] 10Revision-Scoring-As-A-Service-Backlog, 10ores: [Spike] Explore ORES handling of *in-process* scorings - https://phabricator.wikimedia.org/T134064#2253759 (10Halfak) [15:46:52] halfak: we can have a simple workaround for that by using more selective precaching and saying this daemon precaches enwiki, this one precaches wikidata, and so on [15:47:01] it should be easy to implement that [15:47:22] (we give the wiki and models via argument) [15:47:33] Not sure if that will help the problem. [15:47:51] From ORES point of view, the request pattern will look identical. [15:48:21] hmm [15:48:23] yeah [15:48:27] you're right [15:49:35] Hmm... Looks like I have wasted my entire paper review time writing emails >:( [15:51:01] :((( [15:51:12] What can I do to help? [15:52:43] Heh. Invent cloning and/or time travel. [15:52:51] Actually, if I'm going to have time travel, I need cloning too. [15:52:58] Otherwise, I'll just end up getting old really fast. [15:53:13] Since I'll stop time/go back in time and work on things. [15:53:40] halfak: Also is it okay to deploy damaging and goodfaith for wikidata? [15:53:44] * halfak suddenly releases the most complete software system ever seen and promptly dies at the ripe age of 95. [15:54:04] Yeah! Do you want to do the staging shuffle? [15:54:11] I wish at least some parts of Harry Potter was real, specially the magic watch [15:54:26] halfak: yeah [15:54:39] I just want to get permission first [15:54:49] What's your pypi username? [15:55:04] (No pypi stuff should be necessary, but I'll get it set up anyway.) [15:55:27] Amir_Sarabadani [15:55:33] everything else was taken [15:55:42] Couldn't get Ladsgroup!? [15:55:44] https://pypi.python.org/pypi/pywikibase [15:55:50] yeah, I tried [15:55:58] WTF [15:56:31] in instagram, I tried Ladsgroup, was taken, then I tried Amir Sarabadani and it was taken so I tried "amirsarabadanitafreshi" and it worked! [15:56:44] Package Index Owner: Amir_Sarabadani [15:56:49] Yeah. I had a similar experience. [15:57:05] Tried halfak, EpochFail, and a bunch of other handles I have used in the past. [15:57:21] halfak: I add you to this repo too [15:57:25] I hope that's okay [15:57:53] I added you as a maintainer. Let's see how that works. It seems like that is what the system expects of us. [15:58:15] cool [16:02:51] OK. I think I have you added to all the things. revscoring, wikiclass, editquality and ores. [16:04:54] I really wish that we could get the other WMF staffers working on AI to join this channel [16:05:04] Lzia is just picking up work on image classification for commons. [16:05:11] *sigh* [16:05:59] thanks [16:06:01] :) [16:06:09] I hope we can get more people soon [16:06:20] once we have some publicity [16:06:29] specially the extension deployed in some big wikis [16:11:03] https://pypi.python.org/pypi/pywikibase [16:11:10] halfak: you are an owner now [16:11:43] Amir1, do you want to push a new version of revscoring and deploy that? It would be nice to have the dict lookup speedup deployed :) [16:11:52] It would be a good test of pypi and the deploy process. [16:16:49] sure halfak [16:17:03] I'm online off and on because of dinner [16:38:07] halfak: I just pushed new version of revscoring to 1- github 2- pypi [16:39:17] now it's time to deply [16:39:22] *deploy [17:07:32] All looks good on pypi and versioning. [17:18:18] is the python-mwapi the upgraded version of mwapi ? [17:18:44] python-mwapi == mwapi [17:18:51] It's just called python-mwapi in the repo [17:18:57] pypi knows it as simply "mwapi" [17:19:58] alright [17:30:36] working with gerrit is getting harder everyday https://gerrit.wikimedia.org/r/#/c/286283/ [17:30:37] :D [17:57:56] 06Revision-Scoring-As-A-Service, 10Wikidata, 10rsaas-editquality: Train / Test wikidata damaging model - https://phabricator.wikimedia.org/T134047#2253886 (10Ladsgroup) [18:48:32] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Complete wikidatawiki edit quality campaign - https://phabricator.wikimedia.org/T130274#2253942 (10Ladsgroup) I encountered {T130872} again today but it was bearable. [19:07:32] going to sleep [19:07:33] o/ [22:42:07] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2254132 (10Halfak) 05Resolved>03Open [22:43:19] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2149118 (10Halfak) In T130274, @Ladsgroup said: > I encountered T130872 [performance issues] again today but it was bearable. So I'm re-opening this. I thin... [22:43:35] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2254136 (10Halfak)