[02:48:42] halfak yes a uarry like thing for revscoring is what I had in mind, not for tomrrow or even this year - just something to keep in mind :) [14:49:47] hello everyone [14:50:29] I have a question related to ORES API [14:50:55] ORES API provided some services:good-faith, damaging, reverted [14:51:10] where should I look to the source code of these services [14:51:19] and the features used to predict [14:59:26] Hey Vinh. Everything is based on this library: https://github.com/wiki-ai/revscoring [14:59:42] It provides a framework. [14:59:58] This library implements the 'reverted', 'damaging' and 'goodfaith' models: https://github.com/wiki-ai/editquality [15:00:07] (using revscoring) [15:00:42] If you look at the modules in the 'feature_lists' directory, you'll find python files that describe the features used for each model. See https://github.com/wiki-ai/editquality/blob/master/editquality/feature_lists/itwiki.py for example. [15:02:29] Hi Aaron [15:02:30] great [15:02:37] and where the dataset you used to test [15:02:38] in [15:03:00] https://meta.wikimedia.org/wiki/ORES/reverted#English_Wikipedia_.28enwiki.29 [15:03:12] it seems to me that you used the test set of ~ 4000 revisions [15:03:16] If you look at the Makefile inside 'editquality', you can see the process from running the query to gather the random sample, to building the model. [15:03:44] E.g. https://github.com/wiki-ai/editquality/blob/master/Makefile#L515 [15:05:56] hi Aaron [15:05:59] it's great again [15:05:59] :) [15:06:08] but again the question is where can I find the TSV files you used [15:06:10] for instance [15:06:11] enwiki.rev_damaging.20k_2015.tsv [15:06:20] or [15:06:23] enwiki.rev_reverted.20k_2015.tsv [15:06:32] or [15:06:33] eswiki.sampled_revisions.20k_2015.tsv [15:07:05] eswiki.sampled_revisions.20k_2015.tsv is downloaded from quarry. [15:07:11] See the apt. line in the Makefile. [15:07:12] is there a way to public (if possible) the folder /datasets [15:07:27] Yes. It's on the to-do list and has been for a while :/ [15:07:46] got it here [15:07:55] but I want to make sure that we are using same datasets you know [15:07:58] otherwise [15:08:08] I cannot replicate exactly your code [15:08:41] If you start with the same sample and run Makefile, we should end up with the same files. [15:08:52] BUT, I hear you and they should all be downloadable. [15:09:03] In fact, I should be able to check some of them into github, I suppose. [15:11:36] yeah [15:11:39] with TSV file [15:11:47] I suppose that the size should be less < 100MB [15:11:55] * halfak checks. [15:15:50] Yeah. It looks like it would be easy enough to upload these. [15:16:28] great [15:16:30] :P [15:17:23] Vinh, regretfully, I'm in the middle of re-engineering our feature sets for 'editquality', so I can't just do that right now. [15:17:33] But I could pass you a feature set to play with. [15:17:37] don't worry [15:17:44] but actually I have a question Aaron [15:17:49] sorry if it is a personal one [15:17:57] I have an idea to improve the quality of prediction [15:18:00] \o/ [15:18:02] of damaging [15:18:06] and reverted [15:18:07] What is it? [15:18:10] however [15:18:22] I observe that currently the accuracy is 84% for English [15:18:31] if you can improve it to 99% [15:18:32] I mean [15:18:37] there is nothing for me :( [15:18:41] so I will stop my job [15:18:42] Oh. Yeah. By predicting false for me. [15:18:51] :D [15:18:54] and put my effort to another task [15:18:55] :P [15:19:20] so what is the current accuracy (after re-engineering) of prediction of reverted? [15:19:27] We're looking into a few different strategies for pushing fitness of the editquality models, but I'm very interested in your ideas. [15:19:42] Vinh, I usually don't look at accuracy because it is a metric that is easy to cheat. [15:19:48] haha [15:19:53] Instead, I look at roc-auc and (soon) pr-auc. [15:20:25] it's okay [15:20:31] but could we talk personally [15:20:38] something I don't want to discuss in public [15:20:39] Sure. PM OK? [15:20:41] sorry about that [15:20:43] yeah [15:20:46] how to PM in webchat [15:20:49] ah I see [15:57:36] wiki-ai/wb-vandalism#112 (wikidata_tuning - 9d98b17 : halfak): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/95374576 [16:40:46] o/ halfak [16:41:11] o/ Amir1 [16:41:36] I made the announcement [16:41:37] I just finished a model tuning on wikidata. It looks like we can get upwards of .95 AUC with a gradient boost or random forest. [16:41:41] Woot! [16:42:20] I thought it's better not to take your time, If I need your time to do something as simple as this. Why am I here? [16:42:45] (the cast is off, but I shouldn't put too much pressure on it) [16:42:51] halfak: amazing [16:43:18] https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats [16:43:20] ^ [16:51:17] Amir1, woot that the cast is off. [16:51:24] Wow! 381 labels already! [16:51:33] I did about 20 [16:51:38] Amir1, did you already post the announcement? [16:51:41] all of them are done by others [16:51:44] yes [16:53:28] link? [16:56:04] https://www.wikidata.org/wiki/Wikidata:Project_chat#Help_needed_to_improve_anti-vandalism_tools [16:56:25] Awesome! [16:56:47] also in the mailing list [16:59:09] (03CR) 10Siebrand: [C: 031] "i18n/L10n reviewed." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247185 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [17:03:29] (03CR) 10Siebrand: [C: 04-1] "i18n/L10n reviewed." (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/256641 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [17:06:36] Amir1, if you have the time, one other thing I could use a hand with is testing awight's work. [17:07:00] I'm way behind in getting to that and these issues with anons/newcomers has caused further disruption. [17:08:12] hmm, I'm not really good at mediawiki [17:08:17] but why not [17:08:27] Might be OK. Have you set up a vagrant for MediaWiki before? [17:08:35] let me try, it's a good chance to learn more about mediawiki [17:08:59] not vagrant but I've installed it in my local host (and labs) lots of times [17:11:23] Amir1, OK. awight has included instructions for testing in the phab tickets. [17:11:37] If you look at that, I'll look into building a trivial ORES server that you can use in testing. [17:11:54] sure [17:12:05] do you have a link to the tasks halfak [17:12:18] I assume there are several tasks related to the extension [17:12:51] Hmm... I thought it was this one: https://phabricator.wikimedia.org/T112856 [17:12:58] But I am not seeing the instructions there. [17:36:47] halfak: update on production - the redis machines for ores have shipped today [17:36:59] Woot! [17:37:26] halfak: I'll let you know when they arrive etc :) [17:55:20] halfak: googling "vagrant ores extension site:phabricator.wikimedia.org" returned this [17:55:32] https://phabricator.wikimedia.org/T118039 [17:59:14] Ahh yes. that. [17:59:22] Thanks for finding it. [17:59:29] * halfak needs some project management support :/ [18:07:29] * halfak calls in sick [18:07:56] My neck is killing me, so I can't sit at my desk any more. I've got an appointment with the doc and will let you folks know how that goes. [18:14:10] get well soon halfak [18:16:03] Will do. [18:16:08] * halfak tried to take his own advice. [18:16:13] *tries [18:16:23] OK. Meetings rescheduled. [18:16:26] I'm out. [18:16:28] o/ [19:36:53] Sup peeps. [20:07:06] hey aetilley [20:07:09] :) [20:07:17] YuviPanda: hey, around? [20:07:25] vaguely, Amir1 [20:07:27] 'sup [20:08:40] YuviPanda: I want to have an instance to make something for wikidata [20:08:53] Amir1: related to ores or related to wikidata? [20:08:58] wikidata [20:09:04] I want to know if it's okay I make a table with 10M rows [20:09:05] Amir1: are you part of the wikidata project? [20:09:29] yes, I'm software engineer at WMDE for wikidata [20:09:36] contractor [20:09:49] and I was to build this by Lydia [20:09:56] *I was asked [20:10:40] the main reason that I want an instance instead of a service group is that I can't use Semantic UI at tools [20:10:53] YuviPanda: it requires gulp which is not installed [20:11:09] Amir1: no I mean the 'wikidata labs project' [20:11:09] http://semantic-ui.com/introduction/getting-started.html [20:11:23] no I'm not there [20:12:09] Amir1: ok, so you should get lydia or someone else in that project to add you. [20:12:13] Amir1: and then you can create instances there :) [20:12:23] hmm, sure :) [20:12:41] if you can help me to build a test suite at my tools env. that would be great [20:13:03] do you know how I can use Semantic UI in tools YuviPanda ? [20:13:15] Amir1: we do have npm installed, so 'npm install' after cloning should just work? [20:13:45] "npm install -g gulp" [20:13:49] it doesn't work [20:14:14] no -g [20:14:16] -g is global [20:14:30] just npm install gulp might work [20:14:42] hmm [20:14:44] let me check [20:14:46] thanks [23:45:48] wiki-ai/revscoring#370 (pr_auc - 5327157 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/95470389