[09:48:39] (03PS2) 10Ladsgroup: Use minified responses [extensions/ORES] - 10https://gerrit.wikimedia.org/r/334695 [11:22:24] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Deploy edit quality campaign for Romanian Wikipedia - https://phabricator.wikimedia.org/T156357#2979187 (10Andrei_Stroe) Apparently, the labels I translated in https://meta.wikimedia.org/wiki/Wiki_labels/Interface_translation/Edit_qu... [13:28:34] 10Revision-Scoring-As-A-Service-Backlog, 10AbuseFilter, 10ORES, 07Community-Wishlist-Survey-2015: Suggesting AbuseFilter by machine learning - https://phabricator.wikimedia.org/T120741#2979326 (10BethNaught) [16:50:54] (03CR) 10Legoktm: "Has the ORES change been deployed yet? (Does it matter?)" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/334695 (owner: 10Ladsgroup) [16:55:03] (03CR) 10Ladsgroup: "It's not deployed there but it'll be early next week (if nothing happens) and it doesn't matter too. AFAIK it just ignores extra params." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/334695 (owner: 10Ladsgroup) [16:55:06] o/ [16:56:24] halfak: hey there, I'm working on getting the cswiki damaging model up and running, I'm updating my venv atm [16:56:33] Great! [16:56:38] Let me know how the stats work out. [16:57:02] I'm worries that the 2.5/2.5k balancing strategy was ... problematic [16:57:33] sure [16:57:54] 10Revision-Scoring-As-A-Service-Backlog, 10AbuseFilter, 10ORES, 07Community-Wishlist-Survey-2015: Suggesting AbuseFilter by machine learning - https://phabricator.wikimedia.org/T120741#2979547 (10Huji) [17:00:44] btw. halfak: I made a commit in revscoring and pushed it directly to the master. It was fixing broken link in docs [17:00:49] hope you don't mind [17:01:20] * halfak checks [17:01:47] Amir1, hmm... I don't think that's right. [17:02:09] Eek! Looks like enchant.org is down [17:02:15] But pyenchant != enchant [17:02:35] hmmm [17:03:02] What do you think we should do? [17:03:15] https://en.wikipedia.org/wiki/Enchant_(software) [17:03:17] maybe? [17:03:27] http://abisource.com/projects/enchant/ [17:03:31] That could work too [17:03:45] Or https://abiword.github.io/enchant/ [17:04:53] Amir1, do you want to do that? I could do this right now if you're busy. [17:04:57] the github link is creepy [17:05:04] lol [17:05:05] let's go with Wikipedia :D [17:05:10] Yeah, Super basic. [17:05:12] +1 [17:05:15] Wikipedia [17:05:32] I'll do it [17:05:35] halfak: ^ [17:07:15] cool. [17:07:16] halfak: and don [17:07:20] *done [17:07:21] \o/ [17:09:29] 06Revision-Scoring-As-A-Service, 10Wikidata, 15User-Ladsgroup, 05WMDE-Tech-Communication-Mentoring-And-Events: Build item_quality form - https://phabricator.wikimedia.org/T155828#2979558 (10Halfak) Looks good. Maybe it's time to move it to the Wiki and to host a discussion about it. Once there's some buy... [17:09:43] halfak, Amir1 nice to meet you! this is matthew from the GSoC emails :) [17:10:25] o/ mattsun [17:10:51] mattsun: hey there [17:10:55] nice to meet you too [17:11:41] so i was looking at the task here I was looking at this task https://phabricator.wikimedia.org/T156494 [17:12:02] and was wondering if you had any advice on how to get started (Amir1, halfak) [17:12:25] halfak: the task description says you have some helpful datasets? [17:12:35] Oh yes. Let me dig around a bit quick :) [17:13:12] Great, thanks! [17:13:46] https://github.com/wiki-ai/draftquality/blob/master/datasets/enwiki.draft_quality.201508-201608.tsv.bz2 [17:13:49] mattsun, ^ [17:14:18] That dataset contains a record for every article creation in the last year (ending on Aug 2016) [17:14:18] cool, downloading right now [17:14:31] It also contains a label for spam/vandalism/attack/OK [17:15:19] got it [17:16:07] so my goal is probably to run the draftquality model on those articles and see how well it matches the actual labels, is that correct? [17:16:20] Right. [17:16:26] Oh wait... I have a better dataset for this. [17:16:33] Regretfully, we trained the model on that data. [17:16:38] I'll get some fresh data. [17:17:08] Got it, thanks! [17:18:18] Hmm... OK this is going to be a bit of a pain. One minute. [17:18:46] Regretfully, my process isn't finished. But I can get some intemediary data so you can start experimenting. [17:18:47] No worries :) [17:18:57] That sounds good [17:20:23] OK I just sent an email that should arrive shortly. [17:20:38] Great, I'll be on the lookout [17:20:42] It contains a sample from the month of Aug 2016 -- the month immediately after our training sample. [17:20:52] Which should be good for testing. [17:21:52] 61k OK, 1.5k spam, 426 vandalism, 175 attack. [17:22:26] Great, just received it! [17:23:32] I usually use R for this kind of data analysis stuff - is that something you recommend? [17:24:48] And should I run the draftquality model by making api requests? [17:25:59] R is fine. I'd recommend python for data gathering from an API, but R will work. [17:26:11] I do recommend you use the API. [17:26:30] Oh... wait. Damn it. [17:26:50] You can't gather predictions for this because you can't access the text of deleted pages through ORES. [17:27:00] ORES has no privileged access to data. [17:27:03] hmm. [17:27:21] I might need to generate the scores for all of the deleted pages in that set,. [17:28:06] * halfak thinks. [17:28:38] Oh, I see. That makes sense [17:28:40] OK. I think I need to get you scores. [17:28:51] I'm going to look into doing that. [17:29:12] OK, thanks so much! [17:30:34] By "scores" do you mean the scores that the non-draftquality ORES system would give the deleted pages? (not sure if that question even makes sense) [17:31:06] hmm... indeed. it seems I'm confused. [17:31:22] ORES can't score deleted things. [17:32:03] Many pages in that dataset are deleted. [17:32:07] Oh [17:32:12] Basically, all of the not "OK" examples [17:32:17] halfak: quick question. for 5k samples, do we need to add other revs from the 20k or just train based on the 5k? [17:32:57] Amir1, for the balanced dataset, the theory was to train on the 5k after sampling with replacement. [17:33:12] But I'm pretty skeptical of that, honestly. [17:33:17] :( [17:33:27] Past halfak may have been a dummy. [17:33:33] let's go with the 5k and see how it turns out [17:33:44] OK [17:33:55] I don't think it will be horrible. We gather all signals from the 5k too [17:34:00] (IMO) [17:34:16] Yeah. Our test stats are going to be weird though :/ [17:34:51] mattsun, OK so my plan is to get you predictions for those observations. [17:34:58] I'm going to have to hack together a script to do that. [17:35:03] I think it'll be pretty easy. [17:35:21] OK, gotcha [17:35:23] but it's going to take a bit. [17:35:43] OK, cool [17:36:17] So even though some pages are deleted, you're going to make predictions for them [17:36:22] In the meantime, how about you get that dataset loaded and get us a nice plot that shows us trends over time in that dataset. [17:36:27] Roght [17:36:29] *Right [17:36:36] Ok, got it [17:36:38] Will do! [17:52:05] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-editquality, 15User-Ladsgroup, 15User-Urbanecm: Train and test editquality models for Czech Wikipedia - https://phabricator.wikimedia.org/T156492#2979568 (10Ladsgroup) ``` # Model tuning report - Revscoring version: 1.3.5 - Features: editquality.feat... [17:52:59] Amir1, that's a not-so-great tuning. [17:53:03] How many true obs? [17:54:03] halfak: It seems it doesn't have it in the tuning reports, I need to build the model [17:54:29] cat dataset | grep '"damaging": true' | wc [17:55:35] 475 true cases [17:55:53] That's pretty good. [17:56:23] cat dataset | grep '"damaging": true' | grep '"needs_review": true' | wc [17:57:28] (p3)ladsgroup@ores-compute-01:~/editquality$ cat datasets/cswiki.human_labeled_revisions.5k_2016.json | grep '"damaging": true' | grep '"needs_review": true' | wc -l [17:57:28] 434 [17:59:12] halfak, quick question about the dataset you gave me - how is creation_timestamp formatted exactly? what date is "20160801000408" for example? [17:59:29] %y%m%d%H%i%S [17:59:39] YYYYMMDDHHMMSS [17:59:43] got it, thank you! [17:59:50] :) [18:09:01] (03CR) 10Legoktm: [C: 032] Use minified responses [extensions/ORES] - 10https://gerrit.wikimedia.org/r/334695 (owner: 10Ladsgroup) [18:10:38] (03Merged) 10jenkins-bot: Use minified responses [extensions/ORES] - 10https://gerrit.wikimedia.org/r/334695 (owner: 10Ladsgroup) [18:11:33] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-editquality, 15User-Ladsgroup, 15User-Urbanecm: Train and test editquality models for Czech Wikipedia - https://phabricator.wikimedia.org/T156492#2979570 (10Ladsgroup) Model for damaging: ``` - type: GradientBoosting - params: balanced_sample_weigh... [18:13:31] script is ready. Working on running it now. [18:20:11] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-editquality, 15User-Ladsgroup, 15User-Urbanecm: Train and test editquality models for Czech Wikipedia - https://phabricator.wikimedia.org/T156492#2979579 (10Ladsgroup) https://github.com/wiki-ai/editquality/pull/57 [18:22:35] script running. Making coffee. [18:26:06] I got to go, be back soonish [18:26:24] o/ [18:27:39] halfak, looks like my relatives came early to pick me up for chinese new year celebrations (happy lunar new year to anyone who celebrates it!) [18:27:56] i gotta go but i'm making good progress on the R script! i'll report back when it's done :) [18:51:48] Sounds good mattsun. See you soon :) [18:52:52] Amir1, the threshold statistics won't work with the sample as-is. [18:53:00] We need to re-scale the observations. [18:54:05] * halfak looks into the obs for cswiki [19:00:55] OK. So this might be a bit hare-brained. [19:01:01] But here's what I propose we do. [19:01:45] sample 4558 observations from the "needs_review": true labeled subsample (of 2.5k, so it'll be sampling with replacement) [19:02:31] and then merge the labels we have for damage/goodfaith into the remaining "needs_review": false subset [19:02:39] We'll get ~20k observations. [19:03:03] A very small number of "damaging" edits that were marked as not needing review will be mislabeled. [19:08:13] Damn. This is messy. I wish I could go back in time and have the cswiki folk just label all of the "needs_review": true observations. [19:08:24] I'm going to iterate on this PR. [19:24:49] OK. Just completed the work. Am rebuilding the model now. [19:24:58] Will have a followup commit soon [22:20:12] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-editquality, 15User-Ladsgroup, 15User-Urbanecm: Train and test editquality models for Czech Wikipedia - https://phabricator.wikimedia.org/T156492#2979829 (10Halfak) ``` ScikitLearnClassifier - type: GradientBoosting - params: warm_start=false, min_... [22:24:55] Arg. Forgot to build the goodfaith model. Fixing that now. [22:29:50] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-editquality, 15User-Ladsgroup, 15User-Urbanecm: Train and test editquality models for Czech Wikipedia - https://phabricator.wikimedia.org/T156492#2979844 (10Halfak) ``` ScikitLearnClassifier - type: GradientBoosting - params: center=true, scale=tru... [22:33:30] halfak: should I merge it or you do it? [22:33:41] Merge if you like the changes :) [22:33:57] Just about to run away. [22:34:00] Have a good one! [22:34:01] o/ [22:34:01] with pleasure [22:34:10] you too