[14:32:54] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality, 07Easy, 03Google-Code-In-2016: Scale up the number of observations for idwiki to 100k - https://phabricator.wikimedia.org/T147107#2827182 (10Ladsgroup) >>! In T147107#2826061, @Aklapper wrote: > @Ladsgroup: Could you provide more information for... [14:59:50] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Algorithmic dangers and transparency -- Best practices - https://phabricator.wikimedia.org/T147929#2827259 (10ssastry) https://medium.com/@robot_MD/when-bias-in-product-design-means-life-or-death-ea3d16e3ddb2#.5a1ker5hd s... [16:11:57] (03PS3) 10Sbisson: [WIP] goodfaith filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) [16:12:00] (03CR) 10Sbisson: [WIP] goodfaith filter (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) (owner: 10Sbisson) [16:57:40] Amir1, on the phone with insurance company. Looks like I'll be a little late [16:57:45] Did not expect this call to take so long [16:58:27] halfak: okay :) [16:58:42] DarTar said he'd be 5 minutes late too [17:22:10] (03CR) 10Catrope: [C: 032] Update for API error i18n [extensions/ORES] - 10https://gerrit.wikimedia.org/r/321441 (owner: 10Anomie) [17:24:32] (03Merged) 10jenkins-bot: Update for API error i18n [extensions/ORES] - 10https://gerrit.wikimedia.org/r/321441 (owner: 10Anomie) [17:49:22] (03PS4) 10Sbisson: goodfaith filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) [18:05:50] halfak: there? [18:11:42] Hi, yes! [18:11:57] But I'm on the phone. Hopefully I'll be *back* in 5-10 minutes. [18:12:00] codezee, ^ [18:12:11] halfak: ok, I'll wait :) [18:16:17] OK! Back [18:16:19] What's up? [18:17:15] halfak: I came across https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_quality and am interesting in working on it, just needs some heads up on the dataset and work done [18:17:38] I have a background on AI/ML as a student [18:17:45] codezee, so far, the dataset is public, but I've done nothing to try to model it yet. [18:17:52] So, no feature selection. [18:18:12] Oh! However, I am working on PCFG-based feature signal that I think will have a lot of value. [18:18:35] halfak: thats what exactly I'm planning to work on... btw is this trying to achieve the same thing as PCFG? [18:18:49] I don't understand [18:19:29] is this article trying to do the same thing that you're going to do with PCFG? [18:20:13] I suppose in the above article, we need to build a model for tagging drafts as "ok", "otherwise", "vandalism", "attack", "spam" right? [18:20:23] I'm saying from the observations mentioned [18:22:16] Oh! Not quite. [18:22:32] So the PCFG stuff is intended to be able to label individual sentences. [18:22:51] While the draft quality is intended to be able to label whole pages. [18:24:31] halfak: I see, I had read the paper on article quality and on which ORES's wp10 model is built, I'm thinking something in that line would have to be thought in terms of features [18:25:11] codezee, I agree. We'll probably want to use a lot of the edit quality features too -- but we'll skip the diff [18:25:25] Just measure the badwords/informals/etc. in the current version of the page. [18:26:45] yes, and probably some extra one's which might be specific to this use case, I'll have a look... [18:27:16] the article says "The ORES service would be a great place to build and host such a model" but I'm thinking as a first iteration I should host on a standlone model from scratch? [18:27:26] or can I use ORES somehow to build over it? [18:28:08] I wouldn't run a stand-alone model. ORES is really good for extracting features and delivering predictions. [18:28:14] It's intended to be a platform in this way. [18:28:21] We can deploy in labs without much red tape. [18:28:30] You can always run a stand-alone ORES too ;) [18:28:46] It's pretty easy to set up and run if you just work from our deploy repo as a template. [18:29:28] so my starting point should be https://github.com/wiki-ai/ores right? [18:29:45] https://github.com/wiki-ai/ores-wmflabs-deploy [18:30:00] ORES is a shrink-wrapped system [18:30:14] ores-deploy-wmflabs is our configuration for running in labs. [18:30:39] anyway, you can build, evaluate, and make predictions from the `revscoring` library. [18:30:52] Once you have a model you like, it's relatively easy to deploy in an ORES instance. [18:31:04] See https://github.com/wiki-ai/revscoring [18:31:09] and http://pythonhosted.org/revscoring/ [18:34:01] Amir1, looking at done column and we have a lot! [18:34:08] But a lot are from last week :S [18:34:33] I really need to break up https://phabricator.wikimedia.org/T148867 and mark something done. [18:34:43] We're going to have a cascade of pull requests from me. [18:34:54] I've already released new version of mwapi and deltas to do this work. [18:36:43] I' [18:36:57] ll do that cleanup, and then start on the weekly update. [18:42:55] halfak: thanks for the info! I'll come up with something and test it with revscoring [18:43:23] Cool. codezee, if you want to keep that meta page updated with your progress, I'll do the same [18:43:28] Check out the work log on the talk page. [18:43:56] codezee, what's your github username? [18:45:49] halfak: its codez266, in the immediate days, I'll look closely at the dataset and come up with some features and would revert back as I might need some help when I'll use revscoring [18:47:55] codezee, I just invited you to our github org. [18:48:22] See https://github.com/wiki-ai/revscoring/blob/master/ipython/feature_engineering.ipynb for the basics of feature engineering with revscoring [18:48:47] Here's a demo of really basic vandalism detection in Wikipedia: https://github.com/wiki-ai/editquality/blob/master/ipython/reverted_detection_demo.ipynb [18:53:33] ohh, that'd be quite useful, I'll look through it, btw, acknowledged your invite [19:16:29] halfak: I was afk for dinner [19:16:30] \o/ [19:16:36] No worries. [19:16:47] Still doing cleanup. Got distracted by other meetings. [20:51:07] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 07Easy, 03Google-Code-In-2016: Quiet TimeoutError in celery logging - https://phabricator.wikimedia.org/T146681#2828703 (10Dargasea) a:03Dargasea I will be working on this as part of GCI 2016. Joining -dev shortly! [21:28:12] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2828949 (10Halfak) [21:36:05] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality: Implement sentences datascources & experiment with normalization. - https://phabricator.wikimedia.org/T148867#2828989 (10Halfak) [21:36:15] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2828990 (10Halfak) [21:36:33] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality, 10rsaas-editquality, and 2 others: [Epic] Implement PCFG features for editquality and draftquality - https://phabricator.wikimedia.org/T144636#2828991 (10Halfak) [21:40:18] woo. I've almost got all 4 grammars trained. [21:40:39] I accidentally restarted training the FA model, so I'm going to need to wait a while to use that. [21:45:17] 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources - https://phabricator.wikimedia.org/T148867#2735271 (10Halfak) [21:45:32] 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources - https://phabricator.wikimedia.org/T148867#2735271 (10Halfak) Sentence datasources in PR here: https://github.com/wiki-ai/revscoring/pull/291 [21:45:57] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2829023 (10Halfak) [21:45:59] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2829022 (10Halfak) [21:46:31] 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality, 10rsaas-editquality, and 2 others: [Epic] Implement PCFG features for editquality and draftquality - https://phabricator.wikimedia.org/T144636#2829041 (10Halfak) [21:46:33] 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2828949 (10Halfak) [22:22:27] (03CR) 10Catrope: [C: 04-1] goodfaith filter (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) (owner: 10Sbisson) [23:17:29] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Where to surface AI in Wikimedia Projects - https://phabricator.wikimedia.org/T148690#2829324 (10Halfak) [23:21:33] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2829330 (10Halfak) [23:22:36] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak) I've been talking to @Nettrom about an Article Importance prediction model. This model would... [23:22:55] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2829342 (10Halfak) [23:23:45] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak) [23:24:05] 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak)