[14:32:54] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality, 07Easy, 03Google-Code-In-2016: Scale up the number of observations for idwiki to 100k - https://phabricator.wikimedia.org/T147107#2827182 (10Ladsgroup) >>! In T147107#2826061, @Aklapper wrote: > @Ladsgroup: Could you provide more information for...
[14:59:50] <wikibugs>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Algorithmic dangers and transparency -- Best practices - https://phabricator.wikimedia.org/T147929#2827259 (10ssastry) https://medium.com/@robot_MD/when-bias-in-product-design-means-life-or-death-ea3d16e3ddb2#.5a1ker5hd s...
[16:11:57] <grrrit-wm>	 (03PS3) 10Sbisson: [WIP] goodfaith filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) 
[16:12:00] <grrrit-wm>	 (03CR) 10Sbisson: [WIP] goodfaith filter (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) (owner: 10Sbisson) 
[16:57:40] <halfak>	 Amir1, on the phone with insurance company.  Looks like I'll be a little late
[16:57:45] <halfak>	 Did not expect this call to take so long
[16:58:27] <Amir1>	 halfak: okay :)
[16:58:42] <halfak>	 DarTar said he'd be 5 minutes late too
[17:22:10] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Update for API error i18n [extensions/ORES] - 10https://gerrit.wikimedia.org/r/321441 (owner: 10Anomie) 
[17:24:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: Update for API error i18n [extensions/ORES] - 10https://gerrit.wikimedia.org/r/321441 (owner: 10Anomie) 
[17:49:22] <grrrit-wm>	 (03PS4) 10Sbisson: goodfaith filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) 
[18:05:50] <codezee>	 halfak: there?
[18:11:42] <halfak>	 Hi, yes!
[18:11:57] <halfak>	 But I'm on the phone.  Hopefully I'll be *back* in 5-10 minutes. 
[18:12:00] <halfak>	 codezee, ^ 
[18:12:11] <codezee>	 halfak: ok, I'll wait :)
[18:16:17] <halfak>	 OK!  Back
[18:16:19] <halfak>	 What's up?
[18:17:15] <codezee>	 halfak: I came across https://meta.wikimedia.org/wiki/Research:Automated_classification_of_draft_quality and am interesting in working on it, just needs some heads up on the dataset and work done
[18:17:38] <codezee>	 I have a background on AI/ML as a student
[18:17:45] <halfak>	 codezee, so far, the dataset is public, but I've done nothing to try to model it yet. 
[18:17:52] <halfak>	 So, no feature selection. 
[18:18:12] <halfak>	 Oh!  However, I am working on PCFG-based feature signal that I think will have a lot of value. 
[18:18:35] <codezee>	 halfak: thats what exactly I'm planning to work on... btw is this trying to achieve the same thing as PCFG?
[18:18:49] <halfak>	 I don't understand
[18:19:29] <codezee>	 is this article trying to do the same thing that you're going to do with PCFG?
[18:20:13] <codezee>	 I suppose in the above article, we need to build a model for tagging drafts as "ok", "otherwise", "vandalism", "attack", "spam" right?
[18:20:23] <codezee>	 I'm saying from the observations mentioned
[18:22:16] <halfak>	 Oh!  Not quite. 
[18:22:32] <halfak>	 So the PCFG stuff is intended to be able to label individual sentences. 
[18:22:51] <halfak>	 While the draft quality is intended to be able to label whole pages. 
[18:24:31] <codezee>	 halfak: I see, I had read the paper on article quality and on which ORES's wp10 model is built, I'm thinking something in that line would have to be thought in terms of features
[18:25:11] <halfak>	 codezee, I agree.  We'll probably want to use a lot of the edit quality features too -- but we'll skip the diff
[18:25:25] <halfak>	 Just measure the badwords/informals/etc. in the current version of the page. 
[18:26:45] <codezee>	 yes, and probably some extra one's which might be specific to this use case, I'll have a look...
[18:27:16] <codezee>	 the article says "The ORES service would be a great place to build and host such a model" but I'm thinking as a first iteration I should host on a standlone model from scratch?
[18:27:26] <codezee>	 or can I use ORES somehow to build over it?
[18:28:08] <halfak>	 I wouldn't run a stand-alone model.  ORES is really good for extracting features and delivering predictions.
[18:28:14] <halfak>	 It's intended to be a platform in this way. 
[18:28:21] <halfak>	 We can deploy in labs without much red tape. 
[18:28:30] <halfak>	 You can always run a stand-alone ORES too ;) 
[18:28:46] <halfak>	 It's pretty easy to set up and run if you just work from our deploy repo as a template. 
[18:29:28] <codezee>	 so my starting point should be https://github.com/wiki-ai/ores right?
[18:29:45] <halfak>	 https://github.com/wiki-ai/ores-wmflabs-deploy
[18:30:00] <halfak>	 ORES is a shrink-wrapped system
[18:30:14] <halfak>	 ores-deploy-wmflabs is our configuration for running in labs. 
[18:30:39] <halfak>	 anyway, you can build, evaluate, and make predictions from the `revscoring` library. 
[18:30:52] <halfak>	 Once you have a model you like, it's relatively easy to deploy in an ORES instance. 
[18:31:04] <halfak>	 See https://github.com/wiki-ai/revscoring
[18:31:09] <halfak>	 and http://pythonhosted.org/revscoring/
[18:34:01] <halfak>	 Amir1, looking at done column and we have a lot! 
[18:34:08] <halfak>	 But a lot are from last week :S
[18:34:33] <halfak>	 I really need to break up https://phabricator.wikimedia.org/T148867 and mark something done. 
[18:34:43] <halfak>	 We're going to have a cascade of pull requests from me. 
[18:34:54] <halfak>	 I've already released new version of mwapi and deltas to do this work. 
[18:36:43] <halfak>	 I'
[18:36:57] <halfak>	 ll do that cleanup, and then start on the weekly update. 
[18:42:55] <codezee>	 halfak: thanks for the info! I'll come up with something and test it with revscoring
[18:43:23] <halfak>	 Cool.  codezee, if you want to keep that meta page updated with your progress, I'll do the same
[18:43:28] <halfak>	 Check out the work log on the talk page. 
[18:43:56] <halfak>	 codezee, what's your github username?
[18:45:49] <codezee>	 halfak: its codez266, in the immediate days, I'll look closely at the dataset and come up with some features and would revert back as I might need some help when I'll use revscoring
[18:47:55] <halfak>	 codezee, I just invited you to our github org. 
[18:48:22] <halfak>	 See https://github.com/wiki-ai/revscoring/blob/master/ipython/feature_engineering.ipynb for the basics of feature engineering with revscoring
[18:48:47] <halfak>	 Here's a demo of really basic vandalism detection in Wikipedia: https://github.com/wiki-ai/editquality/blob/master/ipython/reverted_detection_demo.ipynb
[18:53:33] <codezee>	 ohh, that'd be quite useful, I'll look through it, btw, acknowledged your invite
[19:16:29] <Amir1>	 halfak: I was afk for dinner 
[19:16:30] <Amir1>	 \o/
[19:16:36] <halfak>	 No worries. 
[19:16:47] <halfak>	 Still doing cleanup.  Got distracted by other meetings. 
[20:51:07] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 07Easy, 03Google-Code-In-2016: Quiet TimeoutError in celery logging - https://phabricator.wikimedia.org/T146681#2828703 (10Dargasea) a:03Dargasea I will be working on this as part of GCI 2016.   Joining -dev shortly!
[21:28:12] <wikibugs>	 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2828949 (10Halfak)
[21:36:05] <wikibugs>	 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality: Implement sentences datascources & experiment with normalization. - https://phabricator.wikimedia.org/T148867#2828989 (10Halfak)
[21:36:15] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2828990 (10Halfak)
[21:36:33] <wikibugs>	 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality, 10rsaas-editquality, and 2 others: [Epic] Implement PCFG features for editquality and draftquality - https://phabricator.wikimedia.org/T144636#2828991 (10Halfak)
[21:40:18] <halfak>	 woo.  I've almost got all 4 grammars trained. 
[21:40:39] <halfak>	 I accidentally restarted training the FA model, so I'm going to need to wait a while to use that. 
[21:45:17] <wikibugs>	 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources - https://phabricator.wikimedia.org/T148867#2735271 (10Halfak)
[21:45:32] <wikibugs>	 06Revision-Scoring-As-A-Service, 10revscoring: Implement sentences datascources - https://phabricator.wikimedia.org/T148867#2735271 (10Halfak) Sentence datasources in PR here: https://github.com/wiki-ai/revscoring/pull/291
[21:45:57] <wikibugs>	 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2829023 (10Halfak)
[21:45:59] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10rsaas-articlequality , 10rsaas-draftquality, 03Research-and-Data-2016-17-Q2: Build draft quality model (spam, vandalism, attack, or OK) - https://phabricator.wikimedia.org/T148038#2829022 (10Halfak)
[21:46:31] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10revscoring, 10rsaas-draftquality, 10rsaas-editquality, and 2 others: [Epic] Implement PCFG features for editquality and draftquality - https://phabricator.wikimedia.org/T144636#2829041 (10Halfak)
[21:46:33] <wikibugs>	 06Revision-Scoring-As-A-Service, 10rsaas-draftquality: Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences. - https://phabricator.wikimedia.org/T151819#2828949 (10Halfak)
[22:22:27] <grrrit-wm>	 (03CR) 10Catrope: [C: 04-1] goodfaith filter (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/323328 (https://phabricator.wikimedia.org/T149853) (owner: 10Sbisson) 
[23:17:29] <wikibugs>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Where to surface AI in Wikimedia Projects - https://phabricator.wikimedia.org/T148690#2829324 (10Halfak)
[23:21:33] <wikibugs>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2829330 (10Halfak)
[23:22:36] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak) I've been talking to @Nettrom about an Article Importance prediction model.  This model would...
[23:22:55] <wikibugs>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2829342 (10Halfak)
[23:23:45] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak)
[23:24:05] <wikibugs>	 06Revision-Scoring-As-A-Service, 10Research Ideas, 10Wikimedia-Developer-Summit (2017): Building an AI wishlist & working groups for Wikimedia Projects - https://phabricator.wikimedia.org/T147710#2701365 (10Halfak)