[01:25:00] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5)
[01:25:56] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152849 (10Arthur2e5)
[01:27:32] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5) Subscribing shizhao as he participated i...
[01:31:54] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring webpage - https://phabricator.wikimedia.org/T162109#3152856 (10Arthur2e5)
[01:33:55] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152875 (10Liuxinyu970226) @Ladsgroup It seems that your previ...
[01:35:23] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring gadget - https://phabricator.wikimedia.org/T162109#3152886 (10Arthur2e5)
[01:35:59] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152887 (10Arthur2e5)
[01:37:39] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5) I believe I am seeing simplified characters right now. I am currently u...
[01:45:12] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152900 (10Arthur2e5) Confirming that both https://github.com/wiki-ai/wikilabels-wmflabs-depl...
[01:51:12] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152906 (10Arthur2e5) Submitted https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/34.
[01:52:23] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152908 (10Arthur2e5)
[02:01:32] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring gadget - https://phabricator.wikimedia.org/T162109#3152913 (10Arthur2e5)
[02:17:20] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10revi) It takes time for translations from translatewiki.net to the site, so you'll...
[03:56:51] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152940 (10Arthur2e5) 05Open>03Resolved a:03Arthur2e5 Looks like fixed with that PR. Cl...
[04:00:21] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152950 (10Arthur2e5) 05Resolved>03Open Patience patience... Deletion not yet in effect f...
[13:25:17] <halfak>	 o/
[13:30:43] <glorian_wd>	 halfak: o/
[13:30:48] <glorian_wd>	 good to see you back
[13:32:45] <halfak>	 Yup.  had a weird, trippy evening last night fighting my jet lag, but I think I'm only slighly off kilter today. :|  :) 
[13:33:50] <glorian_wd>	 halfak: oh! turns out your thesis advisor tips to fight jet lag works huh?
[13:34:15] <halfak>	 heh.  yeah.  worked great for the trip out.  Still have to see how I'll fare for this trip back home. :) 
[13:34:30] <halfak>	 FYI: https://en.wikipedia.org/wiki/John_T._Riedl
[13:35:22] <glorian_wd>	 halfak: yeah I know you were advised by him. I think he was the author of actionable model of Wikipedia. The paper that you gave me a long time ago :)
[13:35:39] <glorian_wd>	 oh do you mean you haven't reached home yet?
[13:35:45] <glorian_wd>	 home = Minnesota
[13:37:30] <halfak>	 Oh!  I'm home, but it's just the first morning.  I might yet have some terrible jet lag today. :) 
[13:40:42] <glorian_wd>	 haha okay
[14:13:39] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Damaging levels on Polish Wikipedia overlap too much - https://phabricator.wikimedia.org/T161655#3153878 (10Halfak) > I think ["Likely bad"] should be more strongl...
[14:14:31] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3153879 (10Halfak) OK.  This can be done.  Adding a precision_at_recall metric would be a...
[14:24:47] <wikibugs>	 06Revision-Scoring-As-A-Service, 10revscoring: Add common set of statistics to all threshold-based test-statistics - https://phabricator.wikimedia.org/T162150#3153900 (10Halfak)
[14:26:36] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete editquality campaign for Korean Wikipedia - https://phabricator.wikimedia.org/T161627#3153907 (10Halfak) @revi, a status update would be a post somewhere on the wiki that pings all of the participants to announce progress....
[14:27:15] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete editquality campaign for Korean Wikipedia - https://phabricator.wikimedia.org/T161627#3153908 (10Halfak) See https://en.wikipedia.org/wiki/Wikipedia_talk:Labels/Edit_quality/Archive_2015#Status_update:_May_11th for an exampl...
[14:29:00] <revi>	 halfak: can you guide me how to read the stats?
[14:29:20] <revi>	 I think task is the max number of revisions to tag, right?
[14:29:24] <halfak>	 Sure.  Let me update the task with some links. 
[14:29:36] <revi>	 thanks
[14:29:36] <halfak>	 Oh!  nevermind.  It's there. http://labels.wmflabs.org/campaigns/kowiki/50/?campaign=stats
[14:29:52] <revi>	 yeah, it's just raw data so I need to know to how to interpret that
[14:30:00] <revi>	 meh
[14:30:02] <revi>	 ignore grammart
[14:30:04] <revi>	 grammar*
[14:30:12] <halfak>	 So the proportion of completed work is = labels / (tasks * labels_per_task)
[14:30:25] <halfak>	 I see 610 / (7186 * 1)
[14:30:41] <halfak>	 So, 8.5% done :) 
[14:31:02] <halfak>	 Check out https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality
[14:31:02] <revi>	 about 4 days of work, not that low compared to low participation of kowiki users :P
[14:31:16] <halfak>	 That's a lot of work :) 
[14:31:27] <revi>	 heh
[14:31:33] <halfak>	 7186 is a lot of revisions to review.  But it'll be worth it once the model is trained :) 
[14:36:45] <halfak>	 revi, in my estimation, it should take about 5 minutes to review 50 revisions.  Is it taking you a lot longer than that
[14:36:46] <halfak>	 ?
[14:36:57] <revi>	 Well, about 5 to 8
[14:37:08] <revi>	 when I'm unsure I check the revision history
[14:37:14] <revi>	 so it's taking about 3 minutes
[14:37:22] <revi>	 (just for my curiosity, too)
[14:45:00] <halfak>	 Gotcha.  Agreed that I do that too.  5 minutes is based on an as-fast-as-I-can strategy. 
[14:45:34] <revi>	 especially if that behavior warrants a block :D
[14:45:50] <revi>	 btw, I once saw a revdeleted revision in a campaign
[14:46:10] <revi>	 Of course I can see the revision with my sysop hat on, but others can't
[14:46:42] <halfak>	 revi, indeed.  For those revisions, I recommend that people mark whatever they like because ORES can't see it either. 
[14:46:47] <halfak>	 ORES is unpriv'd 
[14:47:05] <halfak>	 So when it comes to training, we regretfully skip rev-deleted revisions. 
[14:47:06] <revi>	 maybe we should have SKIP button?
[14:47:09] <revi>	 oh.
[14:47:17] <revi>	 great then
[14:47:21] <revi>	 I'll tell them to do that
[14:48:06] <halfak>	 There's an "abandon" button but then that means the revision will be sent back into the set and be assigned to someone else. 
[14:48:48] <halfak>	 We have a task somewhere for rending a big "skip" button in the diff space when this happens, but that'll be very complicated to implement and configure -- so it's hasn't been picked up yet. 
[14:49:12] <revi>	 https://phabricator.wikimedia.org/T161102 This one? :D
[14:58:00] <halfak>	 right :) 
[16:28:56] <wikibugs_>	 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3141821 (10Pchelolo) Any progress on this? We need to make some changes to #changeprop ORES config to prepare for DC switchover and wanted to do it simultaneously with using the new endpoint.
[16:29:44] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3154222 (10Halfak) We haven't set a date yet, but I expect that we could have this resolved by the end of the week.
[16:30:36] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3154224 (10Pchelolo) Awesome, that gives us enough time before the DC switchover. Perfect timing :)
[18:56:48] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10ORES: ORES API sandbox doesn't work - https://phabricator.wikimedia.org/T162184#3154700 (10Pchelolo)
[21:03:40] <RoanKattouw>	 halfak: When is the next time you're planning to do an ORES deployment and model rebuild etc? IOW, if I submitted a patch for additional test stats today, when could I see them in prod?
[21:03:56] <RoanKattouw>	 (The Phab chatter above about "end of this week" made me hopeful)
[21:04:26] <halfak>	 RoanKattouw, it would be a big deploy if we were pushing out new models like that, but I believe we could aim for a Thursday deployment. 
[21:04:36] <RoanKattouw>	 Awesome
[21:04:51] <halfak>	 I can't guarantee it'll happen though.  I'm cramming on annual planning and a paper this week :S
[21:05:27] <RoanKattouw>	 Yeah I feel ya, I'm putting off a bunch of things this week to work on this ORES stuff
[21:06:22] <RoanKattouw>	 It's a chain of... focusing? :)
[21:15:01] <halfak>	 yeah... something like that :) 
[21:38:22] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3155348 (10Arthur2e5) Apparently OAuth is broken with `TypeError: mw.Uri is not a constructor...
[21:52:40] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155375 (10Catrope)
[21:53:47] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3142283 (10Catrope) I've removed the recall asks for now because it doesn't look like we'...
[22:20:37] <wikibugs_>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155441 (10Catrope)
[22:29:13] <halfak>	 OK so I have an idea that I'm going to brain about real quick. 
[22:29:31] <halfak>	 We should split test_stats from thresholds in revscoring. 
[22:30:40] <halfak>	 thresholds are proving very useful to the ERI development team (e.g. RoanKattouw) as we had expected.  Threshold statistics are currently included with basic stats (like ROC-AUC) in test_stats.
[22:31:30] <halfak>	 But thresholds look different.  For example, we'll configure many different "filter_rate_at_recall" stats because we'd like to set different thresholds at different recall levels. 
[22:32:04] <halfak>	 So I think we should create a new thing called a "threshold". 
[22:32:19] <RoanKattouw>	 Don't mind me bloating your Makefile with all sorts of different recall_at_precision() stats... ;)
[22:32:48] <halfak>	 :D  that's a thing too.  If we want that many stats, we should have them.  However, they shouldn't be considered bloating. 
[22:33:12] <RoanKattouw>	 Are you suggesting that the APIs should be separate so I can get thresholds without ROC-AUC noise and vice versa?
[22:33:13] <halfak>	 I don't mind them in the makefile, but I think the JSON response in ORES is overwhelming to read through and I'd rather it wasn't. 
[22:33:20] <RoanKattouw>	 Right
[22:33:28] <halfak>	 Vice versa. 
[22:33:42] <halfak>	 I want ROC-AUC to be easy to get separate from the thresholds. 
[22:34:20] <halfak>	 The cool thing about "thresholds" is that it makes sense to set them manually, or tie them to a statistics.  Or ask them to optimize some statistics. 
[22:35:05] <halfak>	 OK, idea 2.  There's a thing call "thresholds" in the fitness metric methods called "thresholds" as well. 
[22:35:22] <halfak>	 This stores a set of thresholds at which the model will make predictions with a few nice guarantees. 
[22:35:53] <RoanKattouw>	 What kind of guarantees?
[22:36:01] <RoanKattouw>	 (BTW, https://github.com/wiki-ai/editquality/pull/63 )
[22:36:19] <halfak>	 Not too long of a vector, but adequately covers the real values a model produces. 
[22:36:41] <halfak>	 This is complex because a model's score is not guaranteed to be normally or uniformly distributed. 
[22:36:48] <RoanKattouw>	 Right
[22:37:17] <halfak>	 So, we store all the reasonable thresholds in some sort of array along with all of the 4 interesting test statistics. 
[22:38:07] <halfak>	 So a user could optimize after-the-fact.
[22:39:06] <halfak>	 So at every threshold, we store precision, recall, and filter_rate
[22:39:17] <halfak>	 You optimize how you like with the client. 
[22:39:21] <RoanKattouw>	 Ooh, that sounds great
[22:39:23] <halfak>	 :) 
[22:39:38] <halfak>	 I think it'll make it all easier. 
[22:39:45] <RoanKattouw>	 Are you saying we'd get that info for every threshold at an interval like 0.1 or 0.05, subject to excluding ranges that the model doesn't really ever reach?
[22:40:08] <halfak>	 Basically, yeah. 
[22:40:16] <RoanKattouw>	 Oh that would be excellent
[22:40:38] <halfak>	 So, in the short term, RoanKattouw, I suggest you continue to do what you are doing. 
[22:40:43] <RoanKattouw>	 I would basically not have to bother you ever again, and would have enough granularity to do all sorts of things
[22:41:36] <halfak>	 \o/ better for everyone.  Except it's fun to have you join us in this channel. :D
[22:42:01] <halfak>	 Anyway, I think you shoudl continue to propose a mess of test stats and in the meantime, I'm going to try to put this idea together in some tasks. 
[22:42:07] <RoanKattouw>	 Excellent
[22:42:26] <RoanKattouw>	 halfak: My pull request for my mess of test stats: https://github.com/wiki-ai/editquality/pull/63
[22:42:42] <RoanKattouw>	 Also I suppose the whole precision_at_recall() thing would be unnecessary too
[22:43:01] <RoanKattouw>	 If we have all 4 interesting metrics at a sufficiently granular interval, we can do that kind of thing on the client
[22:47:43] <halfak>	 Right.  And it should be pretty easy. 
[22:48:45] <halfak>	 I'm thinking it will even be easy to have conditionals for your test statistics.  E.g. Give me the best tradeoff between precision and recall between 85% and 95% recall. 
[22:49:09] <halfak>	 That would let you take advantage of minor quirks in the models scoring range. 
[22:49:49] <halfak>	 Usually, recall jumps with a minor loss in precision, that's worth it.
[22:50:01] <halfak>	 Or vise versa
[22:50:58] <RoanKattouw>	 That's actually exactly the kind of stuff that Joe has been writing
[22:51:47] <RoanKattouw>	 Like, target a recall of 90%, but with a minimum precision of N%, but don't let recall get below 80%
[22:52:07] <RoanKattouw>	 Written in the form of rules like "if recall > 80%, go in the direction of more recall"
[22:52:33] <halfak>	 :) 
[22:52:43] <halfak>	 I'm just copy-pasting our chat into phab cards. 
[22:52:48] <halfak>	 This is great. 
[22:55:03] <RoanKattouw>	 Awesome
[22:55:17] <RoanKattouw>	 So... the dreaded question: how long would this take? :P
[23:00:44] <halfak>	 RoanKattouw, depends on if I get excited about it this weekend or a weekend 3 months from now :) 
[23:00:53] <RoanKattouw>	 Ha good point
[23:08:06] <wikibugs>	 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155623 (10Catrope) https://github.com/wiki-ai/editquality/pull/63
[23:31:02] <RoanKattouw>	 halfak: Shorter term, you said you could rebuild the models with my new stats tonight, when would they finish building?
[23:32:14] <halfak>	 Maybe tomorrow. :)  Assuming we didn't mess anything up when updating the file in the meantime. :) 
[23:32:45] <halfak>	 Oh man.  That's another benefit.  We won't need to rebuild models to incorporate new test threshold-level statistcis. 
[23:37:31] <RoanKattouw>	 Right :)