[01:25:00] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5) [01:25:56] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152849 (10Arthur2e5) [01:27:32] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5) Subscribing shizhao as he participated i... [01:31:54] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring webpage - https://phabricator.wikimedia.org/T162109#3152856 (10Arthur2e5) [01:33:55] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Please pull and deploy the latest version of zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152875 (10Liuxinyu970226) @Ladsgroup It seems that your previ... [01:35:23] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring gadget - https://phabricator.wikimedia.org/T162109#3152886 (10Arthur2e5) [01:35:59] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152887 (10Arthur2e5) [01:37:39] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10Arthur2e5) I believe I am seeing simplified characters right now. I am currently u... [01:45:12] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152900 (10Arthur2e5) Confirming that both https://github.com/wiki-ai/wikilabels-wmflabs-depl... [01:51:12] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152906 (10Arthur2e5) Submitted https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/34. [01:52:23] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152908 (10Arthur2e5) [02:01:32] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Accelerator keys in scoring gadget - https://phabricator.wikimedia.org/T162109#3152913 (10Arthur2e5) [02:17:20] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152836 (10revi) It takes time for translations from translatewiki.net to the site, so you'll... [03:56:51] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152940 (10Arthur2e5) 05Open>03Resolved a:03Arthur2e5 Looks like fixed with that PR. Cl... [04:00:21] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3152950 (10Arthur2e5) 05Resolved>03Open Patience patience... Deletion not yet in effect f... [13:25:17] o/ [13:30:43] halfak: o/ [13:30:48] good to see you back [13:32:45] Yup. had a weird, trippy evening last night fighting my jet lag, but I think I'm only slighly off kilter today. :| :) [13:33:50] halfak: oh! turns out your thesis advisor tips to fight jet lag works huh? [13:34:15] heh. yeah. worked great for the trip out. Still have to see how I'll fare for this trip back home. :) [13:34:30] FYI: https://en.wikipedia.org/wiki/John_T._Riedl [13:35:22] halfak: yeah I know you were advised by him. I think he was the author of actionable model of Wikipedia. The paper that you gave me a long time ago :) [13:35:39] oh do you mean you haven't reached home yet? [13:35:45] home = Minnesota [13:37:30] Oh! I'm home, but it's just the first morning. I might yet have some terrible jet lag today. :) [13:40:42] haha okay [14:13:39] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Damaging levels on Polish Wikipedia overlap too much - https://phabricator.wikimedia.org/T161655#3153878 (10Halfak) > I think ["Likely bad"] should be more strongl... [14:14:31] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3153879 (10Halfak) OK. This can be done. Adding a precision_at_recall metric would be a... [14:24:47] 06Revision-Scoring-As-A-Service, 10revscoring: Add common set of statistics to all threshold-based test-statistics - https://phabricator.wikimedia.org/T162150#3153900 (10Halfak) [14:26:36] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete editquality campaign for Korean Wikipedia - https://phabricator.wikimedia.org/T161627#3153907 (10Halfak) @revi, a status update would be a post somewhere on the wiki that pings all of the participants to announce progress.... [14:27:15] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Complete editquality campaign for Korean Wikipedia - https://phabricator.wikimedia.org/T161627#3153908 (10Halfak) See https://en.wikipedia.org/wiki/Wikipedia_talk:Labels/Edit_quality/Archive_2015#Status_update:_May_11th for an exampl... [14:29:00] halfak: can you guide me how to read the stats? [14:29:20] I think task is the max number of revisions to tag, right? [14:29:24] Sure. Let me update the task with some links. [14:29:36] thanks [14:29:36] Oh! nevermind. It's there. http://labels.wmflabs.org/campaigns/kowiki/50/?campaign=stats [14:29:52] yeah, it's just raw data so I need to know to how to interpret that [14:30:00] meh [14:30:02] ignore grammart [14:30:04] grammar* [14:30:12] So the proportion of completed work is = labels / (tasks * labels_per_task) [14:30:25] I see 610 / (7186 * 1) [14:30:41] So, 8.5% done :) [14:31:02] Check out https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality [14:31:02] about 4 days of work, not that low compared to low participation of kowiki users :P [14:31:16] That's a lot of work :) [14:31:27] heh [14:31:33] 7186 is a lot of revisions to review. But it'll be worth it once the model is trained :) [14:36:45] revi, in my estimation, it should take about 5 minutes to review 50 revisions. Is it taking you a lot longer than that [14:36:46] ? [14:36:57] Well, about 5 to 8 [14:37:08] when I'm unsure I check the revision history [14:37:14] so it's taking about 3 minutes [14:37:22] (just for my curiosity, too) [14:45:00] Gotcha. Agreed that I do that too. 5 minutes is based on an as-fast-as-I-can strategy. [14:45:34] especially if that behavior warrants a block :D [14:45:50] btw, I once saw a revdeleted revision in a campaign [14:46:10] Of course I can see the revision with my sysop hat on, but others can't [14:46:42] revi, indeed. For those revisions, I recommend that people mark whatever they like because ORES can't see it either. [14:46:47] ORES is unpriv'd [14:47:05] So when it comes to training, we regretfully skip rev-deleted revisions. [14:47:06] maybe we should have SKIP button? [14:47:09] oh. [14:47:17] great then [14:47:21] I'll tell them to do that [14:48:06] There's an "abandon" button but then that means the revision will be sent back into the set and be assigned to someone else. [14:48:48] We have a task somewhere for rending a big "skip" button in the diff space when this happens, but that'll be very complicated to implement and configure -- so it's hasn't been picked up yet. [14:49:12] https://phabricator.wikimedia.org/T161102 This one? :D [14:58:00] right :) [16:28:56] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3141821 (10Pchelolo) Any progress on this? We need to make some changes to #changeprop ORES config to prepare for DC switchover and wanted to do it simultaneously with using the new endpoint. [16:29:44] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3154222 (10Halfak) We haven't set a date yet, but I expect that we could have this resolved by the end of the week. [16:30:36] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3154224 (10Pchelolo) Awesome, that gives us enough time before the DC switchover. Perfect timing :) [18:56:48] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: ORES API sandbox doesn't work - https://phabricator.wikimedia.org/T162184#3154700 (10Pchelolo) [21:03:40] halfak: When is the next time you're planning to do an ORES deployment and model rebuild etc? IOW, if I submitted a patch for additional test stats today, when could I see them in prod? [21:03:56] (The Phab chatter above about "end of this week" made me hopeful) [21:04:26] RoanKattouw, it would be a big deploy if we were pushing out new models like that, but I believe we could aim for a Thursday deployment. [21:04:36] Awesome [21:04:51] I can't guarantee it'll happen though. I'm cramming on annual planning and a paper this week :S [21:05:27] Yeah I feel ya, I'm putting off a bunch of things this week to work on this ORES stuff [21:06:22] It's a chain of... focusing? :) [21:15:01] yeah... something like that :) [21:38:22] 10Revision-Scoring-As-A-Service-Backlog, 10Deployment-Systems, 10Wikilabels, 07Chinese-Sites, 07I18n: Deploy latest zh-hans/hant translations for Wikilabels on wmflabs - https://phabricator.wikimedia.org/T162108#3155348 (10Arthur2e5) Apparently OAuth is broken with `TypeError: mw.Uri is not a constructor... [21:52:40] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155375 (10Catrope) [21:53:47] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3142283 (10Catrope) I've removed the recall asks for now because it doesn't look like we'... [22:20:37] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155441 (10Catrope) [22:29:13] OK so I have an idea that I'm going to brain about real quick. [22:29:31] We should split test_stats from thresholds in revscoring. [22:30:40] thresholds are proving very useful to the ERI development team (e.g. RoanKattouw) as we had expected. Threshold statistics are currently included with basic stats (like ROC-AUC) in test_stats. [22:31:30] But thresholds look different. For example, we'll configure many different "filter_rate_at_recall" stats because we'd like to set different thresholds at different recall levels. [22:32:04] So I think we should create a new thing called a "threshold". [22:32:19] Don't mind me bloating your Makefile with all sorts of different recall_at_precision() stats... ;) [22:32:48] :D that's a thing too. If we want that many stats, we should have them. However, they shouldn't be considered bloating. [22:33:12] Are you suggesting that the APIs should be separate so I can get thresholds without ROC-AUC noise and vice versa? [22:33:13] I don't mind them in the makefile, but I think the JSON response in ORES is overwhelming to read through and I'd rather it wasn't. [22:33:20] Right [22:33:28] Vice versa. [22:33:42] I want ROC-AUC to be easy to get separate from the thresholds. [22:34:20] The cool thing about "thresholds" is that it makes sense to set them manually, or tie them to a statistics. Or ask them to optimize some statistics. [22:35:05] OK, idea 2. There's a thing call "thresholds" in the fitness metric methods called "thresholds" as well. [22:35:22] This stores a set of thresholds at which the model will make predictions with a few nice guarantees. [22:35:53] What kind of guarantees? [22:36:01] (BTW, https://github.com/wiki-ai/editquality/pull/63 ) [22:36:19] Not too long of a vector, but adequately covers the real values a model produces. [22:36:41] This is complex because a model's score is not guaranteed to be normally or uniformly distributed. [22:36:48] Right [22:37:17] So, we store all the reasonable thresholds in some sort of array along with all of the 4 interesting test statistics. [22:38:07] So a user could optimize after-the-fact. [22:39:06] So at every threshold, we store precision, recall, and filter_rate [22:39:17] You optimize how you like with the client. [22:39:21] Ooh, that sounds great [22:39:23] :) [22:39:38] I think it'll make it all easier. [22:39:45] Are you saying we'd get that info for every threshold at an interval like 0.1 or 0.05, subject to excluding ranges that the model doesn't really ever reach? [22:40:08] Basically, yeah. [22:40:16] Oh that would be excellent [22:40:38] So, in the short term, RoanKattouw, I suggest you continue to do what you are doing. [22:40:43] I would basically not have to bother you ever again, and would have enough granularity to do all sorts of things [22:41:36] \o/ better for everyone. Except it's fun to have you join us in this channel. :D [22:42:01] Anyway, I think you shoudl continue to propose a mess of test stats and in the meantime, I'm going to try to put this idea together in some tasks. [22:42:07] Excellent [22:42:26] halfak: My pull request for my mess of test stats: https://github.com/wiki-ai/editquality/pull/63 [22:42:42] Also I suppose the whole precision_at_recall() thing would be unnecessary too [22:43:01] If we have all 4 interesting metrics at a sufficiently granular interval, we can do that kind of thing on the client [22:47:43] Right. And it should be pretty easy. [22:48:45] I'm thinking it will even be easy to have conditionals for your test statistics. E.g. Give me the best tradeoff between precision and recall between 85% and 95% recall. [22:49:09] That would let you take advantage of minor quirks in the models scoring range. [22:49:49] Usually, recall jumps with a minor loss in precision, that's worth it. [22:50:01] Or vise versa [22:50:58] That's actually exactly the kind of stuff that Joe has been writing [22:51:47] Like, target a recall of 90%, but with a minimum precision of N%, but don't let recall get below 80% [22:52:07] Written in the form of rules like "if recall > 80%, go in the direction of more recall" [22:52:33] :) [22:52:43] I'm just copy-pasting our chat into phab cards. [22:52:48] This is great. [22:55:03] Awesome [22:55:17] So... the dreaded question: how long would this take? :P [23:00:44] RoanKattouw, depends on if I get excited about it this weekend or a weekend 3 months from now :) [23:00:53] Ha good point [23:08:06] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Add more values to test_stats - https://phabricator.wikimedia.org/T161767#3155623 (10Catrope) https://github.com/wiki-ai/editquality/pull/63 [23:31:02] halfak: Shorter term, you said you could rebuild the models with my new stats tonight, when would they finish building? [23:32:14] Maybe tomorrow. :) Assuming we didn't mess anything up when updating the file in the meantime. :) [23:32:45] Oh man. That's another benefit. We won't need to rebuild models to incorporate new test threshold-level statistcis. [23:37:31] Right :)