[00:05:52] halfak: Ooh nice. And yes, !precision and !recall will be quite helpful [01:06:32] 10Revision-Scoring-As-A-Service-Backlog, 06Collaboration-Team-Triage, 10Edit-Review-Improvements-RC-Page, 10ORES: Make presence and targets of ORES filters configurable - https://phabricator.wikimedia.org/T162760#3173928 (10Catrope) [01:06:42] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017): Make presence and targets of ORES filters configurable - https://phabricator.wikimedia.org/T162760#3173941 (10Catrope) [15:26:54] o/ [15:31:02] RoanKattouw, I'm working on the thresholds. What do you think of having 200 unique thresholds to choose from. [15:31:34] halfak: That would be amazing [15:31:45] There'd be no guarantee that any score or stat range is covered, but there would be a guarantee that the available score range would be well represented. [15:31:57] Originally I wondered if I dared ask for 40 [15:32:08] But 200 would give good coverage [15:32:18] I figure that 200 should cover everything from setting thresholds to plotting the precision-recall curve. [15:32:47] You don't really need to do any fancy math tricks to ensure that you have good coverage at that point, you're just brute forcing it [15:32:58] Yes I also want to plot that curve [15:33:07] Na. I still need to. Could have 2k thresholds no problem [15:33:12] Or maybe even 10k [15:33:20] Ha [15:33:29] Well you won't hear me complain [15:33:34] And they could all be distributed between 0.5 and 0.55 [15:33:42] Or 1 and 1000 [15:33:50] Since they aren't necessarily a probability estimate. [15:33:59] Right [15:34:46] But if they're all between 0.5 and 0.55, we have a good reason for that based on some mathematical guarantees, right? [15:36:08] Oh I'm just thinking the score for that. Hopefully, the precision, recall, etc. stats will cover a much wider range :) [15:36:11] No guarantees. [15:36:15] But a lot of likelihood [18:19:13] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 15User-Ladsgroup, 07User-notice-collaboration: Deploy ORES Review Tool for hewiki - https://phabricator.wikimedia.org/T161621#3176003 (10Catrope) 05Open>03Resolved [18:30:42] Right [18:34:26] Also, out of curiosity, what's the status of the frwiki labels campaign? [18:34:39] The language in it looks a bit .... Google Translate-y [18:34:59] Also, I swear I once saw a graph of a campaign indicating progress to completion but now I can't find it [18:35:59] Oh looks like the frwiki labeling campaign was started 11 months(!) ago? [18:38:26] RoanKattouw, yeah. It's one of our oldest campaigns. [18:38:39] Still no one to shepherd it along [18:38:41] RCFilters launched on frwiki yesterday [18:38:49] And now suddenly people are wondering why they don't have ORES filters :) [18:38:56] lol [18:38:59] Label stuff! [18:39:00] :) [18:39:02] It doesn't help that we forgot to update the introductory message to remove mentions of ORES for non-ORES wikis [18:39:07] Yeah! [18:39:23] Can you direct people to the labeling campaign when responding. [18:39:33] I will remind Benoît to [18:39:42] There is a documentation page that I think does that [18:41:08] Looks like https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Label/Qualit%C3%A9_des_%C3%A9ditions was google-translated by one of our Turkish/English speakers. [18:41:33] en-n, tr-4, az-2, ja-1, fr-0 :) [18:41:37] https://fr.wikipedia.org/wiki/Utilisateur:%E3%81%A8%E3%81%82%E3%82%8B%E7%99%BD%E3%81%84%E7%8C%AB [18:43:23] Well that name is actually a bit better [18:43:44] "Qualité des éditions" at least tries to be "quality of the edits", though I'm not sure if it means that or "quality of the editions", I'm only like fr-1.5 [18:43:53] But in WikiLabels itself, the title is something else [18:44:25] -._o_.- [18:44:28] "Modifier la qualité" , which means "edit the quality", with "edit" being a verb/imperative [18:44:45] Anyway, I guess this proves you don't have a local champion on frwiki [18:45:01] Because if you did, this kind of stuff would have been fixed already [18:45:23] which is why I wondered how far along that campaign even is, because I figured it couldn't be that greta [19:01:03] We can always change that title if someone wants to give me a nice string. [19:01:15] Regretfully, that's not one of the things that's on translatewiki.net. [19:07:21] Hmm [19:07:34] Yeah I'm writing an email where I tell Benoît that there's a campaign that needs love [19:08:26] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 06Operations, 06Services (done), 15User-mobrovac: [spec] Active-active setup for ORES across datacenters (eqiad, codfw) - https://phabricator.wikimedia.org/T159615#3176179 (10Pchelolo) 05Open>03Resolved a:03Pchelolo The prefacing rule is now updating... [19:08:26] 10Revision-Scoring-As-A-Service-Backlog, 06Operations, 13Patch-For-Review: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#3176183 (10Pchelolo) [19:09:36] halfak: for when you have some free time [19:09:36] https://github.com/wiki-ai/editquality/pull/65 [19:09:53] I get the POST thingy right after I'm done with some stuff here [20:34:52] * halfak clicks [20:35:01] {{merged}} [20:35:02] 10[1] 04https://meta.wikimedia.org/wiki/Template:merged [20:35:05] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017): Tweak ORES Preferences for Watchlist and RC Page ahead of next release - https://phabricator.wikimedia.org/T162831#3176394 (10jmatazzoni) [20:38:40] halfak: Thanks [20:38:58] I make a new campaign soon [20:44:36] Great :) [20:48:07] legoktm: congrats on getting Linter deployed everywhere [20:48:08] <3 [20:48:13] thanks :D [21:01:16] RoanKattouw, https://github.com/wiki-ai/revscoring/pull/307 [21:01:20] progress on thresholds. [21:01:30] Scaling has been a pain, but I think I have the math worked out. [21:01:45] Currently, we're not doing scaling so this'll be a useful feature. [21:02:08] Scaling affects precision but not recall. [21:03:18] Amir1, given our sampling strategy, we'll need to reverse engineer how the rate of damaging edits in our train/test set differs from the general population. [21:04:01] e.g. in Wikidata where we went with a balanced reverted/not-reverted dataset [21:05:06] Wikidata has been a pain in the sampling [21:05:28] right. [21:08:57] Yay [21:09:15] Scaling meaning that the score value will (roughly) correspond to the precision? [21:09:49] RoanKattouw, not exactly. More that we can train the model on biased dataset but know that we get unbiased threshold information. [21:10:06] Aah OK [21:10:08] E.g. train the model on damage/non-damage 50/50 even though damage isn't that common. [21:11:14] Aha. nice [21:33:46] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017): Tweak ORES Preferences for Watchlist and RC Page ahead of next release - https://phabricator.wikimedia.org/T162831#3176625 (10Catrope) Why have two separate preferences fo... [21:46:26] RoanKattouw, sorry to keep bugging you, but this is exciting. Check this out https://gist.github.com/halfak/a86f690b3a4d870e5e50b67f2a706cec [21:46:47] It's all based on sample data that I'm using in my tests, but you can get a sense for what I expect thresholds to look like. [21:50:03] Nice [21:50:08] Ahm, that data must be fake, though, right? [21:50:13] precision equals recall at all thresholds [21:50:18] Yeah. fake data :) [21:50:29] OK [21:50:30] Well, fake test data, real test stats based on the fake test data. [21:50:34] Just keeping me on my toes, I see ;) [21:50:49] Oh so it's not a real model? [21:50:56] Nope. [21:51:05] Looks like I have an issue in !precision too [21:51:07] OK [21:53:11] This is really cool because it demonstrates the difference between accuracy and precision nicely :) [22:02:40] halfak: any thoughts on how many articles I should sample for WP:MED? If I do <= 12 from each class in my confusion matrix, I get about 200 [22:03:06] Not sure what you mean by <= 12 [22:03:13] “up to 12" [22:03:25] what confusion matrix? [22:03:58] (sorry, context switching might have left me temporarily stupid) [22:04:04] :S [22:04:17] I built a model and had it predict all the articles in WP:MED, the resulting confusion matrix is 4x4, for each of the four importance ratings [22:04:31] no worries, I see you’re slowly turning into a professor ;) [22:05:35] Oh I see! So the plan was to show the articles where the model's prediction was very different from the article's assessment to WPMED-editors and have them comment, right? [22:06:49] halfak: yep! What I’m wondering if whether I should grab a specific type of article (e.g. “Low-importance predicted as Top-importance”), or just sample across the board… also, I’m not planning on telling them what the prediction is, just ask them to reassess the importance [22:07:55] Hmm... I'm kind of in favor of being straightforward with the prediction. After all, that's what we did with the re-assessment work and the article quality model. [22:09:45] halfak: trwiki v2 is done now: http://labels.wmflabs.org/ui/trwiki/ [22:10:12] I think we need to contact the person and tell them to advertise the campaign in Turkish Wikipedia [22:10:13] halfak: ok, maybe best to create four samples, one for each predicted importance class, then [22:10:47] Amir1, great! I can do that. Thanks for getting that set up :) [22:10:59] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality, 15User-Ladsgroup: Start v2 editquality campaign for trwiki - https://phabricator.wikimedia.org/T161977#3176730 (10Ladsgroup) It's up now: http://labels.wmflabs.org/ui/trwiki/ [22:11:05] yw [22:11:19] To the next action item, POST thingy [22:17:37] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality, 15User-Ladsgroup: Start v2 editquality campaign for trwiki - https://phabricator.wikimedia.org/T161977#3176757 (10Halfak) I've contacted Sakhalinio to let him know about the new labeling campaign. [22:30:21] POSTed precaching is done too [22:32:01] halfak: I don't have anything to do for now except tensorflow stuff. I'm calling it a day anyway but for tomorrow, can you find something for me to do? [22:33:18] Sure! [22:33:23] * halfak thinks [22:40:26] 06Revision-Scoring-As-A-Service, 10revscoring: Add common set of statistics to all threshold-based test-statistics - https://phabricator.wikimedia.org/T162150#3176842 (10Halfak) 05Open>03declined Declining in favor of T162217 [22:41:12] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring, 07Bengali-Sites, 15User-Ladsgroup: Add language support for Bengali - https://phabricator.wikimedia.org/T162620#3176860 (10Halfak) a:03Ladsgroup [22:41:42] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring, 15User-Ladsgroup: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3176862 (10Halfak) a:03Ladsgroup [22:41:49] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 15User-Ladsgroup: ORES API sandbox doesn't work - https://phabricator.wikimedia.org/T162184#3176864 (10Halfak) a:03Ladsgroup [22:42:43] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 15User-Ladsgroup: ORES API sandbox doesn't work - https://phabricator.wikimedia.org/T162184#3154700 (10Halfak) I think the problem here is that we detect the protocol used in the URL and set that to the protocol available in swagger. For some reason, we get h... [22:44:05] Amir1, I assigned a few things on the backlog. :) [22:44:05] I'm heading out for the evening. [22:44:05] Have a good one folks! [22:44:05] o/ [22:44:14] me too [22:44:16] bye [22:44:18] o/ [22:48:29] 06Revision-Scoring-As-A-Service, 10revscoring: Implement "thresholds", deprecate "pile of tests_stats" - https://phabricator.wikimedia.org/T162217#3176888 (10Halfak) a:03Halfak [22:48:37] 06Revision-Scoring-As-A-Service, 10revscoring: Implement "thresholds", deprecate "pile of tests_stats" - https://phabricator.wikimedia.org/T162217#3155892 (10Halfak) https://github.com/wiki-ai/revscoring/pull/307 [23:14:06] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017): Tweak ORES-Related Preferences for Watchlist and RC Page ahead of next release - https://phabricator.wikimedia.org/T162831#3176969 (10jmatazzoni) [23:32:04] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017): Tweak ORES-Related Preferences for Watchlist and RC Page ahead of next release - https://phabricator.wikimedia.org/T162831#3177059 (10jmatazzoni)