[06:42:15] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: Design a data collection pilot using WikiLabels platform (mining reasons) - https://phabricator.wikimedia.org/T186351#4107512 (10bmansurov) @Miriam, I've set up a testing environment at http://research-wikila... [09:58:34] (03PS1) 10Umherirrender: Fix parameter docs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/424254 [10:15:17] (03CR) 10jerkins-bot: [V: 04-1] Fix parameter docs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/424254 (owner: 10Umherirrender) [10:19:15] (03CR) 10jerkins-bot: [V: 04-1] Fix parameter docs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/424254 (owner: 10Umherirrender) [12:20:16] o/ [14:07:32] * halfak works on writing [14:42:15] (03CR) 10Alexandros Kosiaris: Try to switch to the dsh host manifest (031 comment) [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/423752 (https://phabricator.wikimedia.org/T191321) (owner: 10Awight) [15:12:00] o/ Amir1 [15:12:12] Is there still a bot running in fawiki using ORES scores to revert edits? [15:23:53] https://www.mediawiki.org/wiki/Topic:Uapg9b5zviuss44o [15:23:58] Observation: Adoption patterns [15:29:59] (03CR) 10Thcipriani: Try to switch to the dsh host manifest (031 comment) [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/423752 (https://phabricator.wikimedia.org/T191321) (owner: 10Awight) [15:31:30] (03CR) 10Thcipriani: Try to switch to the dsh host manifest (031 comment) [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/423752 (https://phabricator.wikimedia.org/T191321) (owner: 10Awight) [15:34:55] halfak: Yeah, it used to be stopped but now it got enabled (several days ago) [16:06:26] Amir1, what's the bot called? [16:06:41] We should add it to the ORES/Applications page :D [16:06:47] It's User:Dexbot [16:07:06] the script called Qaher69 for historical reasons [16:07:57] Having some fun with https://en.wikipedia.org/wiki/IAIO_Qaher-313 [16:08:46] lol. [16:08:50] Like cluebot [16:08:54] And the bomber [16:10:11] I got the idea from Clue bot :D [16:26:15] halfak: the bot running on eswiki using ORES got stopped last week after some complaints [16:29:02] relevant links: [16:29:03] https://es.wikipedia.org/wiki/Wikipedia:Caf%C3%A9/Archivo/Miscel%C3%A1nea/Actual#Parada_de_PatruBOT [16:29:07] https://es.wikipedia.org/wiki/Wikipedia:Mantenimiento/Revisi%C3%B3n_de_errores_de_PatruBOT/An%C3%A1lisis [16:30:02] Understood Platonides. Thanks for the info, [16:30:21] It's regrettable that the dev wouldn't work to adjust the thresholds appropriately. [16:31:24] rather than the thresholds not being adjusted [16:31:32] ORES is plenty accurate for eswiki, but you need to set the thresholds to a more strict level. [16:31:48] it's probably that people felt the bot to be too aggressive / with too many FP [16:32:02] Right. So you adjust the threshold so that there are fewer FP [16:32:03] even if that was just an inaccurate perception [16:32:04] :| [16:33:00] the goal of the second page is to manually review a sample of the bots reversions to figure out how accurate it was [16:35:43] Yikes. We already know this. [16:35:49] using a test dataset. [16:36:03] We have a whole system for figuring out the FP rate for different thresholds. [16:36:11] * halfak puts face in palm [16:36:25] All you need to know if what rate of mistakes is acceptable. [16:36:38] Let's say 10% [16:37:33] https://ores.wmflabs.org/v3/scores/eswiki/?models=damaging&model_info=statistics.thresholds.true.%27maximum%20recall%20@%20precision%20%3E=%200.9%27 [16:37:53] Set threshold at 0.959 [16:38:03] Expect to catch 16.5% of all damage [16:38:08] With 90.2% precision [16:38:31] These estimates are conservative, so you're likely to get slightly better precision and slightly better recall. [16:38:37] Platonides, ^ [16:39:26] do you have those parameters defined? [16:39:42] I'm not sure what you're asking. [16:39:55] sorry [16:39:58] they obviously are [16:40:18] I mean a page documenting what are the match_rate, the recall, the fpr... [16:40:58] Ahh not exactly. Most of these terms are industry standards. Right now, I'd like to consult with developers. I've reached out to jem multiple times to try to help :| [16:41:31] But we should eventually have good docs on this. We're working on spec'ing a full ORES manual now. [17:28:04] (03PS4) 10Awight: Try to switch to the dsh host manifest [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/423752 (https://phabricator.wikimedia.org/T191321) [18:31:02] * halfak looks around for awight [18:31:19] Oh n/m [18:58:56] o/ [19:00:03] o/ awight [19:40:57] (03PS1) 10Awight: Revert "Build venv into deployed source dir (take 2)" [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/424382 [19:43:54] (03CR) 10Awight: [V: 032 C: 032] Revert "Build venv into deployed source dir (take 2)" [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/424382 (owner: 10Awight) [20:37:27] what are the scores that is ORES currently providing? [20:39:24] Platonides: Do you have an example of what you mean? [20:41:55] Is this edit damaging? Is this edit saved in goodfaith? What quality level is this version of an article? [20:41:58] ^ LIke that? [20:42:11] yes [20:42:20] I would like to see alongside the scores that ORES provides with the edits [20:42:57] not sure what the result would be [20:43:15] perhaps detecting vandalisms that ORES detected but reviewers didn't [20:43:36] or that ORES matches very well what people did manually... [20:43:57] Here’s something I wrote along those lines, https://www.mediawiki.org/wiki/Extension:JADE#Jade_namespace:_Judgments [20:44:27] That’s where editors and patrollers will be able to record quantitative judgments in line with ORES scores. [20:44:55] These may confirm or contradict ORES, or they might be the basis of a discussion among pattrollers. [20:46:48] where are the current models described? [20:47:30] [[:mw:ORES]] [20:47:31] 10[1] 04https://meta.wikimedia.org/wiki/:mw:ORES [20:49:28] sight [20:49:37] I started at https://wikitech.wikimedia.org/wiki/ORES [20:49:50] then ended at https://www.mediawiki.org/wiki/ORES/New_model_checklist [20:50:24] when it was easier [20:50:30] * awight levels a wand at AsimovBot [20:50:43] https://www.mediawiki.org/wiki/ORES [20:51:04] has the descriptions of what scores contain [20:52:57] In progress = not available yet? [20:53:24] exactly [20:55:48] Here’s a real-time display of the same data, that Amir1 wrote recently: https://tools.wmflabs.org/ores-support-checklist/ [20:57:32] is the wiki table updated? [20:57:41] " "message": "Models ('reverted',) not available for eswiki"" [20:58:10] Once we deploy “advanced edit quality” models, damaging and good-faith, we remove the reverted model. [20:58:43] Those are “n/a” in the toollabs grid. [20:58:55] then I would expect reverted not to be listed in the table [20:59:57] 10Scoring-platform-team (Current), 10User-Ladsgroup: Build ORES support checklist - https://phabricator.wikimedia.org/T189954#4110288 (10awight) >>! In T189954#4085016, @Halfak wrote: > What do you think about including some notion of language support? E.g. we could query github to see if there's a matching l... [21:00:00] inside the score probability, I guess false and true values are actually one number printed as x and 1 - x [21:00:37] I think that’s right. It gets more interesting with wp10 predictions… [21:00:38] and the prediction just x > some constant [21:01:23] yes, the constant should be calculated like halfak was explaining earlier, using a thresholds request or directly reading the P-R curve. [21:01:58] ah, that's a richer score, indeed [21:02:18] excuse my ignorance, what's the P-R curve? [21:03:09] here’s a glossary for precision and recall, lemme find something about the graph [21:04:44] I guess the threshold is the point where the prediction changes from true to false? [21:04:50] is that dynamic? [21:07:46] It is, but must be taken into consideration by the consumer. [21:08:38] you want different thresholds depending on what you’re doing with the scores. [21:11:07] the earlier url given by halfak https://ores.wmflabs.org/v3/scores/eswiki/?models=damaging&model_info=statistics.thresholds.true.%27maximum%20recall%20@%20precision%20%3E=%200.9%27 provides a threshold for a 0.9 certainity [21:11:16] would it be different eg. next week? [21:11:19] Platonides: https://github.com/adamwight/thresholds_diagrams/blob/master/damaging.svg [21:11:31] yes, it changes when we rebuild the models [21:12:07] The graph I just sent gives an illustration of how threshold is related to precision and recall, for the enwiki damaging model. [21:12:25] in that case, there would be a version bump in the models section, no? [21:13:03] yes [21:14:17] what's the recall? [21:14:33] here’s a description of the type of graph I sent, http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html [21:16:01] Recall is a measure of the ratio of positives found by the classifier [21:18:45] that scikit page is quite useful :) [21:22:50] gtg for an hour or so, thanks for the questions! [21:25:49] the scores are floats or doubles? [21:49:02] o/ Platonides [21:49:07] Sorry to be AFK for a meeting. [21:49:35] Re. floats/doubles, the values are JSON "numbers" [21:49:46] In python, they are floats first though. [21:49:58] So if you're reading them in, read them as floats. [21:50:57] 10Scoring-platform-team (Current), 10Language-Team: [Spike] Investigate how Chinese writing variants are stored - https://phabricator.wikimedia.org/T119687#1833105 (10cscott) >>! In T119687#1912243, @liangent wrote: >>>! In T119687#1911453, @zhaofengli wrote: >> FYI, the hardcoded conversion table is [[ https:... [22:20:39] ok, thanks