[10:09:13] 06Revision-Scoring-As-A-Service, 10Wikilabels: Manage wikilabels for labsdb1004 maintenance - https://phabricator.wikimedia.org/T162265#3159967 (10MoritzMuehlenhoff) @Halfak Sounds good to me [14:23:08] o/ RoanKattouw [14:23:30] Sorry to drop off so quickly yesterday. Needed to get home for the doggo. I'm racing to get the deployment ready for today :) [14:52:59] halfak: No worries! As I said yesterday I'm more interested in the numbers than the deployment anyway; though I suppose the new numbers will only become usable once the new models are deployed [14:53:29] Most likely, though the models should have similar performance. [15:37:31] halfak: o/ [15:37:32] Sorry [15:37:37] I'm in hotel now [15:37:50] it's okay for me when you want to talk [15:38:00] No worries. Thanks for connecting with em for the backlog session. [15:38:08] Am rushing now to get deploy together. [15:38:26] https://github.com/wiki-ai/editquality/pull/64 [15:38:57] Let me see, if you need to catch meeting, it's okay [15:39:01] I can take from now [15:39:24] This one should be easy. Just updates to editquality and ORES. No need to update wheels or anything like that. [15:40:18] That PR is merged now [15:41:50] Amir1, https://github.com/wiki-ai/ores-wmflabs-deploy/pull/80 [15:42:56] halfak: looking at the PR: https://github.com/wiki-ai/ores-wmflabs-deploy/pull/80/files it doesn't seem to include the precache endpoint (in ORES submodule) [15:43:02] do we have it already on labs? [15:44:39] weird. [15:44:40] * halfak looks [15:44:55] https://ores.wmflabs.org/v2/precache [15:44:58] 404's [15:45:19] no you're right [15:45:23] I see precache in the PR [15:45:26] I just missed it [15:45:27] :D [15:45:29] https://github.com/wiki-ai/ores/compare/209522504b9030aa164058309494cccdd1f05c55...93828f4303a8e051652396456ebab7e56eac996f#diff-7b5761d2323aca2df67312122046ce14 [15:45:31] kk :) [15:50:21] Amir1, https://gerrit.wikimedia.org/r/346778 [15:52:32] merged [15:55:53] Thanks! [15:55:58] Staging now [15:59:35] Everything looks OK in staging. Starting deploy to ores.wmflabs.org [16:18:18] All looks good. Starting with beta deployment [16:35:26] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3160834 (10Halfak) [16:36:20] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Implement additional test_stats in editquality - https://phabricator.wikimedia.org/T162377#3160837 (10Halfak) [16:37:39] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Implement additional test_stats in editquality - https://phabricator.wikimedia.org/T162377#3160851 (10Halfak) https://github.com/wiki-ai/editquality/pull/63 [16:38:09] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Implement additional test_stats in editquality - https://phabricator.wikimedia.org/T162377#3160853 (10Halfak) 05Open>03Resolved [16:38:35] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Implement additional test_stats in editquality - https://phabricator.wikimedia.org/T162377#3160837 (10Halfak) [16:38:37] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3160854 (10Halfak) [16:48:53] Beta deploy is done. [16:48:56] All looks OK [16:49:04] I think we're ready for the window today. [16:50:22] 06Revision-Scoring-As-A-Service, 10ORES: Deploy ORES early April - https://phabricator.wikimedia.org/T161748#3160882 (10Halfak) This is deployed to ores-staging.wmflabs.org, ores.wmflabs.org, ores-beta.wmflabs.org. I'm just waiting on the window for production deployment [17:11:38] halfak: Let me know if you did deploy the new models and I'll write a config patch to adjust the thresholds accordingly [17:11:50] in progress :) [17:14:40] Hmm the stats for the plwiki model look dramatically different [17:14:50] Is that expected? [17:21:41] Shouldn't be. Can review soon [17:37:58] ptwiki is very close though [17:38:34] RoanKattouw, do you have the old stats for plwiki handy? [17:38:58] I do, in a spreadhseet, sharing now [17:39:51] plwiki looks really crazy [17:41:11] It claims to be ridiculously fit [17:41:27] Not to question you and your team's awesomeness, but I'm highly skeptical that any model can ever be that good [17:42:14] RoanKattouw, worse, it seems to have the counts of damaging/not flipped [17:42:41] RoanKattouw, deployment completed. [17:42:46] OK [17:42:47] Looking into BS for plwiki now [17:43:16] Meanwhile I've submitted https://gerrit.wikimedia.org/r/346796 for the 11am SWAT [17:43:32] That adjusts the thresholds for plwiki and ptwiki (the only wikis where our new beta feature is live) for these new models [17:44:04] ptwiki damaging in particular gives me high confidence because most of my tweaks were to the third digit after the decimal points (i.e. I was adjusting each threshold by a few 1/1000s) [18:01:10] RoanKattouw, the deployed plwiki looks way more sane. [18:01:32] I'm still trying to figure out what was going on in that gist from yesterday [18:01:39] halfak: When you say deployed, do you mean what's deployed now? [18:01:55] Oh so the stats you gave me yesterday are potentially garbage? [18:04:33] RoanKattouw, still confirming, but yeah. something weird is up. [18:06:43] I'm getting the same from the prod API as from your gist [18:07:26] See columns LMN in the spreadsheet [18:53:05] RoanKattouw, https://gist.github.com/halfak/634e84869ebc64fa522c5ea1bc957051 [18:53:25] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Manage ORES preferences on Watchlist (and Contributions) - https://phabricator.wikimedia.org/T160475#3161423 (10Etonkovidova) @jmatazzoni I am moving the ti... [18:53:46] halfak: What do those numbers mean? [18:54:13] It's a prediction table. It says that there's a huge amount of "true" observations in the plwiki.damaging training set [18:54:18] That's not true. [18:54:31] It's backwards and it explains why the fitness statistics are crazy [18:54:40] Aha OK [18:54:46] So the training set got messed up somehow? [18:54:58] I can't find any evidence of that. [18:55:08] I also can't reproduce the problem. [18:55:32] As in, you rebuilt the model and it didn't do that again? [18:58:29] I haven't even got evidence that the model ever did that [18:58:43] Except for when I loaded it into a local ORES and got you a big blob [18:59:26] But the test_stats that I get from prod are the same as the ones you gave me in the big blob [19:01:03] Not what I'm seeing [19:03:46] It looks like the stats are switched for "damaging" and "goodfaith" [19:03:59] In the gist I made yesterday [19:06:47] Oh, hah, indeed [19:07:22] I'll have to re-adjust the plwiki thresholds then, I adjusted them based on the flipped values [19:09:04] This is the first I've seen this kind of thing happen [19:19:16] OK the plwiki data looks a lot better now [19:19:23] The tweaks are much smaller [19:22:59] Third decimal places just like ptwiki [19:25:50] halfak: I won't ask for a rebuild over this any time soon, but just so we have it for next time, I'll submit a PR that adds 0.9975 and 0.999. plwiki goodfaith is so high fitness that it reaches 99.5% precision with 97.7% recall at threshold 0.587, so I have no idea what the upper half of the threshold space looks like [19:26:13] Sorry make that 99.6% precision even [20:38:32] RoanKattouw, I see 0.826 recall at 0.54 [20:38:40] Oh goodfaith [20:38:42] looking at that [20:38:53] Recall is really easy for goodfaith [20:39:48] Roan are you looking at goodfaith-true or goodfaith-false? [20:39:52] true, sorry [20:40:02] false is a bit less extreme [20:41:23] Yeah the baseline precision (at 100% recall) is 97.6%, so I suppose only 2.4% of edits are bad faith to begin with [20:55:22] Sounds about right. [21:52:19] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 15User-Ladsgroup: Deploy ORES Review Tool for hewiki - https://phabricator.wikimedia.org/T161621#3137415 (10Halfak) @Ladsgroup, this is done, right?