[11:08:24] 10Revision-Scoring-As-A-Service-Backlog, 06Collaboration-Team-Triage, 07Easy: Re-broadcast RCStream with ORES scores - https://phabricator.wikimedia.org/T106279#2762857 (10Mattflaschen-WMF) I think this should be merged into {T143743}, which includes the ORES scores. [16:52:11] I just discovered that I did something dumb. [16:52:12] https://github.com/wiki-ai/revscoring/issues/292 [17:38:16] woah! It's wikidata's birthday! [17:38:41] Amir1, we've been supporting Wikidata with ORES for a whole year! [18:19:21] wiki-ai/revscoring#834 (recall_at_precision - 9df5998 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/172707434 [18:19:33] 06Revision-Scoring-As-A-Service, 10revscoring: Implement recall at precision (and fix FPR metrics) - https://phabricator.wikimedia.org/T149825#2764335 (10Halfak) [18:19:40] 06Revision-Scoring-As-A-Service, 10revscoring: Implement recall at precision (and fix FPR metrics) - https://phabricator.wikimedia.org/T149825#2764360 (10Halfak) https://github.com/wiki-ai/revscoring/pull/293 [18:19:54] 06Revision-Scoring-As-A-Service, 10revscoring: Implement recall at precision (and fix FPR metrics) - https://phabricator.wikimedia.org/T149825#2764335 (10Halfak) a:03Halfak [18:25:44] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3492 bytes in 4.344 second response time [18:25:45] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:54] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3492 bytes in 3.393 second response time [18:39:34] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.829 second response time [18:39:34] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.789 second response time [18:39:54] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.768 second response time [20:11:49] halfak: \o/ [20:20:18] Amir1, what's up? [20:20:27] Oh! My message re. wikidata support :) [20:20:38] yeah, I was afk I just saw it [20:45:16] Amir1, easy review: https://github.com/wiki-ai/revscoring/pull/293 [20:46:36] halfak: The new module is not being used [20:46:49] is that intentional? [20:47:41] Woops. looks like I didn't put it in the __init__.py Is that what you meant? [20:48:46] I mean using the "recall_at_precision" somewhere in the codebase [20:49:52] Amir1, oh. So... The old "recall_at_fpr(min_fpr=0.1)" means the exact same thing as "recall_at_precision(min_precision=0.9)" [20:50:07] And in this PR, I fix the meaning of "FPR" so now that is no longer true. [20:50:27] So... our test stats are good for now, but their names are deceiving. [20:50:44] We'll want to update a line in the editquality Makefile to switch to this new statistic. [20:51:06] okay, [20:51:10] {{merged}} [20:51:30] Nice. [20:51:41] I'll be finishing up the (WIP) state of the sentences then. [20:51:54] I'm going to go extract a few million sentences first. [21:06:06] wikigrammar, fancy :D [21:06:42] :D [21:06:57] I'm going to be working on the sentence grammars in that repo. :) [21:08:09] Also as a (belated) token for Halloween: https://www.youtube.com/watch?v=-kWHMH2kxXs [21:11:26] hello \o [21:26:51] halfak, did you forget to define f here : https://github.com/wiki-ai/revscoring/blob/3cdd263ebf7c15de7eba423b6d66f871f854355e/demo_load_model.py [21:27:11] Hey GhassanMas! [21:27:17] That shouldn't have been checked in :/ [21:27:23] Let me squash that. [21:27:43] gotcha [21:28:54] GhassanMas, I'm looking for a better supported demo [21:29:43] but what do you the demo exactly to do [21:30:51] Those demo scripts are for my own testing. I have like 40 and I don't maintain them :) [21:31:12] alright so these are for QA [21:31:24] https://gist.github.com/halfak/45fc3029dfd6fbc3f9623fbedc7ed1b3 [21:31:25] you test your methods or function [21:31:34] GhassanMas, yeah, but more casual than the actual tests [21:31:47] These are for testing while I develop something. [21:31:53] Or look for an issue. [21:32:14] Once I'm done with them, I keep 'em around and sometimes they come in handy, but mostly, they are intended to be ephemeral. [21:35:34] do you use any framework or design pattern when developing in python? [21:36:21] revscoring is it's own framework :) [21:42:25] GhassanMas, https://github.com/wiki-ai/editquality/blob/4baeca7bc039d582e8e2d6a2f6a51e6a6e656efe/ipython/score_edit.ipynb [21:43:06] Woops. Missing the last cell :! [21:44:56] GhassanMas, this link is better: https://github.com/wiki-ai/editquality/blob/23d67eeff63d80a425cfc99099a8f6db1893a5e5/ipython/score_edit.ipynb [21:47:57] why you opened the model from a file not as a function or method [21:48:05] I know it act as a funtion [21:48:21] but does it means that it was not written in python for performance [21:48:24] When we build models, we serialize them out to a model file on the disk. [21:48:35] It's all in python. :) [21:48:44] Are you familiar with the "pickle" library? [21:48:58] so you can call them many times in the same moment? [21:49:17] no, I'll look it up [21:49:42] :) Pickle, lets you take a big chunk of data from memory and write it to the disk. [21:50:06] A model file is essentially a big chunk of memory written to the disk [21:50:55] like streaming a hd video continuously from disk to memory as it keeps playing [21:56:12] GhassanMas, not quite. We do stream a lot of things, but model file we load into memory all at once. [21:56:27] Writing the model file to disk is sort of like taking a snapshot and a given point in time [21:56:36] And then we load that snapshot back in to make predictions. [21:57:25] did you use that in order to decrease the memory issue which you used to have? [21:57:42] Nope. We always took snapshots of the models this way. :) [21:57:54] It was part of the initial idea that convinced me that this could all work out. [21:58:01] this == revscoring/ores/etc. [21:59:00] So you workout to decrease the time that the model to make the prediction so it won't last too long in the memorey [22:00:06] :P Memory usage is not really the reason we pickle. It's much more that we want a copy of the model that /persists/ [22:00:16] You get data to persist by writing it to the disk. [22:00:25] This is generally the use-case of pickle too. [22:00:45] But pickle is also used to transport data over a text stream too. [22:04:24] we mean memory usage issues typically happens when you have so much data at a moment that you need to persist on the disk [22:05:03] e.g. many editors submitted their edits at the same moment