[00:18:34] RoanKattouw: https://phabricator.wikimedia.org/T161280 Feel free to ask any questions if you can't reproduce it [00:19:11] Amir1: Right, so the intent is to put highlighter.js behind a preference once RCFilters really rolls out [00:19:18] And it will also no longer work on the RC page, only on Watchlist [00:19:26] Because it's redundant with the highlight feature in RCFilters [00:21:26] RoanKattouw: Thanks for the note but one question. I don't see any highlighting from ERI. Only filters [00:21:41] Amir1: There's a button that says "Highlight" in the top right of the popup [00:21:48] Click that and highlight controls will appear [00:22:05] (I don't like this, I think the controls should always be there, but other people thought it would look too cluttered) [00:22:18] Now I see the power of ERI [00:22:20] wow [00:22:30] This is fascinating [00:23:12] One of my concerns is that users may miss the highlight button, the same way you did :/ [00:23:27] We'll see how that goes in practice, we can always change it while it's in beta [00:23:47] RoanKattouw: I can play with this for hours [00:24:06] one thing that might help (but I'm not a UX designer) is some coloring to the button [00:24:21] probably progressive flag [00:24:48] or "primary progressive" [00:25:05] Yeah [00:25:17] Ironically it becomes colored exactly the way you describe once you click it [00:25:22] But of course by then you've already discovered it [00:25:43] :P [00:30:26] RoanKattouw: I have a note regarding eliminating the "hard" threshold (high recall, low precision) if the precision is not good enough for you, it's probably because the model is not good enough. We are trying more sophisticated models (such as RNNs) that might get you high recall but with good enough precision [00:30:32] just wait for a while [00:31:57] Yeah my thinking on that has changed too [00:32:15] The hard threshold doesn't align with any of our thresholds, but I also think our thresholds need tweaking a bit [00:32:35] They're pegged to precision percentages right now but that's limited to the ones exposed in model_info [08:24:26] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring: Add language support for Korean - https://phabricator.wikimedia.org/T160757#3127589 (10revi) I know the list is broad, but paragraph ending with the following words are almost likely to be informal and not encyclopedic, so P... [14:44:01] o/ [14:44:11] \ㅇ/ [14:44:11] FOrgot to wave earlier :S [14:44:16] Hey revi :) [14:44:33] so basically any paragraphs ending with the regex is informal [14:45:19] people are supposed to use honorifics in discussion and non-honorifics for encyclopedia [14:45:21] revi, must it be at the end of a paragraph or is just a sentence OK? [14:45:29] so if you're using honorifics, it's informal. [14:45:31] hmm [14:46:26] I think sentence is ok [14:46:28] but not 100% sure [14:47:04] OK. It'll be hard to match paragraph breaks but we can do it. [14:47:23] Anyway, just give me the raw words with ending punct that you think is right. [14:48:08] E.g. "니다.", "니다!", "니다," etc. [14:48:36] kk [14:48:37] I see you have '.*' in there too [14:48:46] that's like... [14:48:50] Can you give some examples of words that would end in these informal honorifics? [14:48:57] 알겠습니다. [14:49:02] hmm [14:49:19] Yeah. Just like that. A few of those to make sure I set up the character matching right. [14:49:29] It doesn't have to be exhaustive -- but representative. [14:49:39] ok [14:50:44] E.g. https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/tests/test_english.py#L34 [14:51:53] so our definition of formal and informal is "is it appropriate for encyclopedia" or not, right? [14:52:49] Right. Exactly. Informals might show up in quotes and stuff on an article, but will much more likely show up on talk pages. [15:06:32] not sure if I'm doing right [15:06:37] but I'll save anyway.... [15:11:49] This looks great :) [15:12:18] revi, if we encounter an "honorific" in the middle of a sentence, it's not a problem, right? [15:13:22] it might be quote or such [15:13:29] so it's strictly not problem [15:13:40] "그만 하죠 좀!" doesn't look like it ends in the honorific? [15:13:50] that's some form of variations tho [15:14:07] Gotcha. So we shouldn't always match the punctuation. [15:14:11] yeah [15:14:31] Similarly with "모르겠습니까?" [15:14:35] giving more complexitiy [15:14:39] no that ends with honorifics [15:15:02] Not "니다" like the rest of the words in the set [15:15:36] -니다 with ? does not make sense in Korean so I modified it [15:15:40] to make sense [15:16:08] Gotcha. Should we match "니까" generally or just before a "?"? [15:16:15] just before ? [15:16:49] National Institution of Korean has a extensive rules on what is correct in Korean grammar or not [15:16:54] but it's even complex for natives lol [15:17:54] ha. Just so long as we can catch people adding "your mom is fat" in articles :) [15:18:49] lol [15:19:06] they don't say such that politely [15:20:24] revi, in english there's a huge difference in usage between "mom" and "mother". Is that a think in Korean? [15:21:26] yeah [15:21:36] mom would be informal [15:21:55] well, generally, mom is inappropriate for encyclopedia. :P [15:23:06] Is that kind informal captured in the "informals" list you have in the paste? [15:23:23] I'm thinking words like "shitty" and "dummy" or "stupid" too. [15:24:32] that's 'bad', ye [15:24:38] (second one) [15:26:15] OK. If those kind of words are covered, then I think I'm ready to work on building regexes and incorporating them into the modeling library :) [15:28:49] * revi is hoping some papers has other 'bad' words I forgot [15:28:57] ie... http://m.dbpia.co.kr/Journal/ArticleDetail/NODE01626180 [15:35:07] yeah... some more words [15:37:46] revi, want to make some updates to the paste? I haven't started copying from it yet./ [15:38:49] hmm [15:39:02] I'm making another edit, so can you wait about 5 minutes? [15:40:52] sure [15:42:00] good to go now [15:42:15] I think it's still incomplete but I think we'll adjust it later [15:49:50] Sounds good. Thanks for all of this revi. I hope to have a basic damage detection model tested today. I'll start work on the labeling campaign (for training the advance damage detection models) early next week. [15:50:02] I'll ping you about that because we'll need a small army from kowiki to help label edits. [15:50:03] :) [15:50:46] sure :D [15:50:59] I'm still keeping an eye on this channel, so :P [15:57:32] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), 07User-notice: Enable the ORES good faith and damaging UI by default, on wikis that have these ORES models available (i... - https://phabricator.wikimedia.org/T158225#3128429 [15:57:34] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), 06Community-Liaisons (Jan-Mar 2017): Inform communities about the release of the ORES good faith and damaging UI by def... - https://phabricator.wikimedia.org/T159223#3128427 [18:04:35] (03PS1) 10Mattflaschen: Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 [18:04:42] (03CR) 10jerkins-bot: [V: 04-1] Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 (owner: 10Mattflaschen) [18:17:45] (03PS2) 10Mattflaschen: Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 [18:30:20] (03PS3) 10Mattflaschen: Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 (https://phabricator.wikimedia.org/T160803) [20:19:08] (03PS4) 10Mattflaschen: Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 (https://phabricator.wikimedia.org/T161325) [21:00:23] o/ Amir1 [21:00:44] I'm looking at our versioned API structure and I'm thinking that we should move 'precache' out of it. [21:01:01] So changeprop would send a request to ores.wikimedia.org/precache [21:01:12] What do you think of that? [21:01:34] I guess a bad outcome of that could be that we couldn't change the output structure. [21:01:40] halfak: hey [21:02:07] I'm almost done with my big refactor. I think it's nice :) [21:02:08] I thought about it but I was kind against it because the output is dependent to the API version [21:02:21] Yeah. I'm OK with that then, I guess. [21:02:46] I guess I wasn't thinking of output format until sending the message :S [21:02:51] * Amir1 thinks in future we might have another precache node for v3 API [21:03:01] :D [21:03:15] btw. I have some good news for you [21:03:20] Right. I'm going to add that. This refactor is making it super easy to do :) [21:03:42] I'm removing a *ton* of duplicated code. :) [21:03:50] \o/ [21:04:10] I finally was able to play with tensor flow and ran the enwiki damaging data on it: https://github.com/Ladsgroup/TF-on-ORES [21:04:36] I could do it on only two types of ANN: biRNNs and multi-layer perception [21:04:54] 1- the training takes less than a minute so we don't need GPU power at all [21:05:30] 2- the accuracy is 96%, the current model gives 90.3% [21:05:50] (I wanted to get ROC-AUC but I failed miserably, I don't know TF good enough) [21:07:03] we can give a try to CNNs but they are usually being used for image processing (given that their structure is an imitation of human vision neurons) [21:07:47] Amir1, do you get an output probability from the biRNNs? [21:07:55] Also are you using the same feature set as prod? [21:08:12] halfak: Yes, I downloaded the dataset and run on it [21:08:23] Gotcha. [21:08:33] (I downloaded it from ores-compute, it might be outdated) [21:08:40] meh. probably close [21:08:49] Doesn't matter as much when we're not doing online scoring. [21:09:01] regarding the probability, not yet. I can do it soon [21:09:14] Yeah, I thought so [21:11:14] Cool. You should make a task for this and start dumping updates and thoughts into it :) [21:11:22] yeah [21:11:42] It's fun to hear about progress. [21:11:56] Especially when you say things like " the training takes less than a minute" :))) [21:13:08] :D [21:13:24] Our data is super small comparing to what corpations like google do [21:14:11] *corporations [21:14:31] if you compare to google, almost everyone is small :P [21:15:02] (03CR) 10Mooeypoo: [C: 032] Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 (https://phabricator.wikimedia.org/T161325) (owner: 10Mattflaschen) [21:15:13] I think we'll need the GPU for image processing. It would be nice to get some basic models up for commons. [21:15:18] Could even help Wikidata [21:15:37] Or I suppose the intersection of Structured Data On Commons(TM) [21:16:34] fun fact: the dataset for image procesing wasn't good enough, so they built a robot that put things on a table (something like a glass) and took pictures from different angles, and then moves on to the next object [21:16:57] (03Merged) 10jenkins-bot: Revert "Add conflicts for category changes" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/344667 (https://phabricator.wikimedia.org/T161325) (owner: 10Mattflaschen) [21:17:40] yeah, I talked about that with Lydia, I would be interested in doing cool stuff like this [21:19:19] One great thing would be that it might reduce our memory footprint substantially because it has its own shipping mechanism [21:19:20] Sounds like we're working on goals for FY18Q3 [21:19:50] that's a little bit too precise ;) [21:20:16] (wrt memory footprint, I'm talking about tf not image processing) [21:21:12] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129284 (10Halfak) This issue is blocking @Mattflaschen-WMF, @Mooeypoo, @Etonkovidova, and the rest of #edit-revie... [21:21:18] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129286 (10Halfak) p:05Triage>03High [21:22:10] Amir1, tf? [21:22:41] tensorflow [21:23:59] Oh gotcha. [21:30:17] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129337 (10Ladsgroup) a:03Ladsgroup [21:30:44] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3110103 (10Ladsgroup) I get this fixed by migrating to a new instance ASAP. [21:31:03] [22:15] <+halfak> I think we'll need the GPU for image processing. I believe that gives quite the speedup there yes halfak [21:31:54] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3110103 (10greg) (Let us know if you need any assistance.) [21:31:57] basvb, I just finished up a research project looking into this. We're going to go with an AMD GPU because open source drivers are essential for our infra. [21:32:17] So we're going to need to try to use opencl support -- which is still experimental in tensorflow. :( [21:32:51] I haven't done any GPU image processing myself [21:32:54] https://phabricator.wikimedia.org/T159838#3122125 [21:32:56] Oh gotcha. [21:33:06] but I believe most people swear at nvidia GPU's for image processing [21:38:22] aah I see you took those arguments into account at T159838 [21:38:23] T159838: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838 [21:40:34] Amir1, thanks for picking up that task for the beta instance :) [21:40:44] I was looking at it and cursing [21:40:57] I should've done it waaay sooner [22:46:34] halfak: there is a super annoying bug in beta cluster: https://phabricator.wikimedia.org/T148929 [22:46:45] This is my third time creating that instance [22:50:29] I will work on it tomorrow [22:50:34] I'm calling it a day [22:50:35] o/ [22:53:28] Amir1: have a good one! [23:07:13] o/ [23:07:30] it's Saturday here [23:07:37] https://github.com/wiki-ai/ores/pull/191/files [23:07:42] Holy moley [23:07:59] Refactor all the things! [23:08:05] * halfak runs away too. [23:08:08] o/