[00:45:02] (03PS1) 10Catrope: Remove maybebadfaith naming hack [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343225 [05:39:27] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Manage ORES preferences on Watchlist (and Contributions) - https://phabricator.wikimedia.org/T160475#3109050 (10Catrope) I think the description also misunde... [13:45:20] o/ [14:15:32] 06Revision-Scoring-As-A-Service, 10Wikidata, 10Wikilabels, 15User-Ladsgroup: Wikidata items render badly in Wikilabels - https://phabricator.wikimedia.org/T160256#3109752 (10thiemowmde) [14:22:33] hi, https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/ko seems to be all chinese [14:22:46] are you sure you got the data from kowiki? [14:22:52] (or appropriate Korean wikis) [14:22:53] woops! Maybe not! [14:22:57] Amir1, ^ [14:23:06] It' [14:23:21] It's not Korean chars for sure [14:23:52] Thanks for pointing this out. I'll make a task to get it fixed up. [14:23:57] Thanks for pointing this out. [14:24:06] * halfak repeats himself :) [14:24:09] more coffee [14:24:10] heh lol [14:24:21] I was asking community if we could use ORES [14:24:26] and I found that was problematic :P [14:24:29] revi, were you hoping to work through the kowiki list for us? [14:24:45] uhm yeah (I think?) [14:25:00] halfak: hmm [14:25:26] all cjk tokens are the same, that's why it picks zh instead of ko [14:25:28] 06Revision-Scoring-As-A-Service, 10Bad-Words-Detection-System: Korean generated word lists are in chinese - https://phabricator.wikimedia.org/T160752#3109790 (10Halfak) [14:25:38] Amir1, should still be korean chars [14:26:04] heh, I don't need to subscribe [14:26:06] revi, (in our code, we have special handling for Korean, Chinese and Japanese characters) [14:26:08] thanks herald [14:26:39] :) [14:26:51] Amir1, OK if I assign to you? [14:27:13] oh boy: https://en.wikipedia.org/wiki/Hangul_consonant_and_vowel_tables [14:27:16] One other thought -- I should probably get good at running BWDS :S [14:27:21] I think something related to https://phabricator.wikimedia.org/T111179 ? [14:27:45] halfak: I can investigate what's wrong and fix obvious errors [14:27:55] kk [14:28:09] 06Revision-Scoring-As-A-Service, 10Bad-Words-Detection-System, 15User-Ladsgroup: Korean generated word lists are in chinese - https://phabricator.wikimedia.org/T160752#3109805 (10Halfak) a:03Ladsgroup [14:28:12] but if it was something strange in tokening, I guess we should go with manual list :/ [14:28:27] (Like Chinese and Japanese) [14:28:38] Yeah, I'll be down with that. I might make some suggestions about cleaning up tokenization for BWDS too [14:28:43] Do we do ngrams in bwds? [14:28:49] nope [14:28:53] and the code is horrible [14:29:00] kk that will certainly help in this case [14:29:01] lol [14:29:03] I was dumb when I wrote it [14:29:33] I have a gram-er and tfidf weighter in revscoring now. Maybe we can use those and cut down the code for bwds. [14:29:42] I'll have a look at that today. [14:29:48] It sounds like fun-friday hacking :) [14:29:55] heh [14:30:05] I've been doing way too much coordination, communication, and finance templates recently ;) [14:30:14] gift for you then [14:30:20] from saturday in 30 minutes [14:30:21] :D [14:30:46] ha. I've still got most of Friday left in UTC-5. :) [14:31:18] Well, actually I'm in UTC-6 but daylight savings time is dumb. [14:31:22] lol [14:31:28] no such DST in Korea [14:35:36] revi, can you help me understand how many characters in korean usually form a "word"? [14:35:43] hmm [14:35:50] at least two charactors [14:35:57] but even one can form a word [14:35:57] Wait a tick... Korean is space delimited? [14:36:00] yes [14:36:09] OMG Amir1 [14:36:14] This will work so much better [14:36:22] Looks like we're going to move the K out of CJK [14:36:25] yeah [14:36:30] Spaces are *awesome* [14:36:53] 위키백과의 운영은 비영리 단체인 위키미디어 재단이 하고 있다. [14:36:57] Random string from Korea [14:37:02] Spaces everywhere! [14:37:06] But number of characters still really high [14:37:13] That's OK. [14:37:13] Korea -> Korean Wikipedia [14:37:21] :D [14:37:27] so tokening will be strange [14:37:38] we need to strip wikitext stuff [14:37:45] I think you should know https://en.wikipedia.org/wiki/Korean_postpositions this too [14:38:44] I wonder if stemming is a thing in korean [14:38:57] Are you familiar with stemming, revi? [14:39:05] erm I don't think so [14:39:16] https://en.wikipedia.org/wiki/Stemming [14:39:22] I studied Korean during my school [14:39:33] (running, ran, runs) --> "run" [14:39:36] oh [14:39:40] it is [14:39:48] yeah it is [14:39:54] * revi was stupid [14:39:58] Cool. I'll see if I can find a korean stemmer :) [14:40:00] * revi time for another coffee [14:40:08] -> doesn't drink coffee tho [14:40:16] <-* [14:40:18] lol [14:40:29] It's a useful excuse :) [14:40:40] yeah [14:40:49] somebody once said [14:40:50] who cares? [14:41:29] I remember Twitter releasing some Korean tokenization tool [14:41:32] I'll search it [14:41:44] https://github.com/twitter/twitter-korean-text Apache 2.0 [14:42:02] seems to be forked into https://github.com/open-korean-text/open-korean-text [14:42:11] curses... java [14:42:18] indeed [14:42:25] Who loves java? nobody! [14:42:30] right [14:42:31] :) [14:43:48] http://konlpy.org/en/v0.4.4/ [14:44:40] \o/ [14:46:25] Looks like nltk doesn't have a stemmer [14:46:44] and neither does konlpy, but there's some useful stuff in konlpy :) [14:49:54] OK first things first. I need to fix tokenization for korean. [14:50:48] 06Revision-Scoring-As-A-Service, 10revscoring: Fix tokenization for Korean - https://phabricator.wikimedia.org/T160755#3109850 (10Halfak) [14:51:42] * halfak reads https://en.wikipedia.org/wiki/Korean_language_and_computers [14:53:56] interesting for me too [14:54:06] and omg EUC-KR [14:56:03] hmm... looks like we had a bug where korean was already correctly handled O.O [14:56:59] Yup. Works perfectly. [14:57:03] lolwat [14:57:09] lolwat +1 [14:58:18] * halfak adds a test and updates documentation [14:59:14] I just initiated the community consensus procedures for ORES so we should have more than enough time :P [14:59:42] Cool. We don't need consensus for basic support in ORES, but we will for the ORES Review Tool deployment. [14:59:48] Did you mention that in your post? [14:59:53] hmm [14:59:55] didn't know that [14:59:56] will update [15:00:02] https://www.mediawiki.org/wiki/ORES_review_tool [15:00:27] Since we're total external, we can have ORES ready to make predictions but it's up to people if they want to use the predictions. [15:00:44] On the other hand, deploying the ORES Review Tool will actually make changes to the wiki (enabling an extension) [15:00:52] yeah, that's config change [15:00:59] and SWAT [15:01:08] revi knows what's up :) [15:01:23] I did few deploys :D [15:02:20] so I think I still have to follow https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service/Get_support 's new wiki guidelines [15:02:36] the first two Language support doesn't need consensus, right? [15:02:50] right. [15:02:57] got it [15:02:58] But we'll need a lot of help to get the labeling stuff done. [15:03:02] will create ticket [15:03:05] That's a lot like consensus. [15:03:07] Great! [15:03:17] sure, I've got time during weekends and some weekdays [15:06:14] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring: Add language support for ... - https://phabricator.wikimedia.org/T160757#3109896 (10revi) [15:06:15] err [15:06:18] forgot the subject [15:06:27] 10Revision-Scoring-As-A-Service-Backlog, 10Bad-Words-Detection-System, 10revscoring: Add language support for Korean - https://phabricator.wikimedia.org/T160757#3109910 (10revi) [15:09:01] Amir1, should I start working from https://github.com/wiki-ai/Bad-Words-Detection-System ? [15:09:16] halfak: yup [15:09:19] kk [15:09:40] I think I'm going to turn this into a utility inside of editquality called "bwds". Does that sound crazy? [15:10:57] We'll need dexbot to ultimately write the output. But maybe I can have the utility produce a datafile containing interesting word lists. [15:11:21] hmm, that sounds good [15:11:28] OK [15:11:35] I was worried it would worth the effort [15:11:38] but I think so [15:13:45] Amir1, I'm starting to think that maybe we can do this from the API. It might be a nice way to take advantage of our current extraction patterns. [15:13:46] hmmm [15:14:05] E.g. we query for a sample of 100k edits to articles and work from there. [15:14:40] I highly doubt that [15:14:50] Oooh I'm going to pull in a little laplacian smoothing too [15:14:56] Oh? Too few observations. [15:15:07] I tried with even big samples but still unrelated stuff [15:15:38] Maybe the laplacian smoothing will help with that. I'll run some tests. [15:16:36] At least this should be a cheap test. [15:17:35] Hmm... maybe we can even use our autolabeler for this. [15:18:53] So the workflow I'm thinking is (1) query for a large random sample (2) autolabel to find likely "revert_for_damage" and (3) run bwds on the large set of autolabeled revisions [15:29:00] Making API query to get revision text and parent revision text of 100K edits is not that cheap ;) [15:29:06] halfak: ^ [15:29:25] Amir1, cheap enough for how often we do it. :) [15:29:56] When I rebuild all the models, we re-extract features for like 700k revisions :@ [15:30:15] You're researcher and I'm engineer that's the difference :P [15:30:22] lol [15:30:40] In prod, efficiency is king. In analysis, efficiency is measured in human-hours. [15:31:41] afk for dinner [15:31:46] be back in one hour or so [15:31:53] o/ [15:32:44] o/ [15:41:03] about 1AM, heading to bed [15:41:06] o/ [15:42:42] o/ revi [15:42:46] thanks for your help today [15:42:55] Please swing by some time soon to check in on our progress :) [15:52:44] feel free to ping me or mail me (addr in phab profile) if you need me [15:52:54] will do [15:53:12] my irccloud should keep me here unless they have problems with their server or freenode crashes [16:02:43] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikimedia-Logstash, 13Patch-For-Review, 15User-Ladsgroup: Send ORES logs to logstash - https://phabricator.wikimedia.org/T149010#3110075 (10akosiaris) While the change has been merged, and logs still make it to the local disk I don't see anything in logstash yet [16:11:20] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3110103 (10hashar) [17:10:02] wiki-ai/revscoring#900 (derepeat - 37024a8 : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/212198436 [17:10:16] :P [19:52:24] (03PS1) 10Catrope: Update "What's This" messages for RCFilters [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343329 (https://phabricator.wikimedia.org/T149385) [19:53:22] (03PS2) 10Catrope: Update "What's This" messages for RCFilters [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343329 (https://phabricator.wikimedia.org/T160779) [20:20:43] I like actionable features, so why did I decide to add a non-actionable one to my model? [20:20:50] * Nettrom cries at his 5% improvement in accuracy [20:23:13] (5 percentage points, or 8 percent in this case) [20:27:41] o/ glorian_wd [20:27:47] o/ halfak [20:27:54] Nettrom, which features gave you that? [20:28:55] halfak: number of views, number of inlinks to an article, and a modified measure of WPMED-specific inlinks (1 + num_WPMED_links/(1+ num_global_links)) [20:29:41] on my test set it scores 67.5% accuracy, whereas if I use just number of WPMED inlinks I get 62.5% [20:30:55] halfak: so I tried to comment everything in views.js because I want to ensure that it is really the code which shows the form (i.e. the space which shows item information). [20:30:56] for some reason, after I did that, I can still see the item information and did not see any errors. [20:30:56] I have cleared my browser caches several times, and also restarted the dev_server. But the problem still persists. [20:30:56] I am wondering if you could give me a hint for this problem [21:11:05] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Add feature flag to enable parts of ORES extension by default - https://phabricator.wikimedia.org/T159763#3111006 (10jmatazzoni) [21:22:12] (03PS1) 10Catrope: Hide oresRCHideNonDamaging pref if rcenhancedfilters is enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343343 (https://phabricator.wikimedia.org/T160475) [21:45:34] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Enable parts of ORES extension by default and manage impacts on the RC Page and the Recent Changes Preferen... - https://phabricator.wikimedia.org/T159763#3111150 [21:45:53] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Enable parts of ORES extension by default and manage impacts on the RC Page and the RC page Preferences tab - https://phabricator.wikimedia.org/T159763#3077869... [21:50:46] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Enable parts of ORES extension by default and manage impacts on the RC Page and the RC page Preferences tab - https://phabricator.wikimedia.org/T159763#3111156... [21:52:29] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Enable parts of ORES extension by default and manage impacts on the RC Page and the RC page Preferences tab - https://phabricator.wikimedia.org/T159763#3111157... [21:56:26] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Manage ORES preferences on Watchlist (and Contributions) - https://phabricator.wikimedia.org/T160475#3111161 (10jmatazzoni) [21:57:15] glorian_wd, what is the problem? [21:57:34] "I can still see the item information and did not see any errors" sounds good, right? [21:57:51] the item information is still displayed although I clear (comment) my views.js. [21:58:21] I want to make sure views.js is the one which displays the item information. So, I thought if I comment it, I should get some error [21:58:28] but it did not happen [21:58:35] Why is the item information displaying a bad thing? [21:58:50] Battery dying Switching to phone [21:58:51] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), and 2 others: Manage ORES preferences on Watchlist (and Contributions) - https://phabricator.wikimedia.org/T160475#3111170 (10jmatazzoni) @SBisson and @Etonkovidova, pleas... [21:58:52] this is just testing. I want to understand how the code works [21:58:58] :P [21:59:59] Back [22:00:29] halfak|Mobile: yeah. It is just for me understanding how the code works [22:00:37] glorian_wd: imagine my point of view and help me know how to help you. [22:00:57] hmm let me rephrase what I want to know [22:01:24] so, I am trying to find which bits of code that displays the item information. Let me grab you some screenshot [22:02:57] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Move 'ORES Sensitivity' controls and conform levels to the new ERI standards - https://phabricator.wikimedia.org/T160575#3111180 (10jmatazzoni) [22:03:28] test [22:04:38] halfak|Mobile: for some reason I cant send the screenshot. But what I am referring to is, the space underneath the form [22:04:50] where you see the item information such as item ID, value, sitelinks, and so on [22:05:10] I believe this information is displayed by views.js [22:06:47] However, when I tried to comment the content views.js, I can still see all of this information. I supposed this was because my browser cache. But even after I cleared my browser cache, nothing changes, I did not receive any error (i.e. the item information is still shown). [22:06:47] So, I want to confirm to you if my understanding is correct, whether the item information is displayed because views.js [22:07:11] halfak|Mobile: did you get what I am trying to achieve? [22:07:24] (03CR) 10Mooeypoo: [C: 032] Update "What's This" messages for RCFilters [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343329 (https://phabricator.wikimedia.org/T160779) (owner: 10Catrope) [22:09:35] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Move 'ORES Sensitivity' controls and conform levels to the new ERI standards - https://phabricator.wikimedia.org/T160575#3111196 (10jmatazzoni) [22:10:38] (03Merged) 10jenkins-bot: Update "What's This" messages for RCFilters [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343329 (https://phabricator.wikimedia.org/T160779) (owner: 10Catrope) [22:10:42] (03CR) 10Mooeypoo: [C: 032] Remove maybebadfaith naming hack [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343225 (owner: 10Catrope) [22:11:15] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Move 'ORES Sensitivity' controls and conform levels to the new ERI standards - https://phabricator.wikimedia.org/T160575#3104020 (10jmatazzoni) [22:12:11] (03CR) 10Mooeypoo: [C: 032] Hide oresRCHideNonDamaging pref if rcenhancedfilters is enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343343 (https://phabricator.wikimedia.org/T160475) (owner: 10Catrope) [22:12:13] Hard refresh [22:12:31] Do you know how browser caches work? [22:12:41] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Move 'ORES Sensitivity' controls and conform levels to the new ERI standards - https://phabricator.wikimedia.org/T160575#3111203 (10jmatazzoni) Thanks for the suggestions... [22:13:06] glorian_wd: ^ [22:13:49] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements-RC-Page, 10ORES, 06Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017): Move 'ORES Sensitivity' controls and conform levels to the new ERI standards - https://phabricator.wikimedia.org/T160575#3111204 (10jmatazzoni) [22:18:34] halfak|Mobile: yeah. Hmm let me try that [22:18:43] but you also think this is a cache problem right? [22:19:17] not because I missed some other bits of code which maybe display the item information? [22:24:58] No idea. Guessing [22:25:13] I can't see your code [22:25:46] halfak|Mobile: hmm I am just commenting ("//") everything in views.js [22:25:56] commenting the content* [22:26:01] the content within the class [22:31:48] (03Merged) 10jenkins-bot: Remove maybebadfaith naming hack [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343225 (owner: 10Catrope) [22:31:50] (03Merged) 10jenkins-bot: Hide oresRCHideNonDamaging pref if rcenhancedfilters is enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/343343 (https://phabricator.wikimedia.org/T160475) (owner: 10Catrope) [23:02:37] halfak|Mobile: Apparently I know the problem. So the code determines the "view" from views.js right when a new campaign is created. [23:02:37] I guess the selected view is saved for each campaign. This means if I comment the content of views.js (or modify a class of views.js ), it won't affect the previously created campaigns. [23:03:58] because the view for those previously created campaigns has been saved, before the views.js is modified. [23:06:40] in other words, I need to create a new campaign to test my new view (i.e. view with iframe) [23:12:01] but that's just my assumption by far.