[00:43:29] 10Scoring-platform-team, 10Wikidata, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Use 'informals', 'badwords', etc. in Wikidata feature set - https://phabricator.wikimedia.org/T162617#3458934 (10Halfak) We're not getting badwords signal from labels or descriptions are we? [00:44:06] ragesoss, sorry to miss your question. Yes it will accept free text. [00:44:14] So we're focused on curation :( [00:44:27] Because no doubt someone's going to start doxing someone with it. [00:59:18] why are you still working... [01:00:12] halfak, fyi I'm getting off parent duty in a minute and will write the two processing scripts for the fiwiki experiment. [01:00:21] Cool. [01:00:25] I went and made dinner. [01:00:34] I need to finish a paper review before I'm allowed to leave today [01:00:39] * halfak cracks the whip on his back. [01:00:53] I want to have a few hours to myself this weekend hopefully :\ [01:00:54] sounds serious. hairshirts and floggery [01:01:49] I know you said these scripts are probably ephemeral, but my urge is to create a new package revscoring/data where we have discrete routines for munging these files. [01:02:19] awight2, +1 for thinking about a package, but make them ephemeral at first so that the package comes out of v2 of the scripts. [01:02:30] "out of"? [01:02:37] i.e. is not included in? [01:04:00] all good. We can iron out details later. [01:05:16] I concur :D [01:06:33] awight2 what is parent duty? [01:07:23] paladox 12am-11pm work hours [01:07:36] Zppix what? [01:07:38] lol [01:08:02] what do you do between the hours of 12am and 11pm [01:08:17] Zppix ^^ [01:08:19] idk not a parent [01:08:26] oh i see [01:11:14] halfak (and paladox) fyi im going on vaction tomorrow ill be back sunday ill try to pop in as much as i can though [01:11:26] ok :) [01:11:30] No worries dude! Enjoy your time :) [01:11:30] Zppix where you going? [01:11:32] span? [01:11:42] branson Missouri [01:13:40] ty halfak [01:16:29] ok [01:28:51] Well, I'm an asshole. I wouldn't want to receive the review that I just wrote, but then again, I'd never submit a paper in that state. [01:28:57] Yikes. [01:29:00] * halfak runs away [01:29:05] actually I'm biking away. :D [01:29:06] o/ [01:31:21] lol [06:51:25] 10Scoring-platform-team, 10Wikidata, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Use 'informals', 'badwords', etc. in Wikidata feature set - https://phabricator.wikimedia.org/T162617#3459315 (10Ladsgroup) Yes we are. When people edit Wikidata using GUI, it adds what they changed... [09:48:24] 10Scoring-platform-team-Backlog, 10ORES, 10Operations, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3459711 (10fgiunchedi) >>! In T169969#3456586, @Halfak wrote: > I think we'd like to keep some high level metrics forever, others for... [11:07:23] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Gather language assets for Swedish - https://phabricator.wikimedia.org/T131450#3460026 (10Liuxinyu970226) [11:08:25] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3460053 (10Liuxinyu970226) [11:09:19] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Language assets for Azerbaijani - https://phabricator.wikimedia.org/T162014#3460054 (10Liuxinyu970226) [11:09:31] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Tagalog - https://phabricator.wikimedia.org/T149475#3460055 (10Liuxinyu970226) [11:09:50] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Generate stopwords for CJK languages - https://phabricator.wikimedia.org/T111178#3460059 (10Liuxinyu970226) [11:10:10] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: migrate bad words detection to editquality repo - https://phabricator.wikimedia.org/T131861#3460060 (10Liuxinyu970226) [11:12:40] 10Scoring-platform-team, 10articlequality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train a `reverted` model for svwiki - https://phabricator.wikimedia.org/T135604#3460062 (10Liuxinyu970226) [11:12:57] 10Scoring-platform-team, 10Wikilabels, 10editquality-modeling, 10artificial-intelligence: Edit quality campaign for Vietnamese Wikipedia - https://phabricator.wikimedia.org/T114509#3460063 (10Liuxinyu970226) [11:13:31] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Article quality models for Russian Wikipedia - https://phabricator.wikimedia.org/T131635#3460064 (10Liuxinyu970226) [13:59:23] 10Scoring-platform-team, 10Wikidata, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Use 'informals', 'badwords', etc. in Wikidata feature set - https://phabricator.wikimedia.org/T162617#3460387 (10Halfak) Oh! I see! Your approach is a very interesting solution. I'm OK with callin... [13:59:31] o/ [15:33:11] 10Scoring-platform-team-Backlog, 10ORES, 10Operations, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3460777 (10Halfak) great! We'll look into it. [16:57:36] FYI, I copied the announcement of our team to our own blog. https://phabricator.wikimedia.org/phame/post/view/62/announcing_the_scoring_platform_team/ [17:46:53] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3461368 (10Baba_Tabita) Update (based on email exchange with Halfak in May 2017): I had a look at the... [19:22:01] o/ [19:33:02] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-extensions-ORES, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017): Reduce very long search times on RC Page when using ORES for rare combos - https://phabricator.wikimedia.org/T164796#3461633 (10jmatazzoni) [19:37:45] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271#3461654 (10Halfak) Thanks for posting here. It looks like swahili uses latin chars so we won't be ab... [20:01:36] halfak hi! I'm still squandering a sunny day, getting the scraps of my work environment back together again. With enough duck tape over the castle windows, I've nearly gotten the drafts down. [20:01:51] o/ [20:01:59] ooh nasty--this IRC client eats an entire CPU [20:02:04] wow [20:02:06] I suppose it does have lots of colors... [20:02:19] gotta keep your house warm in those cold sf-bay summers [20:02:50] Good thing it's not multithreaded ;-) [20:04:00] haha! [20:04:09] hehe that reminds me of a fantastic placard I just saw on the wall of the Exploratium: "The coldest summer I ever spent was actually in New England." Samuel Clemens [20:04:23] Of course, that could be apocryphal as well... [20:04:27] lol [20:04:45] loll https://twitter.com/textual/status/834978209531846656 [20:05:17] pardon me while I delete this app [20:15:35] aiwhgt lol [20:15:37] awight lol [20:15:44] i use textual too [20:15:48] i built it my self [20:15:54] seriously. My fingers are scorched from the experience [20:16:02] i always find if you try to copy a piece of text it would freeze the thing [20:16:07] as it will copy everything [20:29:25] * awight rubs hands and prepares to write some data-munging scripts [20:36:21] lol [20:52:10] Amir1: I just noticed that .debs might be a pretty good container for our deployment binary burden. What do you think? [20:52:41] We can reuse existing versioning and mirroring infrastructure [20:54:47] Another nice property is that we can choose our granularity, so for example the editquality-ro package would keep you up-to-date for all current built models. [21:05:11] awight, why would .deb be a good option here? Why not something like git-lfs? [21:09:29] What is the git for if we’re not keeping history? [21:09:41] that’s not quite what I meant... [21:09:53] why such an exotic way of keeping history? [21:14:56] deb is an increasingly good option IMO— [21:15:02] * standard way of signing contents [21:15:10] * existing infrastructure: apt, puppet [21:15:18] * rich support for versioning, e.g. rolling back on all machines but one. [21:15:25] * only the latest files are downloaded [21:15:30] * granularity can be set however we like, e.g. `editquality-ro`, [21:15:33] * End-users can download single language packs and benefit from our updates [21:16:10] On the negative side, debs are medium-annoying to build. I’d want to automate that maintenance... [21:16:45] err, annoying to specify in the metadata files. Building is medium-easy :) [21:47:24] halfak: I've trained the models based on the new data (well, combination of old and new data) It has 0.5% increase in ROC-AUC given that damaging is already 97%, I consider it very good [21:47:40] let me get the detailed stats out of the door [21:54:16] What about pr-auc? [21:54:18] kk [23:10:25] OK all done with reviews. I'm out of here. Have a good one! [23:10:26] o/ [23:41:39] oh. deb package sources would have the same problem we were trying to solve with git-lfs. Giant binary blobs that need to be kept in git :) [23:43:57] fun fact polygerrit will not call you an annonymouse coward now like gerrit's gwtui does [23:43:58] heh [23:52:58] * awight holds head a little higher now [23:55:37] lol [23:55:38] awight wow time goes fast [23:55:39] already 00:55am [23:55:40] heh [23:56:07] ono. have a good weekend! [23:56:37] lol thanks and you too [23:56:38] :)