[00:11:43] halfak: never mind I just need to change how i make my github personal access token [00:24:33] 10Jade, 10ORES, 10Scoring-platform-team, 10ApiFeatureUsage, and 22 others: All API help links should use `Special:MyLanguage` - https://phabricator.wikimedia.org/T231269 (10Niharika) [09:47:10] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for Swahili (sw) - https://phabricator.wikimedia.org/T162271 (10kevinbazira) a:03kevinbazira Thanks @Halfak! I've looked at the BWDS list and as @Baba_Tabita said, the Swahili words on the... [14:34:06] o/ kevinbazira [14:46:54] kevinbazira, how is hacking on Swahili? [14:54:08] * halfak digs into https://fasttext.cc/docs/en/unsupervised-tutorial.html [14:58:19] brb [15:10:22] o/ Halfak hacking on Swahili is going well .. still compiling the badwords/informals [15:11:03] Cool :) [15:11:34] https://fasttext.cc/docs/en/unsupervised-tutorial.html will be related to what we work on next, so I'm reading ahead of you a little bit. [15:12:11] Just trying to make sure I have a sense of what I'll be asking you to do. [15:28:52] Looking at https://github.com/facebookresearch/fastText/blob/master/wikifil.pl, it seems like we could build a nice utility for this that would support distributed processing. [15:28:58] I wonder if it would be as fast as perl. [15:29:40] Not that perl is particularly fast but it does seem like this script has been tuned and really only applies a bunch of regex. [15:30:52] Looks like script doesn't support [[File:Image.jpg]] -- it only supports the old [[Image:Image.jpg]] [15:30:53] 10[1] 04https://meta.wikimedia.org/wiki/File:Image.jpg13 => [15:30:56] 10[2] 04https://meta.wikimedia.org/wiki/Image:Image.jpg [15:31:06] lol AsimovBot [15:31:31] We'll need to handle localization. "File" is the English internationalization of that namespace. [15:32:33] I'm imagining that we create a library called "mwtext" or something like that which would contain utilities to support text processing. [15:41:19] e.g. mwtext xml2plaintext --siteinfo=$(mwtext get_siteinfo https://ja.wikipedia.org) jawiki-latest-pages-articles.xml.bz2 > jawiki.articles.plaintext [15:53:04] Looks like we can experiment with some of the pre-generated vectors. https://fasttext.cc/docs/en/crawl-vectors.html [15:53:10] I'll load one of those up quick. [16:30:57] 10Jade, 10ORES, 10Scoring-platform-team, 10ApiFeatureUsage, and 22 others: All API help links should use `Special:MyLanguage` - https://phabricator.wikimedia.org/T231269 (10DannyS712) [17:02:51] Hey folks! Async standup time! [17:09:34] Y: Added values from the euwiki tuning report to the article quality Makefile and Rebuilt euwiki model with random_forest. PR: https://github.com/wikimedia/articlequality/pull/98 [17:09:44] Y: Made cawiki tuning reports and added them to this PR: https://github.com/wikimedia/editquality/pull/216 [17:09:55] T: Compiling Swahili badwords and informals: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sw [17:10:06] T: Setting up dev env for NG's ML tutorials - Octave and Matlab. The introductory theory is done. The comming sessions require us to write code. [17:12:21] Y: Continued on Jade Entity UI. Mostly working on backend OOUI scaffolding to attach JS widgets on the frontend. Didn't really make much progress here, panel layouts are not attaching to the mw-text-content div class. Also spent a little bit of time getting a demo ready for the Jade API. [17:12:46] T: Demo the Jade API to Aaaron & Kevin. Will clean up and address some of the notes taken during the demo and will ping team for additional review. Also will continue working on Jade Entity UI scaffolding. [17:16:31] Y: Spent a bunch of time fighting with git LFS to fix up some of Kevin's PRs (I forgot to tell him to run "git lfs install" -- oof). I've got the articlequality PR ready. I'll be working on getting the editquality PR's LFS stuff fixed up. I'm hoping to have Kevin follow-up with a final review of what I did before we merge. I met with the Growth team to continue our discussion of topic models. [17:16:31] T: Continuing work on Kevin's PRs. Should be minimal. I spent some time this morning exploring FastText (a strategy for making embeddings) and thinking through how we'd engineer pipelines that can build our embeddings. This will be relevant to topic modeling -- which is what I'm hoping Kevin will work on next. I need to bring Kevin's laptop to DHL. Hopefully, I'll get to Jade design work bits that Andy needs. And if I get that far, I'll [17:16:31] work on report to support Andy's review of what I've been doing for session orientation. [17:17:22] kevinbazira, gist is that I'll make some changes to your two PRs. I'll describe those changes in the relevant phab tasks. I'd like you to perform the final review of the PRs and merge if you see fit tomorrow. [17:18:39] That's alright. [17:21:15] halfak: no rush on the additional design assets, I most likely won't get to that stuff until after the offsite. [17:21:39] COol. I'll keep that in mind :) [17:44:37] I just loaded 300 cell vector word embeddings and used it to make some analogies. It took 10GB of memory. So that's unmanageable. But it works. And the interface is pretty straightforward. [17:45:37] regretfully, the fasttext maintainers are bad at pypi, so I had to install directly from their repo. [17:46:47] That was a fun little spike. I'll be coming back to this later. [18:30:57] Going offline. Good day halfak, acraze. [19:46:25] running out to DHL [19:46:31] back in a bit [19:46:33] wish me luck [19:52:53] luck! [20:25:48] That went well, I think. [20:26:04] They confirmed that it should arrive by Nov 12th which would be pretty awesome. [21:43:07] 10Jade, 10Scoring-platform-team (Current): Design Jade entity UI - https://phabricator.wikimedia.org/T212370 (10Halfak) {F31057508} This screenshot contains the labeldata field for editquality. I included a lot of content because I think this label benefits from solid coaching. We'll need to set up strings... [21:51:38] Weird. The revscoring PR wasn't picking up my changes. But the remote branch that the PR referenced had them. [21:51:58] I just rebased on master (which I might as well do anyway) and force pushed -- BAM there are the changes in the PR. [21:55:35] wikimedia/revscoring#1766 (session_orientation - 04bb9df : halfak): The build was broken. https://travis-ci.org/wikimedia/revscoring/builds/608962141 [21:55:41] Oh poo [21:56:38] Oh! It's just docs being broken. That's OK. [21:56:49] For now anyway. [22:07:29] 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: Refactor revscoring to handle session-orientation - https://phabricator.wikimedia.org/T231214 (10Halfak) New changes! See https://github.com/wikimedia/revscoring/compare/9ff5ac176d6fb71f4ccc88bd43a1e36439cb4968...04bb9dfd6acc7da1c90... [22:07:29] OK! Notes to review. [22:07:32] Wooo [22:07:38] I hope that makes it easier. [22:07:43] I need to run for a bit.