[03:09:17] 10ORES, 10Scoring-platform-team: [Discuss] Future ORES architecture - https://phabricator.wikimedia.org/T226193 (10Ottomata) More link dumpy (via Nuria): https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-machine-learning [07:31:14] 10ORES, 10Scoring-platform-team, 10Operations: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10elukey) p:05Triage→03High [14:49:42] o/ [15:02:14] brb plumber just got here :) [15:09:10] back [15:09:15] o/ kevinbazira [15:09:17] how's hacking? [15:09:44] Hi halfak o/ [15:11:24] Hacking is going well. I rebuilt the articletopic model. PR: https://github.com/wikimedia/drafttopic/pull/43 [15:12:02] Awesome! I just finished reviewing and merged it! [15:12:25] Maybe we can get that deployed soon. It would be nice to get this improvement out the door :) [15:16:45] kevinbazira, I made some progress on generating preprocessed text for generating embeddings. [15:17:10] I'd like to have you try generating some 50 cell embeddings using this new preprocessed text. [15:22:05] Sure thing! Thanks for merging the PR. Should the 50 cell embeddings be generated using fasttext? [15:22:38] Yes. [15:22:57] Hmm. I am stuck in a bit of a pickle though. I would like to have you generate some supervised embeddings. [15:23:18] Well, this will be useful. Let's give the intermediate stuff a try. I'll link you to the dir you can find them in. [15:23:49] See stat1007:/home/halfak/projects/mwtext/datasets [15:24:12] Alright, thanks for the link. [15:25:07] The only real difference with this input data is that it has line breaks between paragraphs and between page boundaries. I think fasttext will be able to do a better job learning from this. [15:26:49] I need to figure out how to add our labels to this. [15:27:11] Great. Should I generate the embeddings from enwiki-20190112-preprocessed_article_text.txt.bz2 only? [15:27:38] Yes for right now [15:28:02] Well, actually it would be very interesting to run some tests with the other languages. So it would be cool to see those built too. [15:28:12] Alright, thanks for clarifying. [15:28:14] So yeah, enwiki first, but then the others second. [15:28:39] Cool! [15:42:44] I never really considered the whole labeling problem for supervised learning. I'll need to adapt my approach a bit. [15:43:04] I wonder if we could hold all page labels in memory... Hmm. [15:45:37] I could probably map the page labels to ints and that would make them a lot smaller. [15:49:24] 10ORES, 10Scoring-platform-team: [Discuss] Future ORES architecture - https://phabricator.wikimedia.org/T226193 (10Ottomata) https://github.com/tensorflow/tfx [17:03:34] Hey folks! Async time. [17:03:36] Y: Mostly worked through the details of getting mwtext and a few related packages set up for CI and deployments. I also generated some new datasets for Kevin to work with including data for Korean, Arabic, Vietnamese, and Czech. I also spent a solid hour on Nate's paper re. ORES and a solid hour on the ORES systems paper rewrite. [17:03:36] T: Lots of meetings today. I'm just about to go into a 3 hour "tuning session" meeting. I'm hoping to start setting up vagrant and do a little exploration of what it would take to add labels to the preprocessed text output. [17:07:16] Y: Rebuilt the drafttopic model and succedded with no errors. Thanks to Aaron's guidance. [17:07:27] T: Rebuilt the articletopic model. PR: https://github.com/wikimedia/drafttopic/pull/43 [17:07:40] T: Registered for the Wikimedia Hackathon 2020 in Tirana, Albania: https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2020/Register [17:08:31] Ooh. Thanks for catching that kevinbazira! [17:08:54] T: Generate supervised 50 cell embeddings using fasttext [17:08:55] - input: /home/halfak/projects/mwtext/datasets/ [17:08:55] - halfak, default fasttext label format is denoted by '__label__', what's the label format of this input? (i.e the datasets you generated today) [17:10:41] kevinbazira, no labels yet. So let's do unsupervised. I'll be looking into that briefly today. If I make any useful progress I'll let you know where you can find the labeled/preprocessed text. [17:11:18] Alright, thanks for the clarification halfak! [17:11:54] Y: more jade ui work -- fixed eslint/styleguide issues and now jenkins is finally passing, also fixed up class names for all UI widgets (facets/proposals/endorsements) so css can be streamlined. [17:12:10] T: need to create an "author" widget that handles both anonymous and logged in users, should be simple, also need to debug warnings/errors widgets and then if I have time, will be sorting out the i18n messages for all UI components. [17:21:31] Good day halfak and accraze 👋 [17:21:44] take care, kevinbazira! [17:26:10] later kevinbazira [18:11:08] 10ORES, 10Scoring-platform-team, 10Operations: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10Halfak) I had a look at the request log on ores2001 and I can't find any requests that look concerning. Hypotheses: 1. celery got into a weird state and went crazy. It may not happ... [18:19:41] mediawiki-utilities/python-mwtext#7 (master - c2288e4 : Aaron Halfaker): The build was broken. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/637033911 [19:13:19] mediawiki-utilities/python-mwtext#9 (master - a34bbed : halfak): The build was fixed. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/637055587 [19:33:10] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] build: Updating mediawiki/mediawiki-codesniffer to 29.0.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/564529 (owner: 10Libraryupgrader) [19:58:18] Just got out of a LOOONG meeting. [19:58:21] taking lunch [19:58:24] possibly nap [19:58:25] oof [21:12:54] lunchin brb [23:35:35] I'm outta here for the day. Take care, folks. [23:48:10] 10Scoring-platform-team (Current), 10drafttopic-modeling, 10revscoring, 10artificial-intelligence: Implement native NN topic model in revscoring - https://phabricator.wikimedia.org/T242013 (10Isaac) @Halfak : I moved the code to stat1005 so I can hopefully get access to the GPUs there for any further testi...