[13:24:55] o/ [14:32:54] 10Jade, 10Scoring-platform-team, 10Patch-For-Review: Render edit comments in Jade - https://phabricator.wikimedia.org/T247457 (10kevinbazira) a:03kevinbazira [16:16:23] Hi halfak πŸ‘‹ [16:16:24] Hi accraze πŸ‘‹ [16:16:32] hey kevinbazira! [16:16:42] How's hacking on the edit comment rendering? [16:20:05] It's going well. Managed to push a patchset today. [16:20:08] Though something in the Jenkins pipeline is failing. Want to engage Andy for help :) [16:21:33] taking a quick look kevinbazira [16:21:52] cool .. [16:24:57] Cool! Thanks guys :) [17:23:43] 10Scoring-platform-team, 10artificial-intelligence: Add `words_to_watch` to articlequality and draftquality models in ptwiki - https://phabricator.wikimedia.org/T251171 (10Chtnnh) https://github.com/wikimedia/articlequality/pull/121 Here is the articlequality code for review. [17:41:02] 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10He7d3r) @Halfak do you have a quick way to get how many assessments by each user in the dataset `ptwiki.balanced_labelings.*_2020.... [17:55:26] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10TJones) @HAKSOAT, I have a couple of suggestions, which may or may not be help... [17:57:30] posting our async update notes -- [17:57:41] halfak - [17:57:43] Y: Lots of meetings. Did an interview with an SWE candidate and reached out to get the interviewing process cleaned up. Your calendars should now be cleared up. I talked to the research team about the roadblock we're experiencing with Jade 2ndary tables and how that is likely to play out for a lot of the work that the Research team is doing. I didn't make it to the Jade integrations regretfully. [17:57:45] Instead, I answered a bunch of questions about the ptwiki models and tried to give some insights for editors there. [17:57:47] T: I'm going to review Andy's work on 2ndary integrations and help chtnnh iterate on the ptwiki models. I'll work on the ORES paper if I make it there. [17:57:49] kevinbazira - [17:58:00] Y: [17:58:01] Looked into what facilities are available to us when rendering edit comments: https://phabricator.wikimedia.org/T250723 [17:58:03] - Based on this and this, Wikidata is using the FormatAutocomments MediaWiki hook to render localized strings for edit comments. [17:58:05] T: [17:58:07] Localized Jade history page comment prefixes [17:58:09] Forexample: [17:58:11] If an edit comment on the history page was; [17:58:13] (β†’β€Žjade-createandendorseproposal: {"damaging":true,"goodfaith":true} "SW" :) [17:58:15] It is now; [17:58:17] (Proposal created: {"damaging":true,"goodfaith":true} "SW" :) [17:58:19] The "Proposal created:" part of the comment is now localized. [17:58:21] This has been done for all the 8 Jade actions that create, update or delete. [17:58:23] **The Jenkins test is failing because of a database issue. None of my code was altering the DB. I've engaged Andy for help. [17:58:25] haksoat - [17:58:27] Y: [17:58:29] Started reading up extensively on the logic behind the NFA regex engines [17:58:31] T: [17:58:33] Continued reading up on the same topic from yesterday. I've finished the chapter from the book I am doing my study with and can easily see reasons why our regex is slower. I don't know yet how to make it better, which I hope to learn from the next two chapters. Book title: Mastering Regular Expressions by Jeffrey Friedl. [17:58:35] and me - [17:58:37] Y: Did some code review for Aaron, also worked on cleaning up the Dockerfile for ORES that is used on travis and started working on a blubberfile [17:58:39] T: More code review, also need to investigate the DB issue on Jenkins that Kevin ran into with his rendered comment patchset, I think this might be due to some of the WIP 2ndary schema stuff I did last week, but still need to spend some time digging into why it's affecting Kevin [18:05:33] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10HAKSOAT) Thanks for this @TJones . Yes, a lot of the time spent was due to the... [18:07:56] chtnnh, /home/halfak/projects/articlequality/datasets/ptwiki.labelings.20200301.json [18:08:04] Grab that file and put it in your datasets dir. [18:08:17] aha [18:10:40] new error :( [18:11:01] 10ORES, 10Scoring-platform-team (Current): Remove pylru requirement from ORES - https://phabricator.wikimedia.org/T251003 (10ACraze) Reviewed & merged that PR [18:11:12] Thanks accraze! [18:11:31] no prob [18:11:38] early lunch run, back in a bit [18:11:47] 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10Halfak) Hmm. No quick way to do that... We could modify the label extractor to grab it though. It would require some refactorin... [18:11:55] Arg. I need lunch too. chtnnh paste that error before I go :P [18:15:06] https://gist.github.com/chtnnh/d6e0808360dc3ff2b116a4abd53eb03d [18:15:24] sorry senpai :(( [18:16:59] chtnnh, which version of python are you using? [18:17:05] ohhhh [18:17:09] chtnnh, I think you forgot your venv [18:17:11] I think I saw this error when I used python 2 by accident [18:17:13] forgot to source the venv [18:17:16] ha. [18:17:19] LUNCH! [18:17:26] xD [18:27:18] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10TJones) For the custom token filter, we have written a few simple ones that jus... [18:36:45] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10HAKSOAT) Thanks for this. I currently don't know a lot of Java, just Python and... [20:36:39] 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Review prometheus ORES rules for completeness - https://phabricator.wikimedia.org/T233448 (10Halfak) Hi @colewhite! I'm sorry for the big delay on this. Was looking this over today and I think I might need a tutorial for how to convert grafana graphs to... [20:42:30] halfak, isaacj have a look on this https://pypi.org/project/contextualized-topic-models/ I'll add it for our next 'topic meeting' [20:47:16] Looks interesting though honestly the API looks pretty messy. [20:48:24] I wonder what performance is like. [20:48:28] I haven't played with BERT yet [20:53:52] hmmm...interesting. it looks like essentially they used an autoencoder framework where they start with the text, compress it down into a document embedding (via BERT or bag of words), and then try to "recreate" the bag-of-words from that embedding for unsupervised learning [20:55:19] even if it blows up for article text (because BERT is pretty intensive i think and this architecture seems pretty big too), might work well if you just treat article links as the bag of words [21:01:42] halfak: What does etc mean in the tokens? [21:01:44] ("etc", r".") [21:01:58] Asides from matching any character [21:33:21] haksoat, right. That's for characters that don't match any other token. [21:33:24] It's rare. [21:33:52] But we need to cover it in order to do diffing and stuff. [23:41:10] 10Scoring-platform-team, 10artificial-intelligence: Add `words_to_watch` to articlequality and draftquality models in ptwiki - https://phabricator.wikimedia.org/T251171 (10He7d3r) See https://github.com/wikimedia/articlequality/pull/122 for another possible explanation for the problem: >>! In T251171#6087118,... [23:41:19] halfak, ^ [23:49:17] 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10He7d3r) The following pull request is related to improving the `articlequality` model: https://github.com/wikimedia/articlequality...