[13:24:55] <halfak>	 o/
[14:32:54] <wikibugs>	 10Jade, 10Scoring-platform-team, 10Patch-For-Review: Render edit comments in Jade - https://phabricator.wikimedia.org/T247457 (10kevinbazira) a:03kevinbazira
[16:16:23] <kevinbazira>	 Hi halfak 👋 
[16:16:24] <kevinbazira>	 Hi accraze 👋
[16:16:32] <halfak>	 hey kevinbazira!
[16:16:42] <halfak>	 How's hacking on the edit comment rendering? 
[16:20:05] <kevinbazira>	 It's going well. Managed to push a patchset today. 
[16:20:08] <kevinbazira>	 Though something in the Jenkins pipeline is failing. Want to engage Andy for help :)
[16:21:33] <accraze>	 taking a quick look kevinbazira
[16:21:52] <kevinbazira>	 cool ..
[16:24:57] <halfak>	 Cool!  Thanks guys :) 
[17:23:43] <wikibugs>	 10Scoring-platform-team, 10artificial-intelligence: Add `words_to_watch` to articlequality and draftquality models in ptwiki - https://phabricator.wikimedia.org/T251171 (10Chtnnh) https://github.com/wikimedia/articlequality/pull/121  Here is the articlequality code for review.
[17:41:02] <wikibugs>	 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10He7d3r) @Halfak do you have a quick way to get how many assessments by each user in the dataset `ptwiki.balanced_labelings.*_2020....
[17:55:26] <wikibugs>	 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10TJones) @HAKSOAT,  I have a couple of suggestions, which may or may not be help...
[17:57:30] <accraze>	 posting our async update notes --
[17:57:41] <accraze>	 halfak - 
[17:57:43] <accraze>	 Y: Lots of meetings.  Did an interview with an SWE candidate and reached out to get the interviewing process cleaned up.  Your calendars should now be cleared up.  I talked to the research team about the roadblock we're experiencing with Jade 2ndary tables and how that is likely to play out for a lot of the work that the Research team is doing.   I didn't make it to the Jade integrations regretfully. 
[17:57:45] <accraze>	 Instead, I answered a bunch of questions about the ptwiki models and tried to give some insights for editors there.  
[17:57:47] <accraze>	 T: I'm going to review Andy's work on 2ndary integrations and help chtnnh iterate on the ptwiki models.   I'll work on the ORES paper if I make it there. 
[17:57:49] <accraze>	 kevinbazira -
[17:58:00] <accraze>	 Y:
[17:58:01] <accraze>	 Looked into what facilities are available to us when rendering edit comments: https://phabricator.wikimedia.org/T250723
[17:58:03] <accraze>	 - Based on this and this, Wikidata is using the FormatAutocomments MediaWiki hook to render localized strings for edit comments.
[17:58:05] <accraze>	 T:
[17:58:07] <accraze>	 Localized Jade history page comment prefixes
[17:58:09] <accraze>	     Forexample:
[17:58:11] <accraze>	     If an edit comment on the history page was;
[17:58:13] <accraze>	         (→‎jade-createandendorseproposal: {"damaging":true,"goodfaith":true} "SW" :)
[17:58:15] <accraze>	     It is now;
[17:58:17] <accraze>	         (Proposal created: {"damaging":true,"goodfaith":true} "SW" :)
[17:58:19] <accraze>	     The "Proposal created:" part of the comment is now localized.
[17:58:21] <accraze>	     This has been done for all the 8 Jade actions that create, update or delete.
[17:58:23] <accraze>	 **The Jenkins test is failing because of a database issue. None of my code was altering the DB. I've engaged Andy for help.
[17:58:25] <accraze>	 haksoat -
[17:58:27] <accraze>	 Y:
[17:58:29] <accraze>	 Started reading up extensively on the logic behind the NFA regex engines
[17:58:31] <accraze>	 T:
[17:58:33] <accraze>	 Continued reading up on the same topic from yesterday. I've finished the chapter from the book I am doing my study with and can easily see reasons why our regex is slower. I don't know yet how to make it better, which I hope to learn from the next two chapters. Book title: Mastering Regular Expressions by Jeffrey Friedl.
[17:58:35] <accraze>	 and me -
[17:58:37] <accraze>	 Y: Did some code review for Aaron, also worked on cleaning up the Dockerfile for ORES that is used on travis and started working on a blubberfile
[17:58:39] <accraze>	 T: More code review, also need to investigate the DB issue on Jenkins that Kevin ran into with his rendered comment patchset, I think this might be due to some of the WIP 2ndary schema stuff I did last week, but still need to spend some time digging into why it's affecting Kevin
[18:05:33] <wikibugs>	 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10HAKSOAT) Thanks for this @TJones . Yes, a lot of the time spent was due to the...
[18:07:56] <halfak>	 chtnnh,  /home/halfak/projects/articlequality/datasets/ptwiki.labelings.20200301.json
[18:08:04] <halfak>	 Grab that file and put it in your datasets dir. 
[18:08:17] <chtnnh>	 aha
[18:10:40] <chtnnh>	 new error :(
[18:11:01] <wikibugs>	 10ORES, 10Scoring-platform-team (Current): Remove pylru requirement from ORES - https://phabricator.wikimedia.org/T251003 (10ACraze) Reviewed & merged that PR
[18:11:12] <halfak>	 Thanks accraze!
[18:11:31] <accraze>	 no prob
[18:11:38] <accraze>	 early lunch run, back in a bit
[18:11:47] <wikibugs>	 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10Halfak) Hmm.  No quick way to do that...  We could modify the label extractor to grab it though.  It would require some refactorin...
[18:11:55] <halfak>	 Arg.  I need lunch too. chtnnh paste that error before I go :P 
[18:15:06] <chtnnh>	 https://gist.github.com/chtnnh/d6e0808360dc3ff2b116a4abd53eb03d
[18:15:24] <chtnnh>	 sorry senpai :((
[18:16:59] <Helder>	 chtnnh, which version of python are you using?
[18:17:05] <chtnnh>	 ohhhh
[18:17:09] <halfak>	 chtnnh, I think you forgot your venv
[18:17:11] <Helder>	 I think I saw this error when I used python 2 by accident
[18:17:13] <chtnnh>	 forgot to source the venv
[18:17:16] <halfak>	 ha. 
[18:17:19] <halfak>	 LUNCH!
[18:17:26] <chtnnh>	 xD
[18:27:18] <wikibugs>	 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10TJones) For the custom token filter, we have written a few simple ones that jus...
[18:36:45] <wikibugs>	 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10HAKSOAT) Thanks for this. I currently don't know a lot of Java, just Python and...
[20:36:39] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Review prometheus ORES rules for completeness - https://phabricator.wikimedia.org/T233448 (10Halfak) Hi @colewhite!  I'm sorry for the big delay on this.  Was looking this over today and I think I might need a tutorial for how to convert grafana graphs to...
[20:42:30] <dsaez>	 halfak, isaacj have a look on this https://pypi.org/project/contextualized-topic-models/ I'll add it for our next 'topic meeting' 
[20:47:16] <halfak>	 Looks interesting though honestly the API looks pretty messy. 
[20:48:24] <halfak>	 I wonder what performance is like. 
[20:48:28] <halfak>	 I haven't played with BERT yet
[20:53:52] <isaacj>	 hmmm...interesting. it looks like essentially they used an autoencoder framework where they start with the text, compress it down into a document embedding (via BERT or bag of words), and then try to "recreate" the bag-of-words from that embedding for unsupervised learning
[20:55:19] <isaacj>	 even if it blows up for article text (because BERT is pretty intensive i think and this architecture seems pretty big too), might work well if you just treat article links as the bag of words
[21:01:42] <haksoat>	 halfak: What does etc mean in the tokens?
[21:01:44] <haksoat>	 ("etc",           r".") 
[21:01:58] <haksoat>	 Asides from matching any character
[21:33:21] <halfak>	 haksoat, right.  That's for characters that don't match any other token.  
[21:33:24] <halfak>	 It's rare. 
[21:33:52] <halfak>	 But we need to cover it in order to do diffing and stuff. 
[23:41:10] <wikibugs>	 10Scoring-platform-team, 10artificial-intelligence: Add `words_to_watch` to articlequality and draftquality models in ptwiki - https://phabricator.wikimedia.org/T251171 (10He7d3r) See https://github.com/wikimedia/articlequality/pull/122 for another possible explanation for the problem: >>! In T251171#6087118,...
[23:41:19] <Helder>	 halfak, ^
[23:49:17] <wikibugs>	 10ORES, 10Scoring-platform-team, 10artificial-intelligence: Review model performance for ptwiki 'articlequality' and 'draftquality' - https://phabricator.wikimedia.org/T250809 (10He7d3r) The following pull request is related to improving the `articlequality` model: https://github.com/wikimedia/articlequality...