[01:00:28] 10Scoring-platform-team (Current), 10drafttopic-modeling: Remove old drafttopic utilities and update utility docs. - https://phabricator.wikimedia.org/T249385 (10ACraze) Merged it! Nice one @Halfak [04:47:16] 10Jade, 10Scoring-platform-team (Current), 10Design, 10MW-1.35-notes (1.35.0-wmf.26; 2020-03-31), 10Patch-For-Review: Implement CSS styles for Jade Entity UI - https://phabricator.wikimedia.org/T242648 (10kevinbazira) @Halfak, yes this is done for now. It will be re-activated whenever new elements or fa... [13:38:39] Good Morning halfak! :-) [13:38:56] Hi Helder! I saw you left us a bunch of useful notes :) [13:39:21] I'm almost finishing :-) [13:39:36] hello helder [13:39:49] hey chtnnh ! [13:40:02] thank you for the notes :D silly mistakes of mine mostly [13:41:25] you're welcome [13:42:08] So helpful! [13:43:16] BTW: the comment about the ref tags is more directed to halfak than to chtnnh, since it seems most languages are using the regex he introduced for enwiki [13:43:29] This thing: ^(?!\s*$)((?!)(.|\n))*$ [13:44:22] I was wondering if named tags and tags with groups would be a feature with more importance for predicting the quality of articles [13:45:36] Oh interesting. I guess we could match those separately. [13:48:55] Do you guys have any notes/gists on how to setup my machine for experimenting with the code? [13:49:56] Last time I played with ores/revscoring was in 2015, so I'm a little lost in the current infrastructure needed [13:50:00] halfak correct me im wrong, but all you need to do is clone the repo from github and install the requirements as pip install -r requirements.txt [13:50:12] in the home directory of the cloned repo [13:50:29] https://github.com/wikimedia/articlequality/ [13:50:35] +1 chtnnh. You'll need to install some relevant enchant dictionaries too. [13:50:49] Otherwise, we now run tests with "pytest" [13:50:57] I think those are the major changes, Helder [13:52:47] Helder, this regex looks like nonsense to me. lol [13:53:05] haha, I was confused by it too [13:53:22] I wonder if I should re-implement it by using our tokenization to look for [13:53:45] We already are applying a regex to get those tags. We can just scan for looking at token.type. [13:57:22] I was playing with that regex at https://regex101.com/r/JelMvE/1 [13:57:29] and https://regexper.com/#%5E%28%3F!%5Cs*%24%29%28%28%3F!%3Cref%3E%29%28.%7C%5Cn%29%29*%24 [13:57:42] so I could understand what it was doing [13:59:02] If mwparserfromhell provides a way to get/count the s tags wiht/without attributes, I think it would be better to use that [14:00:59] about the requirements for testing things locally, do I need some dumps to train the models? [14:02:55] halfak ^ [14:03:18] Is it bad idea to test things on Toolforge, considering that there are dumps available at /public/dumps/public/ptwiki ? [14:03:48] You will need dumps to test out the beginning of the pipeline. [14:04:15] I don't think it's a bad idea to do on toolforge. I can pass you the extracted labels though. That can help. [14:04:20] Otherwise, you can do a lot with tests. [14:04:39] It's a big processing job to scan the dumps. [14:07:23] I would like those labels :-) [14:09:52] OK let me make the change you proposed to the extractor and then I'll kick it off. [14:09:56] I expect it'll take a few hours. [14:12:01] Hmm. Where did I see your suggestion about that. [14:12:07] Helder, ^? [14:12:24] I want to revert to using mwp like you suggested. [14:13:59] halfak, this? https://github.com/wikimedia/articlequality/pull/115/files#diff-ea9783f51a6dec90fc13e22a47a194f3R136 [14:14:53] Aha. I need to handle "qualidade" [14:14:53] hanks [14:14:55] *thanks [14:17:22] * halfak facepalms about mwp's error states for calling "get()" on templates. [14:17:28] It's fine, just surprising. [14:37:38] Helder, https://github.com/wikimedia/articlequality/pull/115/files?file-filters%5B%5D=.md&file-filters%5B%5D=.model&file-filters%5B%5D=.py#diff-d09399e2f3dab010fbef74332d3484c8 [14:39:39] wikimedia/articlequality#328 (chtnnh-ptwiki-features - 0e17696 : Aaron Halfaker): The build was broken. https://travis-ci.org/wikimedia/articlequality/builds/672116890 [15:03:07] Helder & chtnnh: I just finished making updates to https://github.com/wikimedia/articlequality/pull/115. Please re-check. [15:13:06] looks good to me halfak [15:40:39] 10Scoring-platform-team (Research), 10Structured-Data-Backlog, 10artificial-intelligence: Implement NSFW image classifier using Open NSFW - https://phabricator.wikimedia.org/T214201 (10Chtnnh) After speaking with @Daimona I have been informed that we require a MediaWiki extension if we wish to communicate wi... [17:35:16] hello halfak [17:35:31] was working on the makefile for draftquality [17:36:02] there seems to be an issue which raises an error multiple target patterns [17:36:07] can you help me out [17:36:17] i tried hard tabs [17:36:21] doesnt seem to work [17:37:32] 10Scoring-platform-team (Current), 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for ptwikipedia - https://phabricator.wikimedia.org/T246663 (10Halfak) Here are some new counts that I get after applying @He7d3r's notes: ` 143901 1 31866 2 5006 3 1... [17:44:09] As I was saying in PMs. Take off the ".bz2" for all of the ptwiki stuff. We don't need it. [17:44:26] I need to go get lunch. Back in a bit. [17:45:51] okay halfak [17:45:55] its running now [17:54:35] posting our async update notes -- [17:55:01] kevinbazira: [17:55:03] Y: [17:55:05] Added MW message key for jade-deleteendorsement [17:55:07] Added MW message key for jade-deleteproposal [17:55:09] T: [17:55:11] Reviewed Andy's patchsets [17:55:13] - 586429 (Re-enable MoveHooks) [17:55:15] Added MW message key for jade-endorse [17:55:17] Added MW message key for jade-createandendorse [17:55:24] halfak: [17:55:26] Y: Gave a presentation on ORES and other applied research @ Wikimedia for MSR. Went well. I worked on the tuning session deck to tell stories about Jade and article topic. I worked a lot of some revscoring/drafttopic cleanup. We have a new issue with sphinx and m2r that I'm looking into. I chatted with clemons about next steps for KubeFlow. That should be coming to a sync meeting soon. I also [17:55:28] finished building an articlequality model for ptwiki using chtnnh's feature code. [17:55:30] T: I've been improving label extraction and feature engineering for ptwiki articlequality based on Helder's feedback. I should have a new version of the model later today. I'll be finishing up tuning deck stuff today and if I make it, I'll dig into the session-orientation refactor. [17:55:50] also paper review due today about mass collaboration dynamics in Wikipedia. [17:55:53] and me: [17:56:03] Y: Did some code review for Aaron and Kevin, worked on fixing test fixtures and re-enabled the move hooks for Jade [17:56:05] T: Talked through secondary db schemas with Aaron, will do more code review Kevin and also will continue on Jade 2ndary integration work [18:01:06] chtnnh, looks like we don't have words_to_watch features for ptwiki. I'd recommend dropping those features for now, but adding something to revscoring based on https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Palavras_a_evitar later. [18:01:15] Helder might be able to help us get that right. [18:02:07] I'm actually getting lunch now :) [18:24:53] 10MediaWiki-extensions-ORES, 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-Recent-changes, 10Regression: Indicators for problematic changes (r) are missing from RC - https://phabricator.wikimedia.org/T248557 (10Catrope) This happens because the ORES extension looks for `