[00:57:05] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10HAKSOAT) In the last week, I have worked extensively on improving my understand... [13:11:30] 10Scoring-platform-team, 10drafttopic-modeling: Compress Gensim models with term hashing - https://phabricator.wikimedia.org/T247523 (10Pavol86) I went through the documentation for gensim and fasttext(very limited info), and a lot of other pages.. :) . To proceed further: what is the goal of python-mwtext? i... [13:16:29] 10Jade, 10Scoring-platform-team (Current), 10Patch-For-Review: Render edit comments in Jade - https://phabricator.wikimedia.org/T247457 (10Halfak) [13:36:10] halfak_: do you know of any ways to filter out all extra stuff other than plain statements from wikipages? i'm using mwparserfromhell.parse.strip_code but looks like it still leaves out table data as plaintext [14:14:36] 10Scoring-platform-team (Current), 10Discovery-Search, 10Elasticsearch, 10revscoring, 10artificial-intelligence: Improve the performance and quality of tokenization in revscoring - https://phabricator.wikimedia.org/T248480 (10Halfak) @Haksoat and I chatted about using unicode ranges in the "word" token.... [14:33:36] nevermind, strip_code is supposed to remove tables, let me see why i/m getting those in some examples [15:18:53] 10ORES, 10Scoring-platform-team (Current): Estimate ORES CapEx for FY21 - https://phabricator.wikimedia.org/T249917 (10akosiaris) > We'd need to experiment with selectively shutting down celery/uwsgi and seeing what happens to memory usage under normal load. @akosiaris, what do you think of trying this out on... [15:33:56] codezee, I have something like that. Check out mwtext [15:34:17] https://pypi.org/project/mwtext/ [15:34:23] looking... [15:38:36] wow! think of a problem with wikitext --> halfak has a library :D [15:38:52] thanks! [15:42:32] It's a little hacky. [15:42:47] Pull requests welcome. We're less good at pulling out templates than mwparserfromhell. [15:43:09] I have some ideas for how we could improve that. [15:43:43] so is that supposed to be a lightweigh substitute for mwparserfromhell when we need cleanups? [15:58:09] Right. It's aimed at high performance situations. [15:58:29] I wonder if we might extend it for better quality when performance is less of an issue. [16:07:38] hey kevinbazira o/ [16:07:50] you still got time for a quick call about the db hooks today? [16:08:22] accraze o/ [16:08:43] yes please we can jump on the call whenever you're ready :) [17:00:37] thanks for the review kevinbazira, things seem good on beta so far [17:44:10] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Store/read informals, badwords, stopwords and other language assets on a wiki page - https://phabricator.wikimedia.org/T158916 (10He7d3r) Is this still wanted nowadays? [18:04:56] \o/ nice work accraze and kevin! [18:06:23] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Store/read informals, badwords, stopwords and other language assets on a wiki page - https://phabricator.wikimedia.org/T158916 (10Halfak) Yeah. I think this is really interesting. We'd need to so some thinking about how it could work with ou... [18:10:00] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages) - https://phabricator.wikimedia.org/T251608 (10He7d3r) I've updated the patch. [18:24:13] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Store/read informals, badwords, stopwords and other language assets on a wiki page - https://phabricator.wikimedia.org/T158916 (10He7d3r) Here are some of the existing lists, of varying q: * https://pt.wikipedia.org/wiki/WP:Software/Anti-vanda... [18:25:49] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Feature request: add weighted sum to revscoring score utility output - https://phabricator.wikimedia.org/T252053 (10Chtnnh) [18:30:38] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Store/read informals, badwords, stopwords and other language assets on a wiki page - https://phabricator.wikimedia.org/T158916 (10He7d3r) It is not uncommon for some good faith edit to add a new expression (or badly written regex) to such list... [19:42:54] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Store/read informals, badwords, stopwords and other language assets on a wiki page - https://phabricator.wikimedia.org/T158916 (10Halfak) We can probably handle the breaking changes by having a manual step where we pull a new version of a badw... [19:54:48] 10Scoring-platform-team, 10drafttopic-modeling: Compress Gensim models with term hashing - https://phabricator.wikimedia.org/T247523 (10Halfak) > what is the purpose of getting word embedding from supervised model? these embeddings are specifically tailored for fasttext classification. They can be used to trai... [19:57:57] chtnnh, looking at https://phabricator.wikimedia.org/T252053 ... [19:58:11] right [19:58:32] I think this is too specific for 'revscoring score' since we use that for all of the models. [19:58:45] I had a similar concern [19:58:52] You could instead build a utility for articlequality that generates the score along with the weighed_sum [19:59:00] And it could call out to revscoring score. [19:59:12] that makes a lot of sense [19:59:24] can you add that to the task for documenting the reason [19:59:32] Alternatively, you could just take the output of revscoring score and give that to "articlequality weighted_sum" [20:00:38] so something like revscoring score model --host --rev-ids | articlequality weighted_sum [20:01:06] Right exactly. [20:01:21] I like that last option better. [20:01:29] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Feature request: add weighted sum to revscoring score utility output - https://phabricator.wikimedia.org/T252053 (10Halfak) I think this is too specific for 'revscoring score' since we use that for all of the models. You could instead build... [20:01:30] I just posted in the task. [20:04:00] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Feature request: add weighted sum to revscoring score utility output - https://phabricator.wikimedia.org/T252053 (10Halfak) ` Usage: articlequality [-h | --help] articlequality weighted_sum Options: -h --help Prints this d... [20:04:13] I just made another note for the call signature and some example configuration files. [20:09:29] 10Scoring-platform-team (Current), 10articlequality-modeling, 10artificial-intelligence: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages) - https://phabricator.wikimedia.org/T251608 (10Halfak) [20:09:46] 10Scoring-platform-team (Current), 10articlequality-modeling, 10artificial-intelligence: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages) - https://phabricator.wikimedia.org/T251608 (10Halfak) a:03He7d3r [20:12:01] sorry halfak had to reboot [20:12:05] can you tell me what I missed [20:12:18] I put a bunch of notes in the task. [20:12:21] :D [20:12:38] thank you, on it [20:28:08] hey halfak, I have a quick question: https://github.com/wikimedia/articlequality/pull/126#discussion_r421064468 [20:43:11] posting our async update notes -- [20:43:15] kevinbazira- [20:43:18] Y: [20:43:20] Replaced user ID with user name in edit comments [20:43:22] Used wikitext username e.g [[User:Kevin Bazira|Kevin Bazira]] to replace user ID in comments on the Jade history page. [20:43:23] 10[1] 04https://meta.wikimedia.org/wiki/User:Kevin_Bazira [20:43:24] Worked on interview questions for the Software Engineer role. [20:43:26] T: [20:43:28] Localized number of endorsements in edit comments [20:43:30] Used the wfMessage plural syntax to localize when it's one endorsement or many endorsements. [20:43:32] The requirement was to render this comment: [20:43:34] (→‎jade-deleteproposal|1: {"damaging":true,"goodfaith":true}) [20:43:36] Like this: [20:43:38] (Proposal deleted: Damaging / Good-faith (1 endorsement)) [20:43:40] But I have ended up rendering it like this: [20:43:42] (Proposal deleted (1 endorsement): Damaging / Good-faith) [20:43:44] halfak- [20:43:46] Y: I wrote a little essay about "latent user needs" for the ORES paper. I helped chtnnh set up a workflow for demonstrating improvements to the ptwiki models using wiki templates and tables. One beautiful day, Jade will let us automate a lot of this. [20:43:48] T: I just finished a huge block of meetings. I realized when meeting with Kevin that I hadn't updated the diff/undo/rollback integration designs for Jade. So I'll be getting that together today. Otherwise, I'm running support for chtnnh and looking at improvements to the ptwiki models. [20:43:50] and me- [20:43:58] Y: Got the Jade db hooks patchset ready for review. Had some local errors popping up that I squashed, and also fought with Jenkins a bit. [20:44:01] T: Met w/ Kevin about the db hooks before he merged the patchset, been testing things out on beta, so far so good. We'll need to eventually clean up the old tables before we launch the pilot. Also watching some of the Github Satellite conf today and will pick up on some of the automation tasks this afternoon. [21:06:12] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for bswiki - https://phabricator.wikimedia.org/T194509 (10Halfak) Sorry for the delay on this task! We're finally looking to pick it up. @Srdjan_m, are you still available to consult w... [21:09:22] halfak i see this task is already assigned to someone else [21:35:43] 10Jade, 10Scoring-platform-team (Current), 10Epic, 10MW-1.35-notes (1.35.0-wmf.32; 2020-05-12): Implement secondary Jade Integrations - https://phabricator.wikimedia.org/T229974 (10Halfak) [21:35:45] 10Jade: Implement Special:Diff integration for Jade - https://phabricator.wikimedia.org/T212387 (10Halfak) [21:36:13] 10Jade, 10Scoring-platform-team: Implement Special:Diff integration for Jade - https://phabricator.wikimedia.org/T212387 (10Halfak) [21:40:20] chtnnh, we'll get that re-assigned. I don't think srdjan meant to assign it to himself. [21:43:02] great okay halfak [21:44:23] I'm out of here. Have a good one, folks.