[01:10:39] ahh yes that's correct halfak, just about to push up the new patchset in a few minutes [01:53:49] ok patchset is up and passing on jenkins ready for review tomorrow [01:54:03] i'm calling it, later! [08:52:17] 10Scoring-platform-team, 10articlequality-modeling, 10editquality-modeling, 10revscoring, and 2 others: Add English Language idioms to revscoring - https://phabricator.wikimedia.org/T205545 (10HAKSOAT) Thanks for the pointers @Halfak I have joined the channel on IRC. I'll look at the pointers and get back... [10:18:22] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve ORES articlequality feature extraction for images - https://phabricator.wikimedia.org/T180822 (10HAKSOAT) I saw this https://en.wikipedia.org/wiki/Wikipedia:Extended_image_syntax I think it has all of the extended Wiki mar... [10:30:44] 10Scoring-platform-team, 10articlequality-modeling, 10editquality-modeling, 10revscoring, and 2 others: Add English Language idioms to revscoring - https://phabricator.wikimedia.org/T205545 (10HAKSOAT) So, I'm considering adding a function to that module that fetches the idioms using mwparserfromhell and r... [13:45:51] 10Scoring-platform-team (Current), 10NewcomerTasks 1.1, 10Research: Improve WikiProject template --> WikiProject mapping - https://phabricator.wikimedia.org/T240282 (10kevinbazira) Worked on fetching WikiProject templates. PR : https://github.com/halfak/wikitax/pull/3 [15:03:07] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @James_F - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:42:58] woops [15:43:01] been here for a while. [15:46:11] 10Scoring-platform-team, 10artificial-intelligence: Exclude number of categories from quality models at euwiki - https://phabricator.wikimedia.org/T240467 (10Theklan) [15:46:53] 10Scoring-platform-team, 10artificial-intelligence: Exclude number of categories from quality models at euwiki - https://phabricator.wikimedia.org/T240467 (10Theklan) [15:48:23] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve ORES articlequality feature extraction for images - https://phabricator.wikimedia.org/T180822 (10Halfak) From a quick scan of the page, it looks like there are two syntaxes: - [[File:Something.jpg|...]] - .... [15:48:24] 10[1] 04https://meta.wikimedia.org/wiki/File:Something.jpg [15:52:45] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @Lucas_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [19:00:09] accraze, thanks for the review! [19:19:32] no prob halfalk! [19:20:05] headed out for an early lunch, back in a bit [19:32:06] halfak: o/ [19:39:57] hey codezee ! [19:40:21] Sorry I have been AFK when you were around recently. How's life/school? Working on any fun modeling problems? [19:41:34] work-school has been good, almost done with semester so trying to speed up research. I've been looking at the semantic edit intentions work from the angle of article quality, like what sequences of edits generally lead to better quality [19:42:19] right now i'm just working on the code of edit intentions to see if i can make the predictions better - this was the work done by diyi [19:42:55] Awesome! I'd be interested in any progress you make on that code as we'd like to get the model running in ORES. [19:44:03] cool! i'll let you know. I can see the patterns in categories, like fact-updates are only infobox or changes with no wiki markup, so i thought i could write some regexes to catch that, but thats not working so well [19:57:12] halfak: i'm trying to understand the diff between "process" and "wikification" from https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types/Taxonomy - both involve wiki markup changes but "process" is associated with addition/removal of a specific template always, right? [19:59:39] Sorry. got pulled into a call [19:59:41] * halfak reads [20:00:08] Yeah. That's right. [20:00:22] Process is all about tagging something for deletion, adding maintenance templates, etc. [20:00:33] * halfak looks for the docs. [20:01:38] https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types/Taxonomy [20:01:42] This might be helpful. [20:02:53] See also the talk: https://en.wikipedia.org/wiki/Wikipedia_talk:Labels/Edit_types/Taxonomy [20:02:57] codezee, ^ [20:04:01] yeah,thanks...I could think of some new features, like i saw diyi is using similarity from the word2vec model but "refactoring" i think we can added a "similarity" between added and removed segments simply by counting similar words [20:04:17] *for "refactoring" [20:04:25] +1 [20:04:34] The word2vec model will help with distance in meaning [20:04:53] E.g. "apple" is similar to "orange" but very different from "spaceship" [20:06:11] halfak: i see, do you think "pov" category will also benefit from this word2vec distance measure? [20:06:44] Hmm. Good Q. Maybe? But I'd definitely use words_to_watch. [20:07:11] halfak: whats words_to_watch?" [20:07:25] https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Words_to_watch [20:07:48] https://github.com/wikimedia/revscoring/blob/master/revscoring/languages/english.py#L212 [20:11:41] oh its in revscoring too...that'd be useful :) [20:21:57] halfak: also, i'm thinking if classification by "segment/paragraph" makes more sense? since sometimes i've seen a "fact-update" in one segment and "refactoring" in another and so eventually the edit will have both labels [20:22:31] Yes. So generally I'd agree with multiple labels. But even a sentence/segment can have multiple labels. [20:22:46] E.g. I edit a sentence for NPOV and add a reference. [20:23:09] yes, true, so the general idea i guess is to throw all the features at a classifier and let it learn from it [20:23:15] from the whole edit [20:26:02] I think that makes the most sense. [20:27:49] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve ORES articlequality feature extraction for images - https://phabricator.wikimedia.org/T180822 (10HAKSOAT) Great. It's all coming together in my head now. How do I get to test my code changes though, to ensure that they wor... [20:49:07] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve ORES articlequality feature extraction for images - https://phabricator.wikimedia.org/T180822 (10Halfak) I'd add tests here: https://github.com/wikimedia/articlequality/blob/master/articlequality/feature_lists/tests/test_e... [21:44:54] 10Scoring-platform-team, 10Research: Extract cross-wiki WikiProject tags - https://phabricator.wikimedia.org/T240273 (10Halfak) I talked to Isaac about including all of the relevant WikiProject templates. I've generated a list of all redirect pages that go to a WikiProject template. See stat1007:/home/halfak... [21:50:41] wikimedia/revscoring#1779 (explicit-multi-dict - 54a53a7 : halfak): The build has errored. https://travis-ci.org/wikimedia/revscoring/builds/623877710 [21:51:01] Shhh. You're fine, Travis [21:51:52] Got a quick pass through the API patchset. Changes look good. I'll be back tomorrow morning to finish off. [21:52:07] For now, I'm going to take off. Been working late this week and I want to get outside before it's fully dark. [21:52:12] Have a good one! [22:28:34] 10Scoring-platform-team, 10Research: Extract cross-wiki WikiProject tags - https://phabricator.wikimedia.org/T240273 (10Isaac) As discussed on IRC: the wikiproject_to_templates YAML is currently missing a number of WikiProjects. Based on the WikiProject templates that I detected in my previous of English Wikip... [23:44:20] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team: Allow searching articles by ORES drafttopic - https://phabricator.wikimedia.org/T240517 (10Tgr) [23:52:14] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team: Allow searching articles by ORES drafttopic - https://phabricator.wikimedia.org/T240517 (10Tgr) [23:54:31] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team: Allow searching articles by ORES drafttopic - https://phabricator.wikimedia.org/T240517 (10Tgr) [23:55:54] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team: Allow searching articles by ORES drafttopic - https://phabricator.wikimedia.org/T240517 (10Tgr) One thing we haven't really discussed is how the fake non-English drafttopic will work. Would that be done within ORES, or in the ES bulk update job + th...