[00:41:53] 10Scoring-platform-team (Current): Secondary 'http' issue in Jade PromoteDialog - https://phabricator.wikimedia.org/T255084 (10ACraze) [00:42:18] 10Scoring-platform-team (Current): Secondary 'http' issue in Jade PromoteDialog - https://phabricator.wikimedia.org/T255084 (10ACraze) a:03ACraze [00:42:42] 10Jade, 10Scoring-platform-team (Current): Secondary 'http' issue in Jade PromoteDialog - https://phabricator.wikimedia.org/T255084 (10ACraze) [09:17:23] 10ORES, 10Scoring-platform-team, 10Operations: Move ORES to redis misc cluster - https://phabricator.wikimedia.org/T254226 (10akosiaris) 05Stalled→03Resolved a:03akosiaris Everything is fine after a week, resolving this. [13:12:32] o/ [13:22:20] 10ORES, 10Scoring-platform-team, 10Operations: Move ORES to redis misc cluster - https://phabricator.wikimedia.org/T254226 (10Halfak) Thank you, @akosiaris! [14:40:35] o/ haksoat [14:40:56] Hello [15:09:41] Woops. I had a questions but I looked away and forgot. [15:10:27] haksoat, how's progress on reviewing that massive editquality PR? [15:10:51] https://github.com/wikimedia/editquality/pull/223 [15:12:49] Going fine. Yet to do stuff today though, should start in about an hour. [15:12:58] Gotcha. Sounds good. [18:27:53] 10Jade, 10Scoring-platform-team (Current), 10Patch-For-Review: Secondary 'http' issue in Jade PromoteDialog - https://phabricator.wikimedia.org/T255084 (10ACraze) @kevinbazira it looks like the EntitySummarizer was not reindexing the array being created during the LinkSummary hook, so there was a weird 'off... [19:05:47] stepping out for a bit, I need a break after watching the training videos about budgets [19:16:02] I can imagine. They are very exciting. need to deflate, I'm sure ;) [19:20:16] halfak: need any additional review from me on https://github.com/mediawiki-utilities/python-mwtext/pull/5 ? [19:20:48] isaacj, was just about to pull it down and try to run it to ensure that it works as designed. [19:20:55] That's my last step in the review. [19:21:56] :thumbs up: thanks -- I took a quick glance just now and all seemed fine to me but i didn't do any explicit testing [19:23:50] isaacj, what's the thinking on randomizing the properties? Should we sort them instead? [19:24:41] yeah, i wrote some stuff in slack about that but i'll copy it here: [19:25:16] https://www.irccloud.com/pastebin/eMCKAtrl/ [19:25:49] Isn't the model learning from order a good thing? [19:26:00] https://www.irccloud.com/pastebin/WNEYFexw/ [19:26:05] E.g. in some cases, P21 appears near P32 and that has meaning. [19:26:26] https://www.irccloud.com/pastebin/p0svhIiQ/ [19:26:55] Hmm. mwbase preserves order on purpose. I guess that doesn't matter. [19:27:27] let me check if the XML dumps preserve the actual Wikidata order (as opposed to JSON/API which have a kinda unpredictable order to them) [19:31:50] yeah, xml dumps don't seem to obey Wikidata's order (https://www.wikidata.org/wiki/MediaWiki:Wikibase-SortedProperties) which is the order that i'd want the model to learn as it's the only one that I understand to be stable [19:32:08] Well dang it. [19:32:39] What is this page communicating? [19:32:50] Is this the desired order? Everything else comes at the end? [19:33:18] yep -- it's what is used for sorting the pages when you open them in a browser and there actually is some grouping / meaning to it as well in my opinion [19:33:33] So we could process this and use it for ordering. [19:33:45] E.g. we just see what index a Pid has on this page and sort by that? [19:34:03] more details: https://www.wikidata.org/wiki/Help:Statements#Order_of_statements [19:34:03] Eh. I'll merge what we have. We may care to try this later. [19:34:32] but yeah, in a perfect world, i think we import the wikidata order and apply it to items [19:54:56] 10Scoring-platform-team, 10Research: Write Python util for converting Wikidata claims to features for ML models - https://phabricator.wikimedia.org/T252775 (10Isaac) I wanted to preserve this info somewhere. We have discussed whether or not the Wikidata statements should be ordered by mwtext (see [[https://paw... [20:04:33] isaacj, I just realized we need to extend this script to handle adding labels with the "words" [20:04:44] so that we can train our embedding in a supervised way. [20:05:20] I have a refactor in progress that will allow us to do this as part of a second step. [20:05:42] Or we could just ignore that for now and just train the embeddings without labels. [20:06:22] halfak: that happens in mwtext? i assumed it was part of the drafttopic repo [20:07:34] oh wait, i see it now in preprocess_text.py [20:08:20] In the refactor, I move it to a separate step. The refactor is close to ready. I've been blowing smoke through it over the last couple of days. [20:09:00] ahh -- up to you. hopefully this is a trivial addition because the figshare item already has QIDs? https://figshare.com/articles/Wikipedia_Articles_and_Associated_WikiProject_Templates/10248344/4 [20:10:02] Huh? [20:10:10] Didn't it always have Qids? [20:10:42] Oh I see. Yes :) Should be. [20:10:57] Regretfully, I think we'll need to get clever to figure out how to generalize. [20:11:16] Maybe we can give "wikidata" as a special 'title_lang' [20:11:29] As opposed to en, cs, vi, etc. [20:13:14] yeah, i see what you mean. can the logic just be set that with no language specified, it doesn't do any filtering? i'd rather not add wikidata to the sitelinks structure in case people use that for counting how many languages an article appears in [20:14:07] We'd need to somehow specify where in the JSON blob to look for the title to match against [20:14:19] This isn't really filtering. It's matching titles with labels. [20:17:06] Holey moley. Wikidata has 74 splits on the most recent pages-articles XML dumps. [20:17:09] That's a ton. [20:17:44] yeah, it's a big wiki :/ i really wish there was a dump that was just items with wikipedia sitelinks [20:20:09] the title dictionary does seem to be used to filter later on too. so then i could see adding a line that adds 'wikidata': to each item's sitelinks dictionary as mwtext loads the labels JSON. then if you pass 'wikidata' as the language, it should pull the QID as its title. alternatively, i could add that into the logic that generates the figshare items, though again i'm not in love with that logic as it's only really for mwtext [20:20:55] Yeah. I think we can just have a bit of logic to grab the Qid to match against title. [20:23:47] :thumbs up: [20:54:57] mediawiki-utilities/python-mwtext#83 (dibyaaaaax-preprocess-wikidata - 69402ac : Aaron Halfaker): The build failed. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/697387195 [21:09:56] mediawiki-utilities/python-mwtext#89 (galtay-galtay_mwpfh_pt1 - fac3207 : Gabriel): The build was fixed. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/697393741 [21:12:03] mediawiki-utilities/python-mwtext#91 (galtay-galtay_mwpfh_pt1 - 0633dc2 : Aaron Halfaker): The build was broken. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/697394274 [21:23:41] I'm off folks. See y'all on Wednesday! [21:25:26] Have a great wedding! [21:46:36] mediawiki-utilities/python-mwtext#95 (galtay-galtay_mwpfh_pt1 - b603184 : Gabriel): The build was fixed. https://travis-ci.org/mediawiki-utilities/python-mwtext/builds/697403669 [23:57:23] 10Jade, 10Scoring-platform-team (Current), 10Documentation: Docker install docs for Jade - https://phabricator.wikimedia.org/T255219 (10ACraze) [23:58:06] 10Jade, 10Scoring-platform-team (Current), 10Documentation: Docker install docs for Jade - https://phabricator.wikimedia.org/T255219 (10ACraze)