[17:42:17] Hi research-team - Quick ping to let you know page_id <-> wikidata entities parquet file have been updated - New folder is /user/joal/wmf/data/wmf/wikidata/item_page_link/20190603 [17:46:06] joal while I see you, sorry about this run's stubs still having the empty userid in them; the next run will be back to the old behavior, since the fix has been deployed now to all wikis [17:46:13] *this dump run's [17:51:03] apergos: Thanks for the fix !! I'm used to slow moving heavy stuff, so yeah, it takes some time :) But it'll get fix :) [17:51:32] :-) [17:57:24] joal: that's awesome, thanks!! [17:57:43] any sense of how much changed over the last few months? (not important, just curious) [17:58:10] isaacj: Can't say about change, but bz2 json-dump grew 5G from last-month [17:58:23] This how much I know about it :) [17:58:42] haha, well that's certainly a change! [17:58:51] as usual isaacj, please report if you think it's wrong in any mean :) [17:59:19] thanks -- i'll test it out in the next week to make sure my code still functions as expected [17:59:58] if we're looking for numbers of new entitites, I could do a couple slow uncompress | grep and get those for you [18:01:44] apergos: thanks for the offer and up to you. coverage was already quite good (almost every article had a matching wikidata ID) but i was mainly curious if any of the already existing wikidata <--> pageID mappings had changed [18:02:28] articles in which wiki? [18:02:42] (that's probably not something I can find out by a grep, rats :-) ) [18:04:10] apergos: if you're interested I can show you Spark one of those days, and how easy it helps on doing this type of things :) [18:04:49] sure; I've had a little exposure but at a pretty basic level only [18:05:47] all the wikis :) that's why this particular mapping is so great because otherwise using APIs or DB joins is nearly impossible [18:09:32] heh, guess I'm not going to be able to pull those numbers out of a hat! [18:28:33] isaacj: you got me nerd-sniped -- https://gist.github.com/jobar/828f56069d8811f0f50014b07810b2d2 [18:31:23] TL;DR: 97.9% identical, 1.9% new links, less than 0.2% change [18:39:33] ahhh oh wow that's awesome joal ! [18:45:11] isaacj: I saw your email about mtizzoni's directory. Per T215775#4943045 I would expect the user to have a directory and a set of files under stat1007. [18:45:11] T215775: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 [18:45:51] isaacj: can you double check please? If not, the question of which data to release may have a very easy answer. ;) [18:47:32] leila: yeah, i don't see mtizzoni under /home/ on either stat1007 or notebook1004 (the two i checked) [18:47:53] isaacj: can you leave a comment on the phab task and ask elukey what's going on? [18:48:00] leila: sure thing [18:48:04] isaacj: thanks! [19:36:25] joal: took a quick look using your code to see what explained the shifts. nothing too surprising -- mainly minor mistakes (titles that have multiple meanings and someone associated a page w/ the wrong wikidata item of the same name) and duplicate wikidata items getting merged. thanks again!~ [20:12:02] Any update on Qualtrics? https://www.mediawiki.org/wiki/Talk:Talk_pages_consultation_2019/Individual_feedback [20:27:51] RF1dle: we're not involved in this at the moment so can't be of any help but hopefully your comment is addressed on the talk page [20:29:33] Okay isaacj, anyone you know active that might have an idea [20:45:41] RF1dle: unfortunately not -- looks like you got the right people in the talk page comment