[01:47:31] ugh... exactly how big is the extracted JSON dump? at 45 gb now and still growing... running out of disk space >_> [01:55:30] 55G apparently [03:00:01] JeroenDeDauw: thanks for unintentionally/pre-emptively answering a question I also had :D [05:33:43] SMalyshev: why the no vote on constant visibility and void return types?? [13:00:55] https://www.wikidata.org/wiki/Wikidata:Wikimania_2016 <-------(~14 days left for comments)--------- [14:53:37] hello - im looking to download data off wiki data for my database. can someone explain how i do this? [14:53:48] lseactuary: Hi [14:53:54] Hi hoo [14:54:20] We have json dumps containing all entities as a list, (beta) ttl dumps and xml dumps which can be imported into a MediaWiki [14:54:41] The first two can bef ound under https://dumps.wikimedia.org/wikidatawiki/entities/ [14:54:42] is it possible to see those entities [14:54:56] the third one under https://dumps.wikimedia.org/wikidatawiki/ [14:55:11] lseactuary: How do you mean see? [14:56:28] hoo - basically i have a bunch of celebrities in a database. i want to assign 'tags' to them. so e.g. david beckham = footballer, UK etc. justin beiber = music etc so then i can group them. if their label changes it will change in wiki and therefore update in the database. im therefore trying to see what labels i can get from wiki :) [14:56:30] does it make sense? [14:57:21] So, you want access to the statements, right? [14:57:56] https://www.wikidata.org/wiki/Special:EntityData/Q42.json [14:58:06] that's how one of the entities in the json dump looks [14:58:33] As you see, they can also be accessed live [14:59:04] hoo - kind new to this - can you explain this [15:00:38] You mean the data structure? [15:01:10] That is documented https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/docs/json.wiki ;) [15:03:22] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON << That is nicer to read [15:09:20] thanks hoo reading now [15:10:19] hoo - i see - but how do i get a data pull [15:10:25] i dont knw about JSON files :S [15:11:25] We also have output in other formats, like ttl or n-triples, if you know how to handle these [15:11:52] not heard of them :( [15:12:13] i have an engineer to help but i would need to pull some data and see the labels we can get before asking him to build the database [15:12:21] dotn want to waste anyones time [15:12:35] hoo out of interest is there a way to link someones twitter handle to the wiki page? [15:13:55] lseactuary: Yeah, that can be done via our SPARQL query endpoint at https://query.wikidata.org [15:15:08] hoo - i can use SQL for this? [15:15:31] Not quite... it's SPARQL which is more advanced [15:15:39] but it's a well documented standard [15:15:45] acting on top of rdf [15:16:31] ic [15:17:07] The query for getting an item by its twitter handle isn't hard to write [15:17:25] hoo - but what i want to know is the fields i can actually pull for example [15:17:48] Everything you can see via the UI is part of the dumps [15:17:50] what can i learn from wiki data about @justinbieber [15:18:21] https://www.wikidata.org/wiki/Q34086 << that's his item [15:18:26] found via a SPARQL query [15:19:03] holy moly [15:19:12] this is amazing [15:20:09] what query did you do [15:20:24] is there a way to input loads of handles and pull all the data in like an excel format or something [15:20:44] this: http://pastebin.com/cXgpdkw7 [15:21:12] no, we don't have a csv output or something like that [15:21:25] oki [15:21:26] wouldn't be possible to represent all our data using such a format [15:21:31] makes sense [15:22:24] so suppose i have a bunch of handles [15:22:28] how do i export data [15:22:32] so i cna see the tags and stuff [15:22:41] or i would change the handle manually [15:22:45] and then copy paste the data? [15:25:41] You can access the data by handle using either https://www.wikidata.org/wiki/Special:EntityData/Q42.json or an api call (like https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&ids=Q42&format=json) [15:25:56] also all items are part of the dump [15:26:27] nice! [15:26:27] And how that format works is documented on the page I gave you earlier [15:26:32] thanks! :) [15:43:11] addshore: https://www.wikidata.org/w/index.php?title=Q404&action=history Lame ... [16:45:39] sjoerddebruin: please see PM [16:47:12] yup [16:47:38] Hi, does it make sense to start a discussion page for individual wikidata titems, or will that probably not be seen by anyone (i.e. https://www.wikidata.org/wiki/Talk:Q190524) [16:47:55] physikerwelt: Like to not get noticed, I'm afraid [16:48:08] They will be more visible in the new ui, I think. [16:48:23] You have three things there: Eigenvectors, eigenvalues and then both combined [16:50:34] hoo: Yes I agree my usecase is that I extract quantity symbol (P416) and so would need the individual concepts ;-) [16:51:21] yeah, it totally makes sense to split that one [16:51:27] for exactly these reasons [16:51:33] but someone would need to do it :S [16:54:55] how would one do that? [16:55:27] Create one item for each separate concept (if there aren't any yet) [16:55:42] then try to figure where the individual data belongs [16:55:49] that's messy, I know [16:56:21] so would I edit the wikitext or use an api? [16:56:35] There's no wikitext ;) [16:57:16] So, easiest is probably to just use the UI [16:57:24] which uses the API internally [16:57:29] Would recommend moving the links with the move gadget btw. [16:57:37] ... but probably not the most scalable one;-) [16:58:08] True that, but we don't really have tooling for these cases [16:58:15] Only the sitelink move gadget [16:58:22] but that's only for sitleinks, obviously [16:58:45] for french an german the also known as things that already do the differenciation [16:59:16] Feel free to remove these, if you create more fine grained items [17:01:54] if I would add those concept automatically, how would I make sure not to insert duplicates? [17:02:21] You mean create duplicate items in the process? [17:02:34] yes [17:02:48] That's not trivial, I guess [17:03:10] well, you could do a full text search and the let a human decide [17:03:34] on top of that, it's very hard... imagine an item with no statements on and only a Cyrillic label [17:04:19] I would probably find it ruwiki [17:04:38] In that case you could use the site:title pair to find the the item [17:06:54] is there a wikidata sparql tutorial? I think I'll start with read only for now and thereafter start with the quantity symbols for items that exist [17:13:42] ^^https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual found it [17:32:15] the main page says we have 15 million items, but https://tools.wmflabs.org/wikidata-todo/stats.php and the json dump have 18.5... what's the 3.5 million difference? redirects? [17:32:43] nikki: Well, there are different ways to count items [17:32:54] the number on the main page includes only non-stub items [17:33:05] thus items that have either a statement or at least one sitleink [17:33:14] ah [17:33:24] also there's some bug with increasing the count AFAIR, so that's probably also slightly off [18:13:12] aude: i can haz updated entity usage table? :) [22:18:06] is there a "Staff and contractors"-like page for WMDE? I have little idea who does what [22:19:41] spagewmf, What information are you looking for? [22:20:46] tobias47n9e__: e.g. what do Lucie and Jeroen work on, or hoo? [22:23:56] spagewmf, I don't know of a page that lists that. Might be shown on phabricator? [22:24:35] like https://wikimediafoundation.org/wiki/Staff_and_contractors , essential to figuring out WMF [22:26:09] Lydia_WMDE, ^ [22:26:43] spagewmf, Probably best to wait around and maybe Lydia can answer the question. [22:26:50] https://wikimedia.de/wiki/Mitarbeitende [22:27:15] Lydia_WMDE, < 1 min response time. You are the best! [22:27:22] ;-) [22:27:28] on a sunday night! [22:28:15] Lydia_WMDE, I will try that on a Friday morning at 04:00 and time again ;) [22:28:28] heh nah rather not [22:29:57] tobias47n9e__, Lydia_WMDE : Google coughed up https://github.com/orgs/wmde/people and https://meta.wikimedia.org/wiki/Wikimedia_Deutschland#People , latter leads to https://wikimedia.de/wiki/Staff <- Gold! [22:30:25] I saw a very good poster at Berlin... [22:31:31] spagewmf, Do you think we should make a redirect from: https://www.wikidata.org/wiki/Staff ? Not sure about the namespace though. [22:33:53] imho no [22:35:36] tobias47n9e__: https://www.wikidata.org/wiki/Wikidata:Staff redirects to Administrators. I guess a hatnote/See also there saying "If you are looking for project staff, https://wikimedia.de/wiki/Staff lists the Wikimedia Deutschland staff, many of whom work on Wikidata". But that's long-winded and promotes staff over volunteers. Hmm [22:41:53] spagewmf, Or put it somewhere in the help section.