[12:50:41] how to find datasets on wididata, I looked up gambling and got links with journal articles not datasets, how do I find datasets that are in CSV formats [13:16:46] there is json dump [13:28:52] vyadhaka: Wikidata isn't organised in datasets but in individual items [13:29:44] so if I want a dataset for class project, wikidata is not the right place? [13:29:48] You can export any of these items, or write a query to export some of them meeting some criteria, or download an entire dump [13:30:15] is there an example of how to do that? [13:30:50] Sure, you can export item Q5 in, let's say, JSON, with https://www.wikidata.org/wiki/Special:EntityData/Q5.json [13:31:27] You can use .rdf, .php, etc., but .csv isn't available [13:32:09] I dont know those formats, will probably look elsewhere [13:32:13] thanks [13:33:30] You can also download the result of a SPARQL query in CSV format [13:33:41] Could that be useful? [13:35:04] For example... http://tinyurl.com/mkas55l [13:35:15] These are some recent events [13:36:06] You can download this result and get something like... [13:36:06] event,eventLabel,date [13:36:06] http://www.wikidata.org/entity/Q1233983,Documenta 14,2017-04-08T00:00:00Z [13:36:06] http://www.wikidata.org/entity/Q15206389,"Somaliland parliamentary election, 2017",2017-03-27T00:00:00Z [13:36:09] http://www.wikidata.org/entity/Q16061881,"Dutch general election, 2017",2017-03-15T00:00:00Z [13:36:12] ... [13:43:41] is there a property "xsd" or "schema" that would be used to point to a specific .xsd that defines an object (Item)? [15:14:27] abian: had the window minimised, I dont understand how to use the wikidata still, I am used to data in a file that can be read into R, python etc to do some analysis, I guess I need to read up on how to convert wikidata to the format I am used to, [15:37:03] sjoerddebruin: What are you thoughts about creating a Dutch politics wikiproject? [15:37:52] vyadhaka: What is your goal? [15:42:12] multichill: I need a decent sized dataset for a class project, we are supposed to analyse data using python and sql [15:45:05] vyadhaka: use the json dump [15:45:29] vyadhaka: what you need to do depend on the dataset, no? [15:46:57] not really we can pick any but I would like to have a look at it before and understand the scope, I have to keep using it until the end of the course as it evolves, in the end we have to compare it with another similar dataset (probabaly from different time periods) and offer some conclusion [15:47:55] multichill: sounds lovely [15:49:45] vyadhaka: yeah use the json dump https://www.wikidata.org/wiki/Wikidata:Database_download#JSON_dumps_.28recommended.29 [15:50:11] vyadhaka: what I do, is that I load the json dump using python2 into wiredtiger and then clean it up and fire some stats about it [15:51:03] ok is there an how-to somewhere on the web to undertand the steps [15:51:33] first question, what dataset am I looking for? how do I choose? [15:51:57] then I can work on getting the JSON data dump [15:55:17] vyadhaka: For you first data class, the json dump might be a bit too large? [15:55:54] sjoerddebruin: Maybe move the stuff in your userspace to some place more central to kickstart it? [15:55:59] multichill: need around 1000 entries with 20-40 variables [15:56:21] Json dump is much larger than that! [15:57:03] ah ok, I will probably stick to csv for now, I dont have the time to learn something new and experiment [15:57:35] vyadhaka: https://github.com/metmuseum/openaccess is a fun one [15:58:25] I'm parsing it and making xml sets out of it for https://commons.wikimedia.org/wiki/Special:ListFiles/Pharos [15:58:43] Looks like the sculpture set is going online now :-0 [16:19:51] multichill: it is a bit big for my purpose, it has close to half a million entries [16:20:53] need some thing with few thousands and also should be comparable to data from another region or time frame [16:21:11] thats why I was looking at crime data [18:47:58] not if they do not meet the admissibility criteria for something else