[10:27:30] Hi there! [10:28:13] I'm looking for a tutorial/workflow on API access to wikipedia. [10:28:52] Specifically I'm looking for a way to access the metadata of Spoken Wikipedia ogg files in bulk. [10:29:09] anyone able to help me with this? [10:30:40] Unfortunately I don't have much experience scraping data from the web, but I am an avid user of R and have a fair understanding of a few programming sources (e.g. Python). [10:59:55] puslet99: see https://www.mediawiki.org/wiki/API:Main_page [11:44:28] thanks, Nemo_bis, do the dumps such as http://dumps.wikimedia.org/enwiki/20150304/ contain also the metadata of files uploaded to wikipedia (such as .ogg spoken wikipedia files), maybe you know? [11:46:00] I'll look into it then sometime [11:57:44] puslet99: there is no reason to believe the files would be in Wikipedia and using the dumps is probably overkill [11:57:55] I suggest to use the web API, of which I linked the starting instructions [12:06:44] yes, not the files themselves, but the metadata. I'm talking about files like this https://en.wikipedia.org/wiki/File:Caesium.ogg, which seem to be linked in wikipedia, although i now see that they are all actually stored in wikimedia commons: https://commons.wikimedia.org/wiki/File:Caesium.ogg [12:07:56] the API is the way to go probably (but then in wikipedia or wikimedia commons?), and the instructions probably will help. since i really want to just search for something, i am also looking for the quickest way from A to B. [12:09:18] that is API will probably provide sufficient help, but if there is a working example of a script that searches through this kind of metadata, or converts these metadata into tables, I will be most happy to use that. API is great to have, but it also takes some skills. [12:10:59] Not to say that the help file probably isn't exhaustive, I expect that it is, but I imagine there have been others like me, who do not know well how to use APIs, but would be quite quickly able to change a few parameters in working scripts. [12:11:27] agreed, that there's no need to add files themselves to the dump [12:18:06] so, I guess I can see, that lines like http://en.wikipedia.org/w/api.php?action=query&titles=San_Francisco&prop=images&imlimit=20&format=jsonfm , give the information asked for, but I can't really figure out how you would change the request to get something specific as "metadata of files linked to spoken wikipedia articles" [12:18:18] and whether I should do this via wikimedia or wikipedia [12:18:57] a collection of inquiries or scripts for looking for some type of info may help (and I'm sure they may exist too, but I haven't found them yet) [20:39:56] * halfak waits for his R packages to compile [20:40:22] ggplot2 is pretty big [23:05:33] hey-mo :) [23:11:41] fucking sunday 4AM security issues [23:11:45] * yuvipanda curses world, goes to sleep [23:12:55] what's up? [23:13:22] Ironholds: nm, just sterriblenessssss. it’s undercontrol now tho [23:13:29] *hugs* [23:14:54] ty much appreciated :)