[00:16:38] Is there a way for researchers to query the Wikibase wikidata tables ? [00:36:56] ewulczyn: rephrase that with an example [00:39:07] Ironholds: Say I want to query the set of items that have a statement involving a particular property. There is a tool on labs: https://wdq.wmflabs.org that exploses an api to to this but it is not reliable. Is there a db I can query directly [00:39:23] ? [00:41:08] ewulczyn: no, because Wikidata doesn't have "tables" [00:41:23] all the properties, entries and everything else are structured JSON blobs embedded in the revision table [00:41:43] this is precisely why the third-party dataservice is necessary - and it's not permanently up, sure, but it beats the status quo of "you want data? AHHAHAHAHA". [00:42:02] your options are (1) use that or (2) download the entire database as a JSON object and handle it locally [00:42:45] Ironholds: I was afraid that was going to be the answer. [00:45:08] * Ironholds shrugs [00:45:15] if mobile wants to deal with it this'll get built out [00:45:22] if it doesn't get built they won't have to deal with it [00:58:20] ewulczyn: Wikidata has dumps. I haven't dug the dumps myself but that's another option [00:59:30] and ewulczyn, there are some tables in wikidatawiki database. I ended up spending some time trying to figure them out and eventually learned that they are not properly generated. [00:59:39] api query or dumps [01:00:11] I'm signing off for few hours folks. see you later. [01:12:47] ewulczyn: maybe eventually, when they get a graph db up [17:16:49] hi researchers [17:23:11] hey harej :) [17:24:16] Ironholds, my script produces datasets that are essentially lists of usernames. This is all stuff that comes from the API and is available to the public, even though I had to write 200 lines of Python to get the actual information I want. From an ethical perspective, what is the best practice regarding publishing these datasets? [17:24:37] throw it up on Figshare, give it a DOI and call it done? [17:24:41] oh, and CC-0 it [17:24:48] What about Github? [17:24:58] I was more asking if there's anything problematic about publishing lists of usernames. [17:25:06] GitHub works too! If you want people to cite it, Figshare. If you want it to be versioned, GitHub [17:25:17] and either way, unless it's "list of users who are horrible nazis" you're probably fine [17:26:39] And this is fine specifically because I did not collect this information from people, but because I got it all from a public API. [17:27:22] Will each individual dataset need a DOI, or will the whole collection get the same DOI? [17:30:25] harej, if you put it in a collection, the latter! [20:23:07] damn [20:23:09] no halfak!!!1 [20:24:43] I assume he's watching metrics or hacking [20:24:47] hey DarTar [20:24:52] (and hey YuviPanda ;p) [20:25:04] :D [20:25:06] I just found out about https://github.com/jupyter/jupyterhub [20:25:10] AND AM SUPER EXCITED [20:26:45] yay! [20:26:46] I uh. [20:26:53] I made a really depressing data visualisation [20:26:58] oh uh [20:27:11] so, I geolocated all the mobile and desktop editors and ranked them, right? [20:27:20] that is, ranked the count of editors by country? [20:28:03] and then you graph those with each value as the X or Y value respectively, draw a 45 degree line across it, and...well, if a country sits right on the 45% line, its prominence in the league table of "countries that send us mobile editors" == prominence in the league table of "countries that send us desktop editors", right? [20:28:36] so ideally we want it to be all over the place. Or even more nicely, we want two distributions, one very desktop-heavy (traditional western nations) and one very mobile-heavy (the developing word) so we can see that mobile is reducing systemic bias. [20:28:38] With me? [20:29:07] http://ironholds.org/Rplot.png the HCI term for this is "bad news bears" [20:29:45] (the big outlier is Cuba. Much better mobile than desktop representation, which makes sense) [20:33:23] wow, that's a really tight fit [20:33:40] so basically, mobile editing is benefitting the rich countries [20:34:41] I'd phrase it more as: we're not seeing substantially different mobile/desktop populations geographically, even if they are different access methods. [20:44:19] hey Ironholds [20:44:45] hey fhocutt! :) [20:45:09] thanks for your edits on Inspire pages [20:45:53] np! I am trying to be minimal because nobody needs me drowning voices out. But there were a couple of conversations where I couldn't help but go "oh god you are so terribly terribly wrong and proving it every time you open your moth" [20:46:21] ...mouth. Although I imagine if they start disemboweling moths everywhere they'll also provide evidence for the idea of this being an aggressive environment. [20:59:03] Ironholds: I’m looking at graph now [20:59:12] YuviPanda, cool! [21:01:28] Ironholds: right, so it basically means ‘they aren’t that different’? [21:02:58] YuviPanda, yup [21:03:06] * YuviPanda sighs [21:21:56] Ironholds: do you have a similar graph for views rather than edits? [21:22:10] no, but I could make one! [21:22:35] later, though. Many things to work on :( [21:22:54] sure. It's more curiosity than anything. [22:42:14] hey halfak! :) [23:03:26] Ironholds: re. earlier, you probably realise already but we could be having more mobile edits from non-western countries even if the relative rankings don't change. Before it might have been a 1000 time difference between the US and India, now it might be a 200 time one, etc. (I don't know, but that graph wouldn't capture this.) [23:05:05] wctaiwan, indeed! [23:06:26] halfak: joining us? [23:29:50] http://datavis.wmflabs.org/agents/ BOOM [23:34:59] Ironholds: nice, you should deposit a copy on figshare and get a DOI [23:35:08] DarTar, am gonna [23:35:53] and you should update the desc of the shiny app, re: raw data [23:36:16] where? [23:36:30] “We are looking into potentially releasing the raw agents as well, in order to enable upstream developers to refine user agent parsers.” [23:36:41] oh, gotcha [23:36:43] * Ironholds headdesks [23:37:36] I would also add a short “data preparation” section explaining when this data was collected and how it was aggregated/prepared [23:42:44] DarTar, try now- aw NOW I see the data prep bit ;p [23:42:46] * Ironholds starts writing [23:45:26] also maybe quickly explain “site used” to clear any confusion on device vs access method, it’s not documented on figshare [23:46:05] and last thing–just to be a total pain–please add a copy of this data to the datahub for portability [23:46:50] yup [23:46:56] I don't know if I have datahub access [23:47:39] you’re not on the WIkimedia member list, create an account and give me the username so I can grant you admin privs [23:47:55] sure; it'll be tomorrow. [23:48:01] okay