[16:00:24] #startmeeting Wikidata office hour [16:00:30] Hello all :) [16:00:45] Hey Auregann_WMDE [16:00:57] hey everyone [16:00:57] Hi [16:01:13] Auregann_WMDE: The bot didn't respond... So it might not have picked up on it [16:01:34] #startmeeting [16:02:04] anyway, let's do it ;) [16:02:14] Looks like the bot is gone... Possibly related to labs stuff [16:02:23] Not sure who runs it [16:02:27] Welcome to the Wikidata office hour! we will start with a quick overview of what we're working on last months [16:02:38] and then you will be able to ask us any question you want [16:02:52] * Lydia_WMDE waves [16:03:09] who's here for the office hour? [16:03:10] Lydia, what's up in our projects? ;) [16:03:15] :D [16:03:18] lots! [16:03:46] *waves* [16:03:59] alright - let's take a look at what we've been doing and what is coming on the dev side first. [16:04:09] then Léa will talk about the non-dev things [16:04:30] first big piece of work is around Wiktionary and supporting lexicographical data [16:04:37] yay! [16:05:02] we've developed an extension to do automatic sitelinks for Wiktionary. that is ready for deployment now and will go live on april 24th [16:05:11] you can help translate the announcement here: https://www.mediawiki.org/wiki/User:Lea_Lacroix_(WMDE)/Cognate_announcement [16:05:45] in parallel we have worked on a new entity type Lexeme (next to items and properties) that will hold the lexicographical data [16:06:11] that is taking shape but not ready to show yet. I hope that will change in the next 2 months [16:06:45] we've also written down the technical specification for it. you can find it here: https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model That also has links to some potential examples [16:07:12] and we've done research to better understand how Wiktionary editors work right now [16:07:25] and how we can best support that work with the help of Wikidata [16:07:46] the next big piece of work was around Wikidata itself [16:08:04] we've added previews to the suggester when you add a link to an image on Commons [16:08:25] we also adapted some colors to be more in line with the rest of the wikimedia projects [16:09:20] we've been working on an API for the constraints checks. that still needs polishing but should go live in a basic form in the next two months as well. once we have that we will be able to show constraint violations right next to the statement that is causing it [16:09:34] and thereby hopefully get more people to fix issues in the data [16:10:12] we've also worked on a new datatype to link to geoshape files on Commons. that will go live on april 24th as well. [16:10:58] keyboard navigation should now also work better on item pages [16:11:18] and! we have better previews on social networks like here: https://twitter.com/BenediktRitter/status/844953528032485376 [16:11:23] for links to item pages [16:12:07] the next thing we'll be working on in this area is polishing the constraint checks api and improving the property suggester [16:12:43] Léa also planned to get people together to improve our documentation at the next Wikimedia hackathon in Vienna [16:13:15] Big area number 3 is support for structured data on Commons. [16:14:01] here we worked on federation (being able to use Wikidata's items and properties on Commons). i hope we have a demo system for that up in the next 2 weeks [16:14:22] we've also done a lot of user research. you can find the results here: https://commons.wikimedia.org/wiki/File:HeavyCommonsUserQualitativeResearch.pdf [16:14:54] and the Wikimedia Foundation has gotten a grant from the Sloan Foundation to work on this as well: https://blog.wikimedia.org/2017/01/09/sloan-foundation-structured-data/ [16:15:30] this means the wikidata team will shift focus towards the backend stuff and groundwork. WMF will do the more user-facing parts and tool integration [16:16:10] our next piece of work there is making it possible to store structured data and wiki text in the same page. this will allow us to have statements and wiki text on the file pages on commons [16:16:42] the next big work area is support for Wikipedia and similar sister project. [16:17:33] here we worked on making the ArticlePlaceholders indexable by search engines. we've done a trial run with a small number of placeholders on welsh wikipedia. that went pretty well and we will now expand that [16:18:32] we've also put finishing touches on the click dummy for editing Wikidata's data directly from Wikipedia. I'm pushing to get that published for feedback in the next 2 weeks. [16:19:23] what we'll be working on next is more fine-grained usage tracking. so far we only know that a certain article uses statements from wikidata. we'll expand it to be able to tell better which statements are used. [16:19:40] and last but not least: the query service [16:19:50] we added more autocompletion [16:20:16] and increased the timeout from 30 seconds to 60 seconds. this should allow more queries to run now even though they'll take a while [16:20:45] and you can now query wikidata's sparql endpoint and a small number of other endpoints together: https://lists.wikimedia.org/pipermail/wikidata/2017-March/010476.html [16:20:56] the list will be expanded [16:21:15] we've also added more dimensions for unit conversion [16:21:28] and made it possible to download query result visualizations as SVG [16:22:08] which brings me to my last point: Since Monday we have WikidataFacts on the team as an intern \o/ [16:22:23] \o/ [16:22:43] any questions on any of this so far? or should we hand over to Auregann_WMDE? [16:24:27] sooo let's talk a bit about events. We've been at several events such as the Wikimedia Developer Summit, more recently the Wikimedia Conference, also the FOSDEM where our UX team gave a talk [16:25:02] We started a regular Wikidata meetup in Berlin: the previous one was last week and the next one on April 26th :) [16:25:29] Of course, there are lots of other Wikidata-related events around the world: have a look or add your own on https://www.wikidata.org/wiki/Wikidata:Events [16:26:07] For the German editors, we are for example looking for volunteers to attend to the Datensummit and a Wikidata workshop in Ulm (more info on the page) [16:26:45] As usual, a lot of new tools have been developed by volunteers based on Wikidata [16:27:27] Let's mention the nice Gender gap tool that allow us to see the gendered repartition of our knowledge on Wikidata and Wikipedia https://www.lehir.net/a-tool-to-estimate-gender-gap-on-wikidata-and-wikipedia/ [16:27:59] You can also play "Stadt, Fluss, Land" ("Petit bac" for the French) based on Wikidata \o/ https://stadt-land-wikidata.netlify.com/#/ [16:28:18] Monumental shows you heritage buildings https://tools.wmflabs.org/monumental [16:28:39] This one checks Germany sister cities on Wikidata and Wikipedia https://tools.wmflabs.org/sistercities/ [16:28:56] And many others :) [16:29:38] We also have a lot of people writing papers or blog posts about Wikidata, I can't quote them all, so here's a shortlist of nice articles that have been published recently: [16:29:59] http://www.snee.com/bobdc.blog/2017/02/getting-to-know-wikidata.html [16:29:59] http://www.snee.com/bobdc.blog/2017/03/wikidatas-excellent-sample-spa.html [16:30:00] https://blog.wikimedia.org/2017/03/08/wizards-muggles-wikidata/ [16:30:00] https://blog.wikimedia.de/2017/01/27/software-product-management-as-an-internship-learning-about-the-real-world-at-wikimedia-deutschland/ [16:30:00] https://blog.wikimedia.de/2017/02/03/being-a-volunteer-developer-for-wikimedia-projects-an-interview-with-greta-doci/ [16:30:00] http://www.oxfordaspiremuseums.org/blog/wikidata [16:30:38] For beginners or people who would like to have an overview of Wikidata, I advise this nice video made by Asaf Bartov https://commons.wikimedia.org/wiki/File:A_Gentle_Introduction_to_Wikidata_for_Absolute_Beginners_(including_non-techies!).webm [16:31:09] if you're interesting in patrolling (=watching the recent changes and fighting vandalism) you'll find a very useful collection of links here https://www.wikidata.org/wiki/User:YMS/RC [16:31:32] oh and one of our developers, Amir (Ladsgroup) published a paper at WWW 2017 https://arxiv.org/abs/1703.03861 [16:32:24] Several new WikiProjects have been created, I'd like to mention the project dedicated to welcoming newcomers and improving documentation: add yourself to the list if you want to help! https://www.wikidata.org/wiki/Wikidata:WikiProject_Welcome [16:32:50] Currently, WMF is asking editors about the general strategy of the movement, you can participate here https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017 [16:33:58] You probably heard about Wikimania, the international Wikimedia conference will take place in Montreal this summer, and you can still submit talks. Careful, the deadline for talks is April 10! For workshop, discussions, etc. it's May 15. https://wikimania2017.wikimedia.org [16:33:58] Submissions and discussions related to Wikidata: https://www.wikidata.org/wiki/Wikidata:Wikimania_2017 [16:34:53] We had several data donations during the last months. As usual, the data need to be reviewed and improved, thanks to all that helped! [16:35:18] We had Quora https://blog.quora.com/Announcing-Wikidata-References-on-Topics , the BBC, Songkick with artists IDs https://www.wikidata.org/wiki/Wikidata:Project_chat#Data_donation:_Songkick_IDs and Social Network Archival Context (University of Virginia) [16:35:18] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/01#Social_Networks_Archival_Context_data_donation [16:35:52] That's a lot of content, I know... any questions? :) [16:38:13] awesome! I have a question about federation (which was in Lydia's part): is federation with non-CC0 sources inconceivable? [16:38:23] MechQuester waves hi. [16:38:34] pintoch: good question! [16:38:40] no it is not inconceivable [16:38:41] * hoo waves [16:38:55] we'll enable it but we did want to start with cc0 for two reasons: [16:39:21] 1) it is much easier because we don't have to worry about attribution and such at all and neither do the users of the query service [16:39:52] 2) the organisations running a cc0 endpoint deserve some praise and recognition and being first in line [16:40:13] is this re federation of the sparql endpoints? [16:40:21] yes [16:40:27] ah, ok, thanks [16:40:27] Where do you see wikidata going within 5 years? [16:40:52] to Mars (because Wikipedia already went to the Moon) [16:40:52] Lydia_WMDE: ok thanks! yeah I agree it's good to put these sources forward. :) [16:41:33] hahahah. [16:41:41] MechQuester: heh well in Wikimedia it will become more and more the backbone of all our projects and will be a true source of support for all our projects, especially the small ones [16:41:52] I have a question regarding the constraints [16:42:04] do you plan migrating from templates to statements soon? [16:42:24] outside wikimedia it will become more and more of a source for good, reliable open data about the world as well as tagging vocabulary [16:43:12] matej_suchanek: i'll need to do some more research if we can actually already do that or if there is more work needed for example for the bot that currently uses them. [16:43:48] also the current implementation of the special page and api does not cover all constraints yet. we'll expand that though now that we have sparql [16:45:32] and regarding the Structured data for Commons, could you describe how the change will look like? [16:45:45] which change do you mean? [16:45:55] how things will work on commons in the future? [16:45:56] or? [16:45:57] wll, no there's nothing but then... [16:46:03] yes, like that [16:46:05] ok [16:46:32] disclosure: I'm not active on Commons [16:46:43] So right now each image, video etc has a piece of wiki text attached to it that says which license it is under, who took the photo, etc [16:47:16] Im not satisfied with a user Brya. [16:47:27] the issue with that is that we can't properly build tools on top of that, make commons usable in languages other than english or have good search [16:48:03] we believe we can help with structured data support by Wikibase and Wikidata [16:48:21] so in the future you will be able to make statements on a file page like "license: CC-BY" [16:48:26] if I understand correctly, you will create a separate repository for Commons? [16:48:28] where license is a property on wikidata [16:48:34] and CC-BY is an item on Wikidata [16:49:00] Mind if i ask, how does the edit byte cout work? sometimes it would be +100 or - 25 or numbers if add or remove links and It is kind of hard for me to see how those edits are that big [16:49:06] and then we can build tools on top of that structured data to make commons usable for non-english speakers, have good search and so on [16:49:20] MechQuester: Because things are stored in json [16:49:31] it's not just plus Q12345 [16:49:41] it can be adding properties, etc all the way down [16:50:02] so there is no second repository for the vocabulary (items and properties) but there will be another place that has statements, yes [16:50:11] does that answer your question matej_suchanek? [16:50:40] so files will be associated with an entity on Commons? [16:50:45] yes [16:51:12] Quick question on data volumes. Looking to expand coverage of Companies by importing structured data from public sources (LEI, SEC Edgar XBRL, etc.) what are the reasonable limits, now and going out say 5 years? (financial statements on all public companies in the US might be around 160m fields, mostly numbers. This of course could be limited way back, or done in stages) [16:51:45] Lydia_WMDE: and about Wiktionary, where is their data going to live? [16:51:49] MechQuester: we the dev team unfortunately can't really help with individual users. i fear you'll have to take that to the editors [16:51:55] matej_suchanek: on wikidata [16:52:07] ok, thank you very much [16:52:38] rjlabs: hard question to answer. it depends on the growth of the community, what they want to cover and what the software can handle [16:53:49] MechQuester: the edit sizes are a bug (Reedy explained where it comes from). It would be great if someone fixed that and made it more sensible [16:54:14] (e.g. "+2 statements", "+3 references", "-2 sitelinks" instead of byte diffs [16:54:51] dennyvrandecic_: yay, if we had a place to put that :) and if that wouldn't break the comparison to older revisions... [16:55:14] edit sizes are measured in bogo-bytes. they give a rough estimate of "added a lot" or "removed a little" [16:55:26] the absolute numbers are meaningless [16:55:29] Lydia_WMDE, the other day there were some insightful comments on the talk page of Wikidata_talk:Item_quality, that tracking sets of items would help to improve the quality and the workflows. Did you take a look? [16:55:43] well, that's why I call it a bug that needs to be fixed :) I am not saying it is trivial [16:55:51] Micru2: i had not seen that but that is kind of the point, yes [16:55:55] and btw, hi everybody! [16:56:03] Lydia_WMDE: I made my work on Wikidata more effective when I switched to enhanced recent changes but I cannot do so on other wikis because you cannot see Wikidata changes (on watchlist)... are you going to do something about this soon? [16:56:55] matej_suchanek: i wish i could say yes but not sure i can promise it at this point. getting wiktionary basics done is the main focus at the moment [16:57:10] ah yes, of course [16:57:18] Just in general, can Wikidata hold a moderately large volume of table data / time series data? Or is that very cumbersome to cast and store as triples? [16:57:39] rjlabs: it can do it to some extend but it is not really what wikidata is good at and made for [16:57:45] rjlabs: wikidata is not really good for time series [16:57:48] dennyvrandecic_: it'S not a bug, it's a misfeature http://catb.org/jargon/html/M/misfeature.html [16:57:50] Lydia_WMDE, right now it is difficult to work with individual items. Should we start a brainstorming process to see what kind of tools people would like to see in the future? I have the feeling that with the discussion about the item quality the "solution" was presented without proper consultation [16:57:57] DanielK_WMDE: ok [16:58:55] rjlabs: for time series there is a better way to store them in wikimedia products, I think Dan Andreescu deployed that feature [16:59:09] Micru2: if we have a clear scope of it, sure. but really the current work on that is just one piece of the puzzle and really basic groundwork that will enable other tools later [16:59:24] rjlabs: Wikimedia Commons can now store tabular data : https://www.mediawiki.org/wiki/Help:Tabular_Data [16:59:26] Would WikiData prefer to be just a top level "registry" of companies and have URLs point to rich company data stored elsewhere? [16:59:39] rjlabs: so you could upload the metadata and the entity descriptions to wikidata, but then link to the tabular data on commons for the timeseries [16:59:43] rjlabs: that is more in-line with what wikidata is good at yes [16:59:59] rjlabs: yes [17:00:31] Wikidata is really a collaborative platform for modelling the world. it's not a plain data repository. [17:00:31] have some high visibility data in wikidata, but then plenty of references to rich data sources outside [17:02:17] In company accountings we have well established xml schema and structured data filings. Is there any need to align WikiData's object structure to well match the detailed xml accounting schemas (even for just the top level data Wikidata wants to store directly?) [17:03:16] we have mappings to a lot of other ontologies for example via external identifiers [17:03:36] if you look at any large item there should be a long list of identifiers at the bottom [17:04:01] and of course in your schema/ontology you can store the corresponding wikidata IDs [17:04:18] same for the properties [17:04:32] rjlabs: that might be overkill in most cases. The first question is which of the fields to we want to have in Wikidata, and then, how do they map to the existing ones, if at all? [17:04:46] Sounds like something you'd like to discuss in a Wikiproject on Wikdiata [17:04:56] and find kindred souls interested in discussing this in detail [17:05:11] i think it would be a great and valuable effort! [17:05:27] rjlabs: like https://www.wikidata.org/wiki/Wikidata:WikiProject_Companies :) [17:05:36] exactly tahat, thanks Auregann_WMDE [17:06:14] yes, rjlabs and I are already involved in this project :) [17:06:23] Project companies is a good start, but some of this even gets past that. [17:06:50] rjlabs, if you have anything in mind you can post it on the project chat [17:07:36] you will find each country has a Statistical Business Registry - and those combined is our best universe to get company data ...however there are many issues [17:07:43] VIGNERON o/ if you have a question, it's now or never :p [17:07:56] I'm good ;) [17:08:15] rjlabs: have you had a look at opencorporates? [17:08:55] Yes I have seen opencorporates. What is the relationship of Wikidata to opencorporates? [17:09:06] we are friendly connected projects [17:09:57] put the rich financial data in open corporates, link to them and call it a day? [17:10:26] we have an external ID https://www.wikidata.org/wiki/Property:P1320 [17:10:29] the last news from them were that alignment to Wikidata was held back by granularity issues ( https://github.com/sparcopen/open-research-doathon/issues/56 ) [17:10:32] i believe that is definitely a good step. call it a day, i don't know. that in the end is a community decision. and will probably change over time [17:11:27] hmm actually a better link is https://www.wikidata.org/wiki/Property_talk:P1320 , of course [17:11:46] If we could talk the SEC into providing a sparql endpoint for all its rich financial data would that help? [17:11:48] Indeed, the issue with very massive data is not only storage/software, but also community: are there enough people to properly watch and update this amount of data? [17:12:19] rjlabs: i think that would totally help - independent of wikidata [17:12:42] I'm really only interested in structured data that can be had from public sources and where ultimately (and how) to store it [17:13:25] If the US opened a SPARQL endpoint on its EDGAR data do you think other countries would follow suit? [17:14:07] i'd certainly hope so (but can't tell for sure of course) [17:14:58] SPARQL endpoints would avoid WikiData trying to import all, or even slices of that data [17:15:03] any more questions? :) [17:16:45] yes one [17:16:59] is there a dev server with the current status of the wiktionary development? [17:17:26] looking at phabricator, it looks like senses are progressing well, and forms are also close [17:17:32] yes a really meh one [17:17:36] let me search for it [17:18:11] dennyvrandecic_: it exists, but there is not much to see yet (except bugs). http://wikidata-lexeme.wmflabs.org/index.php/Special:NewLexeme [17:18:20] it is not currently being kept up-to-date but i'll ask for an update [17:19:01] this is really an internal test system. it's not remotely close to being a beta test [17:20:21] Thank you very much for attending and your questions :) Of course you can reach us at anytime [17:20:39] thanks everyone for joining :) [17:20:47] Lydia_WMDE, thanks, but don't put too much effort into updating this [17:20:52] Have a nice evening/morning/night ;) [17:20:57] I'd rather see the effort spent in development :) [17:20:58] dennyvrandecic_: nah we need to do it anyway [17:20:59] thank you! [17:21:06] ;-) [17:21:31] #endmeeting [18:05:26] getting close to starting... [18:08:03] dr0ptp4kt: waiting for livestream to start… [18:08:15] coming in 3, 2.... [18:08:21] 3, 2.... [18:13:12] dr0ptp4kt: This looks interesting, but I have no idea what I'm watching :) [18:13:59] random commons audio and wikipedia lead sections [18:14:24] neslihan built this as the project she selected for outreachy [18:14:49] oh cool [18:18:01] that's sweet. [18:30:33] Thanks all, cool demos (I'm already using the Wiktionary script :) [21:22:09] dr0ptp4kt: the lead sections were used as pseudo-captions, is this correct? [21:23:16] Volker_E: what's this now? [21:23:54] Volker_E: you asking about Wiki_Radio or something else? i may need a little more elaboration in the question [21:24:03] I'm talking about Wiki_Radio, yes [21:24:03] Volker_E: if needed i can talk on hangout [21:24:30] no, just interested side-note [21:24:44] if real captions were also considered? [21:25:51] Volker_E: got a couple minutes to ghangout? [21:28:03] dr0ptp4kt: need to join another meeting, but came across a citation about importance of captions and was curious [21:30:06] Volker_E: ok, i think you're asking about whether the CREDIT videos would get captions. it's easy enough to export the automatically machine generated ones, although they sometimes are comically wrong. neslihan captioned her youtube wikiradio video, so that will have captions in it itself. i like captions for a lot of reasons, including accessibility and [21:30:06] findability [21:30:16] but it's definitely time intensive [21:30:43] dr0ptp4kt: no, I was asking about Wiki_Radio. As Neslihan mentioned captions in her talk [21:31:10] dr0ptp4kt: second part of your reply it is [21:31:36] dr0ptp4kt: captioning CREDIT videos in general seems out of reach currently, clearly [21:32:21] Volker_E: ok: she was saying that her video, which was being replayed, needed to have captions turned on so people could see the video being described. i guess in terms of doing speech-to-text, that might be an interesting twist (the app does text to speech on the lead section of random wikipedia articles, which is of course for the audio use case) [21:32:57] the captions were turned on for her video, they were running along the bottom of the screen (hard to see if you were in the hangout, but clearly visible if watching the youtube stream)