[16:00:26] #startmeeting Wikidata office hour [16:00:26] Meeting started Tue May 29 16:00:26 2018 UTC and is due to finish in 60 minutes. The chair is Lydia_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [16:00:27] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [16:00:27] The meeting name has been set to 'wikidata_office_hour' [16:00:27] Meeting started Tue May 29 16:00:26 2018 UTC and is due to finish in 60 minutes. The chair is Lydia_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [16:00:27] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [16:00:27] The meeting name has been set to 'wikidata_office_hour' [16:00:52] Hello world! [16:00:56] Hello word :D [16:00:58] o/ [16:01:03] hello, double meetbot :D [16:01:19] Auregann_WMDE: ah, nice one :) [16:01:38] Hello [16:02:05] So, we're going to start, as usual, with an overview of what happened in the dev team since the last office hour (end of January, time flies) [16:02:12] then we will have a time for questions [16:02:35] The second part of the meeting is dedicated to a special topic, and today the special topic is of course the release of lexicographical data on Wikidata :) [16:02:42] Who is here for the office hour? [16:03:02] o/ [16:03:04] o/ [16:03:20] o/ [16:03:25] Yay my favourite people :) [16:03:36] Alright let's get this started then [16:04:10] I'll do an overview of what happened around the development. A lot has happened and I'm only going to concentrate on the most important things. [16:04:22] First of all: We have lexicographical data on Wikidata now! \o/ [16:04:48] It took us a lot of time to get to this point but now the first version is finally out and we can talk about it more in the second part of the meeting. [16:05:25] We also did a lot of work on improving usage tracking and with that what kind of Wikidata changes are shown on Wikipedia watchlists and recent changes. [16:05:32] o/ [16:06:13] Some of the biggest critizism from Wikipedians was that this has been pretty bad in the past and I hope that this is much better now. If you still see things that are not good please let us know and we can look into it more. [16:06:52] We also asked for input on how to improve our Lua functions to make it easier to create infoboxes with Wikidata. [16:07:03] You can still give input here: https://www.wikidata.org/wiki/Wikidata:New_convenience_functions_for_Lua [16:07:37] based on the Feedback we got we already made a few changes like a function that allows checking if an item is a subclass or instance of another item [16:07:48] and a function to test if an item ID is valid [16:08:34] Then we improved the constraints checks. Specifically we added a bunch of new constraints to be able to find even more errors: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/05#New_constraint_types [16:08:54] And the constraints are now enabled for all logged in users to help make errors more visible for more people [16:09:47] The search is also much improved and is now running on elastic. [16:10:32] We also continued our efforts to make it easier to install and use Wikibase outside Wikimedia by offering for example Docker images that people can use to easily set up their own knowledge base [16:11:00] And last but not least a nice little tweak: images are now shown with a thumbnail instead of just a link to the commons page [16:11:18] Any questions so far about any of this? [16:11:52] Sweet then I'll jump to the next part: what's next [16:12:20] We'll continue to polish/build out/improve the support for lexicographical data [16:12:21] Excellent. [16:12:59] And we'll spend a bit more time on the constraints and then investigate how to best integrate shape expressions into Wikidata as another more powerful tool to help with data maintenance [16:13:35] nice! [16:13:43] And we'll spend time on showing labels, descriptions and aliases in all the languages on mobile (right now you only see your own language) [16:13:47] great! [16:14:12] I'll go more into the lexicographical data part later. [16:14:34] Any questions about those? Or should we hand it over to Auregann_WMDE? [16:15:53] Alright, so appart from development, a lot of cool stuff happened during the past 5 months [16:15:57] Is there any plan on having a proper way of doing lists from Wikidata in Wikipedia, Wikisource... ? [16:16:07] with something like "simple queries" [16:16:07] Since February, we got 5 new admins: Kostas20142, Putnik, Okkn, Pintoch, Addshore. Welcome or welcome back! [16:16:32] Tpt[m]: yes but not before the things I listed unless someone pushes for it [16:16:55] Let me see if there is a ticket to collect the ideas/plan [16:17:02] Items now contain an average of 9 statements https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel-statements?refresh=30m&panelId=4&fullscreen&orgId=1&from=now-2y&to=now [16:17:23] Plenty of conferences, Wikidata workshops and events happened! Thank you all for making the Wikidataverse so active :) Top day was May 5th with 3 Wikidata workshops organized in different countries :p [16:17:34] Wikidata:Tools has been reorganized and updated, thanks to Pasleim! Feel free to help keeping this page up to date https://www.wikidata.org/wiki/Wikidata:Tools [16:17:44] The RFC about Privacy and Living People policy has been successfully closed https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Privacy_and_Living_People [16:17:47] Tpt[m]: https://phabricator.wikimedia.org/T67626 (though nothing too useful -.- I should spend some time expanding this) [16:18:04] As usual, a lot of new tools were created, updated or discovered: [16:18:14] EditGroups https://tools.wmflabs.org/editgroups/ is a new tool that lets you review, discuss and revert entire edit groups made by various tools. Try it and let feedback to Pintoch [16:18:22] The property explorer sorts and displays properties per category https://tools.wmflabs.org/prop-explorer/ [16:18:28] ok! thanks! [16:18:31] A new version of Denelezh, a tool to monitor the gender gap in Wikidata, has been released, including a new methodology to produce the data, and an overview of the gender gap by Wikimedia project https://denelezh.dicare.org/gender-gap.php [16:18:37] You can try the new Drag&Drop gadget developed by Yarl and give feedback https://www.wikidata.org/wiki/Wikidata:Project_chat#Drag'n'drop_gadget_rewrite_%E2%80%93_feedback_welcomed [16:18:47] OpenRefine 3.0 beta was released. You can get an overview of the new Wikidata-related features with tutorials and videos https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing [16:19:11] Relator is providing the family tree of a person https://tools.wmflabs.org/wikidata-todo/relator [16:19:36] I want to give OpenRefine 3.0 a try soon. I saw a demo that looked impressive - it now allows to add data to Wikidata! [16:19:54] spinster: you should, it's awesome :) [16:19:59] We also selected a few articles that are worth a look [16:20:02] Yeah and even cooler: it gives you reports of potential issues before import! \o/ [16:20:31] Making women more visible online https://blog.wikimedia.org/2018/03/29/increasing-visibility-women-with-wikidata/ [16:20:38] The work of Goran Milovanovic on the usage of Wikidata accross the Wikimedia projects https://blog.wikimedia.org/2018/01/29/from-the-life-of-wikidata/ + https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor/WDCM_Journal [16:20:46] Discovering Types for Entity Disambiguation on OpenAI https://blog.openai.com/discovering-types-for-entity-disambiguation/ [16:20:51] Some ways Wikidata can improve search and discovery http://blogs.bodleian.ox.ac.uk/digital/2018/02/14/some-ways-wikidata-can-improve-search-and-discovery/ [16:20:57] Using Wikidata to build an authority list of Holocaust-era ghettos https://blog.ehri-project.eu/2018/02/12/using-wikidata/ [16:21:04] Martin Poulter gave a TEDxBathUniversity talk about Wikidata https://www.youtube.com/watch?v=Wj8na1GFXMs [16:21:18] There have been also some scientific papers related to Wikidata [16:21:26] Practical Linked Data Access via SPARQL: The Case of Wikidata https://iccl.inf.tu-dresden.de/w/images/8/85/Wikidata-SPARQL-queries-Bielefeldt-Gonsior-Kroetzsch-LDOW-2018.pdf [16:21:32] Towards a Question Answering System over the Semantic Web https://arxiv.org/abs/1803.00832 [16:21:38] Automatically Generating Wikipedia Info-boxes from Wikidata http://aidanhogan.com/docs/infobox-wikidata.pdf [16:21:44] Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf [16:21:54] Yes I know, that's a lot to read ^^ [16:22:18] Any further question before we focus on Lexemes? [16:22:33] there was one article I really liked about using Wikidata for authority control or something similar, a few weeks ago I think… [16:22:38] was that one of the ones you mentioned? [16:22:45] I can’t find the link right now unfortunately [16:23:06] No, thank you for the tools. I will test them. [16:24:08] Lucas_WMDE: I'll have a look [16:24:38] Alright, then... [16:24:48] ...we have lexicographical data on Wikidata \o/ [16:25:00] Finally! :D [16:25:17] \o/ [16:25:17] Oh yeah! [16:25:18] to read about the details, and the current status of the first release, I encourage you to have a look at the announcement https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data#First_experiment_of_lexicographical_data_is_out [16:25:19] congrats [16:25:32] and all the discussions are happening on https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data [16:25:48] You've been many to discuss on this page, everyone is very constructive, I love that :) [16:26:44] Lucas_WMDE http://swib.org/swib13/slides/steinmetz_swib13_106.pdf [16:26:54] Just as an idea: during the first 3 days after the release, 1111 Lexemes created and improved in 49 languages by 119 people! [16:27:34] A lot of people are playing with the data, discussing about the best way to organize it :) [16:27:58] And of course, people have been starting building tools on the top of it, mostly to help with the features that are not there yet (search, queries) [16:28:42] let's mention Ordia by Finn Nielsen, providing some search on the first lexemes https://tools.wmflabs.org/ordia/search?q=hus [16:29:07] Lucas also wrote a hack to make nice graphs appear :) https://lucaswerkmeister.github.io/wikidata-lexeme-graph-builder/?subjects=L88%2CL129&predicate=P5191 [16:29:24] and I see some python scripts running here and there ;) [16:29:31] I really should have used some less stupid example lexemes for the demo link :D [16:29:59] :D [16:30:02] reminder: don't be too hard on the APIs right now, we're going to improve it in the future so it supports heavy queries :) [16:30:27] alright people, I need to leave now, I have to go to the dentist :o [16:30:39] cu Auregann_WMDE :) [16:30:40] have a nice evening and see you soon onwiki :) [16:30:57] That brings us to the what's next for lexicographical data on Wikidata [16:31:03] Good Bye [16:31:18] Obviously there are a lot of things missing still or not polished. [16:31:43] This includes things like showing the Lemma in recent changes/watchlist/AllPages etc [16:31:57] Some messages that are not really understandable for people [16:32:20] Fixing all these smaller and bigger things is one thing I want to concentrate on [16:32:49] Then we have Search, which is sorely missed. Stas is working on that at the moment. [16:33:22] Then we have querying. Tpt[m] was amazing and wrote a draft for the RDF mapping we need to support that. https://www.wikidata.org/wiki/Wikidata:Project_chat#Draft_for_the_RDF_mapping_of_Wikibase_Lexeme [16:33:47] If you have input on that please give it really soon so it can still be taken into account. [16:34:12] And then there is of course support for Senses which is needed to complete the base. [16:34:42] I'd love to hear from you what would be most important to you so we can make sure we prioritize right. [16:35:20] As a change of an Arabic diacritic can change the meaning of the word, we have to add diacritics to lexemes. The matter is that there is no database involving diacritized lexemes. [16:35:21] Also especially all the little annoying things that are not right yet: it would be super helpful to know about them. [16:36:06] Csisc: can you clarify? There is not existing other dictionary that does that? Or Wikidata doesn't do it? Or? [16:36:26] (Sorry for my ignorance as a non-speaker of Arabic) [16:38:25] Oh and if you're so inclined: check out all the ideas people aready wrote down for querying: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Ideas_of_queries [16:38:41] and for tools to build on top of that data: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Ideas_of_tools [16:38:52] Please add yours if you have additions [16:40:49] Thanks: Auregann_WMDE ! [16:40:58] oooh, office hour [16:41:13] Arabs tend not to put diacritics of words when writing in Arabic. Arabic diacritics are the equivalent of vowels.https://en.m.wikipedia.org/wiki/Arabic_diacritics. A change of the quality of an Arabic diacritic can change a lexeme into another one. [16:41:54] Ok [16:42:12] So we'd cover them in different Lexemes in Wikidata I guess? [16:43:09] Or is that not a good idea for some reason? [16:43:14] Yes, of course. That is why we have to diacritize the lexemes we have before adding them to Wikidata [16:43:26] Ok. Makes sense. [16:43:43] Is there anything we should change/add in the software? [16:44:45] Add a Lua function to get lexemes lemma to be able to easily link them from wikitext [16:45:30] Tpt[m]: hah! Yes. I'll check later if we already have a ticket for that but I think not. [16:45:35] Will make one then. [16:46:01] Tpt[m]: Would you prioritize that over any of the things I mentioned above? [16:46:24] For example, I have added some lexemes as labels to Wikidata entities before the creation of Wikidata's Lexicographical Data. I ask if we can extract these labels and integrate them to the Lexicographical Data. [16:46:47] "what would be most important to you" → I love the order you've already followed when explaining the Lexicographical stuff :) [16:46:56] \o/ [16:47:31] Csisc: hmmm good question. Is there an easy way we can find the ones that should be Lexemes? We don't want to create them en mass for people for example right? [16:48:31] Lydia: No, I believe that UI, search and SPARQL queries should go first [16:48:34] but we should not wait months [16:48:42] Yes, we can use statements like P31/P279 to check what are the labels to be added to the Lexicographical Data [16:48:49] Tpt[m]: heh alright. good to know [16:50:10] Csisc: ok. I guess it's a good idea to wait with that until we have search or queries to avoid creating a ton of duplicates [16:50:20] but not up to me at the end of course [16:50:38] Lydia: This kind of features could be easily added by volunteers if there is a consensus on a good function name [16:50:44] My suggestion though is to wait with masscreation until we at least have recent changes integration and search improved [16:51:15] Tpt[m]: sounds good! I don't have a good name idea right now but happy to brainstorm [16:51:32] great! [16:51:53] Tpt[m]: I'll create the ticket and we can collect suggestions there [16:52:14] thanks [16:52:41] About the masscreation... can we estimate how many lexemes Wikidata will have in the future? [16:52:58] abian: uhhhh good question! [16:53:01] any guesses? [16:53:37] If all proper names are accepted, as they're now, this could explode :) [16:53:51] depending on how enthusiastic the community is, I think they could easily overtake items in the future [16:53:55] Yeah not so sure if that's really useful but maybe [16:53:57] even without names [16:54:05] Lucas_WMDE: agreed [16:54:18] Hi Lydia_WMDE - in Wikidata's unfolding relationship with Google (eg Wikipedia/Wikidata is used by Google a lot) - and now potentially re lexicographical data, could you say a little about how you think something like GNMT / Google Translate will use lexicographical data in Wikidata / Wikipedia's 301 languages please - and re CC licensing too? [16:54:37] to throw some random numbers into the room – 100M lexemes before the end of 2020? [16:55:06] It would be amazing! [16:55:07] I agree. We should wait for the tools so that we will not have duplicate entries [16:55:27] Scott_WUaS: I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages. [16:56:04] Thanks :) [16:56:11] Lucas_WMDE: :panic emoji: [16:56:12] :D [16:56:26] my impression is that Lexemes are much better than item for names (that are definitely closers to lexical element than to usual concepts like people or places...) [16:56:27] Bots will start to create lexemes soon with no statements, I guess :) [16:56:34] Or with a few [16:56:52] Yeah [16:57:16] algorithmic heaven :) ... I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages. [16:57:17] So the number will skyrocket [16:58:21] We can start by making guesses for how things will look like after 1 month and then see how far off we are :D [17:00:33] Just a question. I ask when we can add senses to the Wikidata's Lexicographical Data. [17:00:58] Thank you! [17:01:09] Csisc: We'll start the development next week. My best guess is 3 months at this point but that is a rough guess. [17:01:12] Mainly Q-Embedded senses. [17:03:06] Alright. Any remaining questions? Wishes? Thoughts? [17:04:35] If not then I think we can wrap it up and I'll go file a ticket for the Lua function ;-) [17:04:39] CC-0 licensing question: Is Wikicitation - and possibly re lexicographical data for translation - [17:04:53] Is WikiCite CC-0 licensed? [17:05:15] WikiCite is a project. The data they add to Wikidata is CC-0. [17:05:15] Thank you for the office hour! [17:05:26] Thank you, Lydia! [17:05:27] Thank you so much everyone for coming! [17:05:31] Thank you. [17:05:55] I'm still taking your best guess for how many lexemes we will have after 1 month by email ;-) [17:06:05] <3 [17:06:37] #endmeeting [17:06:37] Meeting ended Tue May 29 17:06:37 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [17:06:37] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.html [17:06:37] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.txt [17:06:37] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.wiki [17:06:37] Meeting ended Tue May 29 17:06:37 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [17:06:37] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.html [17:06:37] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.txt [17:06:37] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.wiki [17:06:38] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.log.html [17:06:38] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-05-29-16.00.log.html [17:07:19] * Lucas_WMDE waves [17:07:57] We have to give a high number and then manipulate the project so that our guess is fulfilled :D [17:08:27] "manipulate" = "create lots of lexemes" O:) [17:08:41] Tpt[m]: https://phabricator.wikimedia.org/T195895 - let me know if that's totally not what you had in mind [17:08:53] abian: tststs :P [17:09:16] Lydia_WMDE: It's perfect! Thanks! [17:09:23] abian: we'd of course never do that, right? ;-) [17:09:51] Sure :D [19:22:22] anyone who finds out how to easily transform meetbot's logs in wikitext has my entire gratitude :D [19:22:48] I tried the visual editor, html to wiki, and ended up adding the blank lines manually :p [19:23:17] Wrap it in pre tags or similar? [19:24:23] Reedy: yay thanks! I tried code but pre actually works! [19:24:32] will remember for next time ;) [19:24:33] :) [19:24:37] It's one of those things [19:24:41] You have an idea what should work [19:24:47] But what you need is something very similar, but not exactly the same :D [19:26:30] but now the links are not clickable anymore :D [19:27:13] anyway, thanks a lot :) [19:28:37] Auregann_WMDE: the other option maybe... might work better [19:29:53] it makes nice colors appears, but still no clickable links unfortunately [23:42:10] Auregann_WMDE, for future reference: the only options that will keep links clickable are either a "leading space" or tag. https://www.mediawiki.org/w/index.php?title=User:Quiddity_(WMF)/sandbox&oldid=2794460 [23:59:53] I've added a note about it in the docs https://meta.wikimedia.org/wiki/Meetbot#Formatting_archived_copies