[16:54:36] Wikidata office hour starting in 5min :) [17:00:04] #startmeeting Wikidata office hour [17:00:04] Meeting started Tue Jan 30 17:00:04 2018 UTC and is due to finish in 60 minutes. The chair is Auregann_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [17:00:05] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [17:00:05] The meeting name has been set to 'wikidata_office_hour' [17:00:17] Meeting started Tue Jan 30 17:00:04 2018 UTC and is due to finish in 60 minutes. The chair is Auregann_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [17:00:17] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [17:00:17] The meeting name has been set to 'wikidata_office_hour' [17:00:18] Hello everyone! Who's there for the office hour? :) [17:00:25] *waves* [17:00:27] o/ [17:00:35] hey :) [17:00:37] Hi! [17:00:48] hi [17:00:54] i am! [17:00:55] Hi :) [17:01:21] \o [17:01:23] As usual, we'll be together for one hour. This time, we try a different format: we'll present the updates regarding the development team, and we'll keep enough time to have a discussion on a specific topic [17:01:27] hello! o/ [17:01:51] Today, we decided to discuss about the growth of Wikidata, and how to address it. Thanks YULdigitalpreser for the idea! [17:02:12] :) [17:02:34] But first, let's start with cool stuff that happened in the development team and among the volunteers [17:02:55] alright - let's get started with that [17:03:09] A lot of things happened so i'll try to keep it short and go over the highlights [17:03:16] belated hi! [17:03:42] Hi! [17:04:05] Our design/ux team did user interviews and more to get a better understanding of how and why people are editing Wikidata. They focused on manual edits. You can find the result here: https://docs.google.com/presentation/d/1ljkpy9yJUWOTcGVXgX5pweG9VdCCQYSak5AyPL0HB-Q/edit#slide=id.p [17:04:25] Lydia: "You need permission" [17:04:42] They also started reworking the term box (the thing with labels, aliases and descriptions). There was a feedback round for it here: https://www.wikidata.org/wiki/Wikidata_talk:Usability_and_usefulness/Feedback_round_17-12_Term_box_Behavior [17:04:47] Give it a try. [17:04:54] pigsonthewing: eww. Lea will fix that [17:05:03] Sorry [17:05:15] TY [17:05:56] Sorry, will be done in a few minutes [17:06:00] We also spent a lot of time on polishing the constraint checks gadget. We're getting closer to putting it into the main codebase and enabling it for all logged in users. [17:06:52] The major part of our work went into continuing the work on support for lexicographical data. This was mainly polishing things like diffs as well as persistently storing edits. [17:07:01] You can find the current state on this demo system: http://wikidata-lexeme.wmflabs.org/index.php/Main_Page [17:07:14] I'd love your feedback if you can give it a try at some point [17:07:38] Alright, new URL to see the results of the interviews of editors is here, sorry https://docs.google.com/presentation/d/19dYziiUI_yEPya4e-lWBhxsejoc3NsX6EE7Z3GFARAA/edit#slide=id.p [17:07:50] On the structured data for commons side we spend our time on multi content revisions which is one of the building blocks needed to bring structured data to commons [17:08:17] :-) [17:08:35] Another big part of our time went into improving how many changes from Wikidata are shown on the watchlist and in recent changes on Wikipedia and co coming from Wikidata [17:09:17] We started rolling it out on wikis and on those wikis that have it you should see a lot less changes. We removed a lot of the ones that are not relevant for the articles on that wiki [17:10:11] Another (hopefully! ;-)) cool thing that was done as part of a master thesis is the new prototype for building queries without needing to know sparql: https://tools.wmflabs.org/wd-query-builder/ [17:10:19] again: would love yourfeedback if you give it a try [17:10:56] (Thank you, Lydia: how do "lexemes" relate to "graphemes" re 113,021 code points in Unicode 7 and "phonemes" - as mp3 files or similar even - in this model?) [17:11:18] With the help of tpt and eran we've also added more Lua functions to make it easier to access wikidata's data on Wikipedia and co [17:11:51] https://phabricator.wikimedia.org/T182147 is a ticket to collect ideas for more if you have any additional Lua functions you'd like to see [17:12:37] And last but not least the entity selector that let's you select items and properties in searches when adding a new statement for example has been migrated to use elastic. the ranking should be considerably better now. [17:12:53] If you have cases where it is still not great please let me and SMalyshev know. [17:13:05] As for what's coming up next: [17:13:46] We'll continue the work on constraint checks, lexicographical data support and multi content revisions. and Special:Search is in the progress of being migrated to elastic as well by SMalyshev [17:14:16] (awesome!) [17:14:16] Scott_WUaS: That'll be something interesting to figure out over the next months :) [17:14:26] pintoch: which of the things? :P [17:14:34] everything ❤ [17:14:34] Entity selector is much better (I say this as somone who regularly bugged Lydia about how bad it was, before!) Thank you. [17:14:40] any more questions about this part? of should we hand over to Auregann_WMDE? [17:14:48] <3 [17:15:07] Lydia_WMDE: Thanks :) [17:16:02] Let's go through the other important things that happened since November :) [17:16:05] First, good news: the deployments for wikidata.org are now happening every week! usually on Wednesdays. That means: bug fixes and new features arrive faster to you :) [17:16:24] These last months, you may have heard about the WikidataCon: check out the documentation, notes, slides here https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Documentation and the report, summarizing how we organized the conference and what were the outcomes, here: https://meta.wikimedia.org/wiki/Grants:Conference/WMDE/WikidataCon/Report [17:16:37] he next WikidataCon will take place in 2019. For 2018, there’s the 6th birthday of Wikidata around the world, and you can already plan your event! https://www.wikidata.org/wiki/Wikidata:Sixth_Birthday [17:17:09] Speaking of reports, check also Wikicite 2017 report https://figshare.com/articles/_/5648233 [17:17:18] If you’re working on Wikidata outreach inside the Wikimedia projects, you may be interested by this documentation page https://www.wikidata.org/wiki/Wikidata:Wikidata_in_Wikimedia_projects [17:17:31] We’re also interested in projects reusing Wikibase outside the Wikimedia projects. If you know some, let us know! For example, FactGrid is a project that aims to build a website for researchers to collect facts as part of their research https://www.wikidata.org/wiki/Wikidata:FactGrid [17:17:52] As usual, a lot of new tools were created by the community these last months: [17:18:01] Getting the recent changes of the items based on a SPARQL query (the first steps of a tool for monitoring data imports?) https://tools.wmflabs.org/wikidata-todo/sparql_rc.php [17:18:08] Check the edits stats based on a SPARQL query https://tools.wmflabs.org/wikidata-todo/wd_edit_stats.php [17:18:14] A great tool checking for vandalism in labels, descriptions and aliases, by language https://tools.wmflabs.org/wdvd/index.php [17:18:25] Mix’n’match got overhauled [17:18:32] QuickStatements now has a CSV-like import function (under "import commands") and new documentation https://www.wikidata.org/wiki/Help:QuickStatements [17:18:39] New tool by YMS to help with vandalism fighting https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Counter-Vandalism#Yet_another_RC_tool [17:19:22] Magnus' new "mixnmatch gadget" script is good, too: https://www.wikidata.org/wiki/User:Magnus_Manske/mixnmatch_gadget.js [17:20:05] And there's now a web hub based on Wikidata's data https://tools.wmflabs.org/hub/ [17:20:13] 'Good' is an understatement ;-) - it's awesome. [17:20:14] Thanks for mentioning it pigsonthewing! [17:21:14] Wikidata now has two new admins, jarekt and Mahir265, welcome to them :) [17:21:48] Yay, Magnus! [17:22:01] In January, we also moved Wikidata to a new server, only for Wikidata! Yay, moar space! [17:22:23] spinster: English understatement [17:22:45] A lot of cool events are happening in the next months, and call for papers started: celtic knot, europeana tech, and the Wikimedia hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2018 [17:23:24] the hackathon of this year has a focus on multilingualism, I guess we can show that Wikidata has an important role to play :) [17:23:58] As usual a lot of discussions are happening on Wikidata and about Wikidata, I'd like to point out two of them that are quite important for the community: [17:24:06] Discussion about privacy and living people https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Privacy_and_Living_People [17:24:13] Discussion about mapping and improving the data import process https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process [17:24:53] And now, I'll share with you some cool stuff to read [17:25:12] but first, let's start with maps: the traditional map of Wikidata items from November 2017 https://commons.wikimedia.org/wiki/File:Wikidata_Map_November_2017_Big.png [17:25:31] and the differences between July and November https://commons.wikimedia.org/wiki/File:Wikidata_items_map_with_difference,_July_2017_to_November_2017.png [17:26:18] Interesting blog posts (of course, just a selection, for the rest, it's in your Weekly Summary every Monday ;) ) [17:26:21] Well structured political data for the whole world: impossible utopia, or Wikidata at its best? https://medium.com/from-mysociety/well-structured-political-data-for-the-whole-world-impossible-utopia-or-wikidata-at-its-best-f627448fb906 [17:26:31] Importing data into Wikidata – current challenges and ideas for future development http://histropedia.com/blog/importing-data-wikidata-current-challenges-ideas-future-development/ [17:26:36] Unlocking the human potential of Wikidata http://blog.hatnote.com/post/166535657877/unlocking-the-human-potential-of-wikidata [17:27:03] Interesting papers related to Wikidata (same note ;) ) [17:27:10] One knowledge graph to rule them all? https://github.com/dringler/KnowledgeGraphAnalysis/blob/master/paper/knowledge_graphs.pdf [17:27:16] Cool-WD: A Completeness Tool for Wikidata https://iswc2017.semanticweb.org/wp-content/uploads/papers/PostersDemos/paper466.pdf [17:27:23] Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References https://iswc2017.ai.wu.ac.at/wp-content/uploads/papers/MainProceedings/71.pdf [17:27:28] Question Answering Benchmarks for Wikidata https://iswc2017.ai.wu.ac.at/wp-content/uploads/papers/PostersDemos/paper555.pdf [17:28:23] Last but not least: you may have seen some nice maps or graphs on the social networks, with stats extracted from our tool that mesures usage of Wikidata on the Wikimedia projects. You can find the details, pictures (and more soon!) here https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor/WDCM_Journal [17:28:50] Is there something super important that I forgot? :) [17:29:58] Or questions? [17:30:41] Ok I guess then we can move on to the discussion topic [17:31:01] I just want to plug that EuropeanaTech (conf in Rotterdam, 15-16 May) has an open call for proposals at the moment, and it's quite related to Wikidata: https://pro.europeana.eu/event/europeanatech-conference-2018 [17:31:17] It was suggested by YULdigitalpreser to talk about adressing Wikidata's growth [17:31:30] Wikidata is growing in many ways [17:31:42] Thanks spinster :) [17:31:58] I think the most important and impactful ones are the number of editors, the amount of data and the usage of our data inside and outside wikimedia [17:32:14] Overall this is really really awesome. [17:32:24] But it does bring with it a few challenges I see. [17:32:51] There are technical challenges - meaning Wikibase and the rest of the infrastructure not scaling particularly well without some handholding. [17:33:37] But there are also social challenges. Not everyone knows everyone anymore. Not everyone likes everyone anymore. Vandalism is becoming more attractive. And people have higher expectations towards our data quality. [17:33:52] Do you agree? Anything I am missing? [17:34:21] Also, the length of Wikidata ID (Qnumber) will grow longer and longer too :) [17:34:26] haha [17:34:26] true [17:34:29] regarding growing the amount of data – we expect to grow more in the number of items than in the number of statements per item, right? [17:34:39] +1 for your list of challenges [17:34:40] We'll need more admins: https://www.wikidata.org/wiki/Wikidata:Project_chat#Lack_of_admin_attention [17:35:07] IIRC there is a technical limit to the size of an individual item but that doesn’t seem to be a big problem so far [17:35:10] Lucas_WMDE: well not sure really [17:35:20] it sometimes seems to be [17:35:39] I see a major problem on the vandalism front, basically projects such as WikiCite aren’t happening because wikipedian don't trust the citations not to vandalized if stored on wikidata. [17:35:47] *nod* [17:35:51] There may be a limit to the size of what can be edited on an average (as oposed to high-spec) machine [17:36:03] yeah [17:36:07] good point [17:37:07] So there are a number of things already happening/being done to address some of these challenges. [17:37:16] That (WikiCite sisue) is because those Wikipedians see Wikidata as "other", not part of the same system. This is a social problem that needs careful thought; and resources, to resolve. [17:37:28] We're working on scaling the infrastructure. [17:37:48] There are real-life meetings like WikidataCon to get people together and get to know each other. [17:38:12] We're improving Wikibase for 3rd parties and promoting it in order to take some pressure off of Wikidata itself [17:38:15] pigsonthewing: It's true. And we will address this by organizing more trainings "Wikidata for Wikipedians" [17:38:36] well as i see it, either we allow them to call a specific revision id, or we have a crosswiki watchlist, or we have some kind of protection system [17:38:39] We're improving the constraint checks and other vandalism fighting tools (see https://www.wikidata.org/wiki/Wikidata:WikiProject_Counter-Vandalism) [17:38:41] and in general, have Wikidata volunteers present in all the big and local conferences of the movement [17:38:50] There is an RfC about data about living people [17:38:55] Hmmm, another challenge I see is that the growth in Wikidata is very unequal. Huge amounts of paintings and scientific articles, quite a few underdeveloped areas. [17:39:03] theres good work on the crosswiki watchlist, so thats the likely option imo [17:39:19] and in the future we want to add the option to cryptographically sign statements to give them some more credibility/trustworthness [17:39:35] i like that [17:39:39] and focusing on increasing the use of the data before focusing more in importing more data [17:39:54] For GLAM projects, signed statements would be a very valuable addition [17:40:10] Are there other ideas/wishes/... you have for the dev team? Something we should stop doing? Something we all can do? [17:40:19] +1 for enthusiasm for signed statements [17:40:44] spinster: good point [17:40:47] Could you please help me to understand what it is about wikibase that is not yet scaling well? [17:40:56] +1 [17:41:03] Dysklyver: do you know more? link to ticket? [17:41:09] More insight into external re-use of Wikidata would be awesome. But I totally understand that that is very difficult to track and measure. [17:41:36] YULdigitalpreser: change propagation for example. that is the mechanism that tells the wikipedias and other projects about changes happening on wikidata [17:41:58] for boosting third parties users of Wikibase: make the federation system built for Commons work also in third party installations (just like InstantCommons) in order to allow them to use Wikidata items and properties [17:42:05] re 'external re-use of Wikidata ': I always as partner orgs (Quora, Songkick) to blog abut what they're doingm and then I incude details on a Wikidata page [17:42:17] Tpt[m]: oh yes that would be sooo good [17:42:20] spinster: yeah very hard. I am trying to talk to organisations and companies using our data to give us feedback on the data. let's see if that helps any [17:42:34] Tpt[m]: *nod* [17:42:37] Oh right. :O [17:43:03] for improving Wikidata use in Wikimedia wikis: allow to do at list simple queries (i.e. conjunction of property-values) from Lua code [17:43:10] pigsonthewing: \o/ that definitely helps raise the visibility of the use of our data [17:43:33] Tpt[m]: do you think that would already be enough? [17:43:37] Are we on question time already or is there still stuff to come? [17:43:42] it seems too limited to me but maybe not [17:43:54] sjoerddebruin: discussion but ask away [17:44:34] Is the current work to the property suggester a start for actually improving it aka providing better suggestions? :) [17:45:30] Lydia_WMDE: yes because then in Lua side you could do unions, aggregations and specific filters. And with conjunctive queries the problems of knowing which page to refresh and which result sets to update and how are trackable. [17:45:45] sjoerddebruin: potentially but not sure yet [17:46:05] Tpt[m]: ok - sounds expensive :D [17:46:07] is it possible to call a specific revision via lua? [17:46:37] Dysklyver: what would you like it for? [17:46:43] Dysklyver: not yet at my knowledge [17:46:59] it would allow a citation to be called and remain unchanged [17:47:03] Lydia_WMDE: you could know which queries to update and how with only the content of the current item. [17:47:22] Dysklyver: *nod* it at least to some degree defeats the purpose of using wikidata though :/ [17:47:29] true that [17:47:40] Tpt[m]: right. i was more thinking about rendering on the page [17:47:48] It'd be useful for old revisions. [17:48:19] Being able to have previous revisions of a page also store references to the of-the-time wikidata version. [17:48:31] kinda why i was thinking it could be good [17:48:37] *nod* [17:48:46] i'll think more about it [17:48:59] Been an issue for images, too, though, so that might be worth looking into to see how folks weighed it as well. [17:49:05] Since it was never implemented there. [17:49:12] makes sense yeah [17:49:23] otherwise it is very hard to know what a reference that comes from wikidata was on a specific revision on wikipedia [17:49:46] Lydia_WMDE: Yes, if you know which queries to update and have a table giving you which page depends on which query it works. The hard part here is to know which query to update [17:50:01] you would have to go to wikidata, scroll back and match the date, then see what has changed since [17:50:03] This probably has more basis than the images for that alone. [17:50:36] Ok other ideas/wishes? Or done with the growth topic? [17:51:01] Tpt[m]: *nod* will add it to the list generation discussion [17:51:21] Lydia_WMDE: ok! [17:52:05] Any other questions you'd like to ask? [17:52:14] At the current rate of growth, are there enough resources for the Wikibase handholding for the rest of 2018? 2019? [17:52:37] YULdigitalpreser: with the work we are doing i hope yes [17:52:41] Wikidata isn't growing that fast as last summer tbh [17:52:43] without it no [17:52:45] I'd like to ask for input at https://www.wikidata.org/wiki/Wikidata:Project_chat#Documenting_and_describing_PIDs please [17:52:50] We're down to 10 million edits per month instead of 20 million. [17:53:09] sjoerddebruin: that makes me sleep better at night :D [17:53:11] thank you, this information is very helpful [17:53:20] ...which sadly got side-tracked. [17:53:26] so the number of edits is the critical metric? it is something we could easily bring down if automated tools compacted their edits [17:53:42] pigsonthewing: wanna give a short overview of what you want there? [17:53:48] If that's off-putting, feel free to use my talk page, or email me. [17:53:55] pintoch: biggest problem last year was editing speed [17:54:12] pintoch: that is one of the big painpoints yes - particularly the edits that affect another wiki [17:54:12] Sure: "how to document PIDs. As an example, I've created KoreaMed Unique Identifier (Q47489994). How could we improve that? For instance, we have no property to hold an example value. " [17:54:17] and speed [17:54:25] People were editing with 400 edits per minute. [17:54:51] It's still happening though: http://wikidata.wikiscan.org/hours/24/users [17:55:01] ^ very useful page! [17:55:17] Users who suddenly find semi-automatic tools, and we don't really have rules for those. [17:55:26] This need arose from discussions with PID-authoring and -using orgs at #PIDapalooza [17:55:29] (we have the bot policy, but you're not really a bot) [17:55:41] sjoerddebruin: should the bot policy be expanded then? [17:56:30] Some semi-automatic policy would be nice indeed, but not sure how to notify people about it. [17:56:46] could be linked from the tools? [17:56:51] we know the biggest ones [17:56:58] Yeah, some clear approval window. [17:57:17] yes, and the tools should just behave in a better way in the first place (e.g. not using two edits to insert a referenced statement, like QS and pywikibot do) [17:57:24] Thank you, All! [17:57:32] And maybe some echo notification if you edit for quite some time above x edits per minute. [17:57:50] worth looking into yeah [17:58:25] Tools behave in a good way actually, but people run four instances at once. [17:58:46] -.- [17:59:14] Only four? [17:59:22] Or more, just saying something :) [18:00:48] Alright people. We're at the end of our hour. Is there anything else you want to talk about or should we wrap it up? [18:01:11] Can tool authors limit the number of instances to one per account? [18:01:16] Thank you, all. [18:01:29] Bye. [18:01:40] pigsonthewing: not sure but worth investigating [18:02:45] Thank you all for attending the office hour :) [18:02:57] See you soon online! [18:03:08] Thanks for the discussion and input folks :) [18:03:21] Thank you for the input and moderation! :-) [18:03:30] #endmeeting [18:03:30] Meeting ended Tue Jan 30 18:03:30 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [18:03:30] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.html [18:03:30] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.txt [18:03:30] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.wiki [18:03:31] Meeting ended Tue Jan 30 18:03:30 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [18:03:31] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.html [18:03:31] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.txt [18:03:31] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.wiki [18:03:31] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.log.html [18:03:31] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-01-30-17.00.log.html [18:04:08] Double bots? [18:04:25] meetbot’s evil twin :O [18:07:58] lol [22:32:26] hmm, does anyone know what timezone is used for the grants deadline tomorrow? [22:32:33] (or today, if UTC :)) [22:32:50] Maybe I'm looking in the wrong spot, but I've been unable to find it