[17:56:46] Hello everyone! Who is here for the Structured Data discussion? [17:57:38] hi fabriceflorin :) [17:57:38] Hey fabriceflorin [17:57:44] hi [17:57:44] I'm here for structured data [17:57:47] w00t w00t [17:57:48] Hello, fabriceflorin [17:57:52] * aude waves [17:57:53] I'm not [17:58:05] ^fact [17:58:10] :) [17:58:10] Hi [17:58:11] I guess I'm here for that as well. [17:58:16] * lv|mtg waves but will be distracted [17:58:17] _o/ [17:58:19] +1 [17:58:26] +1 [17:58:31] Salut Pyb :) [17:58:41] * Lydia_WMDE waves [17:58:42] I'm only here to troll Keegan [17:58:46] Like many of you [17:58:50] a noble endavour [17:58:51] ^fact [17:58:55] +1 [17:59:01] :D [17:59:19] it embiggens the spirit and the soul [17:59:39] guillom: salut [18:00:00] Okay, let's talk about structured data [18:00:05] #startmeeting [18:00:05] Keegan: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [18:00:22] Hello everyone, welcome to our discussion about Structured Data on Commons! [18:00:23] #startmeeting Structured Data [18:00:23] Meeting started Wed Sep 3 18:00:23 2014 UTC and is due to finish in 60 minutes. The chair is Keegan. Information about MeetBot at http://wiki.debian.org/MeetBot. [18:00:23] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [18:00:23] The meeting name has been set to 'structured_data' [18:00:40] Welcome! [18:00:47] Welcome to our discussion about Structured Data everyone! [18:00:53] hi all! [18:00:56] Fabrice, what do you have for us? [18:00:57] hi :) [18:01:03] We'll be talking about https://commons.wikimedia.org/wiki/Commons:Structured_data so open that! [18:01:33] The Structured Data initiative proposes to store and retrieve information for media files in machine-readable data on Wikimedia Commons, using Wikidata tools and practices, as described on this project page: [18:01:40] https://commons.wikimedia.org/wiki/Commons:Structured_data [18:01:54] The Multimedia team and the Wikidata team are starting to plan this project together, in collaboration with many community volunteers active on Wikimedia Commons and other wikis. [18:01:56] hi ! [18:02:05] hi thedjNotWMF [18:02:13] These include the illustrious multichill and thedjNotWMF ... [18:02:29] He's NOT WMF. We swear. [18:02:35] As well as Lydia_WMDE, tgr, mark and many more … [18:02:41] and Jheald! [18:02:53] marktraceur: Well, he does know the sikrit handshake. [18:03:04] The purpose of this project is to make it easier for users to read and write file information, and to enable developers to build better tools to view, search, edit, curate and use media files. To that end, we propose to investigate this opportunity together through community discussions and small experiments. If these initial tests are successful, we would develop new tools and practices for structured data, then work with our communities [18:03:05] to gradually migrate unstructured data into a machine-readable format over time. [18:04:04] We are really happy to have such a great group of folks, and look forward to developing this initiative as a collaboration between community, WMF and WMDE. [18:04:25] So let’s open the floor for comments, questions, suggestions from everyone. Who wants to be first? [18:04:26] So now... [18:04:39] Are commons user able to edit filedescription on common directly? [18:04:57] o/ [18:05:12] Steinsplitter: Yes, commons users will be able to edit filedescription on Commons. [18:05:32] * DanielK_WMDE wibbles [18:05:47] will it still be possible to have templates like https://commons.wikimedia.org/wiki/User:Darkweasel94/copyright2 ? [18:06:06] and we are able to protect filedescriptions on commons? [18:06:10] althought i would say that that might not necessarly be straight from the File description page, but possibly from an alternate namespace most likely. [18:06:22] Here is a nice overview of some first ideas we discussed at Wikimania - our Structured Data Slides: [18:06:23] https://commons.wikimedia.org/wiki/File:Structured_Data_-_Slides.pdf [18:06:26] am I correct in that assesmsent [18:06:53] new namespace on commons? This sounds interesting :) [18:06:55] darkweasel: Yes, that is still possible [18:06:57] Steinsplitter: yes, the pages containing the filedescriptions on Commons can be protected on commons, just as today [18:07:26] hey dennyvrandecic! [18:07:28] great Fabrice [18:07:29] can it be edited directly on commons? [18:07:31] Steinsplitter: darkweasel: think of this as the Commons version of Wikidata-powered infobox templates [18:07:42] Hey dennyvrandecic, so nice to see you here :) [18:07:53] you can just put {{infobox}} in the wikitext and all the information appears magically [18:07:58] matanya: yes. actually, only on commons. [18:07:58] fabriceflorin: thanks :) so nice to see this happen :) [18:08:08] but there is no fundamental change in how the article or wikitext behaves [18:08:20] But denny if the descriptions are being partly pulled from Wikidata items, surely those WD items can be modified, even if the Commons File page is protected ? [18:08:20] Will protecting the page protect the underlying data from being changed? [18:08:32] hi dschwen :) [18:08:32] See some of the magic at https://www.wikidata.org/wiki/Talk:Q17616737 [18:08:38] Hi Steinsplitter ! [18:08:44] darkweasel: less abstract and more concrete: the metadata information is a set of properties and value. If we choose to represent licensing information values by custom templates, you will have to add a Darkweasel94/copyright2 to the relevant property value, and it will be printed. [18:09:03] commons admins schould be able to full administrate every content without asking wd folks [18:09:04] ah, ok, i understand [18:09:11] this is also important for our NPOV policy. [18:09:12] Jheald: the point is not to use Wikidata to store metadata items, but to install the Wikibase extension on Commons [18:09:26] exactly [18:09:49] Do you plan to start a RFC on commosn about the new extension? [18:09:49] As I understand it this whole endeavor is a multi-year thing. Maybe a slightly provocative statement: I'm not even sure the concepts of a "file description page" or even a "commons user" would have to stay the same, once we have structured data. [18:10:00] will this bring us localized category names? in a way that's transparent to readers and editors? [18:10:02] Here is a handy FAQ for Structured Data, which can help clarify some of your questions: https://commons.wikimedia.org/wiki/Commons:Structured_data#FAQ [18:10:16] Dereckson: the translation of the name of a painter in each language will not be stored on CommonsData, it will be pulled from a Q-item on Wikidata [18:10:33] Do you start a RFC about wikibase. Asking teh communety? [18:10:36] Steinsplitter: would you say that is also true for wikipedia admins, wrt wikipedia content? [18:10:47] But this is something that does need more clarity: what will be stored where [18:10:51] One sec Steinsplitter [18:10:54] Steinsplitter: Of course, we will get community consensus before deploying something that huge. For now, we recommend that you all participate in this discussion page: https://commons.wikimedia.org/wiki/Commons_talk:Structured_data [18:10:56] fabriceflorin is typing [18:11:20] Jheald: I don't see the issue: if we consider the Wikidata not pertinent for one of these painter, we can remove the item reference and use our own data instead [18:11:21] i am refering to [[COM:RFC]] with a closure by a local crat. [18:11:32] Let’s first discuss this, then we can make decisions. This is going to be a long project, because it has many complexities. [18:12:05] a project without asking the communetc via RFC (regeular process) ? [18:12:28] Steinsplitter: why is it important for part of the community to be able to not ask other parts of the communtiy what effects the ntire community? [18:12:29] dschwen: You are absolutely right that this needs to be viewed as a multiyear project. Right now, we are proposing that we start with an experimentation period. [18:12:30] Steinsplitter: First discussion, than community consensus to do it, than deploy [18:12:49] RFC , not discussion. [18:12:59] darkweasel: yes it will give us that! \o/ [18:13:00] Yeah, not much point in deciding whether to do something when we don't know what it will look like [18:13:01] The RFC would be the consensus part [18:13:07] yes [18:13:24] no rfc, no deployment ;) [18:13:25] darkweasel: well or at least something like it. we're unsure if categories will be done in that way in the future [18:13:44] but that's a big topic that needs a lot of thinking [18:13:44] but right now I think we are in a phase where we should discuss _what_ is going to be happening [18:13:49] Let’s also keep in mind that this project involves many different communities besides Commons, so we hope to engage them in this conversation as well in coming weeks. [18:13:52] and input from the community [18:14:10] Steinsplitter: It wouldn't make sense to do an RFC at the beginning (when no code has been written and the behavior of the software still needs to evolve based on user feedback) and it wouldn't make sense to have an RFC at the end (when it's too late to change the code). Instead, I understand that the plan is to communicate widely, through talk pages, watchlist notices, maybe even site notices, so that the community constantly evaluates the [18:14:10] requirements and the prototypes, and builds the tool *with* the developers. [18:14:12] * Keegan agrees with dschwen [18:14:40] this looks like a done deal to be honest. And it is difficult to understand for non english speaker and not wikidatians whats happen exactly. [18:14:49] Which edits will be on wikidata exactly? [18:14:49] Basically, it would be a 2-year-long constant RFC. [18:14:56] Steinsplitter, nothing would be on wikidata [18:14:59] as i understand it [18:15:09] things would be in a separate namespace on commons [18:15:12] that looks like wikidata [18:15:13] Yeah, all image metadata would be on Commons, in the Wikibase extension [18:15:21] Seems to me it is actually two projects: one is using data from Wikidata in templates on Commons. The other is what to store on CommonsData. The two are fairly independent. [18:15:24] oky :) [18:15:28] You could *link* to Wikidata. [18:15:39] ah okay, this sounds ok for me. [18:15:41] darkweasel: at most, info that is currently in a Creator or ARtwork or instituion template, by way of a link. [18:15:44] Wikibase is the software, Wikidata is the site. Wikibase will also be on Commons [18:15:44] doublespeak for no RFC, feedback gathered among people giving feedback, unconditional deployment at the end [18:15:48] Does it matter on which project the edits are? [18:16:00] yes [18:16:02] BTW, here is the preliminary roadmap we are considering for our next incremental steps in this inititative - to be adjusted based on community responses: https://commons.wikimedia.org/wiki/Commons:Structured_data#Roadmap [18:16:09] Why? [18:16:22] because i can't investigate if there is vandalism or trolling. [18:16:27] i can only do so on commons. [18:16:43] Steinsplitter: why? [18:16:45] But it's a non-issue, because Commons is where the data will be. [18:16:51] it is annoying to ask othe rpeople [18:16:57] But there is admins on Wikidata to take care of vandalism on Wikidata [18:17:04] Yes, as marktraceur says it's a non-issue [18:17:09] marktraceur, thanks. this schould be added to the faq :) [18:17:15] Gosh; working collaboratively with other people. How unwikimedian :P [18:17:17] commons file pages are already linking/referring to wikidata items through templates at the moment [18:17:20] Steinsplitter: It sounds like we're already planning on that [18:17:26] Jan_Ainali: i don't like to ask admins there. [18:17:33] Well... [18:17:40] Are there other questions? [18:17:45] we are working on better tools to track wikidata changes on other projects btw [18:17:49] Can I have a pony? [18:17:55] guillom: no. [18:17:55] Back to the topic: Discuss structured data! [18:17:58] yes, how much costs this project to the WMF/WMDE :) [18:18:00] Steinsplitter: look, if you see you spend a lot of time to fix stuff on Wikidata, you'll become a Wikidata contributor, and be able to get sysop rights there too. [18:18:01] But a structured pony! [18:18:04] Yeah: can we get decent documentation for WikiBase/WikiData? :-) [18:18:05] marktraceur: No. The label for a Q-item will be on Wikidata, not Commons. Yes, CommonsData will point to a Q-number; but what that Q-number says will be on Wikidata [18:18:16] guillom: maybe. [18:18:35] Steinsplitter: the topic is structured data, please stick to the topic :) [18:18:37] dschwen: we had an intern work on that this summer [18:18:43] nice [18:18:46] Dereckson: no. [18:18:50] dschwen: so i hope the user documentationimproved considerably [18:18:59] it is probably not perfect yet but much better [18:19:04] Keegan: i stick on the topic. plese read my questions again. thanks. [18:19:29] more help on improving documentation is always welcime [18:19:30] i asked on AN and there was ZERO work with the communety. only fyi. :) [18:19:30] Fabrice: Do you have an agenda for this mtg that you want to move through ? [18:19:44] call me naive, but I would assume each project fights equally against vandalism, trolling, etc. wikidata admins would take it as seriously as commons' [18:19:56] Jheald: Here is one way to visualize where data might be stored across our sites: https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Slides.pdf&page=17 [18:19:59] How much of a dream is it to hope that this project will solve category intersection and make Commons finally searchable? :) [18:19:59] This is community work right here, so let's work :) [18:20:05] and as usual, we'll have a set of people with sysop rights on both projects [18:20:40] commons is commons and not wd. [18:20:48] if you're going to change the whole category system – i hope you're considering that not everything with a commons category has a wikipedia article (and thus a wikidata item) [18:20:58] commons will stay commons, Steinsplitter. [18:21:03] Steinsplitter: Please move on to the next question. We want to leave some room for other people to ask questions [18:21:18] guillom: it's not much of a dream, i think. with "topics" from wikidata describing media on commons, you can search in any language, and combine topics. [18:21:26] multichill: thy can ask. i can't say that this channel is flooded by questions. [18:21:48] darkweasel: we will not get rid of categories. we will add better tools next to them. also on wikidata not everything needs a wikipedia article. much more is allowed there [18:21:51] guillom: including "sub-topics" in the search is still not that easy, but it'S a problem we are going to solve (for categories too, btw) [18:21:55] DanielK_WMDE: So in the end I'll be able to search for Pictures of the Mona Lisa taken in October 2009 with a Nikon D90? [18:22:19] guillom: if the picture is annotated correctly, then yes. [18:22:27] it is possible to edit cats in the fildescription? [18:22:29] I have read descriptions of how properties will be divided between Wikidata and Commons for items such as Mona Lisa. How will they? [18:22:31] * guillom hugs DanielK_WMDE. [18:22:40] *directly [18:22:40] \o/ [18:22:47] will things like "october 2009", "nikon d90" be pulled automatically from the EXIF in general? [18:22:55] Steinsplitter: exactly like now. no change. [18:22:56] (even without uploadwizard) [18:23:07] darkweasel: yes but at the moment you can't search using those; or the resolution [18:23:12] * DanielK_WMDE hugs guillom back [18:23:13] DanielK_WMDE: great! so we don't need to rewrite bots. [18:23:20] i'm interested to know how the pages actually work. for example, if i edit https://www.wikidata.org/wiki/Talk:Q17616737 the wikitext is no longer human readable. will a file page have both a wikidata subpage and a main wikitext page? [18:23:37] darkweasel: the current state of the project is to have our own set of wikidata, how to handle EXIF tags is a separate issue [18:23:38] * JeanFred just caught back [18:23:47] Welcome to the jungle, JeanFred [18:23:49] Hi JeanFred [18:23:53] * guillom hugs JeanFred too, for good measure. [18:23:56] and can i store metadata that may not be in wikidata yet and i would not want displayed on the file page? [18:23:57] darkweasel: possibly. not sure yet. it will definitly be possible to override what was taken from the exif [18:23:59] darkweasel: you're suggesting a feature request: import from EXIF tags metadata not yet defined [18:24:00] Jheald: We don’t have a specific agenda for this Q&A, as we would rather hear what questions the community has (and IRC chats are notoriously hard to keep focused on an agenda). That said, we would invite folks to consider the questions posed on this discussion page (and perhaps add new ones, as well as chime in after this chat): https://commons.wikimedia.org/wiki/Commons_talk:Structured_data [18:24:04] hi JeanFred :) [18:24:24] DanielK_WMDE, that sounds good :) [18:24:33] susannaanas: at the moment if you look at a file as an example https://commons.wikimedia.org/wiki/File:Mona_Lisa.jpg you can see that already some information is linked to metadata (artist, current location) [18:24:36] It would be nice to be able to edit topics and other CommonsData items through some of the wikitext of the File page [18:24:50] if there are 2 pages, a wikidata page and wikitext page, how is the file page display determined? [18:24:51] edit topics? [18:24:55] darkweasel: if you're interested by this idea, please note it on a roadmap and in relevant times open a bug on Bugzilla to describe the feature request [18:25:03] dan-nl: This is an example of us going overboard with LUA. Our actual templates on Commons should be better readable [18:25:26] dan-nl: by templates on the wikitext page. just like infoboxes on wikipedia pull data in from wikidata, the file description page will pull in data from the media info page [18:25:30] susannaanas: the general idea, and this open discussion is a way to check that the separation makes sense to everyone, is that wikidata will still take care of storing general information about known artworks, artists, institutions, etc. and the commons wikibase information will be mainly about the photograph itself, not the artwork [18:25:36] Hey JeanFred, glad you could join us :) [18:25:43] susannaanas: does my quick description make sense? [18:25:46] So maybe we'll modify {{Artwork}} to pull the creator (if it's not set in the template) from the wikibase info [18:25:47] Jheald: there are widgets being developed by the community for doing this on wikipedia atm. those are awesome experiments. i'd like to see them develop further and then see what we can integrate directly into the software. possibly also for commons [18:26:03] Question: I wrote (probably rather too much) on the talk page at Commons:Structured Data. Is this the kind of thing you were looking for? And will you have any comments coming back? [18:26:20] It does, I just want to confirm, as I hear "All data will be stored in Commons" comments [18:26:59] fabriceflorin: Is structured data for media not on Commons being considered too? [18:27:14] thanks …. so the template might change so that it could accept a wikidata value or regular text value? [18:27:37] and then if there are both values it would prefer the wikidata value? [18:27:38] gi11es: Where will these divisions be discussed for items like maps? [18:27:39] anomie: yes we are considering this but it'll not happen at the beginning [18:27:42] we will start with commons [18:27:43] Lydia: Widgets being able to edit through templates on the file page would be very nice. So also would be presenting the metadata as 'fake' wikitext and picking up attempts to change that wikitext, eg by bots [18:27:45] but keep others in mind [18:28:10] anomie: We have to start somewhere, and beginning with Commons seems like the natural first step. [18:28:14] dan-nl: Whatever we (the community) decide to give more priority. [18:28:14] dan-nl: actually, if the value is on commons, it would prefer the information from commons most likely. [18:28:39] susannaanas: do you have expertise regarding maps? it would be good to have someone on board with a focus on that type of media, if you feel like getting more involved [18:29:06] gi11es: you just asked that to the driving force behind wikimaps :) [18:29:11] fabriceflorin, Lydia_WMDE: As long as it's on the roadmap, ideally in the "just add additional wikis to the config to enable it" sense [18:29:12] hah [18:29:23] so, {{Information | author = Q17616737 | Frans Hals }} would prefer Frans Hals? [18:29:26] (DanielK_WMDE, kurz offtopic weil du da bist: Habt ihr ein scipt mit denen man interwikis nach WD mergen kann von com: namespace und die lokal entfernen) [18:29:35] "All data stored on commons" simply isn't true, as https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Slides.pdf&page=17 shows -- instead what is stored on Commons will often be links to items stored on Wikidata [18:29:40] for example [18:29:42] gi11es: We do the Wikimaps project, it's not only me, we are a community [18:29:48] i'm definitely looking forward to structured data, i hope it will be as nice in practice as it sounds now :) [18:29:50] Jheald: noted. it does seem like a pretty niche feature to me atm that'd be a huge effort to implement. but we can see if we can do some things that make this easier [18:30:01] * Jan_Ainali ponders OT when local uploads will be stopped and Commons be the only media database. Well, one can dream... [18:30:25] Jan_Ainali: When we abolish copyright, I guess. So maybe 3 months? [18:30:26] :) [18:30:35] Steinsplitter: generally we don't do improt scripts, we leave that to the community. perhaps ask multichill or amir or other folks who deal with bots a lot. [18:30:49] susannaanas: then I would advise keeping track of this project closely, so that you can comment on the data storage division as it evolves from the drafts we have now to something more solidified [18:30:50] darkweasel: at the beginning it will definitely not be as nice. but if we all pull together it'll be awesome in the future :) [18:31:10] dan-nl: assuming bogus syntax, then likely... [18:31:22] susannaanas: we do our best to keep in mind the very different use cases, but it's much better to have an expert look as a sanity check [18:31:23] gi11es: More than keen to, dependent on it [18:31:43] Steinsplitter: More scripts would be written like https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts#Wikidata to assist in converting things on Commons [18:31:58] actually... on that topic. [18:32:24] i know that dbpedia has some bots for that as well. [18:32:31] ok [18:32:38] One of the issues we have been discussing is what would be a good data structure for this project. For example, one suggestion is that every file could contain one or more works, with one or more contributors, and one or more licenses. What do you think of this approach, as roughly diagrammed here? https://commons.wikimedia.org/w/index.php?title=File:Structured_Data_-_Slides.pdf&page=15 [18:32:54] Jheald: things like "topics" will refer to wikidata items. information about the topics (e.g. the poluation and mayor of New York) will be managed on Wikidata. But even if several topics from Wikidata are linked, the important information is which topics are linked to which file, and that is stored on commons,. [18:33:35] Is there any thoughts on doing annotations in a structured way? [18:33:44] hmmm that sounds interesting, so you're intending to make derivative works less awkward? [18:34:11] darkweasel: yes! [18:34:22] fabriceflorin: i have been concerned about commons' current inability to distinguish in metadata between a file and the work depicted in that file, so it's encouragingto see it considered already [18:34:24] darkweasel: Yes, the idea is to have a more general purpose way to recognize that multiple works and multiple contributors can be involved in the same file. [18:34:45] files often contain more than one instance of IP and more than one author, and it's currently a bit of a mess [18:34:46] DanielK_WMDE: Up to a point. But it also matters how that information is presented to ppl reading the file descriptions. Which is something that I think worries Steinsplitter & vandal fighters. [18:34:47] https://commons.wikimedia.org/wiki/File:The_Nightwatch_by_Rembrandt_-_Rijksmuseum.jpg <- example image with annotations [18:35:19] Jheald: yes and it worries us too. and it is definitely on my agenda for 2015 to build better tools to fight vandalism [18:35:22] that sounds good, but i hope it won't be too complicated for people who want to upload just one own photo or edit its properties :) [18:35:34] Jheald: which is actually easier in structured data than in unstructred text for example [18:35:41] Jan_Ainali: Not a focus at the moment, but the community might build something themselves [18:35:49] multichill: Thanks for the example! :) [18:35:49] Jheald: you would see the edit on your watchlist on commons. so at least as many eyes will be there to watch as there are now. [18:36:22] aside from that, i think "license" should be a property of "work" not "file" – if a file is multi-licensed, later contributors might in principle choose to release their changes under only one license [18:36:24] plus the people watching it on other projects like wikipedia! [18:36:28] darkweasel: Good point. One idea that has been proposed it to not burden the uploader with identifying every work and contributor during the upload process, but to invite them after upload to improve the metadata on their page. [18:36:41] that's perhaps an improtant point to mention again: if data from wikidata is used on a page on commons, and you watch that page, you will see any edits to the data item, as if they had happened locally to the page [18:36:43] multichill: I imagine it could be a propery with qualifiers to do it. [18:37:45] Lydia & Daniel: good points. What matters I guess is making sure the tools used by Commons vandal fighters can keep up [18:37:47] it's a feature that needs improvement (still broken in the advanced watchlist), but it'S there. [18:37:56] It seems the important questions are ansvered, so we can sleep all well and chill. :) oh joy. [18:38:22] darkweasel: that's the way it's planned, yes [18:38:27] Jan_Ainali: it's probably not too hard, but it's not very high on the priority list probably, since if the core is there, then this is 'implementable'. and thought it might not be easy then, it's also not out of the realm of posisibilities. [18:38:55] Jan_Ainali: and until that time it can be on wikitext still [18:38:55] Steinsplitter: Yes, I think we have a lot of great contributors focusing on this project, and we plan to move very carefully at each step of the way, in consultation with community members like you :) [18:39:09] Oh DanielK_WMDE that brings up another interesting question: If I watch a page on Wikipedia where an image is used, will I see in my watchlist edits on Wikidata that affects the image? [18:39:26] fabriceflorin: :) [18:39:39] missed a comma between watchlist and edits [18:40:04] Jan_Ainali: no, because the integration of commons with wikipedias isn't that suphisticated :) [18:40:26] btw, an idea – you could also try to change the concept of filenames – right now they can be only in one language, and the unique identifier needs to be the same thing as the [18:40:29] <Jan_Ainali> DanielK_WMDE: I might submit a feature request ;) [18:40:33] <DanielK_WMDE> but such edits shouldn't have an impact on what you see on wikiepdia anyway [18:40:48] <marktraceur> darkweasel: There's a patch for the backend in core that will use a SHA1 for the unique ID on the backend [18:40:52] <darkweasel> it might be a good idea to have a separate "title" attribute that's presented to the user in most places (except when they want to insert the file somewhere) [18:40:52] <marktraceur> But that won't change the display [18:41:30] <DanielK_WMDE> darkweasel: moving away from filenames fro *referring* to an image is going to be hard. it's something that would be nice, but it's not somethign i want to tie to the structured data project [18:41:49] <DanielK_WMDE> darkweasel: however, images will have localizable labels/titles, that can be used in listings, for captions, etc [18:41:54] <Jheald> Q: There are already quite sophisticated models on Wikidata for eg scans from books, distinguishing a "manifestation" (the scan) from an "edition" (a particular variant) from a "work" (the underlying book). Are these models on your radar? [18:42:14] <fabriceflorin> darkweasel: Glad you are thinking along the lines of a translateable “title”, which is high on the priority list. The actual file name will always be preserved, but we could surface the title more often. [18:43:00] <marktraceur> fabriceflorin: I don't know about "always", it could be changed maybe in the future, but it's not currently on our roadmap I think [18:43:01] <darkweasel> ah, that's good, so basically the answer is "yes" – i hope that will reduce the need to rename files that some people feel far too often now on commons :) [18:43:58] <Lydia_WMDE> Jheald: it will be largely up to the community but there are some things we need to settle on to build the tools we'll be working on. we'll get into those detailed discussions over the next weeks and months [18:44:06] <Dereckson> darkweasel: the rename on Commons is limited to precise cases, like misspelling or non informative name like IMG45504.jpg / I'm not sure how a good title could eliminate these rename needs. [18:44:18] <Lydia_WMDE> with the community [18:44:56] <fabriceflorin> marktraceur: Good point. From a product standpoint, it would be good if we started using clear titles more, and have the file names become less important. It would seem useful if over time the file names could be simply alphanumeric, without burdening them from having to include content titles as well. [18:44:57] <darkweasel> Dereckson, well if no one ever sees the filename anymore except when inserting it into an article, then i can't see much of a reason why it shouldn't be IMG45504.jpg [18:45:32] <DanielK_WMDE> darkweasel: though it's still nice to have "sensible" file names when reading wikitext [18:45:36] <fabriceflorin> Are there some important questions we haven’t addressed yet? [18:45:38] <Dereckson> darkweasel: to be able reading the text of the article using the media to have a meaningful information for example [18:45:39] <dennyvrandecic> darkweasel: and the insertion into an article should be invisible thanks to VE [18:45:44] <marktraceur> darkweasel: It will help when we can figure out a way to add images on-demand in an edit window instead of having to use the file name - see e.g. VE's image insert dialog [18:45:44] <DanielK_WMDE> otoh, once we no longer need to read wikitext... [18:46:01] <dennyvrandecic> exactly [18:46:08] <darkweasel> yeah, that's more a matter of commons policy than of structured data [18:46:17] <darkweasel> so not really your concern :) [18:46:26] <DanielK_WMDE> it'S our concern to make it possible :) [18:46:44] <Jheald> Lydia_WMDE: Probably important to think what different dates you want to hold on CommonsData -- and any other things you may want to sort a selection by [18:46:48] <moogsi> fabriceflorin: we haven't really talked about topics vs categories, but i think they are fuzzy enough that there may be no intelligible questions :) [18:46:48] <Jan_Ainali> Perhaps I missed, will "topics" be Wikidata items or Wikibase Items on Commons? [18:46:54] <Lydia_WMDE> Jheald: yeha totally [18:46:58] <moogsi> what is a 'topic'? [18:47:12] <DanielK_WMDE> moogsi: whatever has a Q-number on wikidata. [18:47:25] <multichill> For example Berlin is a topic. [18:47:28] <Lydia_WMDE> Jan_Ainali: the former [18:47:33] <Jan_Ainali> Awesome! [18:47:45] <Lydia_WMDE> because then we can reuse label translations and so on and so forth [18:47:54] <moogsi> DanielK_WMDE: that makes i18n much easier [18:47:58] <Lydia_WMDE> get additional data, links to wikipedia and so on [18:48:01] <DanielK_WMDE> moogsi: indeed [18:48:04] <fabriceflorin> moogsi: I think it is best to think of categories and topics as co-existing, rather than an either/or approach: both have their value, and can play a useful role in our multimedia ecosystem. [18:48:09] <dennyvrandecic> in my understanding, categories basically won't change. everything the community is doing now with categories, they will continue to be able to do. [18:48:24] <Lydia_WMDE> yeah [18:48:25] <Jheald> Will topics also have an (optional) property, to indicate how they relate the the file (eg painter, movement, depicted subject, etc) ? [18:48:30] <moogsi> i have no love for categories and have always found them disastrously unfit for purpose [18:48:42] <darkweasel> but i hope it will be possible to filter even categories by date, filesize, license etc. [18:48:52] <dennyvrandecic> darkweasel: just as now [18:49:13] <darkweasel> so no improvements planned there? because that's one thing i was looking forward to [18:49:13] <dennyvrandecic> darkweasel: this proposal is not about improving this part, in my understanding [18:49:19] <dschwen> if topics are generic wikidata items then the info Jheald wants is already there [18:49:29] <Lydia_WMDE> Jheald: possibly. or they should be using different properties maybe? [18:49:37] <multichill> And don't forget that categories and topics can be connected! [18:49:39] <dschwen> what we cannot do (i believe) is adding prepositions to topics [18:49:45] <fabriceflorin> I would encourage everyone to take a long-term perspective with this initiative: it will take some time for our ecosystem to evolve, and we will not be able to do everything we want all at once. But by starting with small experiments, we can measure and validate our assumptions, learn from any small mistakes, and gradually build a world-class multimedia system :) [18:50:08] <DanielK_WMDE> darkweasel: we have plans for querying/searching by those things. how these searches would integrated categories is an open question. but with the new cirrus search, that should not be a problem [18:50:14] * multichill looks forward to the day he can shutdown https://commons.wikimedia.org/wiki/User:CategorizationBot [18:50:15] <DanielK_WMDE> perhaps even including subcategories [18:50:25] <darkweasel> ok that sounds better [18:50:39] <Keegan> multichill: I'm looking forward to that day too. I hate that bot :) [18:50:40] <Jan_Ainali> This will enable a sort of a backwards lookup of {{objectlocation}} through coordinates on wikidata for images on commons without coordinates attached. That will make all map lovers happy :D [18:50:44] <Jheald> dshwen: it would need to be on CommonsData, because digging things out of Wikidata is hard, especially if you don't know a-priori whether or not they should be there [18:51:06] <thedjNotWMF> as I mentioned at Wikimania, let us not forget how long it took us to add an {{information}} template to every file. Now we have 100x more files, but it basically a similar problem. we can't do it in one go. too much work. [18:51:20] <Steinsplitter> *away* now, thanks again or replying to my questions. It is great to work with people liek you on wikimedi. [18:51:24] <thedjNotWMF> so a gradual approach is a given. there is no other way [18:51:24] <manybubbles> cirrus will grow deep category knowledge at some point. I'm not sure when. [18:51:26] <multichill> Jan_Ainali: First step would be to move the coordinates from the template to CommonsData and just have {{objectlocation}} show those [18:51:33] <fabriceflorin> What experiments and prototypes do you think we should work on first? Would it be useful to start with a simple field, like location or creation date, and try to get it to work with structured data? [18:51:34] <darkweasel> there are still files without information templates – just saying [18:51:45] <Jheald> If you can refine selections, it should be equally possible to refine the contents of categories, just treating the category as an initial selection [18:51:55] <Keegan> thedjNotWMF: information templates are still missing! I added some the other day to really old files [18:52:00] <Jan_Ainali> multichill: yes, that is a nice task for a bot! [18:52:29] <DanielK_WMDE> manybubbles: johannes is eager to make it happen :) [18:52:33] <multichill> Jan_Ainali: Take a look at https://www.wikidata.org/wiki/Wikidata:Coordinates_tracking , we're already doing it for Wikipedia [18:52:42] <darkweasel> fabriceflorin, yeah, that sounds good – location is probably a good start, date= is a too versatile parameter [18:52:53] <Jan_Ainali> multichill: Or I think you missunderstood me. I was talkning about files without any location at all [18:53:00] <multichill> Right [18:53:10] <fabriceflorin> We have 8 minutes left: any final topics you would like to discuss? [18:53:13] <Lydia_WMDE> alright. we have about 8 mins left [18:53:18] <Lydia_WMDE> any big remaining questions? [18:53:24] <Scott_WUaS> fabriceflorin: Where does Structured Data / Wikidata and possible future use cases stand in terms of planning - see https://meta.wikimedia.org/wiki/Wikidata/Notes/Future - vis-a-vis WUaS. (We're already seeing many interesting cases using Wikidata/Wikibase emerge such as http://wikiba.se/). In talking with you about this further, could I develop a weekly or biweekly office hour for projects interested in possible [18:53:25] <Scott_WUaS> future use cases? Does such an office hour already exist? May I possibly please email you about this? [18:53:28] <multichill> That should be possible Jan_Ainali, I'm already doing that for monuments [18:53:36] * JeanFred shouts out to dschwen FastCCI. [18:53:49] <Jan_Ainali> multichill: You are awesome! [18:54:15] <Scott_WUaS> and Lydia_WMDE [18:54:16] <thedjNotWMF> i'm also wondering what people would find a good starting point in terms of data... [18:54:16] <Jheald> Q that I asked above, re the Commons:Structured Data talk page. Is what I wrote there the kind of thing you are looking for ? Will you be commenting back ? [18:55:10] <susannaanas> I would like to invite interested to advise and discuss map metadata https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6WRlI0LJho/edit#gid=0 and move the discussion where it belongs [18:55:14] <Lydia_WMDE> Scott_WUaS: totally. you're always welcome to email me :) [18:55:27] <Keegan> Jheald: I found it useful, I know everyone read it. I'm sorry there aren't any comments back yet - things are still spinning in the air, as you know :) [18:55:29] <thedjNotWMF> Jheald: yes, it is very valuable [18:55:49] <Jan_Ainali> susannaanas: Make it wiki page, that is where it belongs ;) [18:56:01] <Jheald> Oh and can we confirm that this will be on a Wikibase? I was confused by how that relates to DanielK's earlier proposal for a special File Info page, with its own ad-hoc syntax ? [18:56:05] <Keegan> Five minute warning! [18:56:06] <Lydia_WMDE> Jheald: yes that is very helpful [18:56:21] <Keegan> Of course the conversation can continue after that, but it'll be off hours [18:56:28] <susannaanas> Jan_Ainali: :) [18:56:33] <fabriceflorin> Jheald: Thanks for contributing on our discussion. We are reviewing your many comments and will respond in coming weeks. We also invite more people to add comments on that page: https://commons.wikimedia.org/wiki/Commons_talk:Structured_data [18:56:33] <thedjNotWMF> Jheald: if i have one thing to comment on that, that i would like to see a bit more of what you value in terms of priority and for which reasons perhaps. [18:56:48] <Jheald> Keegan: Besides, I imagine there are little whirlpools like the MediaViewer to attend to some of the time... [18:56:55] <marktraceur> "little" [18:56:58] <DanielK_WMDE> Jheald: ad hoc syntax? no. The wikibase software will be installed on commons. structured data about files will be in a special namespace. normal file description pages can access that data (probably mostly via Lua in templates) [18:57:08] * dschwen raises his arms 5 mins late. Thx JeanFred [18:57:13] <Keegan> ...>_> [18:57:16] <Keegan> Never heard of it. [18:57:53] <DanielK_WMDE> Jheald: i hope that clears up any confusion i might have created [18:57:55] <fabriceflorin> Hehe. Yes, MV has kept us busy. But we can’t wait to start on Structured Data, which many contributors have told us is the most important thing our team could be working on in coming years. :) [18:58:40] <Keegan> Well, thanks for coming all. Very useful, I believe [18:58:42] <Jheald> DanielK: sure, & the special namespace isn't editable wikitext, so for an end user like me, it shouldn't really matter how the data is physically arranged. [18:58:53] <DanielK_WMDE> indeed. [18:58:58] <DanielK_WMDE> same as on wikidata [18:58:58] <fabriceflorin> Thank you all so much for joining this chat. It’s such a pleasure to be working with smart, constructive collaborators like you. I look forward to more collaborations with you all in the future! [18:59:04] <Jan_Ainali> fabriceflorin: If I as a contribuotor can chip in, please don't make us wait years! [18:59:09] <Lydia_WMDE> yes indeed! thank you so much for coming [18:59:11] <Keegan> We'll have another one of these in the near future [18:59:28] <Keegan> #endmeeting Structured Data [18:59:29] <wm-labs-meetbot`> Meeting ended Wed Sep 3 18:59:28 2014 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [18:59:29] <wm-labs-meetbot`> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-18.00.html [18:59:29] <wm-labs-meetbot`> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-18.00.txt [18:59:29] <wm-labs-meetbot`> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-18.00.wiki [18:59:29] <wm-labs-meetbot`> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-18.00.log.html [18:59:33] <Scott_WUaS> Thanks, Lydia and Fabrice ... [18:59:44] <Jheald> Thanks everybody [18:59:46] <dschwen> yeah, thanks and till next time! [18:59:53] <moogsi> fabriceflorin: Lydia_WMDE: thank you [18:59:58] <Lydia_WMDE> :) [19:00:26] <fabriceflorin> What I love most about this project is that it is a true collaboration between the community, foundation and WMDE. It feels great to me. I am particularly grateful to multichill, thedjNotWMF and others for their wonderful contributions to this process. [19:01:27] <JeanFred> Group hug ! [19:01:38] <Lydia_WMDE> yay group hugs [19:01:39] * guillom obediently group-hugs. [19:01:56] * ralgis group-hugs as well :) [19:04:28] * dschwen pats everybody forcefully on the back to make this a totally manly hug [21:01:21] <TimStarling> #startmeeting [21:01:21] <wm-labs-meetbot`> TimStarling: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [21:01:37] <TimStarling> #startmeeting RFC meeting September 3 [21:01:37] <wm-labs-meetbot`> Meeting started Wed Sep 3 21:01:37 2014 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:37] <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:37] <wm-labs-meetbot`> The meeting name has been set to 'rfc_meeting_september_3' [21:02:20] <TimStarling> #topic RFC meeting September 3 | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE).| https://meta.wikimedia.org/wiki/IRC_office_hours | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:02:26] <marktraceur> Be afraid. [21:03:33] <TimStarling> csteipp and gwicke, are you at your respective keyboards? [21:04:28] <csteipp> TimStarling: gwicke is talking to me in person... he's headed back to his desk so we can record [21:04:30] <gwicke> yup, just returned from a chat with cscott [21:04:35] <gwicke> eh, csteipp [21:04:39] <gwicke> stupid completion [21:04:45] <csteipp> I get that a lot [21:05:01] <gwicke> yeah, sorry.. [21:05:02] <TimStarling> tab completion should really know who you are talking about so that it can prioritise correctly [21:05:12] <gwicke> definitely [21:05:12] <csteipp> Anyone know the incantation to get the part to officially start the rfc talk? [21:05:25] <gwicke> can easily spare a core for that task [21:05:34] <TimStarling> #topic SOA Authentication | RFC meeting September 3 | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE).| https://meta.wikimedia.org/wiki/IRC_office_hours | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:05:40] <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication [21:06:15] <TimStarling> so, it looks useful to me [21:06:29] <csteipp> So as I mentioned to gwicke irl, I think he and I are a little further apart in what we envisioned this would look like, but I think we have enough to look into [21:06:38] <TimStarling> the idea of this is that it will replace CentralAuth following the SUL finalisation project? [21:06:45] <gwicke> I think it might make sense to walk through the goals first [21:07:10] <gwicke> TimStarling: it should be able to do that, yes [21:07:37] <csteipp> gwicke: You want to lead through the goals? [21:07:50] <gwicke> sure [21:07:53] <gwicke> https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication#Goals [21:08:11] <gwicke> we already talked about single sign-on [21:08:37] <csteipp> ^ on that.. So user signs into <something> and is their user across all wikis and services? [21:08:50] <gwicke> a big goal is to be able to authenticate the bulk of the requests by checking a signature only [21:09:12] <TimStarling> by the way, Brion and Mark give their apologies today, they are in a conflicting meeting until 45 minutes past the hour [21:09:47] <gwicke> csteipp: functionally it'd basically be the same as it's intended with SUL [21:09:58] <csteipp> gwicke: To be pedantic, do mean "authorize" requests based on a signature? [21:10:05] <TimStarling> gwicke: then how do you revoke permissions [21:10:16] <gwicke> TimStarling: lets first walk through the goals [21:10:34] <gwicke> the other part is minimizing the impact of exploits [21:10:55] <gwicke> by limiting the code that has access to sensitive user data [21:11:30] <gwicke> csteipp: authenticate to authorize [21:12:12] <csteipp> gwicke: So... Identify the user? [21:12:22] <gwicke> another part is limiting the trust we need to place into random services & entry points by pushing the checking into the lowest possible layer [21:12:51] <gwicke> csteipp: yes [21:13:40] <gwicke> the other goals are minor I think [21:14:32] <TimStarling> so how do you revoke permissions? [21:14:34] <legoktm> does this RfC also include re-writing the AuthPlugin things (basically https://www.mediawiki.org/wiki/Requests_for_comment/AuthStack) ? [21:14:48] <gwicke> so any comments on the goals? [21:15:24] <gwicke> TimStarling: tokens are time-limited, and certain permissions need to be checked with an authentication service every time they are required [21:15:48] <TimStarling> ah right, hence "revocation within minutes" on the RFC [21:16:01] <csteipp> So there are some distinct functions you've listed. I want to delve into authentication. [21:16:09] <gwicke> TimStarling: that's a possible solution [21:16:22] <csteipp> i.e., only having this service have access to credentials [21:17:53] <gwicke> yes? [21:18:12] <TimStarling> he's probably got some sort of hand injury ;) [21:18:16] <csteipp> It seems limiting to make this new service handle authentication, when i.e., authstack is aiming to privide multiple authn mechanisms. [21:18:25] * csteipp tries to phrase this all. [21:18:38] <csteipp> If we're just trying to protect hashes... let's use ldap auth [21:19:06] <csteipp> So it seems to tackle the authentication side, we need to build a plugable system, i.e., for 2fa [21:19:09] <gwicke> I haven't looked at authstack much [21:19:23] <gwicke> if it isolates the authentication service as well, then great [21:19:30] <TimStarling> you are thinking "log in with your facebook account" etc.? [21:20:13] <gwicke> there are many ways users could authenticate [21:20:14] <csteipp> Mostly 2factor is my concern, but yeah, facebook / openid / etc. [21:20:33] <csteipp> So in this RFC, are we handling those? [21:20:41] <TimStarling> 2FA could presumably be integrated with the proposed service [21:20:45] <gwicke> that's a matter for the authentication service [21:20:54] <gwicke> if that's running some authstack magic internally then great [21:21:27] <csteipp> I assumed not initially, which is why I put that part on the talk page about session management and identification... but if we're doing it, that increases the scope a lot-- I want to make sure we're clear on if we are or not. [21:21:30] <TimStarling> maybe 2FA/OpenID can be a second project? [21:21:33] <gwicke> there are a few solutions for stand-alone authentication services out there [21:21:40] <TimStarling> it seems to me that we need to have a narrow scope [21:21:54] <gwicke> yes, it's out of scope for this RFC [21:22:16] <TimStarling> presumably we can do simple feature parity first and add OpenID/2FA later? [21:22:25] <gwicke> as far as this RFC is concerned, magic happens inside the authentication service and a signed token plops out [21:23:08] <TimStarling> the authentication service in that case should be pretty simple [21:23:20] <TimStarling> the MW integration is probably a larger part of the project [21:23:36] <TimStarling> especially if we want to fix a few architectural issues while we are at it [21:23:49] <csteipp> "magic happens inside the authentication service and a signed token plops out" <- so the service does the authentication? [21:23:55] <gwicke> yes, especially if we want MW to use this service as well [21:24:09] <TimStarling> presumably MW would send the username and password to the auth service [21:24:20] <gwicke> yup [21:24:40] <gwicke> or the user's browser would even talk to the auth service directly [21:24:59] <gwicke> I'd think mw forwards though [21:25:10] <gwicke> csteipp: yes [21:25:19] <TimStarling> omitting 2FA/OpenID means we don't need frontend work, since the UI will be the same [21:25:50] <TimStarling> anyway, I am unconvinced on goal 1, authenticating requests by signature only [21:26:22] <gwicke> it's pretty common these days [21:26:47] <gwicke> TimStarling: which concerns do you have about it? [21:27:35] <TimStarling> seems like premature optimisation [21:27:51] <gwicke> heh [21:28:07] <TimStarling> remote checks can presumably be done in a few ms [21:28:24] <TimStarling> in exchange for a saving of a few ms, you are making auth cookies be short-lived [21:28:36] <TimStarling> which is going to be a hassle for all client implementors [21:28:42] <gwicke> not necessarily [21:29:00] <gwicke> in oauth that's all handled by the library anyway [21:29:01] <TimStarling> plus you introduce a delay before revocation takes effect [21:29:02] <gwicke> and standard practice [21:29:23] <TimStarling> which is sometimes relevant in our world -- sometimes you want to lock people out within seconds [21:29:23] <gwicke> with cookies we can handle it on the server by asking for a new token from the auth service, and returning a set-cookie header [21:30:08] <gwicke> so the validity of those tokens is typically on the order of a minute, at most a few minutes [21:30:13] <TimStarling> consider it this way: if you want people to be able to do zero requests after revocation, then the expiry time has to be shorter than the mean inter-request time [21:30:30] <TimStarling> which means that you need a new set-cookie header on average for every request [21:30:44] <TimStarling> seems like pointless additional complexity to me [21:30:55] <gwicke> it matters for apis [21:31:29] <gwicke> if you aim for a response time < 50ms total, you don't want to spend 10ms or so calling an auth service [21:31:52] <TimStarling> so make it faster than 10ms [21:32:00] <gwicke> for the common case, which would be read requests [21:32:10] <csteipp> So I want to call out a distinction. This ^ is session management. MediaWiki uses memcache for it, and it's a lot faster than 10ms. [21:32:51] <TimStarling> how would session management work? [21:33:11] <gwicke> just as it does right now? [21:33:35] <csteipp> Right now mediawiki exchanges the session cookie for a cached user object (in most cases) [21:33:43] <TimStarling> right now, CA does session management, but CA is going away [21:33:45] <gwicke> so direct access to memcached wouldn't work for many reasons [21:33:53] <gwicke> security being a big one [21:34:09] <gwicke> you don't want random services read & write to memcached [21:34:41] <gwicke> csteipp: that's fine, mw can continue doing that [21:34:52] <TimStarling> the revocation list will presumably be very short -- that is a rare but important event [21:35:16] <TimStarling> suppose you have a service which simply verifies the signature and checks the payload against a revocation list which it has in memory [21:35:30] <TimStarling> surely that could be done a lot quicker than 10ms [21:35:32] <gwicke> normally there is no revocation [21:35:47] <csteipp> Each service keep it's own revocation list? yikes. [21:35:54] <gwicke> if it's a sensitive & typically rare operation, just check with the auth service [21:36:00] <gwicke> which gives you instant revocation [21:36:08] <TimStarling> no, "a service" = the auth service [21:36:25] <csteipp> Oh, that's what gwicke is saying, right? [21:36:29] <gwicke> the auth service would just use the db, as it does right now [21:37:12] <gwicke> so, to clarify: we are mainly talking about reads being authenticated by signed token [21:37:14] <TimStarling> what I am asking about sessions is first -- how do you validate a session? what is in a session cookie? [21:37:18] <gwicke> for public wikis, even that can be skipped [21:37:35] <TimStarling> currently it is a token which is checked against the MW DB, presumably it will not be that anymore [21:37:36] <gwicke> writes would always be validated with the auth service [21:38:01] <csteipp> TimStarling: For that, some sort of signed token [21:38:16] <TimStarling> but I thought signed tokens have a lifetime of 2 seconds or something [21:38:17] <gwicke> TimStarling: it can be just that [21:39:01] <gwicke> we can just keep the session cookie if that works better than a signed user id inside a token [21:39:17] <gwicke> token lifetimes are longer than a typical request sequence [21:39:25] <gwicke> so on the order of single-digit minutes [21:40:06] <TimStarling> by signed token do you mean JWS? [21:40:14] <gwicke> yes [21:40:18] <gwicke> JWT [21:40:42] <gwicke> https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication#JWT_Bearer_tokens_.2F_OpenID_connect [21:40:55] <TimStarling> a JWS is a signed JWT [21:41:10] <gwicke> ah, didn't know that acronym yet [21:41:41] <TimStarling> so you have a JWS in a long-lived session, then the client does a request, gets back a second short-lived JWS? [21:41:47] <csteipp> TimStarling: Cite? [21:41:59] <gwicke> csteipp: http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html [21:42:04] <TimStarling> https://tools.ietf.org/html/draft-ietf-jose-json-web-signature-31 [21:42:38] <TimStarling> yeah, in the JWT spec it says "This example shows how a JWT can be used as the payload of a JWE or JWS to create a Nested JWT." [21:42:54] <TimStarling> then it has a link to the JWS spec where it discusses MAC serialization etc. [21:43:11] <gwicke> TimStarling: for cookies we'd refresh as necessary [21:43:17] <csteipp> Ah, I haven't kept up with jose. Yeah, looks like they're moving to that terminology now [21:43:20] <TimStarling> a JWE is an encrypted JWT [21:43:25] <gwicke> for oauth / openid connect it would follow the normal refresh flow [21:44:23] <TimStarling> it's getting late, we have to talk about tgr's RFC [21:45:02] <TimStarling> this session stuff is not in the RFC at the moment is it? [21:45:05] <gwicke> well, thanks for checking it out! [21:45:18] <gwicke> TimStarling: sessions are pretty orthogonal [21:45:19] <csteipp> It's a large part of the talk page [21:46:15] <gwicke> most services don't need sessions [21:46:24] <gwicke> and the ones that do can store them any way they like [21:46:49] <TimStarling> sounds like you're expanding scope [21:46:51] <gwicke> using the user id, name, or some session id communicated in another cookie if desired [21:47:06] <TimStarling> what I want is documentation of a CA replacement project [21:47:16] <TimStarling> and obviously sessions are fairly relevant to that [21:47:48] <TimStarling> if there is any really urgent service use case that needs to be part of the initial project, maybe that can be added, but it seems like a bit of a drift to me [21:48:28] <gwicke> well, that sounds like a different RFC to me then [21:48:37] <gwicke> sessions are a per-service concern to me [21:48:38] <TimStarling> it's too late for another RFC really, I have another meeting immediately after this one [21:48:58] <TimStarling> tgr: I like your proposal, do you need anything to make it happen? [21:49:28] <tgr> first of all feedback that it's worth investing into [21:49:41] <TimStarling> it sounds pretty awesome to me [21:49:49] <tgr> it turned out significantly more complex than I initially thought [21:50:13] <TimStarling> #topic Server-side Javascript error logging | RFC meeting September 3 | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE).| https://meta.wikimedia.org/wiki/IRC_office_hours | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:50:20] <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/Server-side_Javascript_error_logging [21:50:55] <tgr> I have three points specifically where I would like to know if the tradeoff is acceptable [21:51:07] <ori> what are they? [21:51:37] <tgr> 1. cors, 2. source maps, 3. data transport between client and server [21:52:32] <tgr> about cors: modern browsers sanitize errors if the script is on a different domain, need to add crossorigin="anonymous" to the script tag to avoid that [21:52:59] <tgr> if you do that and there is no Accept-Origin header, the browser will handle it as a 404 [21:53:22] <csteipp> Eek. I'm not a fan of Access-Control-Allow-Origin: * [21:53:24] <ori> so, imo you shouldn't worry too much about those. having the user-agent and the name of the module where the error occurred (easy to do if you wrap RL's execute() with a try/catch) is already very useful and heaps better than flying blind, which is what we do now [21:53:24] <tgr> which means ResourceLoader would have to have some sort of fallback behavior [21:53:52] <ori> the error details aren't terribly important; you can just load the module in the same browser environment and find out that way [21:54:29] <ori> the bigger issue is how to avoid DDOSing ourselves if we happen to deploy code that breaks on a popular browser [21:54:33] <tgr> if the script is not same-domain, there is no error message, no file, no line number, no exception [21:54:47] <tgr> an error report without those is pretty pointless [21:55:02] <robla> "sanitize" == "swallow" then [21:55:09] <tgr> pretty much [21:55:31] <ori> i don't think that's true for RL modules, because they just register themselves, and it's RL itself that actually invokes them, IIRC [21:55:58] <csteipp> But they all come from bits right? [21:56:25] <csteipp> But... bits already allows cors, iirc. [21:56:30] <tgr> I think window.onerror gets the domain of the HTML file, regardless of what file sets it [21:57:02] <tgr> the problem is that there are weird things on the internet which remove CORS headers [21:57:13] <tgr> office firewalls and such [21:57:13] <ori> iirc every RL module is executed by <https://github.com/wikimedia/mediawiki-core/blob/master/resources/src/mediawiki/mediawiki.js#L1092> [21:57:21] <ori> which knows the name of the module it is about to invoke [21:57:33] <TimStarling> #chair ori csteipp [21:57:34] <wm-labs-meetbot`> Current chairs: TimStarling csteipp ori [21:58:02] <tgr> if you use the crossorigin attribute, but there is no header, the script just does not load, period [21:58:21] <TimStarling> can one of you do #endmeeting when you are done talking? I have to go [21:58:41] <csteipp> will do [21:59:27] <tgr> so unless there is some workaround that I don't see, ResourceLoader needs to catch CORS failures and load scripts without a crossorigin attribute in that case, which is a significant increase in complexity [22:00:16] <tgr> ori: here is a simple test case: https://www.mediawiki.org/wiki/User:Tgr_(WMF)/common.js [22:00:29] <tgr> the error source and the onerror handler are in the same file, even [22:00:39] <tgr> you still don't get error details [22:00:43] <csteipp> tgr: That does make it hard. Wouldn't knowing that there is an error be a good, and easy first step. And then adding a test for cors would be even easier, since we would be logging if the test started failing? [22:01:40] <tgr> csteipp: agreed [22:02:05] <ori> take a look at https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.echo.base&skin=vector&version=20140903T215031Z&* [22:02:10] <tgr> also, older browsers don't care about domains, and getting error details for old IE would already be useful [22:02:19] <ori> you can see that the script source is wrapped with: [22:02:28] <ori> if(window.mw){mw.loader.implement("ext.echo.base",function($,jQuery){ [22:02:51] <ori> this registers the function with resourceloader, which then executes it [22:03:18] <tgr> ori: I don't see why that's relevant [22:04:46] <ori> it's relevant because the code of every RL module is enclosed in a closure rather than immediately executed in top scope [22:05:05] <ori> and the closure gets invoked by a script that is always on the same domain, which we control, and which knows the name of the module it is about to call [22:05:56] <ori> this is done in https://github.com/wikimedia/mediawiki-core/blob/master/includes/resourceloader/ResourceLoader.php#L1053 in PHP and in execute() in JS [22:06:24] <tgr> if you are thinking about adding a try-catch to the RL wrapper, that would still not catch exceptions in event handlers, timeout callbacks and such [22:06:55] <ori> yes, but [22:07:13] <ori> take a look at the wrapper code again [22:07:14] <ori> if(window.mw){mw.loader.implement("ext.echo.base",function($,jQuery){ [22:07:19] <ori> we pass each module $ and jQuery [22:07:36] <ori> i added that explicitly so we can pass a patched $ / jQuery to each module [22:08:00] <ori> such that $('.foo').on('something', callback ) can be associated with the module that set it [22:08:02] <tgr> ah, I see [22:08:16] <ori> that doesn't cover window.setTimeout, but it's pretty comprehensive [22:08:17] <tgr> that would work most of the time [22:08:33] <ori> (we could even patch setTimeout, but that's getting ahead of ourselves) [22:08:46] <ori> IIRC Qunit does that [22:08:57] <tgr> sinon.js does, yeah [22:09:15] <ori> Krinkle is absurdly knowledgeable about these things, btw, and is a good resource for that reason [22:09:44] <Krinkle> Patch setTimeout for what? [22:09:56] <Krinkle> scope doesn't change [22:10:09] <ori> so that errors can always be associated with a module [22:10:17] <ori> even if it's asynchronously-executed module code [22:10:26] <ori> that a try/catch in resourceloader's execute() wouldn't catch [22:10:46] <Krinkle> ah, errors, not $/jQuery perf [22:11:02] * ori nods [22:12:36] <tgr> OK, that does sound like a valid workaround [22:13:20] <tgr> the second problem is with source maps, error reports in non-debug mode are pretty useless without that [22:13:54] <ori> you have the module name, that's pretty good already [22:14:02] <ori> module name + user agent is good enough in most cases [22:15:37] <tgr> the main use case of error logging would be 1) error stats, 2) good reports about errors which are not easy to reproduce [22:15:57] <tgr> non-deterministic or race condition or such [22:16:28] <Krinkle> QUnit doesn't, SinonJS does. And it does it synchronously, and in a terribly unstable just-about-works way. [22:16:30] <tgr> if it's easy to reproduce, the developer can usually figure how to do that out from the error report anyway [22:17:04] <Krinkle> what's the topic btw? Is there an office hour? [22:17:18] <Krinkle> debug mode is going away in it's current form as soon as source maps are implemetned. [22:17:19] <tgr> Krinkle: server-side Javascript error logging [22:17:34] <tgr> https://www.mediawiki.org/wiki/Requests_for_comment/Server-side_Javascript_error_logging [22:17:39] <ori> we should probably close the meeting (we can continue chatting informally) [22:17:57] <tgr> you are the chair :) [22:18:07] <ori> #endmeeting [22:18:07] <wm-labs-meetbot`> Meeting ended Wed Sep 3 22:18:07 2014 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:18:07] <wm-labs-meetbot`> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-21.01.html [22:18:07] <wm-labs-meetbot`> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-21.01.txt [22:18:07] <wm-labs-meetbot`> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-21.01.wiki [22:18:08] <wm-labs-meetbot`> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2014/wikimedia-office.2014-09-03-21.01.log.html [22:19:43] <tgr> Krinkle: re setTimeout, we would need something like window.setTimeout = function ( f, t ) { oldSetTimeout( function() { try { f(); } catch(e) { mw.reportError(e); } }, t ); }; [22:19:58] <tgr> if there is any chance to do that in a cross-browser-compatible way [22:20:58] <Krinkle> Is this the only thing left? It seems rather unimportant to know which module it's from. Trivial to determine once it reaches a human. Or is there extra automation you can do with the module name (besides mentioning it in the report). [22:21:45] <tgr> the module name is not important, catching the error is [22:22:13] <tgr> so that it can be logged automatically instead of having to explain users how they should use the debug console [22:23:40] <tgr> Krinkle: is there a roadmap for source maps? [22:24:26] <Krinkle> tgr: roadmap is currently blocked on there being frontend platform (human) resources available, or Roan/myself not being allocated elsewhere. [22:26:09] <tgr> ori: the third problem is not being DOSed, as you said [22:27:55] <tgr> my current plan is <img> with error data in URL query string -> varnish -> varnishkafka -> kafka -> logstash-kafka -> logstash -> elasticsearch [22:28:04] <tgr> I think that scales well in every step [22:28:40] <tgr> it would require running a second instance of varnishkafka though, the first is used to log pageviews [22:29:21] <tgr> and it assumes that the VSL query option of varnishkafka works, as I understand no one has used it so far [22:36:07] <gwicke> csteipp, TimStarling: I copied the soa auth log over [22:45:14] <ori> tgr: i can help with that