[06:35:53] lmfao dongs [21:02:58] #startmeeting RFC meeting [21:03:19] looks like somebody needs to restart the bot? [21:03:21] last week we had two meetbots, I think I said that it's better to have two than zero [21:03:25] hehe... [21:03:49] we can just pretend it's here, and later grep for #info etc [21:04:09] :) [21:04:16] anyone around who knows how to restart meetbot?... [21:04:22] https://wikitech.wikimedia.org/wiki/Tool:Meetbot [21:04:27] but I'm not allowed [21:05:32] bd808: can you check that? [21:05:42] > You are not a member of the group tools.meetbot [21:05:43] meh [21:05:45] anyway. [21:05:58] * bd808 tries to poke meetbot [21:06:04] Today, I'd like to again talk about canonical data URLs, https://phabricator.wikimedia.org/T161527 [21:06:36] you *could have a labs admin restart it no? [21:06:42] Last time, we said that this was ready to go on Last Call, but a few issues came up quickly that I would like to resolve first [21:06:50] bd.808 is one and trying [21:06:53] Zppix: weekly? [21:06:54] there are three open questions listed on the pah page [21:07:11] TimStarling: i meant right now lol [21:07:25] bd808: thanks :) [21:07:44] #startmeeting RFC meeting [21:07:44] Meeting started Wed Apr 12 21:07:44 2017 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:07:44] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:07:44] The meeting name has been set to 'rfc_meeting' [21:08:23] do we need to discuss this new endpoint versus RESTBase, which gwicke raised last hour? [21:08:55] I think Daniel indicated that the API portion is less important for now [21:08:55] we did talk about this last time... anyone want to revisit that point? [21:09:08] it's more about the stable identifier [21:09:24] yes - I am interested mostly in having stable identifiers. they should be resolvable, but how exactly they resolve isn't my main concern right now [21:09:53] e.g. we can have the data URL resolve to action=raw for now, and later change that to RESTbase. or soemthing. [21:10:09] ok, so the open questions, as listed on the page, are: namely: 1) why not just use the normal /wiki/ urls as identifiers? 2) how to address slots besides the main slot, once we have MCR? and 3) should we use page IDs instead of titles, so the URL is stable against renames? [21:10:17] DanielK_WMDE: I added you as a meetbot maintainer too. you use it more than most these days [21:10:36] bd808: thanks! perhaps also add tim [21:11:10] Perhaps let's go through these questions in inverse order. So the last one first. [21:11:25] Shall we use the page ID in the data URL, instead of the page title? [21:11:35] pro: stable against rename. [21:11:49] con: not human readable, needs API or DB lookup to construct from a title [21:11:50] con: stable against rename [21:12:02] heh :) [21:12:14] I mean, I mentioned before that it's not clear what the user's intention is [21:12:26] sometimes pages start with the wrong names, and it's desired to fix them [21:13:08] i think archiving is the primary use case for move-and-replace, where you would want to reference the title, not the page id. [21:14:01] DanielK_WMDE: Tim would have to join Tool Labs for me to be able to do that :) [21:14:05] for actual articles, i think this is rare. you more often have pages moved to a "better name", but would still want to find that same page when dereferencing. in many cases, that works via a redirect. but often enough, the old name becomes a disambiguation page [21:14:30] on wikipedia [21:14:36] yes [21:14:39] but this might not be relevant for the current use case [21:14:48] [[George Bush]] is a good example [21:15:24] TimStarling: you mean, geo-shapes on commons? [21:15:26] if you archive a dataset by moving it away and creating a new dataset with the same name [21:15:38] e.g. if a geo-shape is replaced when a new survey is done [21:15:44] what are clients expecting? [21:15:56] there is no way of knowing, that's true [21:16:30] so, tim is in favor of using titles. so am i, mostly for practical reasons. [21:16:35] any other thoughts? [21:16:50] you can always add an indirection when you need a stable identifier.. [21:16:55] ideally we would have user-configurable redirects which go from a "current" title to a stable title [21:17:15] would /data follow #redirect? [21:17:23] that can be done with #REDIRECT, no? [21:17:29] action=raw doesn't, right? [21:17:36] yea... maybe... no... ugh :) [21:17:36] /by-id/12345 redirecting to /by-title/CurrentTitle [21:18:00] so maybe that has to be a feature of the endpoint [21:18:01] TimStarling: goo ustion. i'd want it to. unless i actually want to address/retrieve the redirect iteself. [21:18:24] the REST API follows redirects by default [21:18:29] with HTTP redirects [21:18:41] can that be suppressed? [21:18:52] yes, optionally [21:18:56] disables caching for the response [21:18:57] i'm thinking of redirect=no in the data url. not pretty, but would be a solution [21:19:02] non-browser clients can just opt not to follow a redirect [21:19:28] yeah, but browsers sadly don't offer control over redirects [21:19:29] true, but that makes the semantics of the identifier ambiguous [21:19:51] what does /data/Foo *identify* if Foo is a redirect? [21:20:09] what is the data thing anyway? [21:20:12] is this about articles? [21:20:21] or the underlying concepts the articles are about? [21:20:25] or a title? [21:20:33] it's not about concepts [21:20:40] it's about the data that makes up the article [21:20:54] independent of the name? [21:21:09] that's the question [21:21:11] revision? [21:21:23] independent of revision, yes. [21:21:28] the content can change a lot [21:21:35] so the answer to your question 3 is to use titles by default, but perhaps provide a by-id endpoint optionally or later [21:21:37] so semantically it can turn upside down [21:21:57] but the page id or title might remain the same [21:22:26] gwicke: a description can change radically, it would still be the same for the purpose of this interface. [21:22:47] are there any objections my previous statement? [21:22:48] the continuity of identity is one of the trickiest questions of ontology :) [21:23:05] +1 to by-title with by-id optionally later [21:23:24] TimStarling: sounds good to me. we may want to clearly specify behaviour wrt redirects. [21:23:32] #agreed 3. use titles by default, but perhaps provide a by-id endpoint optionally or later [21:23:34] or by-magic-semantic-hash, once invented [21:24:07] #info need to clearly specify behaviour wrt redirects. Following redirects is desirable for most use cases. [21:24:20] ok. second question. slots. [21:24:47] can content negotiation be used to select the slot? [21:24:54] so, for example: we'll have a second slot on file description pages, containing a JSON blob that represents meta-data. [21:25:18] which slots would RDF clients actually care about? [21:25:44] TimStarling: i don't think so. content negotiation is for selecting different representatiosn of the same data. The information in different slots should describe disjunct aspects of the page [21:25:44] I'm assuming this is not supposed to be a full slot API driven entirely by content negotiation [21:26:37] gwicke: honestly: i don't have a clear use case for slots in this context right now. I just don't want the entire schema to fall over if we run tino the need of addressing a slot that isn't the main slot [21:27:06] page title needs to be the final component since it can have slashes [21:27:08] so, my proposal is: /data/ for the main slot, /foo-data/ for slot foo, etc. We can have /main-data/ be an alias for plain /data/. [21:27:19] unless you require them to be encoded as %2F, which is messy in a path part [21:27:23] then you are building an ad-hoc REST API [21:27:29] I thought that was out of scope for now [21:27:41] yea. [21:27:43] #help It wasn't clear from the task whether each /data end point would map to /wiki end point or multiple /data points can be used in multiple /wiki end points. Can someone clarify? [21:28:31] I honestly wouldn't worry about slots too much at this point [21:28:39] if the relationship is multiple-to-multiple, then is there need for talking about slots? [21:28:42] DanielK_WMDE: that requires varnish/apache conf changes every time a slot is introduced [21:28:44] bmansurov: ignoring slots, /data and /wiki have a one-tpo-one relationship. with slots, it's potentially one /data url per slot. [21:28:46] the primary need seems to be covered by the main slot [21:29:02] /data/main/ could be resolved in MW [21:29:18] gwicke: i just want to have a plan for how to handle them. no need to actually implement it for now. [21:29:34] you can just expose them via an API [21:29:40] TimStarling: but data/main/ is ambiguous. "main/" could be a page title [21:29:40] it doesn't need to be this API [21:30:02] no, don't have titles under /data, slot names only [21:30:05] gwicke: i don't need a way to fetch the data. i need a way to identify the blob. [21:30:23] or have 0 as an alias for main if main is too long [21:30:25] TimStarling: so we start out with /data/main/Foo, and never use /data/Foo? [21:30:34] yes [21:30:36] DanielK_WMDE: wait, is that a new requirement, beyond identifying the article? [21:30:54] /data/0/Foo is a possible shorter alias if that is necessary [21:31:03] gwicke: the requirement is stable identifiers for the data blobs that define a wiki page. with MCR, there is more than one. [21:31:42] TimStarling: i'd rather have "main". 0 seems cryptic. [21:31:45] this is for machine consumption, we don't have to mix namespaces [21:31:49] that smells like you are mixing implementation details like how a page is represented in blobs with long term article ids [21:32:18] to me, such concerns are solidly API land [21:32:24] not concepts and identifiers [21:32:35] gwicke: i guess we have a different perspective on whether slots are implementation details. [21:32:40] to me, they are not. [21:32:50] they are components of the page. [21:33:06] they surely are not at the same semantic level as the article itself [21:33:08] and i need the ability to identify the component i want [21:33:26] DanielK_WMDE, wouldn't that assume that each /data end point is only used in one page? [21:33:43] DanielK_WMDE: identify to retrieve, or identify to reference in RDF? [21:33:48] or both? [21:33:50] you can't fix it later if you introduce /data/Foo [21:33:59] bmansurov: not sure what you mean by that. we are talking about stable identifiers for the data that defines the content of a page. [21:34:20] if you don't want to identify the slot then have a generic endpoint /data/*/Foo or something [21:34:22] DanielK_WMDE, say some data is needed on two pages, how would that work? [21:34:25] do you ever foresee a need to version slots? [21:34:27] gwicke: both. identify is my main concern, but rdf uris should be resolvable, by convention. [21:34:34] for example, when we split one slot into two? [21:34:51] should old URLs still work? [21:34:56] bmansurov: transclusion. that's not in scope for this discussion, though. this is not about how data gets into pages. [21:35:08] ok thanks [21:35:56] if there is a need for versioning etc, then you are basically describing an API [21:35:58] gwicke: probably. for one of them. or none. depends on the specific case. i don't forsee such splitting to happen frequently, if at all. I can't guarantee it won't. [21:36:15] I currently don't see a need for versioning slot names, no [21:36:26] content models, yes [21:36:31] that would be usefuö [21:36:35] slot names... no [21:36:43] you don't want to rule out replacing slots, though [21:36:53] or changing their semantics significantly [21:36:54] no. but then the old slot is gone. [21:37:05] the identify will no longer be resolvable. [21:37:16] you could still provide it for backwards compatibility [21:37:20] changing the semantics of a slot would be bad. [21:37:28] yes, could. [21:37:39] my point is that all these concerns are typical API concerns [21:37:57] the ones you brought up, yes. [21:38:31] so imho it make sense to treat this as an API design question [21:38:34] all i want is a stable identifier for the data that defines (a component of) page content. [21:38:42] my vote is to have *something* after /data other than the title, so that it is at least forwards-compatible [21:38:46] APIs need to be stable [21:38:51] a single letter, a UTF-8 smiley, whatever [21:39:03] so we use techniques like versioning [21:39:03] ☺ [21:39:07] https://www.mediawiki.org/wiki/API_versioning [21:39:28] 💩 [21:40:00] it doesn't make sense to version a stable URI, described as such [21:40:12] gwicke: URIs need to be stable, that's why they typically do not use versioning ;) [21:40:17] versioning makes it possible to keep the API stable [21:40:32] while also allowing for inevitable changes [21:40:45] gwicke: the need is exactly the *opposite* of a stable API's. [21:40:45] if you don't version, you'll break the api at some point [21:41:19] how so? [21:41:23] for an API, I don't care much about stable URLs. I can change my client to use a different URL. I care a lot that the same URL will behave the same. [21:41:33] just keeping the URL the same doesn't magically keep the semantics the same [21:41:36] is it meant to be that comparing the URLs is equivalent to comparing the objects? [21:41:36] But for URIs, I care that the same URI will always *mean* the same [21:41:52] https://en.wikipedia.org/wikiv1/Main_Page [21:41:53] is that implicit in URL stability? [21:42:13] while with a version, you can keep that guarantee while also introducing /v2 [21:42:43] TimStarling: not sure about "the same", but yes: having several URLs for the same object should be avoided. [21:42:52] which might remove slots, or change semantics of existing slots [21:43:01] that's why there is "canonical" in the name pf this proposal [21:43:41] in any case, we are very much in API design land [21:43:43] gwicke: keeping the URL the same means I'm still talking about the same thing. it does not guarantee that I still get the thing in the same format. [21:43:47] and I thought that this was out of scope for now [21:44:10] Was this the page you meant, legoktm: https://en.wikipedia.org/wiki/Wikiv1/Main_Page ? [21:44:46] #info my vote is to have *something* after /data other than the title, so that it is at least forwards-compatible [21:44:54] #info it doesn't make sense to version a stable URI, described as such [21:45:01] Scott_WUaS: uh, no, I was making a joke/comment about putting version numbers in "stable URI"s [21:45:12] ok, 15 minutes left. let's move on to the next (first) question: [21:45:26] Thanks :) [21:45:28] why not just use the /wiki path, with content negotiation. [21:45:33] I take gwicke's point about versioning enabling stability, the question then becomes what do you mean by "stability" [21:45:56] which is why I asked about comparing URLs [21:46:05] indeed, stability of what. of an API? yes, I agree! for identifiers? no, quite the contrary. [21:46:20] let's move along though, yes [21:46:34] if your entity semantics change completely, I would argue that you broke your stability promise [21:46:49] i don't see that happening [21:46:51] anyway [21:46:52] #info if your entity semantics change completely, I would argue that you broke your stability promise [21:47:13] ok, the /wiki path: [21:47:31] using /wiki risks confusing non-browser clients that screw up their accept headers [21:47:45] pro: it's already used to identify wikipedia pages, articles (descriptions of things), or concepts in several data sets (dbpedia, wikidata) [21:48:05] using /wiki makes request logs more difficult to analyse [21:48:13] #info having a version in your URL and content types lets you keep your stability promises, while also allowing for eventual changes in semantics [21:48:14] con: /wiki refers to a "wiki page", not really "the data defining the content of the wiki page". [21:48:27] the /wiki path is also tightly bound to the idea of a UI for interacting with the content [21:48:32] #link https://www.mediawiki.org/wiki/API_versioning [21:48:36] using /wiki introduces a requirement of frontend cache hacking that makes it difficult for others to follow our example [21:49:05] including us following our own example if we ever change our frontend caching system [21:49:08] well, /data will also need some hacking... is that really a lot better? [21:49:16] yes, because it's low-traffic [21:49:30] just route it to a single raspberry pi [21:49:33] ok, so the argument is practical: don't mess with the fire hose [21:49:55] yeah, per my second point, note that there is a logging fire hose [21:50:06] we mess a lot with the fire hose anyway [21:50:12] lots of normalization, rewriting etc [21:50:24] #info /wiki is high volume and high profile. messing with it is scary. [21:51:07] are you supporting this idea gwicke? [21:51:45] semantically, it seems to me that /wiki and /data simply refer to different things. [21:51:47] basically if you plan to support CDN caching, looking at the Accept header in the CDN is inevitable [21:51:52] it doesn't seem to agree with the things you were saying about API versioning etc. [21:52:20] compare https://commons.wikimedia.org/w/index.php?title=Data:Avignon_City_Wall.map&action=raw and https://commons.wikimedia.org/wiki/Data:Avignon_City_Wall.map [21:52:34] gwicke: if you're just playing devil's advocate then we can move along [21:52:38] are you asking for an explanation of how content negotiation works at the CDN layer? [21:52:40] are these two representation of the same thing? or are they different things - one is a wiki page with a map on it, the other is geoshape data? [21:53:02] I'm assuming that you are familiar with that [21:53:41] I'm saying that if nobody here actually wants /wiki as the stable URI then we can dispense with it without arguing every fine point of what it would imply [21:54:07] this use case seems to be similar to the mobile redirect [21:54:14] which is also set up at /wiki/ [21:54:27] i actually see good arguments for using /wiki. But i'm still convinced that it's simply wrong semantically. and more complicated and risky practically. [21:54:35] so imho the CDN argument is not very strong [21:55:20] gwicke: any thoughts on the semantics of the two example urls above? [21:55:36] what does each refer to, logically, in your mind? [21:55:49] I'm not asking about the strength of any particular argument, I'm asking whether, given there is 5 minutes remaining, whether you want to register an objection to DanielK_WMDE's plan based on it not using /wiki [21:55:58] I'm not sure how those would be related to the actual article [21:56:14] gwicke: one of the is the actual article [21:56:34] most users would link to /wiki/Articlename [21:56:43] this goes to whether we need to schedule another meeting for this or if we can just let DanielK_WMDE go ahead and implement it [21:56:46] and consider that "the article" [21:56:59] gwicke: yes. that's my point. [21:57:24] gwicke: to me, "the article" is distinct from "the data that defines the article's content". [21:57:47] ah, okay - I took the Data: prefix to have a meaning in this context [21:57:57] so this is just an article in the Data namespace? [21:58:00] yep [21:58:10] kk [21:58:17] 2 minutes now, any decision? [21:58:30] Thank you, Tim, Daniel, Gabriel, Legoktm and All! [21:59:07] can this go to last call? [21:59:30] I'm personally not convinced that we should introduce too many identifiers for an article [21:59:31] that would still give gwicke a week to object :) [21:59:55] #info I'm personally not convinced that we should introduce too many identifiers for an article [22:00:06] #info gwicke: to me, "the article" is distinct from "the data that defines the article's content". [22:00:40] earlier you said that you looked for an id for "the article" though [22:00:43] gwicke: for the rdf mapping of wikidata, we have (iirc) 7 semantically distinct identifiers for each property ;) [22:00:46] I don't think that's really strong enough to block it [22:01:14] gwicke: i did? where? i have tried hard to always say "article data", not "article". [22:01:45] when we discussed page ids, titles etc [22:02:07] maybe i got sloppy, then. sorry about that. [22:02:12] Krinkle: are you still here? [22:02:27] Yep [22:02:34] Was AFK for a few minutes though [22:02:36] catching up [22:02:53] I'm wondering if we should have a majority decision on this [22:02:57] so are you saying that you don't need to identify the article any more? [22:02:58] only slots? [22:04:08] the proposal is for /data//Page_title to be put to last call [22:04:09] no, i have a need to identify the data of (slots of) an article, and distingish that from the article as such. [22:04:13] Regarding earlier argumnt about page renames, that goes both ways and applies to page links as well. When a concept is split or re-purposed we can only hope that the title remains stable, but which page was organically which is not as predicable from a semantic view point. That's why titles make more sense to me. [22:04:32] where is some string to be defined, possible generic, let the server decide which slot [22:05:05] I imagine from a coding perspective /data/ would have to be configured as such rewrite won't exist by default and should presumably default similar to how wgArticlePath has a default and the site admin commits to making that work and maintain back-compat in the rare event it does change. [22:05:49] ok, so you are happy with this proposal as it stands? [22:05:52] Imho the main discussion here is about the behaviour of the entry point, implemented as a special page. If we wanna rewrite that to a shorter form or not is imho less importand and I don't mind it being inside /wiki/ either. We do that for OAuth as well. [22:06:04] important* [22:06:19] this seems to have morphed from a content negotiation issue into a proposal for a REST API [22:06:33] as far as I'm concerned, if you're happy with multiple options, it should be the implementor's choice as to which one is selected [22:07:24] that is all I'm asking here, whether there are any objections to this which will actually block implementation [22:07:25] ftr, I think the introduction of a new REST API needs a bit broader discussion [22:07:56] gwicke: i have no idea where you are seeing a new api. or any api, really... [22:08:30] see it as a standard naming scheme for some kinds of RESTbase resources. [22:09:15] agreed versioning seems undesirable in this proposal. [22:09:19] if DanielK_WMDE and Krinkle will agree to the question I'm putting, then we'll mark it as done and close the meeting [22:09:23] it is a REST interface that applications are expected to hit [22:09:29] sounds like an API to me [22:10:13] TimStarling: i'm happy with the proposal as it stands ;) [22:10:34] We shouldn't hardcode the semantic as a revision slot, but yes it should accomodate future expansion beyond data/. We can call it slot for now I suppose. [22:10:35] <Krinkle> LGTM [22:10:49] <DanielK_WMDE> gwicke: that's true for any URL then. [22:10:54] <TimStarling> ok, so the amended proposal is [22:11:07] <TimStarling> an endpoint /data/<something>/Page_title [22:11:15] <DanielK_WMDE> \o/ [22:11:25] <TimStarling> where <something> is to be defined and not really needed initially [22:11:40] <TimStarling> but provided for future expansion of the endpoint [22:11:54] <Krinkle> I'm actually thinking about title/<something> as well. As much as I hate hte fact that we still haven't adopted normalised slugs, it seems reversed that it comes before the title. REST got this rgiht as the expensse of awkward and fragile slash encoding. [22:12:07] <Krinkle> at the expense* [22:12:25] <DanielK_WMDE> #info ftr, i think it's going to be important to clarify the relationship between these URLs and the REST API, to avoid confusion. [22:12:29] <TimStarling> that's what you get for having slashes in titles [22:13:10] <DanielK_WMDE> Krinkle: yes, much prettier, too brittle imho... [22:13:19] <gwicke> it seems that you all seem to associate versioning with instability [22:13:32] <Krinkle> We'd have a lot fewer renames if slugs were introduced and titles more liberally editable. But alas, separate discussion. [22:13:36] <gwicke> and lack of versioning with stability [22:14:00] <DanielK_WMDE> gwicke: of identifiers, yes. it's the reverse for apis. [22:14:18] <gwicke> identifiers of what, though [22:14:26] <TimStarling> so Krinkle, considering the time (1:14 into a 1:00 meeting, after 2:14 on this topic altogether) [22:14:29] <gwicke> is the thing you identified still there? [22:14:44] <TimStarling> are you objecting on the basis of the order of the title and the <something> [22:14:49] <gwicke> you are basically saying that the semantics of slots will never change [22:15:10] <TimStarling> oh, speaking of running over [22:15:18] <gwicke> and also that a version would somehow make them change more quickly [22:15:20] <Krinkle> pages do get deleted, Not Found or Gone is the appropiate response for that. [22:15:33] <TimStarling> I see that I have another meeting scheduled starting 15 minutes ago with vcoleman [22:15:58] <Krinkle> TimStarling: I'm agreed just wanna make sure the other still agree as well after I brought up the issue of slot being before title. [22:15:58] <DanielK_WMDE> TimStarling: tell her it was my fault ;) [22:16:14] <TimStarling> ok [22:16:35] <TimStarling> #agreed move to last call with proposal /data/<something>/Page_title where <something> is to be defined and not really needed initially [22:16:44] <TimStarling> #endmeeting [22:16:44] <wm-labs-meetbot> Meeting ended Wed Apr 12 22:16:44 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:16:44] <wm-labs-meetbot> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-04-12-21.07.html [22:16:44] <wm-labs-meetbot> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-04-12-21.07.txt [22:16:44] <wm-labs-meetbot> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-04-12-21.07.wiki [22:16:45] <wm-labs-meetbot> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-04-12-21.07.log.html [22:16:51] <DanielK_WMDE> \o/ [22:17:00] <DanielK_WMDE> thank you TimStarling, gwicke and Krinkle! [22:19:04] <Scott_WUaS> +1