[22:01:10] #startmeeting REST API response format version negotiation: https://phabricator.wikimedia.org/T124365 [22:01:31] hello [22:01:50] meetbot broken? [22:01:54] * legoktm waves [22:01:56] yeah, not sure it's around [22:01:58] greetings All [22:02:33] anybody knowledgeable about meetbot around? [22:02:48] * gwicke looks at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Tools/meetbot [22:03:20] I'm doing it [22:03:24] I don't have an account in tool labs [22:03:28] hmm, maybe not [22:04:16] maybe I don't have an account either, or the restricted bastion is not documented [22:04:17] tech problems - so typical for engineers :D [22:04:23] well, worst case we can just pretend it's working and then grep everything with #info etc from our own logs [22:04:50] let me give a short intro on the RFC [22:05:00] #link https://phabricator.wikimedia.org/T124365 [22:05:24] the scope of this RFC is negotiating the response format of REST API end points [22:05:49] the REST API has a global version (currently v1) that is used to version the API as a whole [22:06:24] it will be incremented whenever the API layout or the request interface of stable end points is changed in incompatible ways [22:06:50] I have just been reading http://www.troyhunt.com/2014/02/your-api-versioning-is-wrong-which-is.html [22:06:52] the 'stable' in there refers to the per-entrypoint stability markers [22:07:04] as described in https://www.mediawiki.org/wiki/API_versioning [22:07:04] and http://www.lexicalscope.com/blog/2012/03/12/how-are-rest-apis-versioned/ [22:07:52] now, getting back to the RFC, we still have a need to sometimes change the response returned from a *stable* entry point in incompatible ways [22:08:49] concretely, the Parsoid team is about to move data-mw (template metadata etc) from inline attributes to a separate JSON blob; this change would break existing clients if we don't provide a means to continue to receive the old response [22:09:19] if they don't want to switch to the new layout, that is [22:09:48] the main questions in this context are: 1) How to request a specific response format, and 2) What to do if no format was specified [22:10:20] or if they cannot be switched to accept the new layout at the exact same moment the new format is dpeloyed. [22:10:33] for 1), there are two "wrong" proposals, meaning that both have pros & cons [22:10:52] https://www.mediawiki.org/wiki/Talk:API_versioning has a list of pros / cons for each [22:11:21] ah yes, I see it already links to that Troy Hunt article [22:11:39] *nod* [22:11:49] his approach was to just do it all the ways [22:12:07] that way you have a URL if you need one for some reason, but normally you can use an Accept header [22:12:28] for 1) i'd be for the accept header [22:12:36] the downside is that you get the complexity of both [22:12:57] looks much cleaner and is more API-like [22:12:58] wrt VCL complexity and documentation [22:13:30] but, the pro is that you can easily link to a specific version [22:13:45] *and* link to the latest [22:13:46] right, but with query params you might get into the mw api query params caching trap we have right now [22:14:40] yeah [22:14:49] wrt linking, the content render doesn't actually change [22:14:56] it's doable though, with a lot of logic in varnish [22:15:18] it's apparently more common to have the version in the path though [22:15:27] TimStarling: right, but it's less of an effort to do so with clean headers [22:16:04] * mobrovac "it's doable" should be every engineer's moto :P [22:16:06] global api versions are very common in the path, per-entrypoint versions a bit less so [22:16:14] what do the URIs identify? An API entry point, or the data returned by that entry point (or eve nthe thing described by the data returned by the entry point)? [22:16:34] in wikidata, we use separate URIs fpr the Thing and the data describing the Thing. [22:16:46] the first can't be versioned, the second one can [22:17:08] it's typically slightly different representations of the same logical resource [22:17:16] DanielK_urlaub, example urls for 'the thing' and 'the data describing the thing'? [22:17:52] so very much in the spirit of HTTP content negotiation [22:18:09] parsoid already has /v2 [22:18:15] subbu: https://wikidata.org/entity/Q42 and https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf [22:18:29] TimStarling, and /v3 [22:18:31] TimStarling: that's the global API version, which is a different issue [22:18:52] because the api uris completely changed. [22:19:17] subbu: ...and https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf?revision=293317199 [22:19:34] DanielK_urlaub: how do you represent changes in your rdf format? [22:19:51] gwicke: my point is: if your api returns data that is itself versioned, you have multiple dimensions of versioning to deal with [22:19:57] and how do you request a specific RDF format version? [22:20:10] you may have to encode both in the uri without creating confusion [22:20:19] ah right, /:domain/v3 [22:20:26] it's not at the top level [22:20:33] DanielK_urlaub, so, clients making requests to Q42 have to specify the rdf version in the request? [22:20:34] gwicke: there's https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf?flavor=dump for example [22:20:36] gwicke: specific rdf version https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf?revision=293317199 [22:21:03] DanielK_urlaub: revision as in MW revision is a different resource already [22:21:17] of course [22:21:37] (Article titles should be capitalized, which will prevent an article named 'v3' from being an additional source of ambiguity.) [22:21:42] i.e. a request to https://wikidata.org/entity/Q42 might return a new data object after the rdf is updated. [22:21:45] but if i use ?version?1234 in the uri, it's not obvious if that refers to the version of the resource, or the version of the api [22:22:31] the core question is whether we should use Accept headers, a query parameter, or both [22:22:37] subbu: indeed it will. and if you resolve it, it actually redirects to the document URI (also without a fixed version, though) [22:22:51] *nod* [22:22:51] so, to clarify, we are discussing here returning *the same revision* of the content, only slightly differently represented [22:23:26] yes, the revision question is orthogonal / OT [22:23:36] gwicke: i guess my point is: if it's a query parameter, make the name very clear. apiversion=123, not just version=123 [22:23:54] DanielK_urlaub: makes sense [22:24:15] the proposal presents a strawman of ?accept=text/html;profile=mediawiki.org/specs/html/1.0.0 [22:24:18] yea, the revision thing is only relevant in as much we should avoid clashes and confusion when naming parameters [22:24:46] #link https://www.mediawiki.org/wiki/Talk:API_versioning [22:24:58] gwicke: does the profile only govern the output representation, or also the interpretation of the input? [22:25:13] it's the mime type of the response [22:25:21] so only affects the response [22:25:41] so how would you version breaking changes to the parameters an api accepts? [22:25:48] backwards-incompatible changes to the request part would force an increment of the major API version [22:26:10] (or the creation of a new entry point, to avoid breakage) [22:26:21] so gwicke, your preferred solution is the Accept header? [22:26:23] and how does the client specify the major api version in a request? [22:26:56] DanielK_urlaub: it's part of the URI - /api/rest_v1/ ... [22:27:05] TimStarling: if I have to pick one, I think I am leaning slightly towards the Accept header [22:27:17] +1 [22:27:41] mobrovac: i thought that was only one of the alternatives, and not the one that gwicke is prefering [22:27:42] if I had to pick as a user of an API, uri query params for versioning would be simpler from a usability PoV, but understand it complicates aspects of the implementation .. and accept header feels cleaner. [22:27:44] fwiw, i think allowing both could create too much confusion [22:28:14] uri query params for per-endpoint format versioning, to be clear. [22:28:21] the Accept header would be forwarded to parsoid? [22:28:21] one thing that's strong about Accept is the symmetric use of the content type [22:28:31] gwicke: i'm getting the idea that the accept header is good for specifying the expected output format, but versioning of the api input should go into the url. [22:28:35] DanielK_urlaub: the overall api version is contained in the url itself, but here the focus is on small mime-type-like changes in the returned content [22:29:04] subbu: why would it be simpler for you? [22:29:18] TimStarling: potentially; there are optimizations we can apply if an older version of the same content is in storage [22:29:19] it's an API - so you have to build an application that talks to it [22:29:35] yeah but didn't you previously say that version conversion is not RESTBase's responsibility? [22:29:51] TimStarling: we have been discussing a html2html end point, which would convert one version of html/data-parsoid/etc to another [22:30:13] yes, the conversion itself is not, but making the request to parsoid to ask for the conversion might be [22:30:13] mobrovac, testing via browser params vs headers. [22:30:13] grr. [22:30:14] in the browser via params [22:30:15] urls are easy to construct [22:30:20] that's all behind the scenes, though [22:30:33] parsoid could provide a format upgrade/downgrade API [22:30:45] oh right, sorry - i was under the impression that gwicke was proposing to also move the major version marker into the accept header. [22:30:50] #link https://phabricator.wikimedia.org/T114413 [22:31:09] TimStarling, yes, we need to provide that. [22:31:10] subbu: that's what curl is here for [22:31:12] :P [22:31:19] mobrovac, i was afraid you were going to say that. :) [22:31:19] TimStarling: ^^ is a task discussing a html2html end point [22:31:43] subbu: I agree, that's a strong point for URLs [22:32:00] we could have a testing gateway which does nothing but maps query strings to request headers [22:32:00] so why not allow both? [22:32:06] if that is really needed [22:32:14] a mitigating factor is that the swagger sandbox shows accept headers by default: https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_html_title [22:32:19] TimStarling, fair enough [22:32:44] subbu: with our current content types which contain slashes and whatnot, i'd like to see you uri-enconding that by hand :P [22:33:03] DanielK_urlaub, you mean by having the query string one to do an internal redirect to the accept header version? [22:33:12] it's pretty normal for REST APIs to be broken when you can't set custom request headers [22:33:17] TimStarling: yeah, that could be an option if we can manage to do so without overly complicating the VCL config [22:33:24] exactly TimStarling [22:33:26] and I think gateways are a normal solution to that [22:34:11] my proposal would be to start with one option, and consider adding a second alternative in a second step [22:34:28] pragmatically, we probably don't have the time to do both at once [22:34:39] nor does it make sense really [22:35:46] picking one version doesn't stop us from adding an alternative later [22:36:11] i think we should move to part (2) of the rfc if part (1) feels resolved .. which it seems it is .. in terms of accept-header being the preferred solution. [22:36:16] s/version/mechanism/ (to avoid confusion) [22:36:34] * mobrovac looks forward to (2) [22:36:54] TimStarling, DanielK_urlaub: does subbu's summary reflect your view? [22:38:00] gwicke: yep [22:38:02] yes [22:38:41] #summary overall slight preference for going with Accept headers, but option for adding query strings later remains [22:38:45] okay, great [22:38:48] so, lets move to 2) [22:39:04] subbu: redirect, rewrite, whatever. can be done by the api itself, or the webserver, or a proxy [22:39:07] What to do if no format was specified [22:39:28] DanielK_urlaub, makes sense. [22:40:03] I think general consensus is to strongly encourage clients to specify a version in their request, so that they can benefit from stable / predictable responses [22:40:14] +1 [22:40:23] which implies serving the latest version [22:40:33] which i'm +1 for [22:40:43] i think option 3. from the RFC is really a strawman. [22:40:46] one option is to take this even further, to the point of *requiring* clients to specify a version, and reject requests without [22:40:53] i don't think it needs to be considered. [22:41:02] gwicke: one way to do that is to make a version-les request redirect to a versioned request. but that only works if the version is in the url [22:41:31] DanielK_urlaub: that's effectively the same as returning a specific version [22:41:43] gwicke: requiring them to specify it seems a bit like a high-entry bar [22:41:44] which brings us to 1) or 2) [22:41:56] except that the cleint actually gets told "no, look there!" in http speak. [22:42:35] option (1) means making the new versio nthe default without giving people a chance to test agains the new version, if i understand correctly [22:43:15] DanielK_urlaub: yeah, that's true -- in a URL-based scheme this would also slightly reduce cache fragmentation [22:43:50] yes, option (1) very strongly encourages people to set a version explicitly [22:44:02] if they don't, then the format can change at relatively short notice [22:44:07] not as strongly as option 3 [22:44:15] ...by breaking their stuff a couple of times... [22:44:28] yeah, lets call it "intermediate encouragement" [22:44:28] for the particular change under discussion, I guess option 1 is OK [22:44:39] since a lot of clients are not going to care about data-mw [22:44:49] right .. option (1) only makes sense when enforced via (3). [22:45:07] in theory, (2) the the best option. but is it worth the wait, given that people genenrally don't do anything before their stuff breaks, no matter how much you announce? [22:45:26] * DanielK_urlaub is one of the people who only notices this tsuff when something breaks [22:45:26] TimStarling, right .. so the qn. is whether this rfc is deciding a general policy for all future API changes or for this current upcoming one. [22:45:29] legoktm brought up the issue of no delay being long enough for everybody [22:45:39] (appreciating the focus of this and recent #wikimedia-office meetings, and the developing phabricator focus too) [22:45:40] in other cases I have argued for a fatal error because terrible things happen when you serve the new API to an old client [22:46:06] terrible as in subtly terrible, which is worse than obviously broken [22:46:52] i agree .. based on this discussion it seems it is either (2) or (1)+(3). [22:46:54] option 2) might encourage clients to not set the explicit version, and rely on announcements instead [22:48:02] 1-2 months is an extremely short deprecation period [22:48:19] I'm personally a bit more optimistic than subbu about clients having enough pressure to be explicit with 1) [22:48:48] gwicke: you could put an annoying warning in every response to an unversioned request. "use the versioned url! we are otherwise not responsible fo rbroken software and curly toenails!" [22:49:01] DanielK_urlaub: hehe, yes [22:49:08] :) [22:49:08] and.. delay the response by 10s [22:49:15] gwicke, i guess the question is what I asked TimStarling earlier .. is this about this particular format change only? or is this a policy discussion for all future api changes? [22:49:42] for mobile apps the developer does not fully control the update schedule [22:49:44] gwicke: limit batch size to 5 ;) [22:49:49] subbu: we would like to establish a consistent policy for this [22:49:58] i'd like this to be a general discussion, and not focused on data-mw so much [22:50:04] I don't know what the adoption curve looks like when a new version is pushed out [22:50:14] so that we can document expectations, and also be a bit more aggressive about declaring specific entry points stable [22:50:45] alternatively, we say that decision made here applies to future changes and if necessary, a new rfc gets filed when the policy needs to be amended based on experience .. or maybe that is stating the obvious. [22:50:46] but I know android offers lots of options to limit data usage due to updates, like disabling updates for specific apps or just not scheduling updates at all [22:51:22] TimStarling: long-term stability is only possible with an explicit version in the request [22:51:51] with that, we also have data about usage of specific versions, which we can use to make decisions about dropping support [22:52:03] that seems reasonable. [22:52:47] so overall I am hearing a concern about clients not being explicit unless forced [22:53:40] I guess one option would be to go with strong encouragement first, and then see what the uptake is like [22:54:15] ok [22:54:24] gwicke: but, as DanielK_urlaub pointed out, most people seem to feel concerned only once their stuff break [22:54:41] so in that regard opts (1) and (2) might not be that different after all [22:54:53] mobrovac, what? [22:55:09] yeah, my feeling is also that 2) doesn't add much over 1) [22:55:22] the main question seems to be 1) or 3) [22:55:49] i think i had a preference for option (2) but as gwicke said i am being conservative / not as optimistic as him about client adoption rate in general .. but for the immediate change at hand, it is a non-issue. [22:56:03] subbu: (14:45:26) ***DanielK_urlaub is one of the people who only notices this tsuff when something breaks [22:56:23] i didn't understand why (1) and (2) are not different at all. [22:56:38] such breakage can produce learnable moments™ [22:57:06] mobrovac: what else do you have in mind in terms of a general discussion? [22:57:09] subbu: because if you announce it and wait for 2 months, and only once you change it people start reacting then you've effectively wasted 2 months and could have bumped the version in the first place [22:57:12] there is a delta between when the default format changes ... ah, but i think you guys are saying no one pays attention till stuff breaks. [22:57:15] subbu: in terms of how strongly clients are encouraged to be explicit, 2) seems to be slightly weaker than 1) [22:57:39] subbu: that was the point, yea :) [22:57:44] we're just about out of time [22:57:45] while the concern with 2) is that clients not paying attention won't be helped much by a two-month delay [22:57:54] is there an interim solution that everyone is happy with? [22:58:06] Scott_WUaS: nah, i just didn't want the discussion to be confined to the data-mw change at hand [22:58:08] got it .. i can go with (1). [22:58:22] :) [22:58:23] sounds good [22:58:33] subbu: see, we made you optimistic suddenly :D [22:58:45] DanielK_urlaub's argument is a good one. [22:58:59] yeah [22:58:59] no, you clarified that my pessimism was warranted. :) [22:59:32] how about: we start with option 1), and monitor the percentage of clients who are explicit over the first few changes [22:59:56] when take-up is too low and people keep complaining, we revisit and consider switching to 3) [23:00:16] does this sound like a sensible approach? [23:00:26] option 1 sounds fine to me. though it would be nice to give people who *do* pay attention a chance to test. but i guess that would be the people who'd be using a verisoned request in the first place. [23:00:28] wfm [23:00:32] so yea. [23:00:34] yes [23:01:02] mobrovac: you on board as well? [23:01:13] yup [23:01:33] awesome, just in time ;) [23:01:42] but DanielK_urlaub's idea is actually good [23:01:42] documentation is key [23:01:56] that could be achieved by putting it up in labs a week earlier or so [23:01:56] make sure the documentation implies that Accept is required [23:02:03] TimStarling: yes, agreed [23:03:13] #info broad support for starting with option 1), and monitoring the percentage of clients who are explicit over the first few changes; if take-up is too low and people keep complaining, revisit and consider switching to 3) (enforce version in the request) [23:03:43] make sure you copy the IRC log to the phab event, since there is no meetbot [23:03:52] thank you, this was a very productive and enjoyable discussion! [23:04:17] TimStarling: yes, will copy a summary & link to a full log [23:04:22] #endmeeting