[21:02:55] #startmeeting [21:02:56] TimStarling: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' [21:03:13] #topic API roadmap | https://meta.wikimedia.org/wiki/IRC_office_hours | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE). | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:03:21] #link https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap [21:04:32] do we have anomie and yurikR? [21:04:38] TimStarling: I'm here [21:04:49] yep [21:06:57] can you tell us what has been done on this API work since the architecture summit? [21:07:56] I've started working on the stuff in the document. I added Gerrit links to each item as patches got submitted, and moved a few things to a "completed" section. [21:08:44] Since we're taking things slow as far as deprecation, some have their patches merged but need an analysis of whether people have actually changed their code. [21:10:11] you mean like token handling? [21:10:35] Yes [21:12:54] it looks like you need code review on some changes [21:12:59] Yes, I do [21:15:02] is there anything else you need? [21:16:39] Not really. [21:16:46] I'm still not too fond of the decision to go with format=json2 for https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap#Changes_to_JSON_output_format, but I still agree with the points you made at Wikimania that clean breaking is better than random mystery breaking. [21:16:47] I have a set of API feature requests from Tomasz that he sent in June [21:17:09] I would like to see those [21:17:17] I'll forward [21:17:51] the main one that is relevant is a request for "chain queries" [21:18:13] "The fever queries we have to send the better it gets for our users batteries." [21:19:01] so I suppose we are talking about doing multiple actions in a single POST request [21:19:38] We already have generators for a common instance in action=query. Details on what other "chains" he's thinking of would be useful. [21:19:59] yeah, he didn't give details, but I assume he knows about generators already [21:22:24] don't forget that SPDY / HTTP2 is around the corner [21:23:34] well, if we are just talking about doing several unconnected API queries in a row, that could be done with pipelining, if the client supported that [21:23:42] which eliminates some of the issues that generators are designed to address [21:24:00] gwicke: No it doesn't. [21:24:00] but what if you are taking some data from one query and using it in the next query? [21:24:18] then it could be arbitrarily complicated [21:24:22] TimStarling: right, that is the bit that isn't addressed [21:25:07] security is another relevant aspect to consider [21:25:15] DOS in particular [21:26:25] you mean DOS by means of an expensive query batch? [21:26:26] we shouldn't provide entry points that allow somebody to take down the API cluster by visiting some static web page with their cell phone [21:27:03] there is a security bug with an example page [21:28:18] #62615 [21:28:49] i would actually prefer to keep queries separate too [21:29:08] if you want to chain requests, lets rely on http-level protocol [21:29:36] but splitting it up implies duplicated overhead [21:29:49] if some data is needed for consequent request, we either create specific api that understands that (e.g. - generators for query and other) [21:29:57] or rely on gzip compression to take care of it [21:30:30] well, the overhead will be negligent if they reuse the same connection, plus caching might make it much more efficient [21:30:47] with combining done on the api level, caching is totally busted [21:31:00] I mean in varnish, apache and HHVM [21:31:03] I don't think anybody is proposing to get rid of generators or chaining in general altogether -- it's just that we should be careful about what we use them for, and keep in mind how HTTP/2 affects the trade-offs [21:31:15] there is per-request overhead at each level [21:31:22] especially in HHVM/MW [21:31:42] re: Tomasz's requests, I think the problem there is action=mobileformat, which apps use (this was the reason for asking about pipelining, IIRC) [21:31:45] also in MySQL [21:31:48] and that doesn't support generators or anything [21:31:58] so over time slowly things have been tacked on to it [21:32:16] * anomie sees no action=mobileformat on enwiki [21:32:19] there's a big difference in MySQL CPU usage between doing a single query that gets information about 100 pages, and doing 100 queries, one for each page [21:32:28] anomie: gah, action=mobileview [21:32:48] TimStarling: the same is not necessarily true if each of those pages is stored on a different node [21:33:05] JetLaggedPanda, there was a big change a while ago that allowed any module to use generators [21:33:15] JetLaggedPanda: I'd have to look at what exactly action=mobileview is doing, but offhand it sounds like it needs any unique bits rolled into core. Much like a lot of MobileFrontend. [21:33:22] so now the mobileview simply needs to be updated to use generators [21:33:25] anomie: i agree, yeah [21:33:30] we're not going to split storage across hundreds of nodes [21:33:54] yurikR: yeah, that would be good too, although perhaps it needs general query prop= as well [21:34:01] perhaps not hundreds, but we already use dozens [21:34:25] yurikR: *also*, perhaps this could be solved by simply making mobileview html a prop= for action=query, but I guess that'll have caching implications [21:34:28] I don't think so [21:34:46] JetLaggedPanda, yes, i think it should have been done that way :) [21:35:14] TimStarling: I agree with your general point, it's just that it might not be an eternal truth to the same degree it's right now [21:37:58] #info implementation by anomie is proceeding, some changes just need code review and merge [21:37:59] there are for example wins in making more API requests static by storing or caching them; combined with different cost structures in HTTP/2 some applications might actually perform better when they do a few parallel requests vs. hitting a custom, uncached entry point [21:38:41] #info Tomasz requested a "chain query" feature, but we need specific requirements [21:39:06] I see it more as a gradual shift [21:39:49] you can't cache API responses [21:40:33] it's not technically impossible [21:40:40] maybe you could if it were REST, but it is too difficult to invalidate the multiple URL variants enabled by the action API [21:40:44] Some API response can be cached, mostly action=query. We already emit cache-control headers indicating what MediaWiki thinks about cacheability. [21:41:04] True, people might have stale caches then. [21:41:13] the client requests cache-control headers [21:41:38] the client is explicitly requesting a stale cache since there is no way to update those caches once they are generated [21:42:25] * gwicke nods [21:42:45] maybe we could normalize requests in varnish... [21:42:46] there's a lot of stuff on that rfc page. perhaps it would be good to split it to ease discussion. [21:42:48] The major opportunity for caching is revision content, for which gwicke is already working on a REST API specifically intended for heavy caching. [21:43:12] the way things a structured now, i'm afraid some high profile discussions may drown out talk about some finer points [21:43:28] I think it might be worth looking for other resources that could potentially be cacheable with the right URL structure [21:43:50] and have the right granularity / access pattern for this to make sense [21:43:59] even with normalization, you still have things like rvprop [21:44:28] with REST, you just send all the data, but with api.php, each application will request a different rvprop [21:45:29] so even in that simple case, you multiply the cache space requirement by several [21:46:20] yeah, it only makes sense if the number of variants is more limited [21:46:38] which is something we could try to move towards for newer modules [21:46:55] where the trade-offs make sense [21:47:02] for purging, imagine if you had to send an HTCP purge request for each rvprop combination [21:48:00] DanielK_WMDE: There's basically no discussion happening there at the moment, so I doubt anything is being drowned out. Although at some point (not now) I'd still like to hear your thoughts on what makes things like ApiResult::setIndexedTagName hard for you to use (without getting into redesigning the whole thing around a forest of objects, that was discussed enough at Wikimania IMO). [21:48:06] returning more props by default would probably not make a big difference in request size, and could still result in a faster response if the response is cached in exchange [21:49:39] there are some entry points where the choices culd perhaps be reduced a bit without major ill effects [21:50:22] #info gwicke suggests we consider a gradual shift towards greater edge caching coupled with the use of SPDY, as a replacement for batches embedded in single queries (incl. generators) [21:50:53] that's overstating it quite a bit [21:51:24] the meetbot command is unprivileged, you can do your own #info if you like [21:52:59] #info s/as a replacement for batches/as a replacement for *some* batches and expensive generators/ [21:54:06] should we mark this RFC as approved? [21:54:14] I also think that we could do some of the assembly and orchestration in an intermediate layer [21:54:51] netflix of example has been doing something like that: http://techblog.netflix.com/2012/07/embracing-differences-inside-netflix.html [21:55:11] * Krenair thinks we should [21:55:16] I think the reason RFCs don't get approved is that we worry that by marking an RFC approved, we are approving every little aspect [21:55:59] It's fine with me to mark it as approved; I've been treating it that way for a while now. [21:56:22] The only drawback might be that it might discourage further discussion and further things for my "TODO" list. [21:57:15] yeah, maybe it makes sense for something this complex to be a living document [21:57:44] We could move the living document portion of it out of the RFC, although I'm not sure what would be left in the RFC then. [21:57:45] ...or factor out some parts that can be considered agreed on and treated as a "plan". [21:57:50] gwicke and I just spoke about caching a bit, and it seems ideally we should somehow cache certain requests, and devise a well established way to flush them when they become obsolete [21:57:59] it suggests a status flow "in draft" -> "archived complete" for big RFCs [21:58:25] this caching won't apply to every api request, but we really ought to move in that direction [21:58:33] what does "archived complete" mean? [21:58:43] "we are done talking"? [21:59:00] it means it will be listed at https://www.mediawiki.org/wiki/Requests_for_comment/Archive#Implemented [21:59:16] yes, which means we are done talking [22:00:01] we presumably won't discuss archived RFCs in public IRC meetings or architecture committee meetings [22:00:43] for the API roadmap, the work could theoretically be eternal [22:00:54] yea, makes sense [22:01:00] but I prefer to see RFCs as change requests that can be approved and completed [22:01:14] in such a case, the goal of the rfc is not to implement a feaqture, but to agree on a general plan [22:01:45] maybe the RFC should be called "API roadmap 1" [22:01:51] DanielK_WMDE: I think that's a good summary of RFCs in general. [22:01:54] which can be marked approved [22:02:20] then while that is being implemented, an "API roadmap 2" RFC can be the parking lot for design of the next batch of features [22:03:09] then we can schedule a meeting to discuss "API roadmap 2" and we will know that that means we are looking forward not back [22:03:22] heh [22:03:41] you know it is nice when people don't have to read so much [22:04:14] Daniel complained about the RFC being big already, but it has a lot of complete stuff mixed with plans for the near future, plus a few plans for the somewhat more distant future [22:04:34] anomie: that's my understanding too, but the final status is currently called "implemented". That'S a lot more than "agreed on a plan". [22:05:09] we have "accepted" also [22:05:18] So, to summarize: RFC is approved, the "living document" aspect should be abstracted out into a project page of some sort (I'll do that), and when we have enough of a backlog of non-trivial changes we'll make a new RFC (I'll probably do that too when the time comes). [22:05:53] yeah, makes sense I think [22:07:23] #action anomie to abstract the "living document" aspect of the RFC out to a project page [22:07:36] ok, anything else before I end the meeting? [22:07:54] Not from me, I was about to leave the meeting anyway [22:08:27] #endmeeting [22:08:51] oh, there was no meeting started, oops [22:09:16] #action TimStarling to start meeting properly in future so that we have logs