[21:02:57] #startmeeting ArchCom RFC Meeting: Require 'curl' PHP extension for MediaWiki? [21:02:57] Meeting started Wed Jun 29 21:02:57 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:02:57] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:02:57] The meeting name has been set to 'archcom_rfc_meeting__require__curl__php_extension_for_mediawiki_' [21:03:10] yummy curl! [21:03:12] #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:03:36] Heya. [21:03:37] #link https://phabricator.wikimedia.org/E224 [21:03:42] James_F: hi! [21:04:00] #link https://phabricator.wikimedia.org/T137926 [21:04:02] ok, so we have a rough proposal to require the PHP curl extension, at least for HTTP-using stuff [21:04:05] couple of possible options [21:04:45] no change (meaning some of out internal interfaces like MultiHttpClient which is curl-only will be unavailable when not using curl, but things using MWHttpClient will still work) [21:05:07] #info list of 3 options listed at https://phabricator.wikimedia.org/T137926#2412996 [21:05:25] or we could require it for MwHttpRequest / Http::get() etc as well, removing the old non-curl paths which are harder to maintain [21:05:45] another possibility is to provide a non-curl implementation of MultiHttpClient as well, even if it's a dead-simple one [21:06:13] in the case where we start requiring curl for more bits, we'd probably want better error reporting / failure cases [21:06:29] eg, either "don't break the wiki when using InstantCommons but curl is not present" or "break the wiki in a very clear, easy to fix way" [21:07:06] we don't have a *real* clear picture of how commonly available the php curl extension is, but anectdotally it's on a number of major shared posting providers [21:07:28] and if you run your own server it's trivial to make sure it's installed (which also means we can easily make sure our debian packages etc depend on it) [21:07:29] brion: do we have features in MultiClient that are impossible to do without curl? [21:07:34] or very hard [21:08:02] SMalyshev: IIRC it's a fairly simple API, you pass a number of URLs in and it runs them simultaneously, then returns an array mapping of responses [21:08:16] which means you can make a crappy back-compat very easily by running serial instead of parallel [21:08:19] brion: so what happens if it runs them sequentially instead? [21:08:22] there is maxConnsPerHost [21:08:24] right [21:08:25] or a fancy back-compat using stream_select, but only on newer PHP [21:08:39] SMalyshev: slower, but satisfies the API contract [21:08:39] brion: Newer than 5.5 which we already require? [21:08:40] Hello All :) [21:08:45] so a b/c version that runs them serially would be equivalent to configuring maxConnsPerHost=1 [21:08:46] James_F: right iirc [21:08:56] *nod* [21:09:07] not even stream_select, beggars can't be choosers, just bare bones functionality - I think we can do it rather easily? if so, I'd lean towards doing it [21:09:16] James_F: secondarily, creating a 'proper' back-compat path just adds to our complexity and maintenance/security surface. [21:09:22] Yup. [21:09:37] Yeah, i suspect the bare bones version would be like 5 lines [21:09:52] Well. We already have some of this complexity. The RfC was about admitting that it doesn't work well and isn't tested or used in production, so we were going to remove it. [21:10:09] are there bug reports about it not working well? [21:10:10] If we're keeping it and making it "cool", fine, but… someone has to keep looking after it. [21:10:23] TimStarling: iirc we saw problems with SSL certs and InstantCommons [21:10:23] right... so if we can make super-simple super-dumb implementation that still works, I think we should. If it's complex or missing features - then we should fail early and let people install curl [21:10:32] TimStarling: More "not well" as in "90% of uses don't have the back-compat". [21:10:34] work was done to make sure openssl got initialized properly on the back-compat path [21:10:50] but that's only one thing, might be more work next time [21:10:53] But since that's been fixed, it works well now [21:10:55] I can't speak to the security side of this stuff, but bawolff I guess is OK with it? [21:11:08] * bawolff has not looked at the code recently [21:11:16] But I fixed a bunch of stupid things a while back [21:11:16] with curl, I had issues with servers using SSL and certificate auths that are new or exotic... figuring out how to add cert auth is non-trivial IIRC [21:11:30] but that is probably true for any other way of doing it too [21:11:52] in general i'm leery of maintaining an HTTP client in MWHttpRequest's PhpHttpRequest subclass forever, libcurl does that job for us [21:12:02] there is T102566 [21:12:02] T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566 [21:12:11] eventually we'll have to update it for http/2 etc, or else throw it out then :) [21:12:25] But T102566 was broken both with curl and non-curl [21:12:25] T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566 [21:12:41] bawolff: In different ways, so it needed two fixes, or the same issue? [21:12:42] can't http/2 fallback to regular http? [21:13:00] SMalyshev: Some of our client UAs already report HTTP2-only accept headers. [21:13:06] The main issue was it was configured to not follow redirects [21:13:19] so redirects didn't work on both implementations (as it should) [21:13:22] #info question discussed for the first 10-15 minutes: is having a fallback worthwhile at all? [21:13:24] SMalyshev: we don't know for sure what the future will bring; at the least i expect someone will evnetually want persistent connections for something [21:13:38] James_F: that's... aggressive. But yeah, I don't want to maintain php http/2 library :) [21:13:41] bawolff: oh fun :D [21:14:17] SMalyshev: Agreed. [21:14:21] There was some additional problems with https in the fallback config. I think in some configs it thought all certs were invalid or something. It was a while ago. Those issues are fixed [21:14:42] so another possible outcome of this discussion is "let's actually survey availability of php-curl before changing" [21:14:52] brion: right... maybe have some way of asking http client if it supports certain features? OTOH, if we do have such features I'd just take the plunge and drop the fallback [21:14:55] which i didn't think to capture on the task comments earlier [21:15:37] but surveying is hard when we don't know we can reliably reach affected users who don't upgrade frequently, are on obscure hosts, etc [21:16:28] If they don't upgrade they won't notice the change… [21:16:43] :D [21:16:54] This is a common issue in support for third parties; very few voices, very little data. [21:17:03] danger is if they do, they'll tableflip in anger at how nothing works anymore [21:17:11] but we probably broke their site in 10 other ways if that's the case :( [21:17:19] so that isn't the worst additional breakage :D [21:17:27] They'll also do that when the tarball-bundled editor requires curl, which is ~ a year away. [21:17:29] So… [21:17:30] but if we just tell them "install curl" isn't not that bad? [21:17:42] curl is pretty widely supported I think [21:17:49] it is pretty widely supported yes [21:18:04] but not everyone has root on their system, and not everybody is a professional or hobbyist sysadmin [21:18:12] what's trivial to us is not trivial to everyone [21:18:25] right, but most shared hosts I'd expect to have curl [21:18:32] yep [21:18:44] same for university etc environment - it's part of almost every prepackaged setup [21:18:48] so it's a balance between large pain to small group, small pains to large group, etc [21:18:53] unless it's insanely stripped down [21:19:10] i suspect that 'tearing the band-aid off' is going to be net win [21:19:25] since down the road we'll want to just have curl handy [21:19:36] i just wish i had more data! [21:19:59] we could keep the fallback but still require curl e.g. in composer. This way if you don't have curl you could remove the require and hope for the best [21:20:16] at least for now until we add http/2 and http/3 support :) [21:20:20] hehe [21:20:28] I'm not sure I see what the down-the-road usecases are. What are we actually planning to add that requires this? [21:21:17] bawolff: Parsoid HTML reading. VE as the tarball editor. Non-wikitext wikis. [21:21:22] bawolff: 'requires' is a tricky thing. technically, nothing. but it's nice to not maintain an http client, and it's nice not to have to create a second implementation of the multiplexing parallel client wrapper. [21:21:56] bawolff: Also, 'requires' MultiHttpClient rather than curl specifically. [21:22:09] it looks to me like it's not that hard to support non-curl installations, it's just that nobody could be bothered to do that small amount of work [21:22:40] #info James_F clarifies this RFC is about requiring MultiHttpClient, rather than curl specifically [21:22:45] T102566 and [[InstantCommons#HTTPS]] say "just try installing curl and see if it works" [21:22:46] T102566: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566 [21:22:49] I think it's not hard to support now, but we don't know long-term - maybe it stays not hard, and maybe it becomes hard (e.g. if everybody moves to http/2) [21:23:01] the bug with non-curl is not isolated, nobody could be bothered [21:23:17] oh yes, it's trivial to add a non-curl api-compatible single-threaded implementation [21:23:18] robla: Well, no. The impending requirements are for MultiHttpClient which we're standardising on. The curl requirement for that requirement is what we're discussing, I think. [21:23:24] and moderately more work to make a proper implementation. [21:23:30] so the question is, do we do that *and* continue to maintain them? [21:23:31] and like bawolff said previously, you could do MultiHttpClient with non-curl with 5 lines of code [21:23:32] TimStarling: That bug referenced on InstantCommons is fixed afaik, the help page was just never updated [21:23:43] #info robla: Well, no. The impending requirements are for MultiHttpClient which we're standardising on. The curl requirement for that requirement is what we're discussing, I think. [21:23:52] TimStarling: I think that's what we should be doing first - i.e. if somebody has http problems, we should tell them to install curl. If they can't, then it becomes interesting, but we should start with it by default [21:25:01] SMalyshev: that doesn't sound right to me, have a solution which is not tested or maintained, just a trap for new users [21:25:11] TimStarling: the non-curl bug is about installations which don't have the right certs, there is no bug in the MediaWiki stream wrapper implementation per se [21:25:31] I'd note, if you look at the git log, overall there's been very little maintenance needed for the php stream fallback implementation all and all [21:25:34] presumably PHP doesn't bundle its own authority list [21:25:37] TimStarling: well, it is tested now I think. But eventually if the main MultiHttpClient changes it may become stale [21:25:42] we've talked about "strongly recommending" curl; do we have a clear idea of what "strong recommendation" means? [21:25:54] so it would have just been a matter of configuration [21:25:54] so if we don't object to continuing to maintain the non-curl MWHttpRequest path, it's fairly trivial to make a single-threaded wrapper for api compat and not lock the non-curl folks out of future features [21:26:08] or maybe installing the right cert list package [21:26:19] TimStarling: the question is - what's better, having some code that may work or have no code at all for non-curl? I'm honestly not sure here [21:26:24] Currently we tell php to use the system cert directory [21:26:28] and indeed it may not be a huge amount of work. maybe we never will care about http/2 client on it, or by the time we do it'll be definitely no big deal to go curl-only [21:26:33] which will work for non-windows hosts [21:26:38] mostly anyways [21:26:45] robla: if the argument for "strong recommendation" is less tested code then I guess it would be similar to mysql vs any other db backend [21:26:52] TimStarling: no, php relies on OS AFAIR. Distros may do it though [21:27:14] I think its rather unlikely we'll need to support http/2 any time soon [21:27:18] bd808: similar, but with a very different bug surface i think [21:27:29] I have a very hard time imaging any sites will be http/2 only any time soon [21:27:33] DBs are much more fragile with weird APIs that are tweaked a bajillion ways [21:27:40] in an ideal world, we would have automated testing that runs with both curl and non-curl [21:27:43] HTTP 1.1 is complex but much less so [21:27:49] that is the usual solution to developer laziness [21:27:52] and our usage of it is mostly very simple [21:27:59] :D [21:29:07] any strong feelings on the maintenance question? [21:29:18] TimStarling: Example of expected heavy MultiHttpClient use in core is https://phabricator.wikimedia.org/T111588 (finally found the damn link) [21:29:25] if we're not in fact that concerned about it then we can concentrate on the narrow question of MultiHttpClient [21:29:30] I'm kinda indifferent on the whole thing now. [21:30:12] James_F: that's a good one! heavy api-fetch usage is very possible there and parallel's a huge win [21:30:20] #info James_F finds "Example of expected heavy MultiHttpClient use in core": T111588 [21:30:20] T111588: [RFC] API-driven web front-end - https://phabricator.wikimedia.org/T111588 [21:30:48] James_F: That seems like a rather far-in-the-future proposal [21:31:05] this is a slightly tangential troll-y question .. but what is our threshold for how far we are willing to go to support shared host wikis? i.e. is there a development / maintenance threshold at which point we say features X, Y, Z may not be available on those platforms? [21:31:30] What subbu said ^ [21:31:37] subbu: excellent question which has never been answered adequetly :D [21:31:41] It keeps coming up, though. [21:31:43] * subbu hasn't read the scrollback yet .. so jumping in the middle. [21:32:01] And Parsing and Services both have to consider it a lot in some of their work. [21:32:19] at this point though it sounds like we're not so concerned with dropping our non-curl path that it's going to be an issue on the narrow question [21:32:55] eg, it sounds like nobody wants to kill non-curl path and we'll be deciding between 'make a single-threaded MultiHttpClient fallback path' and 'make a fancier non-curl parallel MultiHttpClient fallback path for versions of PHP that support it suitably' [21:33:19] so, it does go into 'are we willing to do that extra work?' [21:33:33] subbu: I'm fine with answering that on a case-by-case basis [21:33:37] it mighta ctually be we don't require newer php, now that i look at it. stream_select on the raw tcp streams should already be available [21:33:49] TimStarling, that is reasonable. [21:33:56] I've found in the past that a lot of imagined conflicts between WMF and shared hosting don't really eventuate in practice [21:34:07] perhaps we can officially deprecate non-curl intentionally breaking it now? [21:34:31] er..."without intentionally breaking it now" [21:35:12] brion: the question is do we really want to deal with http over stream_select? [21:35:21] sure, there are stream functions, and there are HTTP libraries that work on top of them that we could import [21:35:23] we may spend a lot of time for very small reward [21:35:46] unless we have a ready-made library of comparable power [21:35:53] exactly, it's more work to do the proper implementation (or to find and import another library) [21:36:01] but it's very easy to do the simple back-compat [21:36:17] I remember there was one in PEAR, years ago, no doubt it still exists [21:36:26] and as robla suggests we _could_ deprecate it without breaking it, as an implicit warning that folks should use curl when available [21:36:32] PEAR would mean it's probably old... [21:36:57] yeah, i can't recall if we thought about migrating to it but hated the code, or migrated from it to our own ;) [21:37:31] i think there's no need to _actively_ break non-curl unless UI becomes dependent on parallel fetches for reasonable performance [21:38:06] and it's easy even then to say 'well it might be slowing on your config, but enable curl if you can' [21:38:23] if we deprecate, perhaps we could also come up with a way for a novice user to know when they are running in a deprecated environment [21:38:33] that's a good point [21:38:37] there's the environment check page in the installer [21:38:42] *nod* [21:38:52] the installer's one place, but i'd love to have something visible on maybe special:version [21:39:14] someplace you can check on your live installation, that you'd already be directed to check for other environmental things in a help-debug session [21:39:31] special:version sounds good [21:39:35] Special:Version has the added perk that users of the site can know, and use the info to prod the admin [21:39:41] yeah :D [21:39:57] ok i think we've got something coming together [21:40:43] so recommend: keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version ? [21:41:04] WFM. [21:41:05] any objections or alternate proposals? [21:41:10] woohoo [21:41:11] #info so recommend: keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version ? [21:41:41] this leaves open the possibility of adding a parallel non-curl path but does not commit us to doing so [21:42:10] and also leaves open possibility of fully deprecating non-curl but does not force it [21:42:17] Can we put the env warnings on both Special:Version and installer [21:42:20] (eg later if we find we really need to) [21:42:22] on yeah def [21:42:38] #info Can we put the env warnings on both Special:Version and installer [21:43:01] Yeah, that makes sense ^ [21:43:09] maybe it is time to revisit the idea of a pingback from the installer for statistical purposes [21:43:12] We don't actually do environmental warnings on Special:Version yet, but I think we should [21:43:22] TimStarling: +1, brion and I talked about that yesterday [21:43:45] every time this sort of thing comes up, we have no idea of what our users' installations look like [21:43:46] #info maybe it is time to revisit the idea of a pingback from the installer for statistical purposes [21:44:00] Time for new-new-installer :D [21:44:05] lol [21:44:25] yes def [21:44:26] does that count as volunteering? [21:44:26] There's already an environmental warning about curl in the installer, BTW. [21:44:34] a pingback that probably will use MWHttpRequest... ;) [21:44:39] isn't that what wikiapiary is doing? [21:44:40] brion: Tsk. :-) [21:44:50] I proposed it once on wikitech-l but I was shouted down by the libertarians [21:44:52] And might know have curl to tell us :p [21:44:55] as if they're not all using facebook by now [21:45:02] lol [21:45:03] lmao [21:45:17] TimStarling: For some things, just implement and ignore the wails. :-) [21:45:23] TimStarling: I think last time we seriously talked about it like 2-3 years ago the consensus had been "as long as it's opt-in" [21:45:23] privacy issues are... touchy. Maybe do the opt-in? [21:45:34] SMalyshev: do they have an auto-report widget you can add to your site or do they just crawl & make stats from that? [21:45:36] TimStarling: For people with real security needs, they'd be installing it inside an air-gapped machine anyway. [21:45:38] (wikiapiary) [21:45:48] With, of course, a glaring option to turn it on during install time. [21:45:58] brion: I have no idea, I haven't looked into how they work. [21:45:59] (we want to encourage it to be on :)) [21:46:05] is "keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version " what we should put in last call? [21:46:10] I think a prominent checkbox enabled by default would be good enough [21:46:18] robla: I think we're good to go on that yeah [21:46:20] robla: Yes. [21:46:27] put it in last call and let any final objections come in on list [21:46:43] #info "keep existing non-curl path; add non-parallel MultiHttpClient fallback, and provide environmental warnings about missing curl on Special:Version " goes to last call [21:46:47] feels good, we made a decision \o/ [21:46:51] I think if we just have checkbox or button saying "send anonymous statistics about your install to MediaWiki devs" it's fine [21:47:35] TimStarling: did you want to revive the rfc for the installer beacon? or shall i put it on my todo list? [21:47:37] Would it be useful to do something like that from update.php? [21:47:54] James_F: Thanks for filing the RfC. Wasn't the outcome we thought of when we discussed it weeks back, but I think the outcome is good :) [21:48:03] or can we foist it on someone else :D [21:48:29] i don't want it to get lost though if we get caught up in our other big things [21:48:33] Reedy: I think we should trigger a pingback on any install/upgrade, yeah [21:48:53] Ideally more often than that, but we don't really support a "has config changed" hook (yet) [21:48:55] I guess we set a global based on install time preference.... [21:49:22] SMalyshev: let's continue this discussion on facebook [21:49:26] But question of what we do about "current" installs... Ask them nicely to add the global to their localsettings? [21:49:28] we also had T136866 penciled in here if we had time, but I think we can wrap up on brion's question [21:49:28] T136866: Improve the per-programming-language listings for our tools - https://phabricator.wikimedia.org/T136866 [21:49:57] Reedy: We could prompt for the new setting in update.php [21:50:10] Ideally, we should actually do that anyway (somehow) to call out important config changes. [21:50:16] "This setting is new and you might wanna change it" [21:50:20] mmm [21:50:24] Or "This setting doesn't do what you think anymore..." [21:50:31] heh [21:50:41] I wonder if we could reuse the environment type notifications on Special:Version too [21:50:42] #info brion wants to make sure we have a plan for an installer beacon [21:50:50] "reuse" in a loose sense [21:51:08] Reedy: We could abstract them a bit further to make them portable I'm sure. [21:51:20] Reedy: yeah don't dupe the code & messages if they're common checks ideally [21:51:22] They're already pretty well isolated, just in a "weird" place for Special:Version to want them [21:51:29] ostriches: Sure! [21:52:00] ok if nobody grabs the beacon rfc today i'll just add it to our agenda to discuss at the archcom prep meeting next week :D [21:52:04] well, if it's installer messages... they'll be duped :) [21:52:25] thanks brion! that sounds like a good plan [21:52:35] excellent :D [21:53:12] ok that pretty much locks us up for this week i think [21:53:15] thanks everybody! [21:53:40] thanks all! [21:53:46] ending in a few seconds [21:54:02] #action brion, robla discuss beacon at E225 [21:54:15] #endmeeting [21:54:16] Meeting ended Wed Jun 29 21:54:15 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [21:54:16] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-29-21.02.html [21:54:16] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-29-21.02.txt [21:54:16] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-29-21.02.wiki [21:54:17] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-29-21.02.log.html [22:05:15] brion: Of course, this means that all my work on the commit message of https://gerrit.wikimedia.org/r/#/c/294259/ to make it all align is for nought. :-) [22:05:36] Ah well. [22:05:58] :D [22:06:48] * brion refrains from making poor-taste referendum jokes [22:06:58] Indeed. *glares* [22:07:22] don't worry, my country's next :D [22:07:39] one way or another you'll be able to laugh at me in november