[21:03:41] #startmeeting RFC meeting [21:03:41] Meeting started Wed Sep 27 21:03:41 2017 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:03:41] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:03:41] The meeting name has been set to 'rfc_meeting' [21:04:14] T176370 [21:04:14] T176370: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 [21:04:53] that is the RFC we are supposedly discussing, although there has not been any discussion about it in phabricator [21:05:00] <_joe_> Do we want to talk about how we're going to migrate in technical terms? [21:05:36] DanielK_WMDE_ said in the TC radar post to wikitech-l that we would talk about dropping support for HHVM from MW [21:05:44] <_joe_> or about how do we want to manage the transition from the prespective of the end users? [21:05:55] I'm personally interested in how we are going to migrate [21:06:19] <_joe_> well, given that if we want to drop support the migration is a blocker :) [21:06:56] 1.5 nerds are running MW on HHVM besides WMF [21:07:10] nobody will need it after we migrate off [21:07:30] miraheze.org is one. i just told them about this [21:07:56] it's a farm [21:08:29] <_joe_> so, I gave some thought about this, and I think we can basically do it in the following steps: 1) migrate all production to debian stretch, that comes with php 7.0. This means we'll be able to run php-fpm on the same appservers where we run hhvm [21:09:14] you wouldn't want to go back to apache mod_php? [21:09:29] <_joe_> then,2) we can divert traffic to one or the other based on any parameter we like by just having the apache config switch ports based on some details [21:09:42] <_joe_> TimStarling: mod_php is still available with php7? [21:09:54] yes, mod_php works fine with php7 [21:09:54] <_joe_> I thought it was mostly deprecated [21:10:16] installing php7 on stretch by default gives you mod_php [21:10:22] https://packages.debian.org/stretch/libapache2-mod-php7.0 [21:10:27] <_joe_> and in general, mod_php performance (in terms of throughput) are worse than with fcgi. [21:10:32] (oddly in ubuntu the default is php-fpm) [21:10:58] <_joe_> but the biggest advantage of keeping fcgi at the moment is reducing the amount of things we change at once [21:11:08] the PHP manual on this looks the same as ever [21:12:00] with fcgi you can theoretically have wall clock execution time limits [21:12:17] that was my biggest problem with mod_php, the fact that requests could run for hours [21:12:42] * paravoid waves [21:13:01] <_joe_> hi :) [21:13:25] back then, fpm was not properly packaged, I don't think it had init scripts [21:13:40] <_joe_> TimStarling: so with php 5.x php-fpm or really any fcgi server was a neat advantage performance wise (both in thorughput and latency) [21:13:55] <_joe_> I can say that from direct experience and measurement [21:14:09] <_joe_> not sure at all about PHP 7, though [21:14:14] I'm personally interested in a) *if* we should do it b) what are the downsides (e.g. performance), c) how much effort is it and d) at what priority, or equivalently, at what cost (other priorities) [21:14:23] fcgi was faster than mod_php? I don't know why that would be [21:14:32] the fpm vs. mod_php debate is valuable but IMHO feels a little premature [21:14:50] <_joe_> yeah we got a bit off track [21:15:32] paravoid: you saw my parse benchmark? [21:15:39] paravoid: for a) you mean whether we should even migrate away from HHVM? I thought that was discussed and mostly resolved on wikitech-l ? [21:16:18] TimStarling: I did, but it seemed to me like it was just a single benchmark, and arguably a microbenchmark [21:16:36] you're saying I should do more benchmarks? [21:16:42] <_joe_> so, my opinions in order are: a) yes, there is ample consensus that keeping support for hhvm will be hard, and keep getting harder [21:17:04] I think that we don't have the full picture yet, yes [21:17:11] <_joe_> b) We can run serious performance testing on our infrastructure at little cost once we're on stretch [21:17:12] parser performance is the lithmus test of MW performance [21:17:16] I'm not saying that you should do more, but someone should, yes :) [21:17:18] the best way to do a benchmark would be to have a production server with PHP 7 [21:17:31] which PHP 7 though? [21:17:44] <_joe_> TimStarling: that's where my plan was coming from [21:17:57] 7.0? 7.1? 7.2? [21:18:02] 7.0 [21:18:09] since that's what stretch has [21:18:12] <_joe_> TimStarling: if we migrate to stretch, it doesn't require a huge effort to test php 7.0 [21:18:36] I'm not sure if I agree with that [21:19:07] <_joe_> I said not huge for a performance test, not for a full migration [21:19:11] an ops engineer not trying to minimise work required? [21:19:16] * TimStarling cleans glasses [21:20:21] well, we can do a 7.1 benchmark the same way, as long as MW will run on it without errors [21:20:30] paravoid: do you envision us re-packafing php and maintaining it? [21:20:43] so ok, I think the effort to test php 7 in prod vs. the effort to migrate php 7 in prod is comparable [21:20:45] MW should already support 7.1 without errors/warnings [21:20:59] aren't there 7.1 backports around? [21:21:14] <_joe_> paravoid: that's not what my experience with the HHVM migration was. [21:21:14] 7.1 packages are in unstable and via https://deb.sury.org/ [21:21:45] second, to be clear, it /feels/ to me like a PHP migration is right, but at the same time I would like us to avoid doing a full fledged migration only to realise that page load time changed in ways we didn't anticipate [21:22:34] we will have to migrate anyway, if somehow it is much slower, you'll have to order more servers [21:22:58] <_joe_> TimStarling: that works for throughput, not for latency for the single use [21:23:01] <_joe_> *user [21:23:20] that, and also, ordering servers out of budget may be fine, as long as it's expected [21:23:20] we can test throughput if we have a fully production-ready PHP 7 instance [21:23:27] <_joe_> paravoid: how do you propose we evaluate the potential performance impact? [21:24:16] <_joe_> (I'm leaving the urgency/planning completely out of the discussion for now, but that's something that needs to be discussed too) [21:24:28] nod [21:24:30] we can just send production traffic to it and measure CPU usage [21:24:38] shall we alote time for each part of this discussion or something? [21:25:02] allot? [21:25:03] that :) [21:25:05] <_joe_> let's reserve the last 15 minutes to discussing timelines? [21:25:55] +1 for having a canary server (mwdebugXX) with stretch and PHP 7.0 for a few comparisons via X-Wikimedia-Debug, as well as XHGui profiling comparisons. [21:26:16] <_joe_> TimStarling: yeah the problem is that requires quite some work, do we all agree that work is needed? And that a final decision should keep those results into account? [21:26:28] Which might also help inform preferences between fpm/fcgi/mod_php. Although I see there are other reasons for these as well (e.g. time limits) which may be more important. [21:26:44] <_joe_> Krinkle: mod_php would require about 5x the work [21:26:49] Although it sounds like better limits and better perf come with the same method (methods that are not mod_php?) [21:27:04] <_joe_> so for an evaluation I think we should stick to fcgi [21:27:33] _joe_: I'm not saying the canary server shoudl have mod_Php, whatever our first pick is, it will inform the comarpison to hhvm, and if we feel the need the first pick (e.g. fcgi) has a problem, we can possibly evaluate a comparison to another method. [21:27:47] But it sounds like fcgi will be better and easiest, so kudos on that :) [21:27:48] <_joe_> yes, I agree fully [21:29:10] throughput testing is not too hard if we assume PHP 7 migration is essential anyway [21:29:36] i.e. after we have done puppetization etc. for PHP 7, throughput testing is not too hard on top of that [21:29:41] <_joe_> TimStarling: I agree that throughput testing is not a factor in choosing to migrate [21:29:55] <_joe_> TimStarling: latency in page load is though [21:29:59] TimStarling: that's what I meant above with "the effort to test php 7 in prod vs. the effort to migrate php 7 in prod is comparable", fwiw [21:30:43] <_joe_> paravoid: migration requires safeguards we don't need to just run a test. [21:30:56] so, we're talking about an out of plan project with an effort ranging from "not insignificant" to "major" [21:31:14] <_joe_> yes [21:31:53] and yet you want 7.1 and 7.2 to be included in consideration :) [21:32:31] from a purely programmer's POV, heall yeah, gimme 7.2 :P [21:32:43] MaxSem: +1 [21:32:50] fair point, although I framed it as a question -- and mostly because I had previously heard that 7.1 had significant performance improvements over 7.0 [21:32:52] <_joe_> TimStarling: well I would say that in order of effort before we can do a test, we have : php 7.0 w fcgi, php 7.1/7.2 with fcgi, anything with mod_php [21:33:40] I'm fine with not doing mod_php [21:33:41] <_joe_> so I'd say we start any test with 7.0, *if* we see significant performance regressions we move to other options. [21:33:42] paravoid, how much harder would it be to have 7.1 as opposed to stock 7.0? [21:33:47] It sounds like the last bench showed 7.0 as already better or equal to HHVM, so I'm not sure why we'd need to go to 7.x immediately. Presumably that'd be a trivial upgrade at a later point. [21:34:00] <_joe_> Krinkle: Tim used php 7.1 [21:34:06] I think 7.0? [21:34:08] <_joe_> IIRC [21:34:14] <_joe_> uhm let me look [21:34:20] my benchmark was 7.0 [21:34:34] Not that I'm saying we shouldn't take 7.x if it's the same trouble to ops, but it sounds like if we want easier/simpler at first, 7.0 should be fine. [21:34:34] do people here more familiar with MW than I am feel that the parse test is representative or enough in any case? [21:34:35] <_joe_> ok :) [21:34:42] I used the package from xenial [21:34:56] that's a first thing to establish I guess? :) [21:35:07] I think parsing is an important workload, but you also want to measure latency for a normal page render [21:35:24] But parsing is definitely one of the main things that would have to be not (much) worse [21:35:40] If the parsing benchmark had been much slower in 7.0 vs HHVM, that would have been a showstopper I think [21:35:40] <_joe_> there are a lot of other things to consider. Like how does curl+https work with php 7? I think the search team did some optimizations there [21:36:06] parsing is representative of 99th percentile latency [21:36:17] not necessarily representative of throughput [21:36:22] RoanKattouw, showstopper? you would convert to Hack over that? [21:36:41] Or, well, maybe not a showstopper, but we'd know we have a significant problem [21:36:41] ^ that. [21:37:06] <_joe_> MaxSem: we would have to consider that, yes. A longer load time for our users *is* a very serious problem for the WMF [21:37:23] I've held back on other benches because they only make sense right now unless it would be an argument in php/hhvm or an argument in how-to-php7-at-wmf (e.g. mod_php/fcgi). [21:37:51] And it seems like hhvm/php7 was already decided and this RFC is about how to migrate to php7 or php7.x at WMF [21:38:14] well, we would have to come up with some solution because doing Hack would devastate us as an open source project [21:38:31] <_joe_> MaxSem: indeed, but we're pretty far from that. [21:38:48] Once a canary is available, I'd certainly be interested in generating a couple xhgui profiles and see if there are new hot spots worth optimising. As well as in a later stage, xenon/flamegraphs from just a php7 server pooled in prod for 1 day compared to from a hhvm server. [21:39:14] <_joe_> Can we all agree that a full migration of WMF production should be held back until any significant peformance degradation due to the migration is resolved? [21:39:24] <_joe_> this would be an important statement to make. [21:39:43] let's cross that bridge when we come to it, would be my opinion [21:39:47] <_joe_> Krinkle: we can surely do that when we get to that point [21:40:17] I don't want to spend too much time on hypotheticals [21:40:20] Yes, I think the roll out should be gradual and with good metrics and measured from production traffic at all points. And if we can, we should slow down the roll out until any low hanging fruit is avoided/fixed beforehand to avoid impact traffic and then letting the tech debt rot. [21:41:37] php7 presumably does not have xenon [21:42:01] we'll have to look into that sort of thing, I guess [21:43:04] <_joe_> just to be clear, I think it will take quite some time to move from phase 0: have one of the mwdebug servers run php7 to phase 1: have some production traffic go to php7 [21:43:16] <_joe_> that is the huge step I was referring to before [21:43:51] <_joe_> do we want to play it safe as we did with hhvm, introduce php7 as a beta feature, keep its cache space separated? [21:43:54] it has xhprof_sample_enable() [21:43:59] at the pace we're going, we're very far from what you call phase 0, _joe_ [21:44:03] TimStarling: Aye, good point. xhprof is a php extension that hhvm ported into their core, but xenon is custom for hhvm only. [21:44:10] 6 months out, maybe [21:44:22] xhprof is for single request profiles right now, whereas xenon is for aggregated flamegraphs. [21:44:23] <_joe_> paravoid: I would say 5-6, yes [21:44:23] unless this becomes a priority and we go full steam/drop other stuff [21:44:33] xhprof isn't supported on php7 either, there's a fork of it called tideways that is packaged in debian and supports php7+ [21:44:40] <_joe_> it's also time to talk about effort/planning [21:44:48] xhprof is fairly rarely used by us right now. It's xenon that we use day to day. [21:45:08] tideways is a drop in replacement though, and MW core already fully supports it [21:45:13] question 1: what's the timeline on stretch? [21:45:39] Perhaps we can work with upstream to port it from hhvm into a php7 extension, although I suspect there may be some logical limitations there, given xenon (last I checked) sort-of conceptually relies on having a long-running manager process (like HHVM has) [21:46:09] <_joe_> can we talk about effort and planning, please? [21:46:14] <_joe_> I think it's relevant [21:46:16] question 2: how much work will it be to get specifically HHVM running on stretch? [21:46:25] _joe_, I'm already talking :P [21:46:27] <_joe_> MaxSem: it's done [21:46:30] MaxSem: q1) Q2 includes the fundraising/christmas freeze, so realistically... February [21:46:43] <_joe_> MaxSem: or were you asking about PHP7? [21:47:05] <_joe_> HHVM is built and runs on stretch, as of today [21:47:06] no, HHVM [21:47:25] <_joe_> there is the little detail of libicu [21:47:30] partially, different ICU [21:47:31] yes, that [21:47:36] yes, we need HHVM on stretch [21:47:36] <_joe_> but that's for stretch in general [21:47:54] <_joe_> so there are a few things to figure out there [21:48:01] if we want to have a configuration switch to choose between PHP 7 and HHVM [21:48:33] <_joe_> yeah we're not doing a distro upgrade + php engine swap at the same time again [21:48:38] <_joe_> :) [21:48:46] https://phabricator.wikimedia.org/T174431 is the task list for migrating to HHVM on stretch (hhvm, hhvm extensions, and libicu) [21:49:19] videoscalers are also usually a challenge [21:49:22] ffmpeg oddities etc. [21:49:24] stretch migration was already planned, wasn't it? [21:49:37] "planned" [21:49:40] <_joe_> so, HHVM on stretch in production on a few hosts (NOT including videoscalers) is planned for next quarter [21:49:50] <_joe_> yes, "planned" [21:49:51] <_joe_> :P [21:50:03] hhvm and appserver maintenance is on a best effort basis, moritz is doing it basically as part of the broader "keeping the fleet up-to-date/secure" [21:50:04] <_joe_> a full migration will require at least until february/march [21:50:14] but there is noone specifically working on the appserver stack [21:50:36] (was not funded during the annual plan) [21:50:38] do we want to wait until all mw* hosts are running stretch before setting up one of them with php7? [21:50:43] <_joe_> at the current effort rate, which has already been scaled up [21:51:15] <_joe_> real_legoktm: no, of course for the "phase0 test" we can do it before we finish the migration, but it requires additional effort from ops [21:51:36] it's too late for Q2 anyway [21:51:59] we can parallelize these tasks and/or accelerate a migration, but only if we drop some other annual plan objective [21:52:02] <_joe_> that was specifically not planned, if we want to set a more aggressive schedule than, say "March/April 2018" for having mwdebug with php, we need to take this into account [21:52:54] I think March/April is fine [21:53:04] for what? [21:53:19] <_joe_> TimStarling: for having one mwdebug server running php 7.0? [21:53:21] HHVM on stretch will probably happen in Q3 [21:53:53] I think there's other stuff we can do in the meantime, like getting CI to run tests on PHP7 on a per-commit basis, and moving mediawiki-vagrant over to stretch/php7 [21:53:57] prod server on php 7.0... depends :) [21:54:01] it's really up to ops, the payoff for PHP 7 migration is not having to do HHVM upgrades anymore [21:54:24] I don't think there's a lot of urgency from our side [21:54:36] that's definitely a benefit, but different people involved though :) [21:55:12] what kind of urgency and what kind of priority are we talking about here? [21:55:28] is it e.g. more urgent or important than the multiDC program? [21:56:39] HHVM's plan is to release 3.24 in January, then 3.24 will be EOL a year after that, January 2019 [21:56:52] and the idea is that we not upgrade beyond 3.24 [21:57:11] so that is the deadline on this [21:57:16] TimStarling, you said something about them disappearing from public space already? [21:57:57] I don't remember saying that, but you can expect them to get more and more lazy about backporting fixes to 3.24 as the year goes on [21:59:18] TimStarling, I meant your "According to people on the ops team who have worked with them [21:59:18] recently, they stopped working on the open source product altogether. [21:59:18] They stopped responding to bug reports." [21:59:49] well, they are promising to fix that, they have a team now, but we will see [22:00:31] I guess there is a risk that we will need to rush if they screw us over [22:00:39] either that or do a lot more work on our side [22:00:41] <_joe_> just to be clear, we can backport security patches after january 2019, I'd prefer not to do that [22:00:58] work on HHVM, I mean [22:01:26] well, that's if these patches be even remotely backportable - which is not a given considering they want to drop some basic PHP features [22:01:39] Jul 2018 onwards is a new annual plan [22:01:44] <_joe_> MaxSem: I said we can, not we wants to [22:01:47] is Jan 2019 unreasonable as a goal for getting rid of jessie/trusty from the appservers? [22:01:48] a lot more flexibility on resourcing past that date [22:02:27] <_joe_> TimStarling: to get rid completely, that's a lot of effort, I'm not sure just working on it from Jul 2018 would be enough [22:02:57] I know that things happen slowly in opsland, that's why we're talking about this now [22:02:58] TimStarling: trusty is already gone; jessie we'll probably get rid of by Feb/March 2018 [22:03:01] <_joe_> completely includes all the edge cases, and we still didn't get rid of all the edge cases where we use php5 instead of hhvm [22:03:21] php7, we can dip our feet in the water in Q4 2018 and then do a migration in FY18-19? [22:03:39] planned/accounted for [22:04:02] yes, like I say, from my point of view, it is fine [22:04:07] <_joe_> paravoid: that's about what I was proposing too [22:04:20] as long as moritz doesn't try to kill you for having to deal with HHVM upstream for that long [22:04:24] <_joe_> well, I was a bit more optimist :) [22:04:26] heh :) [22:05:23] out of time now, any further philosophizing? [22:05:44] next steps? [22:06:18] CI is the next step, which we haven't talked about at all [22:06:40] and the right people aren't here :) [22:06:49] TimStarling, minimum PHP version for MW RFC cough cough ;) [22:06:53] legoktm will work on CI and beta cluster next quarter [22:07:09] ah, cool [22:08:18] so maybe he will have all the conf and puppetization done for you before you start to think about migration [22:10:01] Bumping minimum version shouldn't be a blocker per se, given MW does advertise itself as supporting PHP 7 already. And while not on each commit, I think we do require PHP 7 passing before releases. Not sure how strict we are on that, but I guess we usually find out during release candidate/beta phase. There's also Travis CI running PHP7 on every commit already (and presently passing) [22:10:16] https://travis-ci.org/wikimedia/mediawiki [22:11:07] (and consistently beating hhvm by 2-3 minutes , 5min vs 8min, but not that that is representative) [22:12:40] #endmeeting [22:12:40] Meeting ended Wed Sep 27 22:12:40 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:12:41] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-27-21.03.html [22:12:41] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-27-21.03.txt [22:12:41] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-27-21.03.wiki [22:12:41] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-27-21.03.log.html [22:13:55] legoktm is awesome, you know he did about 250 commits in the last month, while working 20 hours per week [22:33:02] good work legoktm :)