[01:30:40] last I heard about the MediaWiki a/a plans, it was still going to be more-complicated than just a get/head -vs- post split, there were some edge case URLs and/or cookies to look at too I think? But it's been a while since I refreshed on that plan. [01:31:22] either way, under Varnish, the basic plan was to have two backend services defined. one that's master-only and one that's actually a/a, and split on deciding which of those backend services to use. [01:32:13] the rest is pretty much automagic, re: specific DCs and inter-DC routing, etc. [01:32:52] but, also, we're getting close enough to switching all the backend-facing stuff to ATS, that it's probably better to think in those terms if we're looking a quarter or two out. [15:21:15] 10Traffic, 10ExternalGuidance, 10Operations, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10dr0ptp4kt) Thanks, @santhosh. When you say "context detection code", I take that to mean inc... [18:06:08] I'm going to depool lvs1007 for the asw-a5 work (it needs one of its uplinks moved), disable puppet, stop pybal, log, etc.. [18:06:35] bblack: aha ok, defining two services seems flexible enough [18:07:04] bblack: would you guys have some time in q4 to work on this by any chance? [18:15:12] mobrovac: it's hard to say right now, I might be in a better position to make firmer plans in a week or so though. it also depends on how complex the traffic-splitting conditions end up being (if it's just safe-v-unsafe http methods that's easy, but like I said, last I heard there might be other complex conditions for edge cases?). [18:16:08] also depends on when the end-goal of having this working live and public is I guess. [18:16:49] at one of end of the spectrum: if split-conditions are easy, and everything but VCL is ready for it by end of Q4, we can probably make our part work by then too. [18:17:14] ok that's good to know [18:17:47] at the other end: if it's complex enough that we might have to invest some serious time on the VCL and/or the rest of the stack won't be ready until Q1+ anyways, we should just punt on tackling it in VCL-land at all, and do it in our new ATS backends, which will probably hit cache_text sometime around Q1-Q2 timeframe. [18:18:01] my current view of the situation (and i might be wrong as i haven't fully dug into this yet), is that there are a couple of more things to shuffle around on the MW side, and then it should be as trivial as splitting on get/post requests [18:18:17] right right [18:18:34] but splitting the traffic along get/post is in this year's annual plan [18:19:32] https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/TEC1:_Reliability,_Performance,_and_Maintenance#Outcome_6:_Improved_MediaWiki_availability_and_reduced_read-only_impact_from_data_center_fail-overs [18:22:19] yeah, way back in barcelona, we talked about this a bunch and thought we were very close on the MW side, and then I think as people dug deeper the problems looked more serious and the timeline kinda backed off [18:22:40] I haven't looked at the status of all related things on the MW side in a while, so maybe that's moved along a bunch since then. [18:23:08] right, but that was because the original plan was to go full a/a and then we realised that was too much, so decided on the lighter version of "split the traffic for get/post" [18:23:18] well [18:23:36] even for the get/post split, there were a couple of exceptional corner cases, which I don't remember clearly [18:23:55] bblack: that's what i'm trying to do currently and to figure out what remains to be done and if it's feasible to achieve by the end of the FY [18:24:04] there are a few gets that can't handle being async from write traffic, and a couple other things [18:24:11] bblack: yes, sessions and some other things that use the same mechanism, afaik [18:24:17] my memory of it's hazy, but I bet it's in the ticket somewhere :) [18:24:22] :) [18:25:44] nevermind, lvs1007 is still with role spare::system [18:26:50] lvs1016 is the one that needs to be depooled [18:29:43] hmmm ok [18:31:29] I think for 1016 you do have to the depooling stuff (disable puppet, stop pybal, etc) [18:31:35] and it will fail traffic over to 1006 [18:31:44] XioNoX: ^ [18:32:20] bblack: yeah, it's done and draining, 1006 is ramping up as expected [18:32:27] https://grafana.wikimedia.org/d/000000343/load-balancers?orgId=1&panelId=5&fullscreen&from=now-15m&to=now&refresh=10s [21:53:12] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi) [21:53:20] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10ayounsi) 05Open→03Resolved Everything has been moved smoothly, thanks! [21:57:22] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10ayounsi) [21:57:39] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi) [21:58:19] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10ayounsi) [22:40:51] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10RobH) [22:49:23] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10RobH) [23:13:54] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1045.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:14:07] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1046.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:14:23] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1047.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:14:35] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1048.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:14:46] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1049.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:15:03] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1050.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:15:16] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1051.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:15:29] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1052.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:15:40] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1053.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:15:53] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1054.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:16:06] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10ops-monitoring-bot) wmf-decommission-host was executed by robh for cp1055.eqiad.wmnet and performed the following actions: - Revoked Puppet certificate - Removed from PuppetDB... [23:19:07] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10RobH) Please note cp1045-cp1055 are all on asw-c-eqiad as their active switch, but ports were also reserved on asw2-c-eqiad for migration (if they were not decommissioned befor... [23:42:15] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission old eqiad caches - https://phabricator.wikimedia.org/T208584 (10RobH)