[00:29:01] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2478208 (10BBlack) [00:29:03] 07HTTPS, 10Traffic, 06Operations, 07Tracking: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#2478209 (10BBlack) [00:29:05] 07HTTPS, 10Traffic, 06Operations, 06Performance-Team, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2478207 (10BBlack) [01:01:05] 07HTTPS, 10Traffic, 06Operations, 07Tracking: Requests for resources through a non-canonical address over HTTPS redirect to the canonical address on HTTP (tracking) - https://phabricator.wikimedia.org/T38952#2478299 (10Danny_B) [01:01:09] 10Wikimedia-Apache-configuration, 06Operations, 07Verified: Non-canonical HTTPS URLs quietly redirect to HTTP - https://phabricator.wikimedia.org/T33369#2478303 (10Danny_B) [01:01:26] 07HTTPS, 10Traffic, 06Operations: Requests for resources through a non-canonical address over HTTPS redirect to the canonical address on HTTP - https://phabricator.wikimedia.org/T38952#2478305 (10Danny_B) [01:38:55] 10Wikimedia-Apache-configuration, 06Operations, 06Release-Engineering-Team, 07HHVM: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#2478415 (10Danny_B) [01:39:18] 10Wikimedia-Apache-configuration, 06Operations, 06Release-Engineering-Team, 07HHVM: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#760100 (10Danny_B) [06:24:28] 10Wikimedia-Apache-configuration, 06Operations, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2478798 (10elukey) @hashar: thanks for the info! I was wondering why apache2log... [06:26:17] 10Wikimedia-Apache-configuration, 06Operations, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2478806 (10Joe) @elukey I didn't add it on purpose, as I was pretty sure it was... [08:27:43] 10Wikimedia-Apache-configuration, 06Operations, 06Release-Engineering-Team, 07HHVM: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#2479025 (10hashar) [12:36:11] ha! varnish4 respons with 416 to unsatisfiable range requests [12:36:18] v3 with 200 [12:36:29] (and the full body) [12:39:27] probably better :) [12:40:10] yeah the rfc seems to admit both [12:40:44] but it sounds more reasonable to tell the client "you're drunk" [12:41:08] also, vtc is now documented \o/ [12:41:11] https://www.varnish-cache.org/docs/trunk/reference/vtc.html [12:50:56] oh awesome [12:51:13] no more cargo-culting and guessing! :) [12:52:21] right! and less grepping around the source code! :) [12:58:04] Like for VSL! No more code greps.. wait no :( [12:59:30] jokes aside, the docs have been improving a lot recently, hope that they'll keep going.. maybe some code snippets for api etc.. would be awesome [13:00:03] I guess these are the upside of open-core commercialism - developers have more incentive to care about quality [13:01:39] it is a pity because the VSL api are really awesome, v4 brought a big shiny change but without the docs it is really difficult to get it without banging your head against the wall multiple times [13:02:36] in the end it is a "wow awesome" but in between it is not rare to have mixed feeling and hate against everything [13:02:39] :) [13:12:12] heads-up, I've scheduled the cr2-eqiad upgrade for tomorrow [13:12:17] https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160721T1100 [13:16:10] ok [13:17:21] paravoid: if you have time, how will you to the upgrade? I mean, what procedure you'll follow (curiosity while checking https://wikitech.wikimedia.org/wiki/Network_design#/media/File:Eqiad_logical.png) [13:33:02] elukey: it'll just be cr2-eqiad [13:33:31] wow that diagram is very old [13:33:37] an very inaccurate :) [13:33:54] according to that diagram, maybe we could use tampa to offload some of the traffic :) [13:34:07] haha [13:34:15] yeah :D [13:34:45] I've documented some of the steps in the tasks [13:34:50] https://phabricator.wikimedia.org/T140770 [13:34:53] and https://phabricator.wikimedia.org/T140764 [13:35:28] it's a little terse as it's mainly notes for me, but I can expand each of these if you need [13:35:40] we have two routers in eqiad, cr1-eqiad and cr2-eqiad [13:36:05] they act as a redundant pair for the servers (using VRRP) [13:36:19] and then they each have a set of transits and cross-datacenter links [13:37:04] spread out between the two as much as possible [13:37:29] so cr1 has NTT, Zayo and a direct link to codfw [13:37:55] cr2 has Telia, the Equinix IXP, a link to eqord and another link to codfw [13:38:18] cr1 also has our only link to knams, which is why we can't take it offline yet [13:39:29] I have a question about the first step, namely draining eqiad traffic from gdnsd [13:39:45] each of cr1/cr2 have redundant "routing engines", i.e. server-like systems running JunOS [13:40:25] in theory it's possible to do the upgrade without interruption using techniques called GRES (graceful routing engine switchover), NSR (non-stop routing) and ISSU (in-service software upgrade) [13:40:41] elukey: that's done with ops/dns commits like this one: https://gerrit.wikimedia.org/r/#/c/292127/1/admin_state [13:40:56] but these are complicated (because they sync state between two independent systems) and I've never seen them work properly [13:41:11] yeah, I'm not 100% sure I want to drain yet [13:41:32] bblack: so frontend traffic will be turned off but not the ones coming to the "backend" from others DC? [13:41:38] elukey: right [13:41:39] with cr2 down, we'll be relying on a single eqiad-codfw wave [13:42:01] elukey: (which is the pass-traffic and cache misses, it's a relatively-small chunk of total traffic) [13:42:11] so if we have eqiad frontends depooled, it means that if we lose that wave we'll lose basically all of the US and Asia [13:42:16] paravoid: yeah the following question to understand it better would have been "what happens if cr2 goes completely down outside maintenance" [13:42:29] same thing as we're doing tomorrow [13:43:04] the fallout's not great if we leave eqiad pooled, either [13:43:17] still seems better to have less direct traffic in case of blips [13:43:42] the fallout if we leave eqiad pooled is that we'll lose ulsfo [13:43:50] and that's about it [13:44:01] well that's Asia above [13:44:07] and part of the US [13:44:07] and ok, most likely congestion on the network links [13:44:52] basically unless we go through a full codfw-switchover first, there's going to be serious fallout if we lose eqiad, or lose significant inter-DC links or traffic to eqiad. [13:45:20] oh the scenario wasn't losing eqiad [13:45:26] it was losing the eqiad-codfw link [13:45:37] if we lose eqiad, we're pretty much fucked, sure :) [13:45:42] ahaahah [13:45:56] if we just lose ulsfo/codfw->eqiad, that's still 2/3 of US geography in our default mapping, and all asia [13:46:09] yeah [13:46:37] vs. 3/3 of US + Asia + India + whatnot if we set eqiad frontends to down beforehand [13:46:47] you could argue the other way and depool ulsfo+codfw over the link risk, but then we might saturate remaining inbound internet to eqiad, right? [13:46:54] yes [13:48:02] so I want to look our graphs a little more closely [13:48:18] overall I'm not _too_ worried (famous last words?) [13:48:23] if we don't lose the unrelated SPOF link to codfw during the cr2 downtime, though, and we've left normal traffic on eqiad, we face the possibility of some kind of link saturation anyways though, right? and expose more users to any small blips that might occur in eqiad's outbound networking in general [13:48:56] correct :) [13:49:13] you put it very eloquently [13:49:17] that's the tradeoff basically yeah [13:49:23] still thinking about it and looking at network graphs [13:49:31] what's your opinion? [13:50:02] my opinion is look at the graphs more, and maybe consider what our reliability history has been with that remaining link to codfw [13:50:37] cool, that's my strategy [13:50:42] if there's no compelling argument either direction though, I'd still lean toward depooling eqiad [13:50:47] I already looked for planned maintenance, fwiw :) [13:50:51] I actually got the overall picture, thanks! It might be a good discussion for Friday even if it could take a long time :) [13:53:59] elukey: it's a lot to take in, feel free to ask any questions [13:57:03] sure! I am really interested in this part, probably the best way is to check more netops tasks and follow up if needed.. Not aiming to become a neteng, but I'd prefer to get something out of what you, Brandon and Alex do on the network side once in a while :) [13:57:32] I wasn't aiming to be a neteng either :P [13:57:44] oh yes you were [13:57:45] ahahhaah [13:57:49] you were almost standing in line [13:58:44] i remember it well ;p [13:59:24] < interview starting [14:05:19] bblack: I think I saw you working on lowering TTLs to 300, right? [14:08:21] I'm wondering if we should also eventually allow gdnsd to monitor and automatically mark DCs as down [14:08:46] it might be tricky to implement (split brains and such) but it may be time to start thinking about doing this :) [14:10:01] an average 2.5min when we lose an entire caching pop will be an interesting thing to have! [14:10:12] an average 2.5min downtime* [14:13:34] yeah I made a ticket for it. I'd like to LVS-ify AuthDNS first [14:13:54] yeah, that makes sense [14:14:03] I've been having that in my TODO for months too :(( [14:14:29] re: auto-down, that'd be nice, but for our scenario I don't know if I trust gdnsd's built-in monitoring for it, due to yes the split brains [14:14:54] well not the built-in one probably [14:14:56] I can rationalize that and say that in a split-brain scenario, it's better that caches that can reach the split DC don't get routed by it to other DCs, but still [14:14:57] but the separate process one [14:15:05] with a little more logic perhaps [14:15:20] but yeah, we need DC-isolation-detection in general, and we could use that to feed into gdnsd [14:15:36] via the extmon or extfile plugins, separately from admin_state [14:16:40] we really want two separate things to happen there, I think: [14:17:13] 1) If a DC detects that itself and most of our network is normal, but 1x of our DC appears dead/isolated from the working network, it should gdnsd-depool the isolated DC [14:18:07] 2) If a DC detects that itself is completely isolated, it should do... something? The obvious candidate actions are: depool all other DCs, depool itself, or just stop its own AuthDNS service entirely. [14:18:41] depool itself is probably the right answer [14:18:44] if traffic was assumed to be reasonably-functional from the caches alone in isolated, the first option would make sense, to help those clients that are in its isolation bubble [14:19:03] but I don't think we're in that state today, which leaves "depool self" sounding better [14:19:37] stopping gdnsd is also an option, though. It's basically telling clients to go talk to a gdnsd elsewhere that has better info. If they can't reach those and the local isolated site is assumed unhelpful, they're screwed anyways. [14:21:29] (and for bonus points, those other gdnsd's are more likely to actually be in our administrative control still. the isolated site might be unreachable by us and/or our cfg mgmt) [14:23:19] yeah that's not a bad option either [14:23:22] although a little scary :) [14:24:19] on that note, maybe we should set up ipsec tunnels between the sites' bastions that are allowed to flow over the public internet, so that we can hop manually from e.g. bast1001 to bast3001 when all our transport is down. [14:24:45] well not really "manually", but still [14:25:02] it's annoying to have to edit my ssh config to go reach esams, because I normally route ssh through a US bastion [14:25:43] my ssh config uses the closest bastion to the server I want to reach [14:25:44] or ideally, have such a tunnel that can be used for cfg mgmt and such as well, but doesn't get used for user traffic? [14:25:47] for .wmnet at least [14:26:08] *.codfw.wmnet uses bast2001 and so forth and so on [14:26:19] harder to do with misc-named servers [14:26:24] yeah I used to have something like that, but then it doesn't work for the wikimedia.org hosts we don't subdomain [14:26:37] yup [14:26:41] last time I redid mine I just stuck with bast2001 for all, I figured our networks are better anyways [14:26:50] (usually) [14:26:59] well it's not /better/ per se [14:27:11] the shortest path from greece to ashburn may or may not be through amsterdam [14:27:36] more reliable? [14:27:45] (maybe not with our 1x link, but soon!) [14:27:48] I wouldn't call our GTT link reliable :P [14:27:52] yeah :) [14:28:03] definitely different for you! [14:28:37] you're too close to Dallas too, that makes a difference too [14:28:58] if you were in, say, Atlanta, it wouldn't (necessarily) make sense to go via one of the two sites always [14:29:05] there's no real point trying to tunnel traffic like we've done in the past as a backup, right? most likely we'd just face all kinds of neteng problems with saturated peers or transits, etc? [14:29:28] from routers you mean? [14:29:29] I guess not peers, but transits? [14:29:35] it's... annoying, for various reasons [14:29:45] yeah, we had it at one point, I think [14:29:50] there's the MTU issues as well [14:29:52] yeah, and MTU fail [14:29:59] so I got in contact with one of our transits [14:30:05] and it's possible to use jumbo frames with them [14:30:14] their customer support said no [14:30:33] and then I got a private email from one of their network architects [14:30:42] who reads the support emails apparently (!?) [14:30:52] and said it's possible, but they don't advertise it [14:31:02] so we could have a 4k or 9k MTU with the transit [14:31:08] maybe support asked him, and he gave two answers via two channels :) [14:31:14] and then tunnel 1500 bytes over that link [14:31:26] it's still in a very gray zone though [14:31:49] you're not really supposed to be using the "connected" subnet of a transit link for example [14:31:59] most carriers don't block that, but you're not really supposed to do that [14:32:06] (it's their IP space) [14:32:34] more random conversational sidetrack: is it worth trying to do jumbo frames on our local network with MTUs just for our own networks' routes in iproute2? It seems possible, but slightly complicated, and maybe not worth much in the real world. [14:33:09] I don't think you can, you'd have to do the opposite, no? [14:33:26] set the interface's MTU to 9000 and then set specific routes to 1500 [14:33:30] (such as 0/0) [14:33:51] you can't leave the interface at 1500 and then set specific routes to 9000? [14:34:03] I wouldn't think so, but I've never tried it [14:34:07] me either [14:34:23] anyways, we'd still want specific routes for the other subnets in the DC, and if our transport supported it, to remotes too [14:34:54] our backbone is jumbo frames enabled [14:35:07] but as much as it sounds awesome, I know some have said the small gains aren't worth pursuing [14:35:21] it's still possible there are misconfigured links somehwere though, we've never really tried it :) [14:36:55] bbl [14:37:20] https://www.reddit.com/r/networking/comments/3nvvrw/what_advantage_does_enabling_jumbo_frames_provide/cvrpsd7 [14:37:28] ^ thinking like that, I've heard often before [14:38:11] and we'd probably have strange corner-cases, too. like mgmt network or power strips not supporting the MTU. [14:39:42] 10Traffic, 06Operations, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2480158 (10BBlack) Copying in an old argument never recorded, I think from @faidon: While many services on cache_misc are obvious targets for "mid", phabricator its... [14:50:15] 10Traffic, 06Operations, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2480206 (10BBlack) Going into that list a little deeper, though, there's a secondary pragmatic issue. Most (all?) of the servers for the services above are still i... [15:10:08] 10Traffic, 06Operations, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2480349 (10Dzahn) gerrit will be replaced by https://gerrit-new.wikimedia.org/r/#/q/status:open soon-ish , then it will be jessie (and use Letsencrypt for certs).... [18:56:55] 10netops, 06Operations, 10ops-codfw: audit network ports in a4-codfw - https://phabricator.wikimedia.org/T140935#2481487 (10RobH)