[11:40:22] <wikibugs_>	 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Shop: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626730 (10Jseddon) Hey @BBlack,  Been working on this over the last week.  The short: We have HSTS but its set to 90 days. Shopify have confirmed that this can be extended in le...
[13:07:51] <wikibugs_>	 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Shop: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626970 (10BBlack) Thanks for the updates!  Even a 90d HSTS without the preload/includeSub flags is better than nothing.  If we can get the time extended out to 1y that's even be...
[13:09:41] <wikibugs_>	 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Shop: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626971 (10BBlack)
[13:16:41] <bblack>	 so while looking for possible causes of the netdev watchdog thing, I got nerdsniped off into the land of re-reviewing our various network tweaks for the LVS/cache cases, things like sysctls and card settings and blah blah...
[13:17:42] <bblack>	 a few ideas for potentially-positive changes came from all that, but need some thinking (or in some cases, some feedback from ancient history, and/or feedback from the more-networky types!)
[13:18:27] <bblack>	 the easiest one (because it's not harmful, we can just test whether it's a positive perf impact or not) is:
[13:19:45] <bblack>	 sysctl net.ipv4.ip_early_demux = 0 (default 1).  Sysctl docs and whatever other reading I could find on this says the default is an optimization that's positive for end-hosts which are terminating most of their traffic, but 0 is better for routers.  Of course nobody says for sure about the ipvs case, which is sort-of a router in that it's not terminating most of its socket traffic, but ipvs also 
[13:19:51] <bblack>	 doesn't use the standard ip_forward way of routing things either.
[13:21:54] <bblack>	 next up: disabling ethernet-level flow control on the cards for both the lvs and cache cases (possibly broader than that, really).  e.g. "ethtool -a eth0 autoneg off rx off tx off".  default on bnx2x is "autoneg off rx on tx on"
[13:22:27] <bblack>	 internet advice is that ethernet-level flow-control is kind of a busted idea anyways, and that most switches (stats indicate probably ours too) have a policy of not sending pause frames, but they will obey pauses sent by the host.
[13:22:59] <bblack>	 the host would send the switch a pause frame if its receive ring buffer gets above some high water mark, causing the packets to then buffer up a bit at the switch (and then I guess if that fills it gets discarded)
[13:23:38] <bblack>	 but this just contributes to bufferbloat, and it's better to turn it off and let things drop immediately in those cases so that higher-level things like tcp can see and respond to the bottleneck accurately (e.g. bbr)
[13:25:15] <bblack>	 the third (and final) thing to look into is turning off LRO (but not GRO) on the lvses and/or caches (the case is stronger for the LVSes, assuming it doesn't cause too much CPU increase handling traffic)
[13:25:44] <bblack>	 we do have new-enough kernels (3.7 or higher) that ipvs is *compatible* with LRO and GRO (doesn't cause malfunction).
[13:26:49] <bblack>	 but most advise that on router-like hosts, turning off LRO is advisable even if you keep GRO.  This is because GRO's rules for merging packets together are quite strict (most header fields must match), so that it works fairly transparently in cases like ipvs where frames get aggregated by GRO, routed by ipvs, then de-aggregated back by GSO before sending.
[13:27:26] <bblack>	 but LRO has looser rules about aggregating: it sometimes aggregates packets that have header flags that differ in ways that might confuse TCP metrics, congctl, etc on the end-hosts.
[13:27:37] <bblack>	 (e.g. merging up two packets with non-identical tcp timestamps, and other minor headers)
[13:28:49] <bblack>	 as end hosts this matters less on the cache boxes, but there's still the issue that even locally, LRO might be merging away important header bits that help BBR accurately control congestion and such.
[13:33:12] <bblack>	 so TL;DR-ing all of that up, my instincts are:
[13:33:30] <bblack>	 ip_early_demux=0: try it, see if LVS cpu usage drops a little
[13:33:52] <bblack>	 ethernet autoneg: disabling it sounds right, probably for all hosts, but at least for the lvs+cache cases
[13:34:38] <bblack>	 LRO: tricky.  we'd expect some cpu% increase in both the lvs and cache cases as there will be less packet aggregation than before.  but it's probably manageable and GRO will still be doing some of that job, and it's probably beneficial to BBR and such.
[13:35:21] <bblack>	 I checked the bnx2x code btw, and when LRO+GRO are both on as options, LRO takes precedence for traffic that can use LRO (which is only TCPv4, whereas GRO can handle others like ipv6)
[13:35:59] <bblack>	 it's not immediately clear, actually, whether with LRO enabled GRO gets used at all, even for that other traffic.  I think so, but not sure.
[13:37:27] <bblack>	 (our current settings are both enabled by default on bnx2x)
[13:38:12] <bblack>	 XioNoX: paravoid: mark: ema: ^ any related thoughts/arguments/info welcome :)
[13:38:43] <bblack>	 s/ethernet autoneg:/ethernet flowcontrol:/ above heh
[13:43:15] <mark>	 oh brandon wrote another book on irc
[13:43:22] * mark goes back to page 1
[13:46:42] <mark>	 amen?
[13:48:10] <bblack>	 :P
[13:48:41] <mark>	 yeah disable flow control everywhere
[13:48:50] <_joe_>	 bblack: I honestly don't know enough about LRO/GRO, but the rest makes sense.
[13:48:54] <mark>	 i have no opinion at all about lro/gro, too far out
[13:49:16] <wikibugs_>	 10Traffic, 10Discovery, 10Maps, 10Maps-Sprint, and 2 others: Make maps active / active - https://phabricator.wikimedia.org/T162362#3627067 (10debt) 05Open>03Resolved Thanks @BBlack and @Gehel !
[13:50:11] <bblack>	 we used to have both disabled on the lvses (LRO + GRO) long ago.  Later when we got upgraded jessie kernels we turned both back on.
[13:50:37] <mark>	 i disabled it
[13:50:37] <bblack>	 the reasoning in the old commits/comments was that GRO was incompatible with LVS in general (was true back then) and bnx2x had some kind of LRO bug causing kernel lockups (since fixed)
[13:50:41] <mark>	 yes
[14:04:13] <bblack>	 lol, there are TCP jokes too: https://twitter.com/seismictc/status/571085852358656000
[14:12:50] <mark>	 haha
[14:21:38] <ema>	 there are a bunch of UDP jokes too but you might not get them
[14:22:27] <bblack>	 :P
[14:50:16] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627380 (10Jgreen)
[14:50:34] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627397 (10Jgreen)
[14:50:39] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627399 (10Jgreen) a:05Jgreen>03None
[14:50:50] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627380 (10Jgreen) p:05High>03Triage
[16:04:51] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627580 (10Jgreen) This also requires an updating to the firewall policy, I added the new database and generated the new policy.  com...
[16:36:43] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops, 10Patch-For-Review: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627694 (10ayounsi) a:03ayounsi Vlan changed on pfw-eqiad (old) Vlan changed on fasw-c-eqiad (new) Security p...
[16:51:58] <wikibugs_>	 10netops, 10Operations, 10fundraising-tech-ops, 10Patch-For-Review: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627748 (10ayounsi) 05Open>03Resolved new policy file worked fine, committed. Don't forget to update rackta...
[17:12:21] <XioNoX>	 bblack: https://grafana.wikimedia.org/dashboard/db/tcp-fast-open?orgId=1&from=now-7d&to=now
[17:13:15] <XioNoX>	 TCP fast-open failure in esams is quite high, no?
[17:13:46] <XioNoX>	 but it seems to be regular
[17:14:32] <XioNoX>	 bblack: do we track somewhere the TCP connect time? We got an email from Airtel saying that "TCP connect time degraded recently"
[17:15:26] <bblack>	 I didn't look at the email yet, but perhaps they're referring to the recent depool of esams?
[17:15:37] <bblack>	 (or ulsfo, which is still depooled)
[17:15:46] <bblack>	 (for upload, anyways)
[17:16:35] <bblack>	 in any case, TFO is still fairly rare, probably not what anyone's asking about
[17:18:46] <bblack>	 hmm yeah the email isn't about a depooled state
[17:19:14] <bblack>	 I mean, it sucks that they are ~420ms on a good day, but sounds like they're talking about tcp connect times worse than that in their graph, recently
[17:19:47] <XioNoX>	 yeah, and I found the email I sent them on april 25th about the same issue. Asking them to peer with us to solve the issue
[17:20:16] <bblack>	 are they at amsix?
[17:21:19] <XioNoX>	 yeah
[17:21:26] <XioNoX>	 and palo alto
[17:21:38] <XioNoX>	 the email they sent us is really similar to the previous one
[17:22:04] <bblack>	 anyways, hopefully once we have eqsin, maybe it will be slightly better than esams for India?
[17:22:30] <bblack>	 obviously long term, we'd love to have another edge site somewhere closer to india and/or ME in general, but eqsin may improve things slightly
[17:22:52] <XioNoX>	 yeah it should
[17:24:55] <bblack>	 https://grafana.wikimedia.org/dashboard/db/navtiming-count-by-country?orgId=1&var-C=India&from=now-30d&to=now
[17:25:12] <bblack>	 ^ that's something to correlate with as well.  the units of that graph have no true meaning.
[17:25:33] <bblack>	 but it's roughly a graph showing the shape of "how many users from country X are hitting our servers?"
[17:25:52] <bblack>	 there's an interesting total dropout there from like 9-14 to 9-18 too
[17:26:40] <bblack>	 oh the same dropout exists for other countries, likely that's some artificial artifact of where the stats come from being broken, or something
[17:26:46] <XioNoX>	 that's interesting graph, thaks
[17:27:22] <XioNoX>	 all the countries have this drop
[20:05:43] <wikibugs_>	 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3628244 (10Dzahn) Hey @Lydia_Pintscher Happy to work on this and talk to you maybe on IRC as well.  I would say one of the...