[07:46:43] 07HTTPS, 10Traffic, 06Operations: wmflabs.org should enforce HTTPS - https://phabricator.wikimedia.org/T144790#2610285 (10abian) [09:50:39] godog: hi :) Anything weird with swift yesterday between 17:35 and 23:23? [09:51:21] that's the codfw upload upgrade window, there should have been a increase in requests to swift but not too crazy [09:55:40] ema: hi! no I'm not seeing a jump in requests in swift codfw yesterday, https://grafana.wikimedia.org/dashboard/file/swift.json?from=1473069320166&to=1473155660166&var-DC=codfw [09:56:55] godog: in the end we routed cache_upload codfw to swift eqiad [09:57:58] ah ok, that's likely why then! [09:59:13] but yeah in eqiad it peaked 20% higher than usual, https://grafana.wikimedia.org/dashboard/file/swift.json?from=1472551122863&to=1473155862863&var-DC=eqiad [10:02:45] ok, thanks. It seems to handle that load very well though, I haven't seen an error-rate increase [10:06:34] 07HTTPS, 10Traffic, 06Operations: wmflabs.org should enforce HTTPS - https://phabricator.wikimedia.org/T144790#2610533 (10Aklapper) @abian: What does this task cover which is not already covered by T102367 ? Or what is its relationship to T102367? [10:11:36] as FYI, Varnishkafka seems behaving fine, no alerts about data inconsistencies received so far [10:11:56] (meanwhile we had a lot of them when ulsfo got upgraded) [10:19:47] ema: yeah it can handle the load, it is "just" slow for the most part [10:41:19] 07HTTPS, 10Traffic, 06Operations: wmflabs.org should enforce HTTPS - https://phabricator.wikimedia.org/T144790#2610579 (10abian) >>! In T144790#2610533, @Aklapper wrote: > @abian: What does this task cover which is not already covered by T102367 ? > Or what is its relationship to T102367? I think T102367 c... [10:44:57] 10Traffic, 10Varnish, 06Operations, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2610589 (10ema) >>! In T131502#2608658, @ema wrote: > We suspect that the bug(s) encountered while upgrading ulsfo might have been caused by running a mix of Varnish 3 and... [12:07:33] ori: nice link :) you can see that their stats are biased in favor of FF clients over Chrome/IE though :) [12:08:47] (e.g. they only get 2.5% chapoly because FF doesn't yet ever prefer it) [12:14:20] ema: 17:35 -> 23:23, that's the window of codfw switching over to v4? [12:16:44] bblack: yep [12:17:34] the miss rate seems a little higher, but yeah no 503 [12:17:42] the miss rate could just be natural fallout of fewer clients [12:17:55] (and no ulsfo backend traffic to help vs earlier, either) [12:18:51] I'd say if we make it 24h with no signs of 503 or CL:0 issues, we have two possible next steps: [12:19:14] 1. We could depool ulsfo in gdnsd, which sends all those clients to codfw for more load, to be more certain, or... [12:19:35] 2. We could convert ulsfo to v4 again, which gives us the same client load testing and also be->be v4 traffic [12:19:57] (2) is more-comprehensive, but (1) is simpler to actually do (and to revert if necc) [12:20:33] (and in 2, I should've said, also pointing ulsfo at codfw again) [12:20:45] we could also do (1) first, wait 24h and if everything is fine go for (2). [12:20:53] yeah [12:21:15] are you logging somewhere for the CL:0 stuff on codfw too? [12:21:57] nope, let's do that [12:22:18] assuming we get through some blend of 1+2 and it turns out everything's fine with v3 out of the mix, we'll need to have a real plan for making the rest of the conversion go smoothly-ish. [12:23:41] probably something like: point esams->codfw temporarily and then immediately roll through upgrading esams and then eqiad to v4, then switch things back to their usual routing (esams->eqiad) [12:25:35] well either that or we assume if it's just for a few hours, the v4+v3 fallout isn't too horrible, and just upgrade eqiad followed by esams without re-routing. [12:26:01] but we'd probably want to wait a few hours inbetween without re-routing to be sure eqiad is re-populated well before esams misses a lot [12:30:01] also interesting in the mozilla/cloudflare stats is they're not supporting ECDSA or DHE [12:30:53] (which means a sizeable number of clients that could potentially do forward secrecy, but only with ECDSA (some outdated IE/Win combos) or DHE (old Android, old OpenSSL), end up falling back to non-forward-secret options with them [12:30:57] ) [12:31:49] that may be one of the reasons for their substantiall higher (as in, order of magnitude) 3DES percentage than ours [12:32:16] although I'm surprised more of it didn't go to AES128-SHA... their AES128-SHA is lower than ours, which is also surprising (by about half) [12:33:57] but if their interest in these stats is minimizing their total list without losing compat, they're missing half the picture by only logging what was negotiated, rather than what was advertised (which is what we sample periodically). You might miss out on the fact that all of the X% of clients that pick CrapCipherA can also fall back to EquallyCrapCipherB that you have to support, and therefore C [12:34:03] rapCipherA can be dropped. [12:35:24] maybe they have 3des ahead of aes128-sha? would seem an odd choice [12:38:02] more things to include in the ever-delayed blog post about TLS things @ wikimedia :) [12:42:50] in any case, the blend of concerns at everywhere major (including us) on compat-vs-sec have been different. [12:43:54] most of the other major sites have opted to drop DHE entirely because of the DHE-2048 compatibility nightmare with certain very ancient / crap clients (including Java6), whereas we went the other direction long ago and kept DHE-2048 in play to give FS to some middling clients that would otherwise never attain FS with us, cutting out the DHE-2048-incompatible ones in the process. [12:45:12] (and I mis-spoke above, I don't think ECDSA certs gain FS with anyone, it's that certain IE/Win combinations only do TLSv1.2 FS+AEAD ciphers if you have either ECDSA certs or DHE available, but not with ECDHE-RSA) [12:46:13] for example: https://www.ssllabs.com/ssltest/viewClient.html?name=IE&version=11&platform=Win%207&key=133 [12:46:42] ^ in the above, you can see IE11/Win7 can only do forward-secret AES-GCM type stuff with ECDHE-ECDSA or DHE-RSA, but not ECDHE-RSA. [12:47:25] it has TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 and TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, but the best it does for ECDHE-RSA is CBC modes likes TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA [12:48:40] and as one of the most-knowledgeable and trustworthy people on this topic has said multiple times before: "every­thing less than TLS 1.2 with an AEAD ci­pher suite is cryp­to­graph­i­cally broken" [12:48:54] heh stupid blog type justification [12:53:19] bblack, ema: I saw https://phabricator.wikimedia.org/T143539 when triaging some tickets for clinic duty, any objections to just closing the ticket? [12:53:31] there's nothing really to be done there [12:54:22] moritzm: yeah nothing to do there for sure [12:56:52] 07HTTPS, 10Traffic, 06Operations, 07Browser-Support-Internet-Explorer: Internet Explorer 6 can not reach https://*.wikipedia.org - https://phabricator.wikimedia.org/T143539#2610885 (10MoritzMuehlenhoff) 05Open>03declined Closing the ticket. IE6/XP is unsupported for quite a while now and we're not plan... [13:29:03] 07HTTPS, 10Traffic, 06Operations, 06Performance-Team, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2610970 (10BBlack) Have we sent any announcement about this to the community? We might have already, just not tracked in here. [13:32:15] 07HTTPS, 10Traffic, 06Operations, 06Performance-Team, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2610981 (10BBlack) Nevermind, found it via google: https://lists.wikimedia.org/pipermail/wikitech-l/2016-June/085928.html [15:15:30] 07HTTPS, 10Traffic, 06Operations, 06Performance-Team, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2611233 (10BBlack) Looking at the past couple days of access logs from ori's nginx patch above, it looks like the current split is still 88% insecure, 12% secu... [16:46:09] 10Traffic, 06Operations: Strong cipher preference ordering for cache terminators - https://phabricator.wikimedia.org/T144626#2611554 (10BBlack) Link to mozilla NSS bug on doing AES-NI pref hacks on the client side (no real info there yet): https://bugzilla.mozilla.org/show_bug.cgi?id=1279584 [16:53:31] 10Traffic, 10Citoid, 10ContentTranslation-CXserver, 06Operations, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001#2611591 (10BBlack) `rest.wikimedia.org` is gone. We've still got `citoid.wikimedia.org` and `cxserver.wi... [16:57:13] ema: I'd need to point yarn.w.o to stat1001 since we added basic auth + ldap today [16:57:19] (it proxies to analytics1001) [16:57:38] I saw that there is a vtc use case about it though :) [16:57:44] should I remove it? [16:58:57] the rest looks easy in VCL [17:00:03] ema: still looking like v3+v4 was the issue? [17:00:50] elukey: does it need to be pass-mode if we re-enable it? [17:00:54] ema: still looking like v3+v4 was the issue?yarn) [17:01:12] grrrr @ keyboard [17:01:20] elukey: (yarn) [17:02:39] bblack: yeah I think so [17:03:50] but I could be wrong. Do the other analytics websites use something different than pass? [17:04:48] elukey: well, in theory if the app Does The Right Thing, we shouldn't need to pass-mode it, things should work right automatically, at least for basic auth if not for authcookies. [17:05:45] elukey: in practice, we don't care much about hitrate there, and if you're worried at all about accidental caching of authenticated content, pass-mode is safer. [17:06:45] bblack: still no 503 spikes, the varnishlogs did produce some output but what I've checked so far looks like 416s [17:07:21] ema: good news \o/ at least maybe now we can see the light at the end of this recent tunnel [17:07:33] indeed! :) [17:07:47] bblack: we could try not to pass and see how it goes, and maybe switch to pass if things gets inconsistent. Yarn is not that critical, we use it to see the status of the Hadoop cluster [17:09:36] elukey: sounds fine to me. I double-checked and cache_misc does have code to auto-pass if Authorization headers are present or Cookies (with some cookie exceptions for common non-auth cookies) [17:10:12] all right I can try to sent a code review, will probably get a -1 but worth to try :) [17:37:47] 10Traffic, 06MediaWiki-Stakeholders-Group, 06Operations, 07Developer-notice, and 2 others: Get rid of geoiplookup service - https://phabricator.wikimedia.org/T100902#2611800 (10BBlack) Bump - the geoiplookup JSON service (the hostname and the /geoiplookup) path will go away sometime this week, preferably t... [20:28:43] 10Traffic, 06Operations: OpenSSL 1.1 deployment for cache clusters - https://phabricator.wikimedia.org/T144523#2612589 (10BBlack) On the "potential post-1.1.0 issues" front so far, nothing too serious, but: * This double-free probably impacts our currently-under-testing nginx-1.11+openssl-1.1 build: https://g... [23:41:24] 10Traffic, 10MediaWiki-ResourceLoader, 06Operations, 06Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#2613260 (10Krinkle) I synced with @BBlack on IRC just now and resolved the confusion @Catrope and I were... [23:41:44] 10Traffic, 10MediaWiki-ResourceLoader, 06Operations, 06Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#1448575 (10Krinkle) a:05Krinkle>03None