[12:51:29] 10Traffic, 06Operations, 06Project-Admins: Create #HTTP2 tag - https://phabricator.wikimedia.org/T134960#2295661 (10BBlack) I think maybe some of the confusion here is about the various roles of tags here. Most of our tags are more about management of teams and projects. This is very different from the tra... [14:11:23] 10Traffic, 06Discovery, 06Operations, 10Wikidata, 10Wikidata-Query-Service: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989#2295722 (10ema) Update: setting default_grace to 0 makes the issue unreproducible. 4.1.0 is not affected, while... [15:22:59] 10Traffic, 06Discovery, 06Operations, 10Wikidata, 10Wikidata-Query-Service: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989#2295734 (10ema) I haven't been able to reproduce the bug so far after reverting the commit mentioned above. cp10... [15:26:14] ema: nice find! :) [15:27:04] I was just about to start turning your repro into a VTC test. I may still try, initially with 100s of re-attempts to trigger, but maybe eventually whittle it down to something that fails right away with the right conditions. [15:28:05] ema: the patch you've backed out to fix is odd anyways, it seems like the author may have made some logic mistake, as the if-conditions are strange [15:28:35] there's 2x "} else if (oc->flags & OC_F_PASS) {" in a row in that chain of if-conds. One that does things, followed by another that does nothing... [15:29:26] that's not really a problem in itself (just a dead if-condition the compiler would probably take away), but that the patch looks weird like that could mean some other mental mistake happened writing it... [15:30:38] yes! [15:31:31] it seems like that patch is a fairly-well isolated feature patch though (a nice feature too, if it worked right) [15:32:16] I'd say if all the remaining stock VTC tests still pass on 4.1.2 with that patch reverted (including its tests), we should probably deploy that way for now while this gets worked out upstream. [15:33:52] I haven't managed to reproduce right away, only after (3-4) minutes from the first request for some reason [15:33:58] on the feature being nice: we actually use hit-for-pass quite extensively on the text cluster. especially e.g. for logged-in users. for a given URI, once every 10 minutes or so the HFP has to be refreshed, which stalls concurrent logged-in clients on refreshing the HFP. [15:34:13] having grace work for that would take away the global stalls there on HFP-refresh. [15:35:36] yeah I'm sure this is gonna be fixed before we tackle text :) [15:36:45] knowing the bad patch should help narrow down the VTC anyways. I think we could probably repro this in two requests with a single varnishd, if the conditions/timing and server-response are lined up perfectly [15:39:03] oh also, it looks like upstream nginx is finally going to implement multi-cert, taking away our need for our local patches. [15:39:11] nice [15:39:37] their implementation is better anyways (ours only works for the parameters of our situation: the two certs come from the same issuer and issuing chain cert, and their OCSP server can staple the 2x certs in a single stapling response) [15:39:54] theirs is generic to any N certs with different algorithms, from disparate providers with separate stapling [15:40:11] http://mailman.nginx.org/pipermail/nginx-devel/2016-May/008231.html [15:41:27] also notably tacked onto the end of his patch chain there: OpenSSL 1.1.x -related things about supporting multiple ECDH curves and defaulting to preferring X25519 if client supports over prime256v1 (awesome) for the ephemeral part even if the cert is still prime256v1 for compat [15:41:53] and removal of default DHE stuff (won't do DHE unless you give an explicit dhparam, which we do anyways, but it's nice to not have an insecure default anymore) [15:43:02] I'm expecting these end up in near-future 1.11.x release, and we'll want to move on to 1.11.x for this and for draft-ietf-tls-chacha20-poly1305 support with openssl 1.1.x too (once that's out and stable) [15:46:00] OK no repro in 2k iterations on cp3007:80, we should be fine [15:46:09] awesome [15:46:34] vtc tests also seem ok, only one exploded because I don't have libvmod-debug installed [15:46:58] well, they seem ok till the one that failed, now I have to check the others :) [15:47:24] ok so yeah, if we didn't break unrelated VTCs, let's plan this week to update our package to include a new debian/patches/ that backs out the bad patch, and then update on maps and re-upgrade misc to v4 [15:47:58] yep (the package is ready btw) [15:48:07] IMHO we should keep the two new fixes backported during this too, they seem legit and useful [15:48:58] agreed [15:53:11] alright, 4.1.2-1wm5 reverts the patch and is uploaded to carbon. Tomorrow I'm officially out (bank holiday) but I'll surely show up every now and then :) [15:53:14] see you! [15:54:12] cya