[04:42:32] 07HTTPS, 10Traffic, 06Operations: Secure connection failed when attempting to preview or save pages - https://phabricator.wikimedia.org/T134869#2280304 (10MSJapan) I would just like to add that I've having the same issue, but it occurs when saving almost any edit, as well as trying to XfD in Twinkle, and I'm... [06:39:19] curl -I https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz [06:39:22] [...] [06:39:25] content-length:25175751 [06:39:30] age:203 [06:39:30] x-cache:cp1045 hit(2), cp3010 hit(1), cp3007 frontend miss(0) [06:39:39] with https://gerrit.wikimedia.org/r/#/c/288350/ applied [06:40:06] I've currently upgraded cp3007 only, but I'd say we found a solution [07:32:29] ema: I've uploaded the new Linux package to carbon yesterday, the meta packages don't need to be changed this time [07:32:41] 10Traffic, 06Discovery, 06Operations, 10Wikidata, and 2 others: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989#2288343 (10Gehel) a:05Gehel>03None [07:58:08] moritzm: thanks [07:59:03] gehel: I've upgraded all misc caches to a patched version of varnish that I think solves the issue [07:59:11] please try to reproduce when you have a sec [07:59:20] ema: trying right now... [08:12:08] gehel: and? :) [08:12:43] ema: 2 errors on 100 requests, lemme check [08:14:04] https://phabricator.wikimedia.org/P3048 and https://phabricator.wikimedia.org/P3049 [08:14:20] I still see the 3x Age header... [08:14:55] yeah that is likely unrelated [08:15:45] mmh, both failed requests got misses in esams and a hit on cp1061 [08:19:31] ema: was that with all the CL-sensitive VCL still gone? [08:19:48] bblack: yes I haven't touched that [08:19:57] but I could repro the problem even with CL-stuff gone [08:20:17] oh ok [08:21:05] after upgrading I cannot reproduce the issue with https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz anymore [08:21:42] yeah but gehel still hits it [08:21:47] indeed [08:22:11] right after re-(re-)re-moving the last of the CL-stuff I couldn't reproduce, and everyone else said it was ok at the time too heh [08:22:38] it may be hard to hit at times. I think the CL-stuff just made it easier to repro. [08:22:48] so a good order for cache wiping would be: eqiad, codfw, esams, ulsfo, right? [08:23:43] yeah for misc: eqiad, codfw, (esams+ulsfo) for the backends [08:24:35] try adding a50c99f6 "Make sure hp->body is always initialized." maybe. That one looks related to me too. [08:24:46] anyways, I'm going back to sleep, back in a few hours :) [08:24:52] sleep well! [08:26:24] bblack: and come back rested! [08:27:57] oh, I had to take one more peek. I didn't notice last night that that patch is to varnishtest. so, not related :( [08:28:34] uh? bin/varnishd/http1/cache_http1_proto.c [08:29:13] https://github.com/wikimedia/operations-debs-varnish4/commit/9560c50fbeca1c32a0ddf52ef84ff9776c8d8099 [08:30:30] oh, you mean the hp->body one [08:33:45] let me try to wipe the caches again just to be sure [08:40:33] gehel: could you please give it another go? [08:40:40] sure ... [08:45:40] looks good so far ... [08:50:35] 100 requests, no error. Looks good. (Or I might just be lucky again...) [08:51:02] yeah given the nature of this bug I wouldn't get too excited (just mildly happy) [08:51:48] * gehel is midly happy [08:54:11] gehel: is your test still running or does it stop at 100 reqs? [08:54:32] it stops at 100 requests, but I launched it again, just to be sure... [08:54:38] great, thanks [08:55:29] still no errors... What is the level above "mildly happy"? [08:56:00] Damn, I spoke too soon, 1 error again [08:56:28] https://phabricator.wikimedia.org/P3050 [08:57:33] just 1 error on this batch of 100 requests... [08:59:01] mmh [09:11:41] this also might be relevant https://varnish-cache.org/trac/ticket/1858 [09:12:00] Stale hit-for-pass objects generating 200 responses with missing content [09:17:47] yeah but it should be in 4.1.2 [09:22:56] and now style.css got cached as empty on cp3008 [09:23:08] X-Cache: cp1061 hit(1), cp3010 miss(0), cp3008 frontend hit(9) [09:23:12] curl: (18) transfer closed with 6304 bytes remaining to read [09:24:08] so yeah, certainly not solved :( [09:41:42] cached with Age: 995 BTW [09:41:45] * ema is confused [09:42:39] flushing cp3008's frontend I get a proper response again [10:26:03] there shouldn't be any HFPs for these test URLs right now, though [10:29:28] another thing to look at: this probably doesn't affect all backends/URLs [10:29:39] we have examples in wdqs, downloads, stats [10:29:57] is there a common factor in them? [10:30:37] re: Age, keep in mind that doesn't indicate how long it's been in the FE (or anywhere) [10:31:05] that's just the Age since that response was first generated at some origin (some backends may even send our backend-most cache non-zero Age) [10:35:47] wdqs uses nginx, download uses apache [10:35:59] so it's probably not some common subtle bug in origin output [10:36:27] so for a while I thought there was a difference between T134989 and T135038 given that the latter seemed to be fixed by https://github.com/varnishcache/varnish-cache/commit/e142a199c53dd9331001cb29678602e726a35690 [10:36:27] T135038: Inconsistently unable to download https://releases.wikimedia.org/mediawiki/1.26/mediawiki-1.26.2.tar.gz (returns zero-byte response) - https://phabricator.wikimedia.org/T135038 [10:36:28] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989 [10:36:37] s/download/releases/ [10:36:52] sorry, I meant the releases one seemed to be fixed [10:37:06] however now I can reproduce both [10:37:19] the trick is triggering misses to find repros faster [10:37:39] I donno if hit->miss at different layers is an important part too, though [10:37:43] triggering full-miss is easy [10:38:46] but if you get an FE cache hit with valid output, and it's a default 120s cache object, you're probably not going to get a reproduction there for 120s [10:38:57] right [10:38:57] which can be a lot of test requests and make you think it's 100x ok and all's good :) [10:39:27] have we ever had a reproduction with miss,miss,miss? [10:40:21] (or pass,pass,pass for that matter, using a pass-only backend) [10:40:35] I don't think I've had one yet, no [10:40:38] https://config-master.wikimedia.org/ is explicit return-pass, so it always has 3x pass [10:40:58] oh ok, I was adding query params to get misses :) [10:41:40] well adding query params will get you 3x miss, but doesn't help much with finding a hit-miss-miss [10:42:03] yep I was doing that to trying a repro with 3x miss [10:44:09] for x in {1..1000}; do curl -s "https://query.wikidata.org/style.css?asdf=${x}" >x; ls -l x; done [10:44:18] (but pick your own unique 'asdf') [10:44:37] I don't think I've ever gotten a miss/miss/miss or pass/pass/pass repro, though [10:45:04] so it's likely only triggering on a miss/pass that's fetching a cache hit from beneath [10:46:13] in the style.css case, we know the origin just sends etag, but no expires/CC headers, which means it uses the default 120s TTL [10:46:45] for style.css I get: [10:46:46] X-Cache: cp1058 miss(0), cp3010 miss(0), cp3008 frontend hit(164) [10:47:01] why 164 then? Shouldn't it expire? [10:47:02] to help debug on that URL, could hack in some frontend-only VCL (for that req.http.host / req.url) that caps ttl down to like 30s [10:47:16] not if it's been hit 164 times in under 120s [10:47:31] ah, I forgot to sleep in the loop :) [10:47:46] if backends do it for 120 and frontends only for 30, we'll be able to get more hit-miss-miss reprods [10:47:47] and I got a repro on pass pass miss [10:47:57] what did pass/pass/miss? [10:47:59] < X-Cache: cp1045 miss(0), cp3010 pass(0), cp3008 frontend pass(0) [10:48:04] without Content-Length [10:48:08] what URLs? [10:48:13] https://query.wikidata.org/style.css?xxxxx [10:48:28] why would it ever be pass at all? this is what had me questioning X-Cache the other day [10:48:56] it clearly really was a pass, though [10:49:17] right after that one I got another repro with: [10:49:18] < X-Cache: cp1058 hit(1), cp3007 miss(0), cp3008 frontend pass(0) [10:49:23] because the BE->BE jump there is 3010->1058 on miss, but 3010->1045 on pass (pass is random) [10:49:57] yeah once you hit pass, pass picks random backends, much more likely to find intermediate miss/pass, more likely to repro, etc [10:50:05] the question is, what is causing those to pass at all? [10:53:12] so at a given layer (let's say cp3010 in the middle), we're pretty sure it did a hit at some point in the past (for the miss(0)->cp3008 hit) [10:53:35] then when cp3008 finally expired, it ask 3010 again, and 3010 did a pass to 1058 when we'd expect a miss... [10:53:49] there may have been a miss we didn't record first, but that miss generated a hit-for-pass [10:54:34] same story for cp3008 as a frontend: after the obect expires I always get a pass [10:54:38] the only explicit hit_for_pass we have is in text/upload -specific VCL. and with misc's removal of CL-logic, we don't have hfp/beresp.uncacheable in that code either, and not in shared [10:54:58] the pass will go away in 120s, should get another chance to become a real hit instead of an hfp [10:55:27] something in shared VCL must be doing it somehow, or in v4 builtin VCL? [10:56:08] perhaps! But why do we sometimes get hits and sometimes passes? [10:57:34] got a repro with 3x miss [10:57:47] < X-Cache: cp1045 miss(0), cp3008 miss(0), cp3008 frontend miss(0) [10:57:50] and no CL [10:58:06] the request immediately afterwards was good though [10:58:09] < Content-Length: 6304 [10:58:10] < X-Cache: cp1045 miss(0), cp3008 miss(0), cp3008 frontend hit(1) [10:58:14] for https://query.wikidata.org/style.css?xxxx-ema [10:58:39] I'm assuming no-CL == repro here [10:58:55] not necessarily at all [10:59:15] the response can be chunked, too. can only tell by whether final output is zero-length or 6304 [11:00:07] anyways, back on the pass-mystery: I just varnishlogged on cp3010-backend and cp3008-frontend while putting style.css reqs through [11:00:56] where cp3010 indicated pass on the way through... [11:01:25] - Debug "VSLP picked preferred backend 3 for key cf143d05" [11:01:28] - VCL_return hash [11:01:30] - VCL_call HASH [11:01:31] so the pass is due to hit-for-pass [11:01:34] - VCL_return lookup [11:01:36] - Debug "XXXX HIT-FOR-PASS" [11:02:19] and I've just caught 2x reqs going through cp3010 serially that were miss then pass [11:03:16] first one: [11:03:16] - VCL_return lookup [11:03:17] - VCL_call MISS [11:03:17] - ReqHeader X-CDIS: miss [11:03:17] - VCL_return fetch [11:03:19] - Link bereq 19859904 fetch [11:03:29] next one: [11:03:30] - VCL_call HASH [11:03:30] - VCL_return lookup [11:03:31] - Debug "XXXX HIT-FOR-PASS" [11:03:31] - HitPass 19859904 [11:03:33] - VCL_call PASS [11:03:35] - ReqHeader X-CDIS: pass [11:03:38] - VCL_return fetch [11:03:40] - Link bereq 9458032 pass [11:03:45] I assume HitPass 19859904 == Link bereq 19859904 fetch [11:03:55] which means the fetch in the miss above created the HFP object used in the next [11:05:02] the fetch from the eqiad backend (the fetch on miss, which probably created hfp), had: [11:05:05] - RespHeader Age: 226 [11:05:16] which would be a negative-TTL object if we're still dealing with 120s [11:05:33] right [11:05:40] I didn't catch the backend-side, I wonder if it tried to do an ETag/If-None-Match to get that? [11:06:15] (as in, I wonder if 3010->cp10xx was a 304-type response for the "hit") [11:06:25] on that note: https://varnish-cache.org/trac/ticket/1858 [11:06:28] even then, hmmm [11:06:33] which should be fixed already though [11:06:53] yeah there's another related thing though that's not fixed already [11:07:36] https://github.com/varnishcache/varnish-cache/commit/d828a042b3fc2c2b4f1fea83021f0d5508649e50 [11:07:43] is in 4.1 branch but no release yet [11:08:12] but regardless of that, I think the IMS issues are probably minor [11:08:32] the real problem here is in appropriate handling of negative TTLs in our VCL code, which worked before and doesn't work now [11:09:19] probably because we don't use the builtin vcl_foo tails, and while that's well-analyzed in the text/upload case, it's probably full of holes (where we don't adequately replace the builtins) in misc/maps [11:10:36] another related VCL nit: shared VCL in the wikimedia-common says this in vcl_backend_response: [11:10:39] /* Don't cache private, no-cache, no-store objects */ [11:10:42] if (beresp.http.Cache-Control ~ "(private|no-cache|no-store)") { [11:10:45] set beresp.ttl = 0s; [11:10:47] /* This should be translated into hit_for_pass later */ [11:10:50] } [11:11:08] only text/upload have the custom VCL to translate that into hit_for_pass later [11:11:21] found something perhaps interesting: [11:11:21] < X-Cache: cp1058 hit(4), cp3008 pass(0), cp3008 frontend pass(0) [11:11:21] SIZE: 6304 [11:11:30] < X-Cache: cp1058 hit(5), cp3008 miss(0), cp3008 frontend miss(0) [11:11:30] SIZE: 0 [11:11:45] the second request also had CL set, the first one didn't [11:11:46] I guess that explains why I can't repro on pass/pass/pass :) [11:12:16] how is it possible that a hit on cp1058 is returned with the right size first and the wrong one shortly later? [11:12:37] well we don't know that for sure, at the cp1058 level [11:12:56] probably cp1058 has the same output both times, but cp3008 handles it differently on miss-vs-pass [11:13:17] or, between those two requests the age passed into negative territory [11:13:45] I think we're really missing (anywhere - text/upload have it explicit, builtin VCL probably has it but we skip builtin, but misc has it nowhere...) [11:13:57] the standard if (ttl <= 0s) { hfp } [11:14:14] oh, which is in default vcl [11:14:29] ? [11:14:50] or conversely, we do have it from builtin perhaps, and we need to answer the question of why an Age:226 object made it out of a cache with 120s TTLs [11:15:24] hmmmm [11:15:39] ok so, this object has no expiry info we have to remember (no CC, no Expires) [11:15:47] it only has an etag to match [11:16:02] default TTL applies, which sets 120s object life in a single varnishd [11:16:30] I guess it's still psosible to get values up to 360s at the frontend [11:16:48] if you fetch a nearly expired object at each layet and give it a fresh 120s life in this layer [11:17:07] but also, the TTL is coming from default, not from Age:, so there should be no negative-TTL issues [11:19:04] ok wait, new candidate VCL bug idea [11:20:33] our cluster-shared vcl defines vcl_hit (for both layers) [11:21:17] all we really do there is set X-CDIS for X-Cache and call the cluster-specific hit function, which is empty on misc [11:21:33] we do explicit return in our cluster-shared vcl_hit though, bypassing builting [11:21:36] yes [11:21:49] the builtin for varnish4 is not simple [11:21:51] and we say that default VCL is just "return (deliver)" anyways [11:21:52] # sub vcl_hit { [11:21:52] # if (obj.ttl >= 0s) { [11:21:52] # // A pure unadultered hit, deliver it [11:21:52] # return (deliver); [11:21:52] # } [11:21:55] # if (obj.ttl + obj.grace > 0s) { [11:21:57] # // Object is in grace, deliver it [11:22:00] # // Automatically triggers a background fetch [11:22:02] # return (deliver); [11:22:05] # } [11:22:05] yes [11:22:07] # // fetch & deliver once we get the result [11:22:10] # return (miss); [11:22:12] # } [11:22:15] apparently VCL in varnish4 is what handles hits on expired objects [11:22:27] I don't think we should return! [11:22:48] probably [11:22:57] needs some thinking to be sure if we fix it in shared VCL though [11:23:54] the default vcl_hit's grace logic is simplistic [11:24:05] we probably want a better version anyways, basic bugs aside [11:24:26] our grace logic even in varnish3 was awful/missing anyways [11:24:56] https://info.varnish-software.com/blog/grace-varnish-4-stale-while-revalidate-semantics-varnish [11:25:02] mmh do we ever return(miss) then? [11:25:54] not from vcl_hit [11:26:09] I think the v4 flowchart can reach vcl_miss without going through hit though [11:26:33] the path through hit->miss is just for when the cache has an expired object that hasn't been completed erased from lookup indices/storage whatever [11:28:27] back on the general topic of grace: [11:29:01] we probably won't bother with the health-check logic, as for most backends we don't have health anyways [11:29:11] (well, we do for inter-cache, but not for applications usually) [11:30:31] all of that topic around TTLs is confusing when you consider that TTL != time-to-HTTP-object-expiry [11:33:58] I'm assuming a backend with no healthcheck is "healthy" in the vmod_std healthy() sense [11:34:26] anyways, that's all long-term problems [11:34:52] for now just to help debug on misc, we could just add to its cluster-specific vcl_hit: [11:34:56] should we try if the default vcl_hit changes anything on misc? [11:35:03] if (obj.ttl < 0s) { return (miss); } [11:35:14] it would do the same basic thing [11:35:16] it sounds good [11:35:26] I think it will clear up confusion about some things in our testing [11:35:34] I'm not sure it's really going to affect CL:0 [11:36:10] grace doesn't matter much for this scenario, for misc, for now [11:36:18] alright [11:36:20] but serving expired objects as hits might be very confusing [11:36:51] food is needed here, bbiab [11:37:17] ok [11:44:30] https://gerrit.wikimedia.org/r/288372 is salt-puppeting [11:45:06] oh while I'm at it, I'll merge the X-Cache fixup too heh [11:53:00] ok so with those patches in place, I've got an interesting result [11:53:13] 10Traffic, 06Discovery, 06Operations, 10Wikidata, and 2 others: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989#2288816 (10Jonas) Now I get content-length 0 for query.wikidata.org ``` jonkr@C134:~$ curl -v 'https://query.wikidata.org/'... [11:53:54] req #1: directly to cp3010 backend: [11:53:55] bblack@palladium:~$ curl -sv -H 'Host: query.wikidata.org' -H 'X-Forwarded-Proto: https' http://cp3010.esams.wmnet:3128/style.css >x [11:54:33] < Age: 9 [11:54:33] < Via: 1.1 varnish-v4 [11:54:33] < X-Cache: cp1045 hit+miss(0), cp3010 hit(1) [11:54:33] < Accept-Ranges: bytes [11:54:33] < Content-Length: 6304 [11:54:36] < Connection: keep-alive [11:54:38] < [11:54:41] { [data not shown] [11:54:43] * transfer closed with 6304 bytes remaining to read [11:55:14] I wasn't there for the initial miss, but either way this was a hit in cp3010 backend, and it gave CL:6304, but never sent the damn bytes and timed out [11:55:52] req #2: same basic query + result: [11:55:53] < Age: 35 [11:55:53] < Via: 1.1 varnish-v4 [11:55:53] < X-Cache: cp1045 hit+miss(0), cp3010 hit(2) [11:56:01] slightly older hit object, same lack of bytes [11:56:13] req #3: try cp3008: [11:56:14] bblack@palladium:~$ curl -sv -H 'Host: query.wikidata.org' -H 'X-Forwarded-Proto: https' http://cp3008.esams.wmnet/style.css >x [11:56:36] < X-Cache: cp1045 hit+miss(0), cp3010 hit(3), cp3008 frontend miss(0) [11:56:48] < Content-Length: 6304 [11:56:49] < Connection: keep-alive [11:56:57] and it did transfer all 6304 bytes correctly [11:57:25] req #4: back to cp3010-backend directly to re-verify: [11:57:25] < X-Cache: cp1045 hit+miss(0), cp3010 hit(4) [11:57:26] < Accept-Ranges: bytes [11:57:26] < Content-Length: 6304 [11:57:36] and still: [11:57:36] * transfer closed with 6304 bytes remaining to read [11:58:09] so apparently when cp3008 queries cp3010, it can get the object and pass it successfully to curl [11:58:18] but when curl hits cp3010 directly, the transfer fails [11:58:20] wtf? [11:59:19] direct fetch to cp1061 or cp1045 :3128 does work [12:08:13] on those ones with "transfer closed with 6304 bytes remaining to read", it's definitely varnish not curl that closes, 5 seconds later [12:08:18] I've watched it in a packet capture [12:09:39] bblack: I really like that you take the time to write your thoughts on IRC. It let us mere mortals learn a lot in the process! [12:10:20] again: direct to cp3010:3128, I get: < X-Cache: cp1061 hit+miss(0), cp3010 hit+miss(0) + failed to transfer w/ 5sec timeout [12:10:54] (also, that failed transfer had < Age: 0) [12:11:09] (which makes sense, it was a miss/miss) [12:11:31] but shortly after, I hit cp3008 frontend and got: [12:11:31] < Age: 116 [12:11:33] < X-Cache: cp1061 hit+miss(0), cp3010 hit(2), cp3008 frontend hit+miss(0) [12:11:37] and successful transfer [12:12:04] so analyzing just that pair of test requests: [12:12:44] the first one is a full miss, no valid cache contents in the way. cp1061 and cp3010 both pulled in the content and created cache objects, but cp3010 failed to deliver it outbound [12:13:03] the second one is a miss in 3008 that hits in 3010, and does get the content out of it and delivers it to me [12:13:50] if I go back to 3010:3128 right after, once again it still stalls [12:14:02] so apparently 3008 can fetch the object from 3010, but curl cannot [12:15:04] another interesting point: if I request from cp3010:3128 with --compressed, I get < Content-Length: 0 and no delay and an empty file [12:15:29] whereas without --compressed, I get CL:6304 and 5 seconds of radio silence with no bytes output, then varnish closes connection [12:19:04] either way it's bad, but I guess this explains how we get two different kinds of failed outputs [12:20:16] tracing a 3010:3128 fetch which results in failed transfer + < X-Cache: cp1061 hit+miss(0), cp3010 hit+miss(0) [12:20:52] varnishlog on 3010 shows the expected hit->miss->fetch cycle, and then fetches from the backend with AE:gzip, and backend gzips for us [12:21:10] - RespHeader Content-Encoding: gzip [12:21:18] - RespHeader Via: 1.1 varnish-v4 [12:21:18] - RespHeader X-Cache: cp1061 hit+miss(0) [12:21:19] - RespHeader Content-Length: 1829 [12:21:24] - RespHeader Age: 0 [12:21:51] those are the initial respheader in the 3010 client-facing varnishlog, then: [12:21:54] - VCL_call DELIVER [12:22:08] - RespUnset X-Cache: cp1061 hit+miss(0) [12:22:08] - RespHeader X-Cache: cp1061 hit+miss(0), cp3010 hit+miss(0) [12:22:08] - VCL_return deliver [12:22:08] - Timestamp Process: 1463055557.951748 0.084348 0.000031 [12:22:08] - RespUnset Content-Encoding: gzip [12:22:11] - RespHeader Accept-Ranges: bytes [12:22:13] - RespUnset Content-Length: 1829 [12:22:16] - RespHeader Content-Length: 6304 [12:22:18] - Debug "RES_MODE 42" [12:22:21] - RespHeader Connection: keep-alive [12:22:23] - Gzip U D - 0 0 0 0 0 [12:22:26] - Timestamp Resp: 1463055557.951792 0.084392 0.000044 [12:22:28] - ReqAcct 205 0 205 427 0 427 [12:22:57] so.... it found an 1829 byte gzipped object (ok), it gunzipped it for the client at deliver-time (ok), rewrite content-length to 6304 (ok), the the Gzip line says gzip generated zero bytes of output (wtf) [12:24:02] if I do the same right after with --compressed: [12:24:04] - VCL_call DELIVER [12:24:04] - RespUnset X-Cache: cp1061 hit(1) [12:24:04] - RespHeader X-Cache: cp1061 hit(1), cp3010 hit+miss(0) [12:24:04] - VCL_return deliver [12:24:06] - Timestamp Process: 1463055807.020175 0.083673 0.000057 [12:24:09] - RespHeader Accept-Ranges: bytes [12:24:11] - RespUnset Content-Length: 1829 [12:24:13] - RespHeader Content-Length: 0 [12:24:16] - Debug "RES_MODE 2" [12:24:18] - RespHeader Connection: keep-alive [12:24:21] - Timestamp Resp: 1463055807.020196 0.083693 0.000021 [12:24:23] - ReqAcct 237 0 237 454 0 454 [12:24:26] there's no gzip transform at deliver time, yet we see it rewrite content-length to zero [12:25:59] trying again through cp3008 frontend while still tracing cp3010:3128 [12:26:18] it's a sucess again, but the 3008->3010 req gave a 304 not modified, it didn't have to transfer any bytes anyways. [12:27:29] because cp3008 had an expired object laying around, so it asked: [12:27:29] - ReqHeader If-Modified-Since: Sat, 07 May 2016 20:29:34 GMT [12:27:30] - ReqHeader If-None-Match: W/"572e502e-18a0" [12:29:38] ok so that explains why hit+miss tends to work better than a pure miss [12:29:57] and how it's working around 3010's lack of output. it doesn't need output, it gets a 304 [12:30:55] so I wiped cp3008's frontend (varnish-frontend restart), fetch from 3008 again while tracing 3010:3128's connection to its own backend [12:31:17] curl fails with CL:0, and it's < X-Cache: cp1061 hit+miss(0), cp3010 hit+miss(0), cp3008 frontend miss(0) [12:31:38] 3010's request to cp1061 had IMS/INM and got a 304 response [12:33:45] sooooo.... [12:34:35] I can't explain the gzip connection yet, there may not really be one other than gzip and ungzipped outputs are going to look different but both be broken [12:34:49] but it seems a lot like 304's are the problem here [12:36:50] client (no IMS/INM) asks varnishd for x -> varnishd asks other varnishd for x using IMS/INM headers based on (possibly expired) object -> gets 304 response which naturally has no content-length, which means it can use its cached content -> returns zero bytes to client (either because the object the storage should come from is expire, or more likely just passing through the zero-byte-ness of the 3 [12:36:56] 04 response) [12:41:35] this from 4.1 is possibly-related: https://github.com/varnishcache/varnish-cache/commit/d828a042b3fc2c2b4f1fea83021f0d5508649e50 [12:41:46] (from 4.1 branch I mean, after our version) [12:41:52] ok [12:42:11] perhaps the affected headers include Content-Length somehow [12:42:28] so the object gets created properly but varnish returns a 0-length response [12:42:38] well, not quite [12:42:50] what I'm seeing most-recently is more like this: [12:43:50] when varnish has an expired object in storage and has to fetch from backend (another varnish), it sends IMS/INM headers based on its expired object (ok), the other varnish replies with 304 Not Modified (ok), and then the requesting varnish replies to the client with 0-length 200 OK response [12:44:05] I don't think it's creating an object, I think it's re-using the expired one on 304 [12:44:26] which is fine if it will send the contents too :P [12:45:03] does this only happen between varnishes or also with backends? [12:45:09] probably the 0-length of the 304 from upstream is causing the 0-length when returning 200 OK to the client [12:45:29] I think only between varnishes, because when backend-most in eqiad also has an expired object, it still gives me full output [12:45:44] but maybe that's because the applayer backend doesn't do 304s in this case? I haven't checked. [12:47:08] somehow then it gets confused and returns a 0-length response because the 304 is 0-length, instead of returning the cached item [12:47:17] ? [12:47:30] yeah, I think [12:47:37] or something related is happening anyways [12:47:43] which gets back to looking at 304/IMS-ish patches [12:47:58] the expire object it's trying to reuse has a stored length [12:48:05] the 304 has a different (zero) length [12:48:08] https://varnish-cache.org/trac/ticket/1858 and https://github.com/varnishcache/varnish-cache/commit/d828a042b3fc2c2b4f1fea83021f0d5508649e50 [12:48:21] the patch linked about is about duplicate headers on 304/IMS stuff... [12:48:25] and how they're merged [12:48:34] yes, that one :) [12:49:35] I'm still trying to confirm varnish<->varnish vs varnish<->app heh [12:52:41] confirmed: for style.css test queries, even when handed a valid IMS/INM, applayer nginx still sends 200 OK [12:52:59] so it's varnish<->varnish [12:53:05] even though the ETag on the fresh 200 matches the ETag asked for in INM [12:53:17] well it's varnish<->varnish because varnish is supported 304 on INM correctly [12:53:30] nginx is ignoring its opportunity to respond with 304 and giving a pointless full 200 [12:53:49] which is not good but might be helpful to pinpoint the issue :) [12:53:58] yeah [12:54:07] and explains why I never see bad output from the backend-most varnish [12:54:17] only from the intermediate one and/or frontend [12:54:47] I wonder if there's something dumb in our shared VCL that can affect the 304 case, too [12:55:49] there is a beresp.was_304 VCL variable which could perhaps be useful for logging/debugging if needs be [12:55:52] https://github.com/varnishcache/varnish-cache/commit/c17c701b6ecf89a331d1c495991f6e2997d31a12 [12:56:00] FTR: our shared VCL does unconditionally: set beresp.grace = 60m; [12:56:27]