[01:25:41] 10netops, 06Discovery, 06Operations, 03Discovery-Search-Sprint: deploy elasticsearch/plugins to relforge1001-1002 servers - https://phabricator.wikimedia.org/T141085#2486473 (10Dzahn) renaming the network broke ferm on neon (icinga) -> T141957 [06:35:05] 10Traffic, 10MediaWiki-extensions-UrlShortener, 06Operations: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2518039 (10Legoktm) 05Resolved>03Open @BBlack this doesn't seem to be working yet? https://w.wiki/?search=X still shows search results...and bblack: confirmed, swift responds with 206 to valid Range requests [09:23:27] 10netops, 10Datasets-General-or-Unknown, 06Operations: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2518282 (10Nemo_bis) Tried another: {P3628} [10:05:40] bblack: yesterday I interrupted you while you were saying: [10:05:41] 19:30 < bblack> maybe *that's* the missing magic our 3.x-plus had that v4 doesn't have. [10:05:44] 19:31 < bblack> if that turns out to be the case, we might want to slightly-alter our VCL on the backends more like: [10:09:39] 10Traffic, 10Varnish, 06Operations, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2518408 (10ema) On the swift side of things: - Range requests are correctly handled, swift responds with 206 Partial Content - Swift always sends CL, which removes the ne... [10:11:49] 10Wikimedia-Apache-configuration, 06Operations, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2518412 (10elukey) After a long chat with upstream we decided not to go ahead w... [10:37:09] 10Traffic, 10Varnish, 06Operations, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2518467 (10ema) Interestingly the second Range request mentioned in [[https://phabricator.wikimedia.org/T131502#2515835 | my previous comment ]] does *not* stall on varni... [12:40:57] ema: who knows, I probably would've changed my mind again anyways :) [12:41:16] :) [12:42:36] yeah so the missing-magic bit isn't ideal. [12:43:01] but we can probably live without it, perhaps setting our ignore/pass tunables a little lower to compensate [12:43:11] eventually the large objects do make it into caches [12:43:28] (the ones that fit and are hot, anyways) [12:44:06] still, it might be worth peeking again at the old v3plus code, and seeing if the concept is or isn't relatively easy to bring into v4 code. [12:45:05] agreed [12:45:46] 10Traffic, 10MediaWiki-extensions-UrlShortener, 06Operations: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2518733 (10BBlack) Yeah I checked as well, and it doesn't work. Most likely there simple patch is subtly broken... [13:04:28] 10Traffic, 10Varnish, 06Operations, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2518759 (10mark) >>! In T131502#2518467, @ema wrote: > Interestingly the second Range request mentioned in [[https://phabricator.wikimedia.org/T131502#2515835 | my previo... [13:05:48] that surprises me [13:05:57] didn't they rewrite most of that code for v4? [13:06:45] or actually [13:06:50] wasn't that the patch that I wrote for varnish-plus [13:07:04] there was stock varnish, and there was a varnish patch to add streaming support [13:07:08] but then it didn't work with range requests [13:07:11] so I added that, I think? [13:09:07] mark: https://github.com/wikimedia/operations-debs-varnish/tree/3.0.6-plus-wm/debian/patches [13:09:35] those are the patch we apply to 3.0.6-plus [13:09:50] i think they were incorporated into -plus though [13:09:57] oh I see [13:10:13] http://mediawiki-commits.wikimedia.narkive.com/2fhrHZRK/gerrit-add-range-support-to-varnish-in-streaming-mode-change-operations-varnish-patches-streaming [13:10:55] sad if that needed to be redone for v4 [13:11:02] but perhaps you can use it as inspiration [13:11:03] it might be that 4.x-plus includes those changes [13:11:08] yes [13:11:33] but that code I wrote is GPL [13:12:05] oh really :) [13:21:20] would be good to find an old varnish3-plus tarball to reconfirm that [13:21:39] but that was my understanding at the time anyway, before they changed it and locked it up [13:22:10] but, we already had varnish deployed on upload (instead of squid) and hit such issues in production [13:22:24] so I wrote that patch and deployed that patch within a couple of days I think to alleviate it [13:22:34] I remember that! [13:22:53] i remember using a lot of metrics meeting videos (1 GB+) to test it [13:22:58] yes [13:22:59] i've seen erik moeller work around with his mug a LOT [13:30:06] https://info.varnish-software.com/blog/caching-partial-objects-varnish [13:30:14] that suggests varnish4 should support range requests streaming [13:31:42] > Varnish 4.0 will by default go to the backend and fetch the entire object, and when the part that the client asked for arrives, serve that as fast as it comes in. [13:32:01] this is true, what it doesn't say is what happens if another client arrives before the whole object is cached [13:32:20] hmm [13:33:34] so it does enter a queue for that object then [13:33:44] perhaps you can simply break that in that case and convert it into a miss of the range request? [13:42:51] uh! I think there might be something wrong in our VCL actually [13:43:43] I've tried the same test (two successive requests for slightly different ranges) with default vcl and it works as expected [13:43:57] without stalling, that is [13:53:03] well that would be good :) [14:14:59] ema: yeah maybe re-test all our assumptions on stock vcl, since upload vcl is designed to work around related things. [14:15:31] confirmed, our frontend VCL is doing something funky there [14:15:51] well our frontends intentionally return (pass) on all Range requests (and copy the Range header through) [14:16:18] the logic there is heavy range usage is mostly large objects, FE storage is small, and there's a network-local backend to take advantage of, etc... [14:16:30] most of this is about what happens with our backends [14:22:06] 2.5k lines of vcl [14:27:32] 2811 [14:28:09] plus all the templating [14:28:19] this was wc-l on a random upload cache :) [14:28:26] ah [14:28:35] 2811 of erb vcl templates all total [14:28:49] some of the caches have lots of unused files in /etc/varnish/ though, because of renaming [14:28:59] yeah I thought I saw some duplication [14:29:06] (renaming vcl file outputs, and also switching a machine from one cluster to another without reinstall) [14:29:31] tsk tsk tsk, when will we run varnish in kubernetes? ;p [14:29:48] behind its default port forwarder/load balancer of course [14:29:55] commenting out 'call upload_common_set_xrange; return (pass);' in cluster_fe_miss fixes the problem heh [14:30:57] i would remove all the varnish 3 related range vcl hacks as varnish4 works differently anyway [14:31:15] does it still have do_stream etc? [14:31:30] yes but it should be true by default [14:31:49] do_stream still exists, but streaming lacks all the caveats it had in varnish3, so it's on-by-default and you can turn it off if you want. [14:31:55] right [14:32:03] the only good reason we've found to turn it off is content-length issues [14:32:26] (as in, if the applayer doesn't send CL, but we'd really prefer the upper layers get a CL, we can turn off streaming at the backend-most so that varnish will create a CL) [14:32:33] which we don't care about in cache_upload because swift returns CL as a good citizen [14:33:10] mark: so far evidence is we'll still need at least some of our existing range hacks [14:33:20] i'm sure there will be some [14:33:33] varnish still doesn't handle a ranged response AFAIK, for instance [14:33:50] so a true cache miss on a high-range, we'll still want to (pass) and pass Range through [14:34:21] you can get into quite a few distinct cases to handle [14:34:28] yeah [14:35:42] All tests successful. [14:35:42] Files=81, Tests=446, 53 wallclock secs ( 0.86 usr 0.19 sys + 26.53 cusr 2.00 csys = 29.58 CPU) [14:35:45] Result: PASS [14:35:56] ^ openssl-1.1.0 patch for chapoly draft mode :) [14:36:07] :-) [14:36:08] nice [14:36:10] I really need to test interop with the cloudflare patch before thinking about pushing it up though [14:36:25] writing crypto code is scary :P [14:36:34] s/writing/minorly patching/ [15:23:57] yeah [15:24:06] I can imagine that being scary :) [15:25:04] the way that cloudflare factored their chapoly code, they made it extremely trivial (3x little if conditions covering a handful of lines of C) to re-use the same code for draft and RFC modes [15:25:26] the openssl-1.1.0 chapoly code was factored with that in mind, so sharing it and making it switch modes isn't quite as trivial :( [15:25:32] but it's not awful, just not trivial [15:25:43] s/code was factored/code wasn't factored/ [15:27:05] anyways, the best outcome is deploying the cloudflare patch teaches that the draft-mode traffic isn't enough to care about anyways, (either now, or by the time we want 1.1.x deployed) [16:08:09] in the very limited/tiny data set of "a few minutes of cache_maps + cache_misc traffic", draft-mode has ~half the traffic that RFC mode has. [16:08:28] no idea on how either fares as a percentage of total traffic until upload+text though [16:32:25] on initialy deployment of the chapoly stuff to text+upload (all clusters now), the initial point-in-time stats changes look like: [16:33:11] before: ecdhe-ecdsa-aes128-gcm-sha256 ~83%, ecdhe-rsa-aes128-gcm-sha256: ~1% [16:35:03] after: 61% and ~0.1% for the above. the drop moved to these chapoly variants: ecdsa-rfc: ~18%, ecdsa-draft ~%4, rsa-rfc ~1%, rsa-draft ~0.1% [16:35:21] so far, zero usage of the DHE (as opposed to ECDHE) versions of either draft or rfc chapoly [16:36:31] long term we want to kill all the DHE ciphers anyways, but that's far off. it's tempting to eliminate more of them now early while their stats are near-zero, but we might find their usage increases just before or after we eventually dump the remaining non-forward-secret options. [16:36:46] so basically it's best not to kill viable DHE options until after the forward-secret-only transition sometime off in the future. [16:38:36] (the only except I made on that was the dhe+aes256 options, they're categorically different than the other virtually-unused cases, because they always implement aes128 as well, and a policy decision to not support aes128 can/should include a policy decision to use better ciphers than dhe+non-aead anyways :P) [16:40:16] anyways, so now chapoly rfc-mode using ecdsa key is our second-most-popular cipher :) [16:40:33] (behind ecdsa-keyed aes128-gcm) [16:41:46] and whenever we switch to 1.1.0, very modern clients will be able to do that using x25519 for the ECDHE part as well, essentially freeing the transaction from all historically NIST-tainted algorithms and being all-djb [16:42:16] (except for our NIST-256 ECDSA certificate, but that's just used to authenticate we're wikimedia, not to exchange session keys or encrypt traffic) [16:47:32] 10Traffic, 06Operations: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2519626 (10BBlack) Quoting myself from IRC: ``` 16:32 < bblack> on initialy deployment of the chapoly stuff to text+upload (all clusters now), the initial point-in-time stats changes look like: 16:33... [16:47:46] 16:32 < bblack> on initialy deployment of the chapoly stuff to text+upload (all clusters now), the initial point-in-time stats changes look like: [16:47:49] 16:33 < bblack> before: ecdhe-ecdsa-aes128-gcm-sha256 ~83%, ecdhe-rsa-aes128-gcm-sha256: ~1% [16:47:52] 16:35 < bblack> after: 61% and ~0.1% for the above. the drop moved to these chapoly variants: ecdsa-rfc: ~18%, ecdsa-draft ~%4, rsa-rfc ~1%, rsa-draft ~0.1% [16:47:55] 16:35 < bblack> so far, zero usage of the DHE (as opposed to ECDHE) versions of either draft or rfc chapoly [16:47:58] oops [16:48:01] this: [16:48:03] https://phab.wmfusercontent.org/file/data/5vxwz3fj46dq62ilqv6m/PHID-FILE-qbxb6ofg73leynbgvyh2/2016-08-03-164516_1864x742_scrot.png [16:55:31] another way to think of draft-vs-rfc stats: of the total client population that supports either, about 20% are stuck on draft-mode right now [16:55:40] but stats will be better once we have a few days of history, too. [17:07:55] maybe a little lower, more like 16-18% ish. again, better data after a few days. I added a chapoly draft% graph to the grafana tls-ciphers stuff to keep track. [17:22:51] no notable cpu bump on our end, either [17:24:37] also notable expected confirmation: apparently every client that does chapoly also already did aes-gcm. so there's no fundamental categorical shift in either direction. [17:41:17] ok today I thought I found the cause for the Range issue 1k times but I actually haven't :/ [17:44:00] eg: I've tried running the backend varnishd with default.vcl and couldn't reproduce the issue, but only because I was using -smalloc,256m [17:44:34] with default.vcl and -sdeprecated_persistent / -sfile the problem is reproducible [17:50:15] ok [17:50:41] so, it still stalls a second low-range (or identical low-range) request while fetching the whole big object for the first low-range request, right? [17:50:57] right [17:51:30] and it doesn't stall by commenting out the Range-header hack in the frontend as mentioned earlier [17:51:35] another important detail: when it does that, does it consider it a miss or a hit? (basically, is still ok to wait to deal with the stalling in VCL until vcl_miss time?) [17:52:22] it's a hit [17:53:02] hmmm ok [17:53:08] I guess hit is more categorically-correct [17:53:38] so what our VCL does today (on the backends) is: [17:54:11] 1. If it's a high-byte range request, set hash_ignore_busy in vcl_recv (now it will never stall on anything, even if that means firing off excess redundant backend fetches) [17:54:44] 2. If it's a high-byte range request and it's a 'miss', in vcl_miss we also return (pass) and pass on the Range: header to the next backend. [17:54:58] yep [17:55:31] those make sense, but they don't really account for the huge stall on a concurrent low-range request [17:56:11] as I said before, lowering the "high byte range" cutoff would help alleviate it a little in practice, maybe [17:57:05] it might be easier to gauge the relative importance of fixing various behaviors if we had some idea about range stats [17:57:51] as in, what % of upload requests are Range: requests to begin with (carefully excluding stupid-range requests like "Range: bytes=0-") [17:58:08] and what % of those are high-range requests beyond some bytes cutoff (say existing 32MB cutoff) [17:58:11] yep. Also we might want to study whether it makes sense to cache the 206 responses themselves [17:58:31] I'm not sure varnish is capable of usefully caching a 206 from the backend [17:58:48] if it is, there's some question about how it does that (as a separate partial object, a partially-satisified object?) [17:59:08] I think it can: https://info.varnish-software.com/blog/caching-partial-objects-varnish [18:00:13] yeah ok [18:00:46] so in that approach, they're hacking around all of stock varnish's prevention of that kind of thing, and then also explicit hashing on the Range: part to split the cache. [18:01:03] which means the cache would separately handle the whole-file case and each range any client asks for [18:01:41] if one client asks for range 0-4K, one asks for 0-1K and one asks for 3K-4K, those are all separate objects, and separate from any cached copy of the whole object. [18:02:19] but they're probably right that in the large-scale averages, there are probably only a handful of video buffering behaviors (how many bytes fetched per chunk) [18:02:33] and a lot of people are only going to play the first bit of a video, or the deeplinked most-interesting bit. [18:03:17] if we assume there's, say 3 popular buffering strategies for chunking up range, at worst we cache 4 copies of the whole video if it's very popular. [18:03:22] and it does avoid all the stalling problems. [18:04:02] it would still be interesting to understand the stalling behavior, but it might really be important to take a look at the current Range requests and see how they look like first [18:04:03] and we can optimize a bit by capping range-object TTLs lower than normal ones, too [18:04:11] agreed [18:05:24] a bonus point would be that if any range on the object misses, some way to fire off a whole-object fetch independently of all the clients. [18:05:55] so that eventually the whole object gets backend-cached and starts satisfying the range requests (if we only do the range-object-caching crap on misses to the whole object somehow) [18:07:06] actually that bonus point might be premature and misguided even if possible [18:08:02] still, we can improve on the blog's VCL substantially I imagine. at least filter out the fake-range requests that are really for all bytes [18:09:05] with the blog approach we could in theory let the frontend cache small ranged-objects, too, but there's a chance of them overwhelming storage of normal small files. we could split storage though, like we do for large/small objects in the upload backends... [18:10:13] lots of options! [18:10:46] if only there was a single elegant software solution that handled this correctly out of the box [18:10:54] oh wait, there is, but we can't use it :P [18:11:11] :) [18:13:12] mmh, if we start caching partial objects though, would a Range request for a full object previously cached considered a cache hit? I guess not, given that Range is added to the hash [18:13:23] right [18:13:42] unless we pull some crazy stunt involving detecting hit/miss before deciding on which hashing behavior to use [18:14:02] (which would involve a frontend-side request-restart in one of the two cases, in varnish4's fe/be split model) [18:14:10] /o\ [18:15:09] basically we'd start with ignoring Range on the intial request for hashing purposes, and see if it hits (on a whole-object, which varnish can do). if it didn't hit, we'd restart the request with a reqheader flag changing VCL behavior the second time through and hashing on the Range:. [18:15:34] sounds complicated though, it'd have to really be worth it [18:16:07] yeah given the amount of complexity in our vcl I'd do that only in case of absolute necessity :) [18:16:40] anyways, that approach probably still fails on the stalling crap. it would hit a half-filled whole object and stall. [18:17:02] (unless we hash_ignore_busy, which causes redudant fetch, which isn't awesome either) [18:17:34] I guess we could hash_ignore_busy only on the first whole object cache check, and then not do so after? but it's getting crazy-complex at that point. [18:18:04] tomorrow morning I'll give another shot at the stalling issue and take a look at the Range patterns we get at the moment [18:18:14] sounds like a good plan [18:18:28] nice find on that blog article too :) [18:18:39] I wonder if that approach works in V3 as well [18:18:47] (hash on range and use 206 from backend) [18:19:00] not terribly relevant at this point, but interesting to know [18:19:22] yeah, no clue :) [18:25:50] (I've started doubting everything today, including the storage backend) [18:26:45] if you don't end up doubting everything routinely you're doing something wrong :) [18:30:46] at least the shell is always true to me! :) [18:31:21] I'd like to test the stalling issue on a real machine in prod (depooled obviously). Would that be ok? [18:31:44] yeah [18:31:56] be sure to depool fe+be [18:32:02] +nginx, the whole thing [18:32:20] will do [18:34:11] on my labs instance I have 2G persistent storage files and the object I'm serving is 1G. It would be nice to test with decent storage capacity :) [18:35:41] see you tomorrow o/ [18:39:24] cya :) [19:12:22] 10Traffic, 10MediaWiki-extensions-UrlShortener, 06Operations: Strip query parameters from w.wiki domain - https://phabricator.wikimedia.org/T141170#2520208 (10Legoktm) 05Open>03Resolved Thanks! It works now :) [20:20:17] 10Traffic, 06Operations: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2520524 (10Krinkle) [20:40:31] 10Traffic, 06Operations: SSL certificate for policy.wikimedia.org - https://phabricator.wikimedia.org/T110197#2520562 (10RobH) [21:06:12] 10Traffic, 06Operations, 06Wikipedia-iOS-App-Backlog: Wikipedia app hits loads.php on bits.wikimedia.org - https://phabricator.wikimedia.org/T132969#2215676 (10BBlack) >>! In T132969#2429719, @Fjalapeno wrote: > @Krinkle the version of the iOS app that made those requests is a legacy version - the iOS app no... [21:18:25] 10Traffic, 06Operations, 06Wikipedia-iOS-App-Backlog: Wikipedia app hits loads.php on bits.wikimedia.org - https://phabricator.wikimedia.org/T132969#2520673 (10Mholloway) @BBlack, there's no policy on supporting un-upgraded versions of which I'm aware (but I'll add @Dbrant as product owner here for comment).... [21:19:52] 10Traffic, 06Operations, 07Mobile, 13Patch-For-Review: Replace bits URL in Firefox app, if possible - https://phabricator.wikimedia.org/T98373#2520676 (10Krinkle) [21:19:54] 10Traffic, 06Operations: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2520675 (10Krinkle) [22:03:55] 10Traffic, 06Operations, 05WMF-deploy-2016-08-09_(1.28.0-wmf.14): Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#1494832 (10Krinkle) The Commons app for Android (previously by Wikimedia, now community-maintained) also uses `bits.wikimedia.org/event.gif` still. Fix pending at 10Traffic, 06Operations, 05WMF-deploy-2016-08-09_(1.28.0-wmf.14): Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2521183 (10Tbayer) >>! In T107430#2184485, @BBlack wrote: > We need to start making progress on this again and kill cruft at some point... > > bits.wikimedia.org st...