[10:28:15] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10ema) I've downgraded Varnish to version 5.2.1-1wm1 on cp3054 and had to revert due to an issue with varnishlog. After t... [12:38:27] 10Traffic, 10Operations, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10ema) >>! In T264398#6675887, @ema wrote: > It's likely that 5.2.1 is affected by T264074, we need to backport [[ https://gerrit.wikimedia.org... [14:55:52] https://blog.cloudflare.com/oblivious-dns/ [14:56:29] basically adding a proxy layer and request-encryption inside of DoH, so that no single 3rd party sees both the client IP and the named being queried [14:57:33] either way we still want to do basic DoH first, then look at things like this :) [15:15:45] bblack: geodns can't be done this way, right? [15:16:47] unless there are mechanisms to forward some sort of geo-based information to the resolver without disclosing the client IP, that is [15:26:16] well I think CF's assumption/hope is that proxies will be close-enough to approximate [15:28:07] in any case, we should someday fix this all up on our end as well with alt-svc [15:28:51] I'd love to put some concerted effort/time in our official plans to make user->edge geotargetting better. There's multiple viable ideas that compound on each other to improve the situation there. [15:30:25] one is to make the existing geodns stuff more-pluggable/flexible/smart than it is today, which enables things like ASN-level overrides and others, and datacenter load weighting and smooth ramp-in/out [15:31:35] and another is to use generic client-side JS or NEL (or other similar mechanisms) to gather our own sampled RUM from $user_networks->$all_edges and use it to override the geographic-based defaults for networks where we get decent data [15:32:13] and another is to also pull this same data and decision making into the FE caches to enable them to make the same IP-based decisions DNS does. [15:32:34] since they can always see the client IP, while DNS can only sometimes see it. and redirect with Alt-Svc [15:33:14] all of those efforts can proceed in parallel with each other really, but their results have synergies [15:36:49] another interesting idea is to have a secondary mapping for logged-in sessions [15:37:52] where we take our geo-estimates or our acquired RUM data, and append the edge->core latency to the calculation as well. We can't/shouldn't use this in geodns, but for Alt-Svc we could decide that once they have a Session and become uncacheable, they should be using a different DC. [15:38:31] (e.g. some network in australia may get better cache hit latency to eqsin than ulsfo, but an uncacheable session goes faster via ulsfo->core) [15:40:07] there's a whole world of things we could be doing better in this space, but we're preoccupied with all our other problems :) [15:43:11] this to me looks like reinventing single-hop Tor :) (albeit with differences in the encryption between the "nodes") [15:43:21] CF says the reason for doing this is "reliability" and "logging", while the draft says "logging" [15:43:52] "clients cannot send DNS queries and receive answers from servers without revealing their local IP address, and thus information about the identity or location of the client." -> in this case, a simpler solution that currently works for us that we don't log and hence cannot make the correlation [15:44:10] I am not dismissing this but it's not a problem unless you make it one :) [15:44:59] 10Domains, 10Traffic, 10Operations: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10RBrounley_WMF) [15:45:10] if the concern is an on-path observer can observe the DoH encrypted traffic and try to infer the domain from it, that's another thing. but I also feel that if you are introducing a proxy at this stage, might as well use a VPN to take care of the SNI and the IP address as well [15:46:22] well, to take their side for a momentm the crux of their argument for ODoH necessity is: [15:46:28] "Cloudflare is committed to end-user privacy. Users of our public DNS resolver service are protected by a strong, audited privacy policy. However, for some, trusting Cloudflare with sensitive query information is a barrier to adoption, even with such a strong privacy policy." [15:46:48] basically "some people will never trust us, so this is a way for them to use us without having to trust us" [15:47:05] right, I think they are trying to alleviate concerns people have around their service specifically [15:48:09] honestly, SNI should never have existed in the first place [15:48:55] if nobody had bothered to specify and implement SNI, we would've just run out of IPv4 faster and we'd be much further along the v6 adoption curve, and v6 has enough server-side IPs that it doesn't have the problem SNI addresses. [15:49:42] pragmatic hacks usually win, though :) [15:50:52] the less charitable way to state that is "short-term thinking wins arguments" [15:52:09] that's a valid restatement, and also in practice pretty true [15:53:01] but other than the no SNI, faster IPv6 adoption, there is something else that really bothers me about this entire thing. it's a problem they created and now they are having another layer on top to solve it [15:53:12] yeah [15:53:24] I am not trying to criticize CF here but they knew that the DoH thing would centralize requests. that wasn't an unknown for them [15:53:42] they could have simply said: we won't log _anything_, not even for 24-hours. and then this creates a culture where no one logs anything, unless they really need to [15:53:45] much like ECH is another layer to fix the SNI problem we've created, which leads to the same result (IP-based filtering for censorship) in the end, just via a longer and more complex historical route [15:53:46] I realize that it's easier said than done [15:54:11] yeah [15:54:29] sukhe: they can't legally say that [15:54:51] although to be fair, the scenarios are slightly different. in the no-SNI alternate reality, every site would have a distinct IPv6 and IP censorsing would be "easy" for all sites [15:54:54] paravoid: yeah I know it's tricky. it's mostly wishful thinking on my part [15:55:11] law enforcement & court system can just ask them to log, and they can contest it in court, but if they lose they have to do it [15:55:18] in the SNI->ECH world, we'll still have the case that many common sites will share cloud service IPs and be hard to disambiguate, which helps prevent the censorship via fate-sharing [15:56:24] and so in the no-SNI alternate really, we'd still probably end up inventing some way to use some crypto key to share a proxy IP between many sites, not unlike ECH with a split-mode. [15:58:00] paravoid: yeah, that's one of the upsides of the ODoH mechanism I guess. If it's a new way of doing things that makes logging and analysis much harder even under court order. [15:58:19] but they'd probably just compel both the proxy and the doh server and get logs from both and correlate on time and packet size, etc [15:58:55] maybe if they're in different jurisdictions from each other, it helps more [15:59:38] right, in the case of Tor, one of the reasons that timing attacks become more difficult is that the guard (first) and the exit (last) may be in different jurisdictions and mostly are [15:59:39] someone wise said that "it looks like reinventing single-hop Tor" :P [15:59:56] bblack: I think in the case of CF, being an American company US law applies [16:00:22] regardless of where the servers are [16:00:38] yeah, and the idea that perf and geodns are preserved by using "close" proxies doesn't jive well with hoping for jurisdictional separation in many cases. [16:04:34] but back to first principles, I think pushing for de-centralization is generally a win. It's much easier for all of these things to be compromised when only a few large companies' cooperation are needed to compromise most of everything. [16:09:43] there certainly is value in the criticism of DoH that it centralizes the requests that were otherwise decentralized to various ISPs around the world. but at least for a given individual they are centralized to their ISPs and that's where the problem lies in many cases :) [16:10:01] I think it will be good if Firefox and others have multiple DoH providers they support and that they can randomly cycle between for requests [16:10:48] or at least, that there are multiple providers other than CF, or those that can match the performance 1.1.1.1 provides, which is pretty great. at the end of the day, unless users are really looking for privacy and all that, they will default to whatever works the fastest for them [16:12:26] https://blog.cloudflare.com/encrypted-client-hello/ another blog post from them today [16:16:54] yeah it's a pretty decent primer on all related things! [16:26:47] the SVCB stuff is mentioned in passing too, which is probably going to become a big thing [16:27:18] https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https/ [16:27:41] basically a new dns query intended for widepsread adoption for browser traffic [16:28:32] it's kina like SRV on steroids + ECH info [16:28:45] (and future flexibility) [16:33:04] thanks! (will read the SVCB stuff) [16:44:40] 10Domains, 10Traffic, 10Okapi, 10Operations: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10Nintendofan885) [17:49:51] 10Domains, 10Traffic, 10Okapi, 10Operations: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10Reedy) I'm presuming these aren't going to be MediaWiki wiks underneath etc? Where do these domains want to point? While they can be "parked", there's not a great deal to do until there's something to... [20:56:41] hi, quick question wrt https://gerrit.wikimedia.org/r/c/operations/dns/+/643983 - looks like maxmind DB has been updated today, so I assume there's nothing more left other than waiting for cron, right? [21:07:26] yeah :) [21:07:52] I believe we now poll files from maxmind daily as well? [21:08:11] https://github.com/wikimedia/puppet/blob/production/modules/geoip/manifests/data/maxmind.pp#L78 [21:08:13] looks like it [21:08:36] I might be still awake when that runs :P [21:14:09] hilariously back in 2018 the problem was exactly reversed - RES was geolocating wrongly and SJC was correct, so that was fixed, then somehow SJC got all messed up... in any case, hopefully it should be good from now on [21:14:47] I'll submit a patch to remove that old 2018 workaround once I can verify that things seem OK after the update [21:15:09] thanks :) [21:15:25] geoip mapping is ... there's a lot of things on the internet where you wonder how it ever works at all [21:15:29] and wow is geoip mapping on that list [21:16:26] yeah :)