[09:20:47] 10Traffic, 10Operations, 10RESTBase: envoy overwrites the server header - https://phabricator.wikimedia.org/T238050 (10Joe) [10:04:46] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10ema) As a way to identify more specifically where the TTFB regression comes from, in particular to understand precisely how much ats-be co... [10:18:58] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10Gilles) Could you generate a separate stats table for misses and passthroughs? [10:25:07] 10Traffic, 10Operations, 10Patch-For-Review: Traffic Server packaging and initial puppetization - https://phabricator.wikimedia.org/T200178 (10hashar) > pristine-tar: delta is version 3, newer than maximum supported version 2 @ema the CI debian-glue jobs are now running on Buster instances and thus come wit... [10:25:24] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10ema) >>! In T238494#5717652, @Gilles wrote: > Could you generate a separate stats table for misses and passthroughs? Certainly. Non-hits... [10:27:22] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10Gilles) I meant specifically misses (ATS/Varnish did a lookup a didn't find the object) vs passthroughs (ATS/Varnish merely acted as a pro... [10:38:35] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10ema) >>! In T238494#5717657, @Gilles wrote: > I meant specifically misses (ATS/Varnish did a lookup a didn't find the object) vs passthrou... [10:41:51] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10Gilles) Yes, from a cache application perspective they are different tasks and therefore the issues affecting each could have different ca... [13:23:46] 10Traffic, 10DNS, 10Operations, 10Patch-For-Review: Internal DNS resolver responds with NXDOMAIN for localhost AAAA - https://phabricator.wikimedia.org/T125170 (10BBlack) 05Open→03Resolved a:03BBlack I'm not sure how long it's been fixed in our infra, but it definitely works correctly now in our new... [13:52:01] 10Traffic, 10Operations: Implement machine-local forwarding DNS caches - https://phabricator.wikimedia.org/T171498 (10BBlack) In these past couple of weeks we've had a real about-face on this issue, and I think there's a pretty strong consensus and rationale to pursue some kind of host-level caching, but there... [14:00:49] 10Traffic, 10Operations: Decom LVS recdns - https://phabricator.wikimedia.org/T239993 (10BBlack) [14:01:11] 10Traffic, 10Operations, 10Pybal: DNS recursors TCP retransmits - https://phabricator.wikimedia.org/T211131 (10BBlack) 05Open→03Declined These are still present AFAIK, and we're fairly certain it's just due to pybal healthchecks using blank/broken TCP connections to monitor them. That will be cleaned up... [14:02:46] 10Traffic, 10Operations: Make authdns-update compatible with local emergency changes - https://phabricator.wikimedia.org/T219400 (10BBlack) Sorry I hadn't remember we had this existing ticket. Will merge into the other newer one since it has patches already and some deeper context, and copy the main text over. [14:03:53] 10Traffic, 10Operations: Make DNS operations resilient against predictable failures - https://phabricator.wikimedia.org/T239711 (10BBlack) [14:03:55] 10Traffic, 10Operations: Make authdns-update compatible with local emergency changes - https://phabricator.wikimedia.org/T219400 (10BBlack) [14:04:29] 10Traffic, 10Operations: Make DNS operations resilient against predictable failures - https://phabricator.wikimedia.org/T239711 (10BBlack) Thoughts from the main text of the merged ticket: ------------ We should improve our current [1] support of deploying an emergency DNS change when other dependent services... [14:06:26] 10Traffic, 10DNS, 10Operations, 10SRE-tools: Include zone+subnet checks for DNS validation - https://phabricator.wikimedia.org/T238727 (10BBlack) 05Open→03Declined Declined in favor of netbox integration ( T233183 ? ) making this problem go away. [14:08:28] 10Traffic, 10DNS, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Services (watching): icinga alerts on nodejs services when a recdns server is depooled - https://phabricator.wikimedia.org/T162818 (10BBlack) [14:09:03] 10Traffic, 10DNS, 10Operations, 10serviceops, and 2 others: icinga alerts on nodejs services when a recdns server is depooled - https://phabricator.wikimedia.org/T162818 (10BBlack) [14:11:44] 10Traffic, 10DNS, 10Operations, 10serviceops, and 2 others: nodejs / restbase services (mobileapps, aqs, recommendation-api, etc?) fail persistently after short windows of DNS unavailability - https://phabricator.wikimedia.org/T162818 (10BBlack) [14:13:28] 10Traffic, 10DNS, 10Operations, 10serviceops, and 2 others: nodejs / restbase services (mobileapps, aqs, recommendation-api, etc?) fail persistently after short windows of DNS unavailability - https://phabricator.wikimedia.org/T162818 (10BBlack) While we'll work on improvements that make this less-likely i... [14:16:09] 10Traffic, 10DNS, 10Operations: Consider DNSSec - https://phabricator.wikimedia.org/T26413 (10BBlack) Since we haven't updated this in two years, I figured I should post again: * DNSSEC is still awful * DNSSEC is still basically all the world has to solve certain problems, for better or worse. * DNSSEC has... [14:21:01] 10Traffic, 10Operations: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605 (10BBlack) [14:21:40] 10Traffic, 10Operations: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605 (10BBlack) a:05faidon→03None [14:21:55] sorry for the spam, trying to get some global cleanup done on everything in our "DNS Infra" column :) [14:22:39] nice! [14:23:12] 10Traffic, 10Operations: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365 (10BBlack) [14:23:14] 10Traffic, 10Operations: Implement GeoDNS smooth repooling in gdnsd - https://phabricator.wikimedia.org/T228678 (10BBlack) [14:23:17] 10Traffic, 10Operations: Set up LVS for current AuthDNS - https://phabricator.wikimedia.org/T101525 (10BBlack) [14:23:28] * volans|off loved the DNSSEC summary fwiw [14:23:30] 10Traffic, 10Operations: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365 (10BBlack) This is still something we want to pursue, but we really need to get past the smooth repooling issue first, so I've added that as a subtask (consider it blocking this one). [14:24:17] hello volans|off o/ [14:24:23] are you off? :) [14:24:31] 10Traffic, 10Operations: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605 (10BBlack) [14:24:37] 10Traffic, 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10BBlack) [14:25:02] I'll leave it to you to tell :-P [14:26:10] * ema takes it as a yes and wishes volans a nice weekend! [14:27:49] 10Traffic, 10Operations: Implement DNS-over-TLS for AuthDNS - https://phabricator.wikimedia.org/T239994 (10BBlack) p:05Triage→03Normal [14:29:13] thanks, you too! [14:30:32] I feel like while our phab board in general is still very much: https://i.pinimg.com/originals/54/7e/e7/547ee7604ee4cb023b3c20930b1e6914.jpg [14:30:50] I have now fully mopped and cleaned the least-important square meter in one corner of the room :P [14:32:06] oh man I can only imagine how pleasant it must be to help someone move all those records to a new flat :) [16:45:55] 10Traffic, 10Operations, 10Performance-Team: 15% response start regression starting around 2019-11-11 - https://phabricator.wikimedia.org/T238494 (10ema) We can now distinguish between hit, miss, and pass in text@esams ATS too. An important caveat when looking at these numbers is that Varnish supports hit-f... [16:56:45] 10Traffic, 10Operations, 10Patch-For-Review: Decom LVS recdns - https://phabricator.wikimedia.org/T239993 (10BBlack) In a sample I just took across all recdns for a little over 15 minutes of sniffer time looking for requests to the legacy LVS-based recdns IPs: * ulsfo, eqsin, and esams had no traffic to them... [18:16:07] 10Traffic, 10Operations, 10Patch-For-Review: Decom LVS recdns - https://phabricator.wikimedia.org/T239993 (10BBlack) Dug into the odd cases from `install2002` and `kraz` - the common pattern here is that there are some daemons in the world which both (a) parse `/etc/resolv.conf` for themselves because they u... [20:41:57] 10Traffic, 10Gerrit, 10Operations, 10Phabricator, 10Security-Team: Add gerrit.wikimedia.org to the Phabricator CSP - https://phabricator.wikimedia.org/T218308 (10Dzahn) [23:19:31] 10Traffic, 10Operations, 10Patch-For-Review: Decom LVS recdns - https://phabricator.wikimedia.org/T239993 (10colewhite) p:05Triage→03Normal [23:50:02] 10Traffic, 10ContentSecurityPolicy, 10Gerrit, 10Operations, and 2 others: Add gerrit.wikimedia.org to the Phabricator CSP - https://phabricator.wikimedia.org/T218308 (10Bawolff)