[01:13:56] 10Traffic, 10Operations: ats-tls-restart failed on cp4027 - https://phabricator.wikimedia.org/T237425 (10Vgutierrez) this is a known issue caused by update-ocsp-all [03:24:01] 10Traffic, 10Operations, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [04:21:22] 10Traffic, 10Operations: Create a second text-lb IP address for test purposes - https://phabricator.wikimedia.org/T237492 (10BBlack) p:05Triage→03Normal [05:09:14] 10Traffic, 10Operations, 10observability: Add ats-tls status and availability graphs to frontend-traffic - https://phabricator.wikimedia.org/T236482 (10Vgutierrez) Added ats-tls status panel: https://grafana.wikimedia.org/d/000000479/frontend-traffic?refresh=1m&panelId=12&fullscreen&orgId=1&var-site=All&var-... [09:18:13] 10Traffic, 10CX-cxserver, 10Citoid, 10Operations, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 (10hashar) [15:52:22] hm am following some instructions at https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service [15:52:24] i think some are out of date [15:52:29] Confd Define service conftool-data/service/services.yaml: [15:52:32] that file doesn't exist [15:52:42] ah yes [15:52:53] services are no longer a first-class entity in conftool [15:53:30] should I just remove that step then? [15:53:32] you should be able to just skip that step [15:53:35] yes please [15:53:38] k danke [15:54:37] https://wikitech.wikimedia.org/w/index.php?title=LVS&type=revision&diff=1843937&oldid=1836453 [16:18:38] some interesting stats/research on dns ttl issues relevant to us (from lobsters): https://00f.net/2019/11/03/stop-using-low-dns-ttls/ [16:19:38] and I'm not saying it's necessarily good conclusions either, I think I take issue with a lot of it [16:19:55] like a recommendation at the bottom that local users should extend minimum TTLs on their own [16:19:58] "If you use a local DNS cache such as dnscrypt-proxy that allows minimum TTLs to be set, use that feature. This is okay. Nothing bad will happen. Set that minimum TTL to something between 40 minutes (2400 seconds) and 1 hour. This is a perfectly reasonable range." [16:20:15] like, we totally don't want users doing their own 40-minute caching of our intentional 10-minute TTLs :P [16:20:25] but still, lots of interesting data and thinking in there to read through [16:24:53] lol "nothing bad will happen" [16:25:43] bblack: I saw that one during RIPE, interesting as well, but not much we can change if we want quite site depool https://ripe79.ripe.net/archives/video/184/ [16:35:55] 10Traffic, 10netops, 10Operations, 10observability: Network port utilization alerts should be paging - https://phabricator.wikimedia.org/T224888 (10ayounsi) [22:37:22] 10netops, 10Operations, 10observability: Determine & implement near-term method for escalating network alerts - https://phabricator.wikimedia.org/T237587 (10herron) p:05Triage→03Normal [22:43:17] 10netops, 10Operations, 10observability: Determine & implement near-term method for escalating network alerts - https://phabricator.wikimedia.org/T237587 (10herron) In terms of “what” should be escalated, so far we discussed * Fastnetmon “Potential DDOS” * Interface saturation What else is in scope here?...