[01:47:35] 10Traffic, 10MediaWiki-Cache, 10Operations, 10Page Content Service, and 3 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10AntiCompositeNumber) https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Edits_invisible_or_variably_vis... [04:22:14] Krenair, bblack a simple reload isn't enough to trigger a TLS cert reload with ATS [04:24:10] you need to touch ssl_multicert.config and then issue the service reload [04:27:43] and yes.. it looks like that if clause at the end of tls_material.pp is buggy and $do_ocsp shouldn't be there [06:52:18] 10Traffic, 10MediaWiki-Cache, 10Operations, 10Page Content Service, and 3 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10ema) >>! In T249325#6060912, @AntiCompositeNumber wrote: > https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(tech... [11:58:17] 10Traffic, 10Operations: Implement TTL cap for ats-be - https://phabricator.wikimedia.org/T249627 (10ema) I was under the impression that ATS had no config setting to impose a TTL cap. The reason for this is that the [[https://docs.trafficserver.apache.org/en/8.0.x/admin-guide/files/records.config.en.html#prox... [12:25:04] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589291 [12:34:43] Krenair: I think we can refactor that even more [12:34:52] cause we don't use nginx anymore [12:34:58] hmm good point [12:35:36] so we can remove that if branch completely, keep the Exec and just trigger it using the puppet_rsc parameter of acme_chief::cert [12:35:50] I can do that if you want [12:35:55] just let me know [12:51:33] 10Traffic, 10Operations: Servers freezing across the caching cluster (November 2019) - https://phabricator.wikimedia.org/T238305 (10Vgutierrez) [12:53:06] 10Traffic, 10Operations: Servers freezing across the caching cluster (November 2019) - https://phabricator.wikimedia.org/T238305 (10Vgutierrez) cp1087 crashed a few minutes ago (12:30) showing the same symptoms, running buster and with firmware upgraded according to T243167 [12:59:28] BTW it looks like we could close T170567 [12:59:29] T170567: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 [12:59:32] \o/ [13:00:05] yay! [13:00:18] nice! [13:02:33] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review, 10Performance-Team (Radar): Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Vgutierrez) 05Open→03Resolved TLSv1.3 is now available on both text and upload clusters :) [13:02:34] 10HTTPS, 10Traffic, 10Operations, 10Security: Investigate our mitigation strategy for HTTPS response length attacks - https://phabricator.wikimedia.org/T92298 (10Vgutierrez) [13:14:40] vgutierrez: \m/ [13:32:08] nice work :) [13:32:44] just narrowing to the most-recent 30m in turnilo, the TLS version split looks like ~80/20 (1.3/1.2) [13:33:16] yep [13:33:45] we should ping our labs users and recommend them to upgrade their bots to something able to talk TLSv1.3 [13:34:06] I'll wait a day or two to gather some data and early next week I'll do some TLSv1.3 ambassador work [13:34:17] sounds like a plan :) [13:34:58] "but I'm using some $veryoldversionofalanguage, I can't possibly do that!" [13:37:36] so (again just 30m of data) - the top ISP is "unknown" (because private address space from our own stuff), and the top few IPs are labs IPs [13:37:41] (172.16.x.x) [13:38:00] but also interesting if you extend the list of those unknowns out a bit, there's lots of production internal IPs too [13:38:10] for services like wdqs, ores, maps, etc [13:38:51] so yeah, definitely some internal research+advocacy to do ) [13:39:12] in theory, we should also move their traffic away from the cache layer, I think? [13:39:44] well, yeah, that would be ideal, but it's not simple to do it right [13:39:56] once we have an internal routing layer that makes sense for their easy use, yes [13:40:03] bblack: yup.. I've been filtering by ^172. :) [13:40:08] yeah [13:40:37] we should get some automagically improvement as we get rid of stretch systems [13:40:58] yeah true [14:49:41] XioNoX: is this librenms email about `cr1-codfw Primary outbound port utilisation over 80%` a false alarm? [14:50:58] it must be, the utilization numbers are nonsense [15:19:48] cdanis: either monitoring glitch or a very brief spike of traffic causing the device to fail the pooling [15:20:01] I'd say the former though [15:20:54] seems like a glitch, it was for so many every interfaces [15:58:46] 10Traffic, 10MediaWiki-Cache, 10Operations, 10Page Content Service, and 3 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10ema) @Urbanecm, @AntiCompositeNumber: esams has now been running with the new system for the past few hours, the sit... [15:59:37] ema: <3 [16:00:28] cdanis: fingers crossed! :) [19:07:19] 10netops, 10Operations, 10homer: Homer: Netbox driven switch interfaces - https://phabricator.wikimedia.org/T250429 (10ayounsi) p:05Triage→03Medium [19:16:37] 10netops, 10Operations, 10homer, 10Patch-For-Review: Homer: Netbox driven switch interfaces - https://phabricator.wikimedia.org/T250429 (10ayounsi) [19:22:54] 10netops, 10Operations, 10homer, 10Patch-For-Review: Homer: Netbox driven switch interfaces - https://phabricator.wikimedia.org/T250429 (10ayounsi) [20:14:57] 10Traffic, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Krinkle) a:03Krinkle [20:46:46] vgutierrez, I've updated https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589291 [21:23:19] 10Traffic, 10MediaWiki-Cache, 10Operations, 10serviceops, and 3 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10daniel) [23:15:28] 10HTTPS, 10Traffic, 10Operations, 10Toolforge, 10cloud-services-team (Kanban): Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 (10bd808) We have left the POST loophole open for more than a year. Now that we have introduced [[https://wikitech.wikimedia.or... [23:20:26] 10HTTPS, 10Traffic, 10Cloud-VPS, 10Operations, 10cloud-services-team (Kanban): Set "https_upgrade" configuration flag for domainproxy to enforce HTTPS upgrade for GET|HEAD requests - https://phabricator.wikimedia.org/T120486 (10bd808) [23:20:59] 10HTTPS, 10Cloud-VPS, 10cloud-services-team (Kanban): Set "https_upgrade" configuration flag for domainproxy to enforce HTTPS upgrade for GET|HEAD requests - https://phabricator.wikimedia.org/T120486 (10bd808)