[01:02:19] 10Traffic, 10Operations, 10Research: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10bmansurov) @Vgutierrez, thanks for working on this task. Please let me know if I can help move the task forward. [03:58:48] 10netops, 10Operations, 10ops-codfw: codfw: Delete cloud interface-range - https://phabricator.wikimedia.org/T244196 (10Papaul) [08:53:32] 10Traffic, 10Operations: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10Vgutierrez) [09:01:34] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, 10User-fgiunchedi: Port varnishlog consumers to log to syslog / logging infra - https://phabricator.wikimedia.org/T227108 (10fgiunchedi) >>! In T227108#5844341, @fgiunchedi wrote: > Had to revert in https://gerrit.wikimedia.org/r/c/operation... [12:37:59] 10netops, 10Operations, 10ops-codfw: codfw: Delete cloud interface-range - https://phabricator.wikimedia.org/T244196 (10ayounsi) Yep, it's fine to delete it if there are no more member interfaces. [12:48:01] good morning! I'm going to depool ulsfo to upgrade the routers there [12:48:09] https://gerrit.wikimedia.org/r/c/operations/dns/+/570036 [13:34:25] 10Traffic, 10Operations, 10Patch-For-Review: Fix acme-chief DNS validation correctly - https://phabricator.wikimedia.org/T240614 (10Vgutierrez) 05Open→03Resolved a:05BBlack→03Vgutierrez Solved in acme-chief 0.22, now we can set an arbitrary DNS port to validate the DNS-01 challenges on the acme-chief... [13:41:38] 10Traffic, 10Operations: acme-chief should be able to refresh OCSP stapling response even if the renewal process fails - https://phabricator.wikimedia.org/T244232 (10Vgutierrez) [13:42:05] 10Traffic, 10Operations: acme-chief should be able to refresh OCSP stapling response even if the renewal process fails - https://phabricator.wikimedia.org/T244232 (10Vgutierrez) p:05Triage→03Normal [13:42:35] 10Traffic, 10Operations: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has X seconds left - https://phabricator.wikimedia.org/T243948 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez After solving T240614, acme-chief has been able to renew non-canonical-redirect-3 so OCSP stapling refresh is fi... [13:54:33] 10Traffic, 10Operations, 10Wikimedia-Incident: cp3050 depooled due to explosion in CPU usage and inuse sockets - https://phabricator.wikimedia.org/T241001 (10ema) 05Open→03Resolved text@esams has had no similar issues since we disabled xdebug in December, closing. [13:59:18] 10Traffic, 10Operations: Remove debug proxies once all Varnish backends are gone - https://phabricator.wikimedia.org/T237932 (10ema) [14:04:53] 10Acme-chief, 10Traffic, 10Operations: acme-chief is unable to renew certificates against LE staging environment - https://phabricator.wikimedia.org/T244236 (10Vgutierrez) [14:04:56] 10Acme-chief, 10Traffic, 10Operations: acme-chief is unable to renew certificates against LE staging environment - https://phabricator.wikimedia.org/T244236 (10Vgutierrez) p:05Triage→03High [14:50:01] https://lwn.net/Articles/809333/ - "Accelerating netfilter with hardware offload" [14:58:19] 10Traffic, 10Operations: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10ema) We now know that this is 100% traffic induced, and the culprit seems to be FortiGate. Compare the last 24 hours of FD growth: {F31547496} And ulsfo requests with UA: FortiGate durin... [16:11:13] 10netops, 10Operations, 10ops-codfw: codfw: Delete cloud interface-range - https://phabricator.wikimedia.org/T244196 (10Papaul) okay thanks will do that [16:36:35] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-publish: Varnish HTTP response from app servers taking 160s (only 0.031s inside Apache) - https://phabricator.wikimedia.org/T181315 (10Krinkle) 05Open→03Declined We've migrated from varnish-be to ats-be (see <... [16:47:57] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10ArthurPSmith) @Addshore and others - the problem has deteriorated since Saturday - see this discussion... [20:35:25] I was digging through ATS code a bunch today, and while I was at it I took a good look at some of the experimental plugins... [20:35:39] I don't know how stable they are, but some of them look very useful! [20:36:28] in particular we could use the fq_pacing one to try to cap outbound tcp stream rates better for fairness (i.e. even if the client is 0.05ms away in eqiad, don't let them stream commons content at multi-gigabit rates...) [20:36:53] and the memcached module looks interesting. the documentation is poor, but it may support all the usual operations. [20:37:14] when I first stumbled across that one, I assumed it was to use memcached as storage for the http cache.... [20:37:48] but it's actually a memcached server protocol implementation, for the existing http cache. meaning you can connect to it with anything that talks to memcache and use it to query and delete http cache keys, etc... [20:38:49] there's some cache keying ones that sound similarly-useful to x-key stuff, there's a collapsed_forwarding one to fix up the loophole in read_while_writer waiting on headers, etc.. [20:38:53] lots of interesting things laying around in there [22:27:43] 10netops, 10Operations, 10cloud-services-team (Kanban): WMCS: cleanup network allocations - https://phabricator.wikimedia.org/T240670 (10ayounsi) 05Open→03Resolved I think everything is done here? [22:32:33] 10netops, 10Operations, 10SRE-tools, 10Goal, and 2 others: Configuration management for network operations - https://phabricator.wikimedia.org/T228388 (10ayounsi) [22:34:07] 10netops, 10Operations, 10SRE-tools, 10Goal, and 2 others: Configuration management for network operations - https://phabricator.wikimedia.org/T228388 (10ayounsi) Everything here is done. Doc is there https://wikitech.wikimedia.org/wiki/Homer and has been tested by other SREs than Riccardo or me. Future d... [22:34:18] 10netops, 10Operations, 10SRE-tools, 10Goal, and 2 others: Configuration management for network operations - https://phabricator.wikimedia.org/T228388 (10ayounsi) 05Open→03Resolved