[05:50:30] 10Domains, 10Traffic, 10Operations, 10Patch-For-Review: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10HakanIST) Nameservers list doesn't look updated. @CRoslof could you check? [08:10:48] https://scotthelme.co.uk/introducing-another-free-ca-as-an-alternative-to-lets-encrypt/ [08:43:58] XioNoX: nice, the other ca issuing free certs is https://www.buypass.com/ssl/products/acme apparently [08:51:44] 10Traffic, 10Operations, 10Technical-blog-posts: 2nd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T266857 (10ema) >>! In T266857#6641846, @srodlund wrote: > @ema do you have any preference for a featured image? Nope! Very unimaginativel... [09:06:29] 10Traffic, 10Operations, 10Technical-blog-posts: 2nd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T266857 (10Aklapper) https://commons.wikimedia.org/wiki/File:NCDN_-_CDN.png is the closest I could find (but a bit boring I guess) [09:58:15] 10Traffic, 10Operations, 10Patch-For-Review: Decom LVS recdns - https://phabricator.wikimedia.org/T239993 (10Volans) @BBlack FYI the step: > Merge and deploy the DNS patch https://gerrit.wikimedia.org/r/#/c/operations/dns/+/556230/ (removes the last comment lines noting that these IPs are still in use) Has... [10:05:23] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10ayounsi) I think we can remove: Everything that we remove from T231339#6612105 (which mean `net_cidr_src`, `net_cidr_dst` as... [10:20:40] 10Traffic, 10Operations, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10Gilles) I pulled the November data for the experiment, for reference: | Host | hit-front rate 2020-11-01 -> 2020-11-23 | | cp4027 | 67.23% |... [11:23:03] XioNoX, ema: interesting... what about their root certificates? are those trusted by somebody? [11:23:59] vgutierrez: even on old android for what I read [11:25:08] sadly buypass.com doesn't support wilcard certs issued over ACME [11:25:15] even wildcard [11:25:38] zerossl.com does though [11:25:50] XioNoX: yup.. that sounds like a cross signed cert [11:26:08] like the one that LE got at the beginning and that's expiring at the beginning of 2021 [11:26:20] * vgutierrez sending a vacation request for January 2021... [12:10:50] and finally after having migrated all to netbox we're down to more manageable violations in the dns zone validator: [12:10:53] RESULT: 0 Errors, 126 Warnings, 0 Ignored violations, 0 Ignored lines [12:13:53] volans: \o/ [12:14:34] ema: hey, for when you have a bit of time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/642587 [12:18:35] nice, thanks Amir1. Please include `Hosts: cp3050.esams.wmnet` (or any other hostname) in the commit log and add "check experimental" as a gerrit comment to double-check that this is a noop [12:18:57] oooh nice [12:18:57] sure [12:21:10] ema: noop https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler-test/626/console [12:22:13] Amir1: excellent, thank you! Will merge later today [12:22:19] coool [12:38:12] 10Traffic, 10DNS, 10Operations, 10netbox, 10cloud-services-team (Kanban): Move some of wikimediacloud.org 185.15.56.0/23 to Netbox - https://phabricator.wikimedia.org/T268621 (10ayounsi) p:05Triage→03Low [12:55:56] 10Traffic, 10DNS, 10Operations, 10netbox, 10cloud-services-team (Kanban): Move some of wikimediacloud.org 185.15.56.0/23 to Netbox - https://phabricator.wikimedia.org/T268621 (10ayounsi) Note that 185.15.56.0/24 is delegated to designate directly from the RIPE, so the PTR is out of scope here. [14:14:57] 10Traffic, 10Operations: ATS-BE Lua mitigations for cacheable responses w/ Set-Cookie seemingly not working - https://phabricator.wikimedia.org/T264378 (10ema) 05Open→03Resolved a:03ema >>! In T264378#6520360, @ema wrote: > > 1) the history API responds with `Cache-Control: max-ag... [14:59:08] 10Traffic, 10Operations, 10Technical-blog-posts: 2nd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T266857 (10srodlund) For the featured image, we typically use a photo (though there have been some exceptions). I decided on this image of... [15:18:46] 10Traffic, 10Operations: Broken package state on cp4032 - https://phabricator.wikimedia.org/T268243 (10ema) 05Open→03Resolved >>! In T268243#6644258, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/iMyH-XUBpU87LSFJXZ86} [2020-11-24T09:13:... [15:27:23] volans: congrats [15:28:20] chaomodus: congrats too [15:28:23] 🎉 [15:28:35] that was an epic journey to a nice win :) [15:29:32] thanks, was a team effort :) bblack, there are some leftovers that we need to decide if managing via netbox or not, no strong opinions on my side [15:30:00] yeah [15:30:19] they are listed in: https://phabricator.wikimedia.org/T258729#6644509 [15:30:33] the svc ones are in discussion in -serviceops, I'll open a separate task [15:30:37] there are pro/cons [15:30:57] wikimedia.org-global contains only nsa.w.o [16:08:56] nsa is basically the same as ns[012] [16:09:17] whatever we do with those three, we should do with it [16:13:11] what happened to TCP Fast Open? :) [16:13:33] we seem to be getting just a very few TFO requests, and I can't even find the option under chrome://flags/ [16:13:57] has it just quietly disappeared? [16:14:28] I looked at it when I thought of enabling it in wikidough. my understanding was that neither Chrome nor Firefox support it any more, so I didn't enable it [16:15:35] my vanilla FF config has network.tcp.tcp_fastopen_enable to false [16:16:35] sukhe: fascinating, yeah we seem to be getting about 0.0something TFO requests per second https://grafana-rw.wikimedia.org/d/000000257/tcp-fast-open?viewPanel=1&orgId=1 [16:17:05] that's node_netstat_TcpExt_TCPFastOpenPassive, which should be the right counter I think [16:17:19] use case replaced by http/2? [16:19:29] cdanis: probably a bit of that and a bit of middle-boxes dropping TCP packets with "unknown" options [16:19:34] yeah [16:19:48] there don't have to be all that many of those for Chrome to decide it isn't worth keeping the code around [16:20:35] yeah it's kinda sad. TFO really is an improvement if it works [16:20:45] (even with h/2 and/or TLS1.3) [16:21:26] we should probably also drop the setting from ats-tls, no point in having the kernel going through tcp_try_fastopen() all the time [16:21:31] the FF referenced bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1398201 ["Based on https://bugzilla.mozilla.org/show_bug.cgi?id=1393327 and an interaction some network devices are having with TFO and TLS 1.3, we are going to pref off TFO for 57. It will be back :)"] [16:21:37] but it never did! [16:21:44] I'm asking a Chrome dev friend of mine if she knows offhand [16:21:47] probably the other reason nobody cares to push to fix TFO in the real world, is that it's obviated/unecessary in the QUIC/H3 world we're probably headed for [16:24:09] ema: tcp_try_fastopen is for outbound I think? [16:24:39] oh no, I read it wrong [16:25:59] bblack: it's inbound I think [16:26:41] yeah, it looks at the SYN [16:28:04] I wonder if TFO is getting used in general elsewhere in our infra [16:28:14] not that the function is called that often, just ~4K times per second, but if TFO is gone what's the point? :) [16:29:07] looks like envoy is configured to accept TFO in puppet [16:29:27] I wonder if it's actually working for our conns to it (or inter-service) [16:29:44] although, unless they've put together a solution for a shared cookie, probably not [16:30:38] (because LVS randomization to N hosts with different cookies) [16:43:39] yeah, not very useful there [16:44:28] probably even detrimental, as we'd pretty much always send the wrong cookie :) [16:46:01] based on the sysctls, I think our client side will always try, right? [16:46:06] I don't think that needs app-level config even [16:46:57] we had wanted a shared cookie solution for our public-facing TFO as well, but without broad external adoption it wasn't worth working on (and in any case, source hashing fixes it most of the time) [16:47:41] it's harder than it sounds, since it's a shared secret that needs occasional rotation. we had a ticket somewhere once (probably still do), about implementing a generic solution for rotating shared secrets, as this applies to other similar things and not just TFO [16:49:06] https://phabricator.wikimedia.org/T240866 [16:49:51] yeah I think given the sysctls we probably always try [16:50:21] will look deeper tomorrow, it's playtime now :) [17:43:03] 10netops, 10DC-Ops, 10Operations: Juniper network device audit - all sites - https://phabricator.wikimedia.org/T213843 (10RobH) So I have another task nearly identical to this, T266053. However it is just for active/planned/inventory gear and correcting their locations now that the most recent renewal is co... [18:59:38] FYI anycast migrated too, nsa.w.o marked manual in Netbox (so no zonefile is created) [19:39:23] awesome, thanks [20:07:24] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) OK, cool. Knowing that you'd be open to reducing the retention period for Druid storage if necessary, what I'll do i... [22:00:06] 10Wikimedia-Apache-configuration, 10Android-app-Bugs, 10Fundraising-Backlog, 10Operations, and 4 others: Deal with donatewiki Thank You page launching in apps - https://phabricator.wikimedia.org/T259312 (10MattCleinman) Let's try copying the file so that it's accessible via `https://thankyou.wikipedia.org/... [22:25:22] 10Traffic, 10Operations, 10Performance-Team, 10Wikimedia-Incident: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 (10BBlack) @Gilles - please excuse the extremely long response! :) I've reviewed the history in this ticket with any eye to seeing wh...