[08:15:58] 10Traffic, 10Operations, 10Goal: Verify ATS handling of DNS TTLs - https://phabricator.wikimedia.org/T261312 (10ema) 05Open→03Resolved >>! In T261312#6412856, @Volans wrote: > @ema as one of the requester for this test thanks a lot for the effort. It looks like we're in good shape here. Thank you for ra... [08:33:59] 10Traffic, 10Operations: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [08:34:28] 10Traffic, 10Operations: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [08:34:30] 10Traffic, 10Operations, 10Patch-For-Review: Analyze custom varnish 5.1 patches considering the migration to varnish 6 - https://phabricator.wikimedia.org/T260702 (10ema) [08:34:38] 10Traffic, 10Operations: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) p:05Triage→03Medium [09:09:36] 10Traffic, 10Operations, 10Patch-For-Review: Varnish 6.0 needs a SONAME version bump - https://phabricator.wikimedia.org/T261487 (10ema) [10:10:51] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10fdans) @CDanis that makes sense. In that case what we propose is adding an intermediate data augmentation step to add these dimensions about 6-7 h... [10:13:00] 10Traffic, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Abbe98) Aren't the referenced patch blocking access to all services on maps.wikimedia.org and not only osm-intl? This issue and the deprecation mess... [10:36:46] 10netops, 10Operations, 10ops-eqiad: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10Marostegui) [10:36:50] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459 (10Marostegui) [12:28:39] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10CDanis) Yes, it would. There's two use cases here: * DoS attack analysis, for which real-time is essential. Here, the augmented data would be hel... [13:28:40] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10fdans) Thanks for clarifying. A correction from my end: the extra dimensions would actually take significantly less then 6 hours since they would... [15:18:54] 10Traffic, 10Maps, 10Operations, 10Wiki-Loves-Monuments (2020): wikimedia.pl returns a HTTP 429 error (let it access varnish maps_domains) - https://phabricator.wikimedia.org/T261506 (10TOR) [15:25:40] 10Traffic, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) There are some uses of maps.wikimedia.org by websites that are definitely movement-affiliated but may not be hosted on Wikimedi... [15:27:23] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) [18:03:35] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Dzahn) Isn't this already the case? I mean we just recently had to add wikilovesmonuments to the regex of allowed domains to let it use m... [18:06:12] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Dzahn) It seems like this statement: > only support requests from the Wikimedia cluster is in conflict with this statement: > Cloud S... [18:08:09] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) @AntiCompositeNumber our intention was to create a clear, maintainable, line around what domains would be supported. We'd really l... [18:10:15] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10CDanis) My two cents: I suspect it's merely the case that the language in the announcement was somewhat imprecise. That's understandable... [18:10:30] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) > " from the Wikimedia cluster" is a naive Product person's use of that term. I apologize if that imprecision is causing misunders... [18:16:27] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10BBlack) Based on the email to maps-l linked at the top, it's not clear to me that cases like wikilovesmonuments and wikimedia.pl were mean... [18:21:53] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10BBlack) //[Removed my earlier comment because I had failed to refresh this ticket and catch up on all the other recent traffic, which chan... [18:37:15] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) > "So if "limit maps serving to Wikimedia hosted sites only" wasn't already the status quo then why would we have to add to this... [18:43:21] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10CDanis) @JMinor @Elitre Although some of this is already mentioned upwards in this task, here's a summary of the community objections I'm... [20:20:33] 10Traffic, 10Maps, 10Operations, 10Product-Infrastructure-Team-Backlog: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Multichill) I noticed the announcement on the maps-l list and I also noticed https://twitter.com/krmaher/status/1299203640188690434 where... [20:40:47] There's some context in #wikimedia-sre, but I've got warnings in icinga for cert renewal related to `cloudelastic1005.wikimedia.org` and `cloudelastic1006.wikimedia.org` [20:40:51] copy-pasting the relevant part: [20:42:49] There's 12 warnings total, 6 per node (corresponding to # of elasticsearch clusters on each node), they all look like `SSL WARNING - Certificate cloudelastic.wikimedia.org valid until 2020-09-02 19:55:16 +0000 (expires in 4 days)` where the actual command is `$USER1$/check_ssl --warning 7 --critical 3 -H $HOSTADDRESS$ --cn cloudelastic1005.wikimedia.org -p 9443` [20:43:25] the relevant nodes/hostnames are in `acme-chief`: https://github.com/wikimedia/puppet/blob/822259dc2de2eb5dc9669b6ad6f5f04b24dba2ba/hieradata/role/common/acme_chief.yaml#L18-L30 [20:44:59] BTW scrolling up in this channel I'd had a similar question around aug 18 but it was for the inverse problem. quick context, we have `cloudelastic100[1-6]` as the nodes, 1005 and 1006 were added later in time from `1001-1004` [20:45:59] last time I'd asked because there was a warning that `cloudelastic100[1-4]` was going to expire, and it was explained by vgutierrez that (copy-pasting): [20:46:02] > from let's encrypt point of view we have 2 certs, one including 1001-1006 and one with 1001-1004, the latter is going to expire (as expected) and that's why LE warns us [20:46:52] so I'm thinking that now maybe it thinks `cloudelastic100[5-6]` are going to expire for a similar reason, the reason being that LE is managing all 6 of `cloudelastic100[1-6]` as a single block? [20:53:14] Captured the above context in https://phabricator.wikimedia.org/T261528 [21:01:37] So in conclusion, I've sticky-acked the 12 warnings which should prevent them from going critical. If this is a real issue and not just an alerting issue then certs will expire on `2020-09-02 19:55:16 +0000`. I've set a reminder to circle back here on monday so we'll have time to figure it out before the potential expiration [21:18:52] * vgutierrez checking [21:21:08] ryankemper: are you sure that you're using the proper cert on those machines? [21:21:14] from acmechief1001 [21:21:59] root@acmechief1001:~# openssl x509 -dates -noout -in /var/lib/acme-chief/certs/cloudelastic/live/ec-prime256v1.crt [21:21:59] notBefore=Aug 3 19:00:35 2020 GMT [21:21:59] notAfter=Nov 1 19:00:35 2020 GMT [21:21:59] root@acmechief1001:~# openssl x509 -dates -noout -in /var/lib/acme-chief/certs/cloudelastic/live/rsa-2048.crt [21:21:59] notBefore=Aug 3 19:00:43 2020 GMT [21:22:00] notAfter=Nov 1 19:00:43 2020 GMT [21:22:19] and [21:22:22] root@acmechief1001:~# openssl x509 -text -noout -in /var/lib/acme-chief/certs/cloudelastic/live/rsa-2048.crt |grep DNS [21:22:23] DNS:cloudelastic.wikimedia.org, DNS:cloudelastic1001.wikimedia.org, DNS:cloudelastic1002.wikimedia.org, DNS:cloudelastic1003.wikimedia.org, DNS:cloudelastic1004.wikimedia.org, DNS:cloudelastic1005.wikimedia.org, DNS:cloudelastic1006.wikimedia.org [21:26:29] also... the puppetization on cloudelastic servers is reloading/restarting the affected services after acme-chief renews the certs? [21:42:15] vgutierrez: was that last question you asking if that is happening (re restarting services), or are you saying that you've observed that it is happening? [21:42:33] I'm asking if it's happening [21:42:45] cause the cert is being renewed as expected [21:43:51] and I can see those certs renewed as expected on cloudelastic1006 [21:43:58] vgutierrez@cloudelastic1006:~$ sudo -i openssl x509 -dates -noout -in /etc/acmecerts/cloudelastic/live/rsa-2048.crt [21:43:59] notBefore=Aug 3 19:00:43 2020 GMT [21:43:59] notAfter=Nov 1 19:00:43 2020 GMT [21:43:59] vgutierrez@cloudelastic1006:~$ sudo -i openssl x509 -dates -noout -in /etc/acmecerts/cloudelastic/live/ec-prime256v1.crt [21:43:59] notBefore=Aug 3 19:00:35 2020 GMT [21:43:59] notAfter=Nov 1 19:00:35 2020 GMT [21:44:20] I'm still getting familiar with how everything fits together so not sure off the top of my head, will take a look [21:44:27] so I'm guessing that the services using those certs are missing a reload ryankemper [21:44:29] Does the above mean the certs were renewed on aug 3? [21:44:33] indeed [21:44:55] Aug 3rd at 20:00 GMT [21:45:07] the cert shows 19:00 cause Let's Encrypt backdates them 1 hour [21:45:17] (to avoid huge clock skew issues) [21:45:41] ack, the relevant elasticsearch services haven't been restarted since 2020-06-04 so that is sounding promising [21:46:18] I dunno if a reload would suffice to trigger a TLS material reload on elasticserach [21:46:22] *elasticsearch [21:46:44] Only thing is `cloudelastic1004` (which is not alerting) has the same aug 3 renewal date and yet services weren't restarted there [21:47:40] cloudelastic1004 icinga checks are showing the expected date [21:47:42] notAfter=Nov 1 19:00:43 2020 GMT [21:47:57] (copy pasted from icinga) SSL OK - Certificate cloudelastic.wikimedia.org valid until 2020-11-01 19:00:35 +0000 (expires in 64 days) [21:48:16] Oh I see [21:48:35] Any guesses what needs to be reloaded to get the new certs to take effect? [21:48:40] nginx? :) [21:48:54] at least according to cloudelastic puppetization [21:48:58] Ah of course, we do have nginx in front of elasticsearch [21:50:19] Will depool and then reload nginx and see if that helps [21:50:51] a nginx reload shouldn't require a depool [21:50:52] Come to think of it not sure if it even needs to be depooled for a reload but probably better to be paranoid anyway [21:50:58] Good timing [21:51:00] Okay :D [21:51:06] nginx is smart enough :) [21:51:47] 10Traffic, 10DNS, 10Operations: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10Krinkle) [21:54:48] vgutierrez: alerts have resolved, thanks for all the help! [21:55:07] i'm gonna jot down a TODO to mention nginx reloads in our elasticsearch documentation [21:55:13] please donate 🍺 [21:55:19] ;P [21:55:40] so.. .acme_chief::cert puppet resource is able to trigger the required service reload upon cert renewal [21:55:54] please check the elasticsearch puppetization [21:56:29] as I'm not über familiar with it [21:56:45] * ryankemper sends a virtual IPA your way [21:57:11] <3 [21:58:26] I hear you guys like your lagers in europe but I'm an IPA zealot (or stouts) [21:58:34] thanks for jumping on this given your tz, much appreciated [21:58:49] yeah.. you got me after 2 DIPAs [21:58:51] :P [21:59:05] it's Friday night after all [21:59:10] hopefully hit the ballmer peak? https://xkcd.com/323/ [21:59:27] and I will follow-up to figure out what's going on with puppet, there's a related issue where `pool` and `depool` scripts aren't present on those boxes so I suspect something is missing somewhere [21:59:33] well.. I didn't trigger a bigger problem [21:59:38] :D [21:59:41] that's my job [22:01:17] ping me on Monday if you need help with the acme_chief::cert puppetization [22:01:20] see you! [22:02:17] cheers with a can of cheap Mexican lager, vgutierrez [22:02:25] good weekend https://en.wikipedia.org/wiki/Grupo_Modelo#Estrella_Jalisco [22:04:52] 10Traffic, 10InternetArchiveBot, 10Operations: Support TLSv1.3 in IABot - https://phabricator.wikimedia.org/T251414 (10Cyberpower678) 05Open→03Declined This is not something I believe I have control over. [22:38:38] 10Traffic, 10DNS, 10Operations: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10Krinkle) Based on the [Matrix docs](https://github.com/matrix-org/synapse/blob/v1.19.1/docs/delegate.md#srv-dns-record-delegation) a... [23:08:56] 10Traffic, 10DNS, 10Operations: Configure subdomain foundation.wikimedia.org to enable *:foundation.wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T261531 (10bcampbell) I heard back from the vendor regarding DNS and the rep said "I have not found the DNS way for delegation in our internal... [23:42:42] 10Traffic, 10InternetArchiveBot, 10Operations: Support TLSv1.3 in IABot - https://phabricator.wikimedia.org/T251414 (10Krenair) >>! In T251414#6420161, @Cyberpower678 wrote: > This is not something I believe I have control over. Could you be more specific? What challenges do you see in implementing this?