[09:00:18] Krenair: please, take a look to the debian part of the acme-chief renaming: https://gerrit.wikimedia.org/r/c/operations/software/certcentral/+/489993 [09:00:33] I've tested the package build in boron and it looks good [10:43:49] vgutierrez, done [10:47:37] thanks! [11:07:19] <_joe_> vgutierrez: talking about PKI, what do you think we should do for internal usage? [11:07:29] <_joe_> I think it's a major infrastructural deficiency we have right now [11:22:58] right now cfssl + some ACME layer [11:23:09] and use acme-chief as a client [11:26:18] if we write the acme layer in go we could reuse some code from pebble/Boulder [11:28:46] I should run this train of thought by bblack [11:28:52] but IMHO it could make sense [12:43:19] we could even try to run a boulder instance... after all as I see in their docker-compose that they provide for testing / development.. the only external service you need to run boulder is MariaDB [12:43:29] and I guess that we know how to run mariadb [12:48:23] <_joe_> sure [12:48:48] <_joe_> i'll list my requirements :) [12:49:29] <_joe_> one thing that kinda scares me are short-lived certs [12:49:56] <_joe_> for things like nginx that's ok, but not every software in the world can reload the certs at runtime [12:50:20] <_joe_> one notable example: etcd [12:51:46] hmm actually Boulder is built in top of cfssl [12:51:46] I was talking about PKI with jbond42 yesterday [13:00:11] _joe_ there is nothing on acme that stops us from issuing long term certs [13:00:39] so that shouldn't be an issue [13:00:59] I'd argue that we should not issue long term certs :) [13:01:04] for various reasons [13:02:06] <_joe_> paravoid: and I'd agree, but I'm not going to fix every software that's not designed to do so and we already use. So realistically we should have a PKI that allows us to do that and then work on automation to make quick renewal seamless in most cases [13:02:16] I agree.. but we could allow them to make the transition easier [13:02:28] we don't have to "fix" them, just work around them [13:02:36] <_joe_> but we can also go the other way around, fix what's fixable first, and work on a pki afterwards [13:02:38] but exercising those workarounds often instead of every 2 years sounds more prudent [13:03:10] <_joe_> paravoid: I agree FWIW, read above :) [13:03:48] <_joe_> for clusterized software it could be as easy as to run a cookbook [13:04:00] I don't think it should be manual :) [13:04:02] <_joe_> as you'll need some level of coordination [13:04:15] anyway, that's a bigger, longer-term and non-traffic-related project I'd say [13:04:20] <_joe_> yeah, I'm trying to think of a way to do it with etcd and elasticsearch [13:04:22] <_joe_> yes [13:04:32] <_joe_> or mysql, FWIW [13:05:18] I've been wanting to checkout Vault's PKI [13:05:46] vault is interesting to think about more broadly anyway, maybe we should start there [13:06:24] <_joe_> last time I looked at vault, the pki capabilities were pretty basic, but it was like 4 years ago [13:07:24] <_joe_> I'll look into it as well. I would like our PKI to be able to manage multiple CAs, to allow for arbitrary expiration, SANs and utilization [13:07:38] sure [13:08:44] acme support would be awesome.. we could deploy internal and external certs in the same way [13:09:54] are there any CAs supporting ACME atm? [13:10:08] any FL/OSS CA software, I mean [13:10:14] Boulder [13:10:32] yeah well :P [13:10:34] that's what's running let's encrypt [13:10:39] i'm aware [13:11:10] I think he meant apart from Boulder :P [13:12:19] yeah I meant apart from Boulder (or Pebble) [13:12:27] none afaik [13:13:15] but maybe we can benefit from boulder itself [13:14:52] dunno, I don't think it's built for that purpose [13:15:31] might be a weird fit, and not supported well by upstream [13:16:16] I do wonder whether ACME is a good fit for this really [13:17:24] like does it make sense to be doing DV verification across the wikimedia network? [13:19:49] <_joe_> the interesting feature would be to have a single way to issue certs [13:20:19] yeah that it works the same way as public certs [13:20:43] I can see the appeal [13:20:52] not convinced though [13:21:07] also it can be one of many/several CAs under a hypothetical WMF root CA [13:21:30] so not everything has to be handled by it [13:22:08] I'm just thinking about the actual process of issuing certs right now compared to with ACME and domain validation [13:23:54] wondering if the whole thing just opens up more flaws in the process [13:25:36] well.. even if you consider another kind of validation it could be implemented in top of ACME [13:26:20] presumably we'd be talking about something custom then, or at least not just boulder [13:26:40] but okay, maybe ACME itself is fine it's just the exact verification method used that needs thought [13:26:56] <_joe_> so a rapid look at vault's CA makes it look interesting because it allows managing certs from k8s directly via service account tokens [13:31:15] <_joe_> as far as a puppet integration goes, yuck, there are some things [13:33:01] <_joe_> https://www.hashicorp.com/blog/how-to-use-vault-with-hiera-5-for-secret-management seems like a sane-ish architecture, and also has quite interesting Q&A [13:33:23] <_joe_> the TL;DR is: we'd have to write such an integration ourselves, but that doesn't seem unsurmountable [14:23:05] 10Traffic, 10Cloud-VPS, 10Operations, 10Toolforge, 10Patch-For-Review: Wikimedia varnish rules no longer exempt all Cloud VPS/Toolforge IPs from rate limits (HTTP 429 response) - https://phabricator.wikimedia.org/T213475 (10akosiaris) Change has been deployed across the fleet. WMCS IP space `172.16.0.0/1... [15:32:54] i'm late to the party but i think both (vault and cfssl ACME api provider alike) will probably be needed, i dont know much about the rest of the usecases but for k8s https://blog.digitalocean.com/vault-and-kubernetes/ [15:34:01] you will need to create long term certificates (sorry but i already lied a k8s CA expired party and i dont plant to have another soon) and then distribute them, vault can help with distribution while an ACME certificate provider could manage the creation and renewal of certs [16:42:34] 10Traffic, 10Varnish, 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Maps (Tilerator): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10Mholloway) [16:53:41] 10Traffic, 10Acme-chief, 10Operations: Upgrade acme-chief to run in debian buster - https://phabricator.wikimedia.org/T215925 (10Vgutierrez) [16:53:53] 10Traffic, 10Acme-chief, 10Operations: Upgrade acme-chief to run in debian buster - https://phabricator.wikimedia.org/T215925 (10Vgutierrez) p:05Triage→03Normal [16:57:21] cool.. acme-chief tests are happy with python 3.7 and buster dependencies \o/ [16:57:53] Krenair: could you take a look? https://gerrit.wikimedia.org/r/c/operations/software/certcentral/+/490093 [17:02:28] 10Traffic, 10ExternalGuidance, 10Operations, 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10dr0ptp4kt) @santhosh ^ would you please review and verify it has the intended effect? I need to reset my Vagrant stuff, but figured this wa... [17:08:17] Krenair: also the puppet code https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/489719/ [17:08:33] PCC seems happy but I'm pretty sure I'm missing something [17:10:09] it will run in brand new instances so we won't break anything [18:16:14] vgutierrez, done first [18:16:16] puppet later [18:24:49] thx [18:53:59] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200) - https://phabricator.wikimedia.org/T213214 (10matmarex) I did not experience this issue since my last report in January. Has anyone else ran into it again? [19:11:20] 10Traffic, 10ExternalGuidance, 10Operations, 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10dr0ptp4kt) @BBlack ^ would you please review the enwiki VCL patch? We'll only want to merge it after ExternalGuidance has been tested with... [19:24:43] 10netops, 10Operations, 10ops-eqiad, 10ops-eqsin: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10RobH) >>! In T213121#4944125, @RobH wrote: > Chris shipped this, and I just put in an inbound shipemnt ticket for EQ Singapore SG#: 1-185487164544 > UPS tracking 1Z291X71DG27842078 EQ SG3... [19:57:55] 10Traffic, 10Cloud-VPS, 10Operations, 10Toolforge, 10Patch-For-Review: Wikimedia varnish rules no longer exempt all Cloud VPS/Toolforge IPs from rate limits (HTTP 429 response) - https://phabricator.wikimedia.org/T213475 (10Cyberpower678) Not hitting anymore Varnish error messages. Cyberbot's operation...