[06:52:02] 10Certcentral: Rename the Certcentral project - https://phabricator.wikimedia.org/T207389 (10Joe) I **think** the original reference was indeed https://en.wikipedia.org/wiki/Acme_Corporation#Animated_films,_TV_series which is most appropriate for something we integrate with puppet. So my suggestions, if we want... [07:49:34] 10Traffic, 10Operations, 10Patch-For-Review: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10ema) Looking at the logs on cp1071 and other cp-ats hosts, it seems that the patch above is working as expected: ` modified path: /api/rest_v1/page/mobile-sections/Wikipedia%3AWikiProject_Spide... [10:31:56] 10Traffic, 10ChangeProp, 10Operations, 10RESTBase, and 3 others: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10mobrovac) [14:44:49] elukey: so does this make sense to you? https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475692/ [14:48:44] vgutierrez: nice! So I am faily ignorant about the certcentral puppet code, but the only thing that I'd ask is if letsencrypt::cert::integrated needs to be kept or not [14:48:56] elukey: that should be removed two commits later [14:49:05] elukey: after that one, this one https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475757/ should be merged [14:49:16] and after that, one removing the old LE puppetization [14:50:10] ahhh so certcentral deploys under /etc/certcentral, so we deploy the new certs first, then we flip nginx to use the newer ones and we clean up [14:50:18] /etc/centralcerts [14:50:23] indeed :) [14:50:24] yes sorry [14:50:29] sounds wonderful [14:51:17] if you want feel free to deploy the new certs whenever you have time [14:51:26] yup, I'll do it right now [14:55:06] 10Traffic, 10Operations: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [14:55:10] 10Traffic, 10Operations, 10Patch-For-Review: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 (10ema) 05Open>03Resolved [14:55:25] Notice: /Stage[main]/Archiva::Proxy/Certcentral::Cert[archiva]/File[/etc/centralcerts/archiva.rsa-2048.crt]/ensure: defined content as '{md5}32325f2c3cf7a07d864a1c882a01a03e' [14:55:35] cert an related files have been deployed in archiva1001 [14:55:38] *and [15:01:44] elukey: next step https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475757/ [15:02:59] checking [15:03:50] +1 [15:04:02] Krenair: correct me if I'm wrong, but we currently lack a notify => Service[$puppet_svc] in the file resources defined in certcentral::cert, right? [15:06:39] Krenair: something like https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475762/ [15:17:22] elukey: yey, archiva is now using the certcentral managed TLS certificate \o/ [15:18:20] * elukey dances [15:18:23] great!!! [15:20:12] elukey: and this is the commit that cleans the old stuff: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475765/ [15:23:33] +1! [15:25:01] <3 [15:33:32] 10Traffic, 10Operations, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [15:44:56] vgutierrez, right yes we should have that... not sure how that got missed [15:45:04] that's what puppet_svc is for :) [15:45:09] np :) [15:45:39] vgutierrez, apt next? [15:45:45] Krenair: also please take a look to https://gerrit.wikimedia.org/r/#/c/operations/software/certcentral/+/475713/ [15:45:57] Krenair: if moritzm it's ok with that, we can follow with apt.wm.o :) [16:00:27] 10Traffic, 10Operations: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) p:05Triage>03Normal [16:03:24] 10Traffic, 10Operations: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [16:04:17] 10Traffic, 10Operations, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [16:04:23] 10Traffic, 10ChangeProp, 10Operations, 10RESTBase, and 3 others: ATS path normalization - https://phabricator.wikimedia.org/T210295 (10ema) 05Open>03Resolved Deployed and working fine. Closing. [16:05:59] ema: nice T210411, I was wondering the other day while moving librenms and netbox to certcentral while smokeping was still running in plain text [16:06:00] T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 [17:02:29] moritzm: is it ok with you if we proceed with apt.wm.o? [17:25:38] 10Traffic, 10Operations: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Joe) I think we need to solve somehow T194031 before this will be feasible. We could as well keep using the puppet CA for everything, but I feel we're at the point where we really need a real PKI solution. [17:26:09] <_joe_> vgutierrez: I was thinking about bblack's proposal on https://phabricator.wikimedia.org/T194031#4217303 and it seems more and more appealing to me [17:26:26] <_joe_> vgutierrez: as the person who makes the sausage, what do you think? [17:27:11] <_joe_> the idea would be to install an internal CA with ACME support, and then use certcentral to deliver those to the hosts that need them [17:27:59] that should be feasible [17:28:10] but deploying an internal CA with ACME support is not a trivial task [17:28:27] <_joe_> yup [17:28:35] dunno if you have boulder in mind or another implementation [17:28:38] <_joe_> anyways, we need to prioritize work on a PKI [17:28:44] that's right [17:28:50] <_joe_> vgutierrez: no idea, tbh [17:29:15] <_joe_> it seemed nice to leech certcentral's ability to interact nicely with puppet [17:29:44] yep [17:30:00] sadly cfssl doesn't support ACME (yet) [17:30:26] dunno if we could work in that direction, or analyze if boulder could be a good fit here in WMF [17:32:18] I'm not sure I'd describe it as 100% 'nicely', the metadata thing on the server is using a different format from the usual puppet one, the client end requires style lint ignore lines, and I think I guessed at some of the parameters for the metadata API part until Puppet was happy [17:33:52] yeah.. but TBH it would be nice to benefit from those efforts for both internal and publicly trusted certficates [17:52:56] 10Traffic, 10Operations, 10ops-eqsin: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10BBlack) Update from SRE meeting today - memtest was successful, and we're asked to put it back in production and see if the error happens again or not. Re-pooling! [18:04:36] 10Traffic, 10Operations, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Dzahn) [18:05:26] 10Traffic, 10Operations, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Dzahn) icinga-old has been removed from DNS and was only temporary. T209738 https://gerrit.wikimedia.org/r/#/c/operations/dns/+/474392/ so it does... [18:08:10] 10Traffic, 10Operations, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Krenair) [18:08:30] 10Traffic, 10Operations, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Krenair) [19:19:31] bblack: eqord-codfw and codfw-ulsfo links have overlaping maintenances at 7am UTC tomorrow (and possibly 7am Tuesday), which means no more physical link redundancy for ulsfo (but still a backup GRE tunnel), and ulsfo bouncing through eqiad to reach codfw. I think we should either depool ulsfo, or redirect its caches to eqiad. [19:20:06] (thanks Jaime for the head's up) [20:02:31] 10Traffic, 10netops, 10Operations, 10Goal: Increase network capacity (2018-19 Q2 Goal) - https://phabricator.wikimedia.org/T207668 (10ayounsi) [20:03:02] 10netops, 10Operations: Rack/Setup new codfw QFX5100 10G switch - https://phabricator.wikimedia.org/T197147 (10ayounsi) [20:03:05] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:03:43] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:42:34] 10netops, 10Operations, 10ops-codfw: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) p:05Triage>03Normal [20:43:06] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [20:43:13] 10netops, 10Operations, 10ops-codfw, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) [20:43:40] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [21:50:59] 10netops, 10Operations, 10ops-codfw: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) p:05Triage>03Normal [21:51:09] 10netops, 10Operations, 10ops-codfw: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) a:05ayounsi>03Papaul [21:51:16] 10netops, 10Operations, 10ops-codfw: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) a:05ayounsi>03Papaul [21:51:38] 10netops, 10Operations, 10ops-codfw: codfw row B recable and add QFX - https://phabricator.wikimedia.org/T210456 (10ayounsi) [21:51:40] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [21:52:03] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [23:01:03] XioNoX: it's interesting that those links have identical overlapping maintenances scheduled 2 days in a row, with one doing 0-6 and the other 1-5.... [23:01:48] XioNoX: if I had to guess with no other information, I'd assume they share a physical cable that's being worked on by whoever provides them the cable, and just have differing policies about how many hours to leave at the edges [23:02:15] bblack: it's maybe internal policy, the 2nd maintenance is only a backup date if the 1st day can't happen [23:03:10] yeah they do cite incompatible supposed reasons for the maintenance [23:03:34] anyways, so we're 9 hours out from the first 6 hour window? [23:04:47] oh 7 hours [23:04:59] and the maintenance locations are different, one is in Phoenix, the other Dallas & Kansas City [23:05:30] 7h from the 1st one, 8 from the overlap [23:05:51] ok [23:07:17] yeah, I think we might as well depool it now and then see how things look after the window during the US daytime tomorrow and decide then whether to repool or just wait for the next window. [23:07:39] ulsfo's role isn't generally very critical for anyone's perf at the moment [23:07:46] (assuming no other sites are lost!) [23:10:09] 40 extra miliseconds for me to go to codfw! [23:11:34] :) [23:14:04] bblack: https://gerrit.wikimedia.org/r/c/operations/dns/+/475910 [23:14:19] go for it! [23:15:28] merged and logged [23:19:14] I'm surprisingly not seeing a big drop on https://grafana.wikimedia.org/dashboard/db/load-balancers?orgId=1&from=now-5m&to=now [23:19:41] it's usually more pronunced, no? [23:21:11] whatever resolved that coffee shop uses still redirects me to ulsfo [23:34:56] 10netops, 10Operations, 10ops-codfw: codfw row D recable and add QFX - https://phabricator.wikimedia.org/T210467 (10ayounsi) p:05Triage>03Normal [23:35:10] 10netops, 10Operations, 10ops-codfw: codfw row D recable and add QFX - https://phabricator.wikimedia.org/T210467 (10ayounsi) [23:35:12] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi) [23:35:27] 10netops, 10Operations, 10ops-codfw: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489 (10ayounsi)