[07:53:05] 10Acme-chief: acme-chief fails to issue certificates against LE staging environment - https://phabricator.wikimedia.org/T219414 (10Vgutierrez) The same behaviour is observed with acme 0.32.0 [07:58:08] 10Acme-chief: acme-chief puppetization fails to provide a valid config with multiple accounts and an explicit default account - https://phabricator.wikimedia.org/T219482 (10Vgutierrez) p:05Triage→03Normal [08:10:41] https://www.irccloud.com/pastebin/x1xCyNRU/ [08:10:55] as suspected the unified certificate has been issued on the first attempt against LE production environment [08:14:22] 10Acme-chief, 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) ` root@acmechief1001:~# openssl x509 -text -noout -in /var/lib/acme-chief/certs/unified/live/rsa-2048.crt Certificate:... [08:59:54] 10Traffic, 10Operations, 10Performance-Team: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10Gilles) [09:15:29] Krenair: I think that a simpler approach by now would be enough [09:15:54] akka adding acme_chief::cert resources and configuring ocsp stapling properly [09:16:15] Krenair: but maybe it's useful to do it on a different CR [09:16:23] and resume yours later in the future [09:16:45] vgutierrez, this is for the acme-chief unified cert puppetisation? [09:16:52] indeed [09:18:19] true [09:23:19] I've summed up the experience with ATS so far on https://phabricator.wikimedia.org/phame/post/view/115/switching_production_traffic_to_apache_traffic_server/ FYI! [09:26:37] \o/ [11:22:37] nice! [13:10:09] Krenair: BTW, I've fixed authorized_regexes for the wikiba.se cert, nice catch [13:10:22] it should be ready to be merged [13:10:32] vgutierrez, was this the eqsin one with hostnames from other regions? [13:10:37] er, DCs [13:11:14] indeed [14:09:58] certificate for wikibase. issued succesfully [14:10:35] wooohoo. this is cool vgutierrez. [14:10:39] Mar 28 14:08:58 acmechief1001 acme-chief-backend[18655]: Number of certificates per status: Counter({'VALID': 28, 'NEEDS_RENEWAL': 2, 'INITIAL': 2}) [14:10:50] on config reload it detected that gerrit cert needed to be renewed as well [14:11:14] everything went as expected.. gerrit cert got renewed and wikiba.se cert issued without issues [14:17:45] 10Acme-chief, 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) ` vgutierrez@acmechief1001:~$ sudo -i openssl x509 -text -noout -in /var/lib/acme-chief/certs/wikibase/live/rsa-2048.crt... [16:00:53] 10Traffic, 10Operations, 10Performance-Team: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10ayounsi) The first step when looking at peering with a provider is to check if we're both present at a common exchange point. You can see where we are present on https:... [16:07:04] Krenair: hmmm so as bblack wisely reminded me, we need the unified cert just hanging in the cp servers BUT we need to use the acme-chief wikibase cert [16:07:37] I think I'm going to use your change as a base to accomplish that [16:13:40] ok [16:24:17] anyone know what's up with the cr2-eqdfw SomeLoss alerts from smokeping half an hour ago? https://smokeping.wikimedia.org/smokeping.cgi?target=codfw.Core.cr2-eqdfw [16:24:29] XioNoX: ^^ [16:25:25] yeah, I've been looking at it, seems like we're seeing some v6 packet loss between codfw and eqdfw [16:25:49] ah, and v4, that's better [16:38:29] 10Traffic, 10Operations, 10Performance-Team: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10Gilles) Thanks for the details. I don't have permission to access T186835 Is this manual work something I can do myself? [16:44:05] I'll monitor and contact CyrusOne if the issue persists [16:49:26] 10netops, 10Operations: Add eqsin routing special cases to jnt - https://phabricator.wikimedia.org/T211930 (10ayounsi) a:05faidon→03ayounsi Moving forward on that as the latest plan (taking the feedback into consideration) is anyway better than what we currently have deployed in Singapore. [16:55:58] vgutierrez, ? [16:56:19] Krenair: yeah, I've used your change and modified it to allow deploying both of them (certs + acme-chief) and only use acme-chief if certs is empty [16:57:21] ok [16:58:37] f [16:59:01] https://www.irccloud.com/pastebin/2FDnWnQy/ [16:59:20] that's the unified config for cp5007 according to pcc https://puppet-compiler.wmflabs.org/compiler1002/15411/ [17:01:13] thanks for the warning though, you make me realise that in PS11 I forgot to update the unified.pp :) [17:03:06] *made [17:24:06] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Addshore) >>! In T217897#5062728, @Smalyshev wrote: >> the cache we are talking about there would be unnecessary... [17:25:35] 10Traffic, 10Operations, 10Performance-Team: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10ayounsi) (Added you to the task) In some measure, yes. Getting the routing table is the most complicated part. As a one of, I've been SSHing directly to the routers, bu... [17:35:13] pcc wikibase site config for cp1008 looks promising [17:35:36] https://www.irccloud.com/pastebin/YSLIJXzf/ [17:44:09] bblack: initial approach for serving wikiba.se in cp1008 -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/499825 [17:44:48] I've reused Krenair job to deploy acme-chief certs with tlsproxy::localssl [17:45:46] but allowing certs + acme_chief at the same time (certs are used for serving traffic and acme_chief certs hang in the server), if certs is empty then acme_chief certs are used (like in the wikiba.se use case) [17:46:25] after validating on cp1008, it should be as easy as moving the include profile from role::cache:canary to role::cache::text [17:47:40] vgutierrez: ah maybe I misspoke earlier [17:47:59] so there's two different kinds of things we're trying to get going in tlsproxy with the acme certs: [17:48:41] 1) Getting the new unified to deploy alongside the static ones (tricky, and if we can just deploy it alongside for now and not config nginx yet, that's ok) [17:49:16] 2) Getting the wikiba.se case to deploy "alongside" all the unifieds in a different sense of "alongside", meaning two different localssl's with unified as default [17:49:44] yeah [17:50:08] oh, I see now [17:50:16] you did get that, I just mis-read what you wrote above a bit :) [17:51:11] that's what I've implemented [17:51:15] :) [17:51:17] yeah, got it now [17:51:25] ok, so yeah, that looks pretty sane [17:51:53] maybe try tomorrow morning, with puppet disabled on all of cache_text, then depool one node and puppetize it to check sanity? [17:52:09] (tomorrow morning your time I mean) [17:52:23] and then roll it out progressively in chunks if things keep looking fine [17:52:53] we also need the hookup from cache_text -> webserver_misc_static, but that's easy to do (like 15.wikipedia.org config for cache_text hierdata) [17:53:30] current code only messes with the canary role [17:53:33] and then we can go back to the ticket and tell them we're ready for moving wikiba.se and get them to manually test (/etc/hosts hacks or curl --resolve, etc) and update the git repo if they haven't in a while, etc. [17:53:49] vgutierrez: yeah, what I mean is push that + move it to cache::text [17:53:53] so we can play with cp1008 without affecting the text nodes [17:53:54] ack [17:53:59] (tomorrow) [17:54:12] yeah.. I'll coordinate with ema [17:54:16] awesome [17:59:29] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) > WDQS does know what the latest version of the entity that it is trying to get updates for is, But "... [22:54:48] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10VisualEditor, 10User-Ryasmeen: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200/Loading failed for the