[00:44:53] 10netops, 10Operations: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10ayounsi) p:05Triage>03Normal [05:48:43] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3038.esams.wmnet', 'cp3040.esams.wmnet'] ``` The log can be found in `/var/l... [05:53:51] ema: lvs1016 has puppet disabled since 11 days, I think related to the asw2-a-eqiad issue, but might be a forgotten leftover [06:02:09] volans: it might be, yes [06:04:15] also, interestingly, the message says bblack-pybal-stop but pybal is running [06:04:52] there are a couple of SAL here: T201694 [06:04:52] T201694: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694 [06:04:59] including restarting pybal :D [06:05:49] bblack: see above, I think puppet can be re-enabled on lvs1016 but waiting for you to come online before doing so [06:06:07] volans: thanks :) [06:06:13] yw :) [06:21:21] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3038.esams.wmnet', 'cp3040.esams.wmnet'] ``` and were **ALL** successful. [06:24:28] 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) [07:17:45] vgutierrez: o/ - whenever you have time I'd like your opinion on https://phabricator.wikimedia.org/T192639#4537062 [07:28:17] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3039.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [07:36:19] 10Traffic, 10Operations: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10ema) [07:36:28] 10Traffic, 10Operations: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10ema) p:05Triage>03Normal [07:59:49] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3039.esams.wmnet'] ``` and were **ALL** successful. [08:58:17] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3043.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [09:29:43] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3043.esams.wmnet'] ``` and were **ALL** successful. [09:34:57] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3046.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [10:06:22] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3046.esams.wmnet'] ``` and were **ALL** successful. [10:11:35] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) 05Open>03Resolved a:03ema Done! \o/ The only cache host running jessie is cp1008, which will be replaced soon by cp1099: T202966. [11:33:30] ema: yeah sounds like just an error, can re-enable on 1016 [12:07:57] if anybody has time, I'd need some advice/review about the archiva TLS LE cert switch (https://phabricator.wikimedia.org/T192639#4537062) [12:14:16] 10netops, 10Operations, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10faidon) This was logged every time a login was attempted, in netmon1002's /var/log/auth.log with this: `Aug 28 00:08:07 netmon1002 /ssh-agent-proxy[12127]: [ elukey: in https://phabricator.wikimedia.org/T192639#4537062 there's a missing step in there somewhere. Around the same time you merge the DNS-level change, you also need to merge a puppet-level change to switch: hieradata/hosts/archiva1001.yaml:profile::archiva::proxy::certificate_name: 'archiva-new' [12:15:27] elukey: or else when you re-enable and run puppet on archiva1001, it will still try to fetch a cert for archiva-new and fail [12:16:46] bblack: yes sorry I forgot to add https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/455761/! [12:17:13] just added :) [12:17:43] ok :) [12:18:00] my main questions were: 1) do I need to revoke archiva-new's cert? 2) will archiva1001 be able to get the archiva's tls cert without the need for me to copy the certs over from meitnerium? [12:25:26] 1) I don't think so. Just delete the keys when you're done on meitnerium (easiest just to rm -rf /etc/acme/) [12:26:08] 2) Yes, assuming LE doesn't have the old resolution cached. Our recdns have no impact on this (I'm guessing you're purging them for other internal/testing reasons). [12:26:55] On that point, the CNAME is currently 1H TTL. If you want to reduce the risks, you could first lower that to something much smaller like 5M, and then wait 1H before proceeding for the original 1H TTL to go away everywhere. [12:27:17] then worst case after the DNS+puppet changes, it should take up to 5 minutes before the new LE issuance works [12:27:51] ahh yes you are right, I didn't think about the TTL for "external" DNS resolutions to our auth servers [12:28:04] right, LE is external [12:28:26] yes yes, I wanted to clean up our rec dns servers for consistency, nothing more [12:28:28] of course LE probably hasn't looked it up recently anyways, but yeah [12:28:47] better to be safe and not have to explain why everyone's waiting an hour for the new archiva to come online [12:28:59] definitely [13:37:26] fyi i just meregd https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/455743/ [13:37:49] should be no-op, just adds some new directors [13:37:49] will be adding routes for them this week (probably one today) [14:50:54] 10Traffic, 10Analytics, 10Operations, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10mobrovac) While the proposed solution will work for us in this case, I second @Ottomata's thoughts that having this header (or the lack thereof) included in the lo... [15:18:08] 10netops, 10Operations, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10thcipriani) ugh! Seeing the actual error I realize that a 3rd party user brought this up previously and it was fixed in the 3rd party repo (https://phabricator.wikimedia.org/sou... [15:45:59] 10netops, 10Operations, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10Volans) Is there any reason why we are keeping two versions of the same code in different places? We should unify and use only one of them IMHO. [16:55:08] 10Traffic, 10Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) ```alex@alex-laptop:~$ openssl s_client -starttls smtp -connect mx1001.wikimedia.org:25 2>/dev/null | openssl x509 -noout -text | grep Issuer: Issuer: C = US, O = Let's Enc... [16:57:39] 10Traffic, 10Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) And actually that's the whole list on this ticket. Anything else missing @bblack or can this be closed? [17:06:48] 10Traffic, 10Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10BBlack) Yeah I think this is closeable. This was just our initial "convert all the low-hanging fruit" ticket for the previous iteration of LE support. [17:11:30] 10Traffic, 10Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) [17:12:14] 10Traffic, 10Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) 05Open>03Resolved Yep cool [18:25:50] 10Traffic, 10DNS, 10Operations, 10Mobile, 10Patch-For-Review: Many misc wikis lack mobile domains - https://phabricator.wikimedia.org/T152882 (10Dzahn) Meanwhile new wikis have been created without "m" like: wikimania.wikimedia.org has address 198.35.26.96 Host wikimania.m.wikimedia.org not found: 3(NXD... [23:46:08] 10Traffic, 10DNS, 10Operations, 10Mobile, 10Patch-For-Review: Many misc wikis lack mobile domains - https://phabricator.wikimedia.org/T152882 (10Krenair)