[07:41:50] 10Traffic, 10DNS, 10Operations: rack/setup/install authdns1001.wikimedia.org - https://phabricator.wikimedia.org/T196693 (10Vgutierrez) @MoritzMuehlenhoff ack, thanks for pinging us [07:58:43] so.. seeing https://grafana.wikimedia.org/dashboard/db/dns?orgId=1 [07:58:47] we just need to decomm radon [08:03:04] 10Traffic, 10Operations, 10ops-codfw: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Vgutierrez) [08:03:11] 10Traffic, 10Operations, 10decommission, 10ops-codfw: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Vgutierrez) [08:05:23] 10Traffic, 10DNS, 10Operations: rack/setup/install authdns1001.wikimedia.org - https://phabricator.wikimedia.org/T196693 (10Vgutierrez) 05Open>03Resolved [08:05:36] vgutierrez: yeah, yesterday when we rebooted the dnsauth hosts for kernel updates, Arzhel redirected the eqiad traffic to dnsauth1001 [08:05:36] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission radon - https://phabricator.wikimedia.org/T202040 (10Vgutierrez) [08:05:54] moritzm: yup.. I've seen that on the SAL :D [08:41:04] 10Traffic, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission radon - https://phabricator.wikimedia.org/T202040 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` radon.wikimedia.org ``` The log can be found in `/var/... [09:19:06] 10Traffic, 10Operations: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema) [09:19:35] 10Traffic, 10Operations: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema) p:05Triage>03Normal [09:19:55] 10Traffic, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission radon - https://phabricator.wikimedia.org/T202040 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['radon.wikimedia.org'] ``` and were **ALL** successful. [09:20:08] 10Traffic, 10Operations, 10ops-esams: cp3032 PS Redundancy Lost - https://phabricator.wikimedia.org/T202046 (10ema) [09:23:15] 10Traffic, 10Operations, 10decommission, 10ops-eqiad: Decommission radon - https://phabricator.wikimedia.org/T202040 (10Vgutierrez) [09:24:22] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3049.esams.wmnet', 'cp2001.codfw.wmnet'] ``` The log can be found in `/var/l... [09:26:08] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` cp4023.ulsfo.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/201808... [09:55:34] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2001.codfw.wmnet', 'cp3049.esams.wmnet'] ``` and were **ALL** successful. [09:58:14] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4023.ulsfo.wmnet'] ``` and were **ALL** successful. [09:59:12] what? ALL successful? [09:59:13] crazy [10:06:48] yeah, go home! [10:06:49] :P [10:07:10] volans is that good that we are being deprecated by cumin [10:07:25] rotfl [13:54:56] vgutierrez: https://gerrit.wikimedia.org/r/q/topic:%2522tls-levels-sanity%2522 is a stab at cleaning up ssl_ciphersuite() tlsproxy::* usage a bit, with the goal that internal-network-only things should be 'strong', public caches (and whatever toollabs stuff, leaving their stuff alone for now) are 'compat', anything else public but more one-off/technical stays 'mid' as it is. [13:56:12] and then we can look at changing things up in ssl_ciphersuite() itself, e.g. removing TLSv1.[01] in 'mid', possibly moving DHE down to 'compat', etc [13:57:00] XioNoX: are we still basically a go for moving servers off asw2-a during the upcoming hour? [13:57:45] bblack: yes, Chris is running a bit late because of traffic [13:57:47] vgutierrez: (of course jerkins-bot hates my changes, but whatever) [13:57:52] ok [13:58:46] we can sync up on -dcops [14:22:44] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp4024.ulsfo.wmnet', 'cp2002.codfw.wmnet'] ``` The log can be found in `/var/l... [14:42:26] bblack: ack [14:44:44] vgutierrez, I think we should keep very basic write-file-to-local-dir challenge support in certcentral.py for the time being [14:45:38] if nothing else it's a nice test mechanism to sort out issues with other challenge methods [14:47:02] the latest commits just have "TODO: push challenges to the servers / gdnsd" [14:47:12] meaning if merged, master won't work anymore [14:48:18] Krenair: acme_tiny was writing the challenges to disk? [14:49:05] I definitely missed that. You're right, I'll implement it [14:49:54] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4024.ulsfo.wmnet'] ``` Of which those **FAILED**: ``` ['cp4024.ulsfo.wmnet'] ``` [14:50:40] vgutierrez, yep, acme_tiny is the thing being used in prod now for various misc services [14:50:59] BTW, regarding how we should name the certificate, with and without the chain and so on, we could follow the naming proposed here: https://github.com/wikimedia/puppet/blob/production/modules/letsencrypt/manifests/cert/integrated.pp#L18-L21 [14:51:22] Krenair: yep, I'm aware, but in certcentral I've missed the fact that it was persisting the challenges on disk [14:51:35] yeah [14:51:46] acme_tiny lines 132-134 [14:51:47] wellknown_path = os.path.join(acme_dir, token) [14:51:47] with open(wellknown_path, "w") as wellknown_file: [14:51:47] wellknown_file.write(keyauthorization) [14:52:11] oh right, it was being done implicitly [14:52:28] I'll add the functionality in the acme_tiny replacement commit then [14:52:57] cool [15:03:13] asw2-a moves running over a bit into meeting time, will join after [15:03:33] ack [15:47:14] 10netops, 10Operations, 10Wikimedia-Incident: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi) [15:47:19] 10netops, 10Operations, 10Patch-For-Review: Move servers off asw2-a-eqiad - https://phabricator.wikimedia.org/T201694 (10ayounsi) 05Open>03Resolved a:03Cmjohnson [15:55:09] FYI I'm away from keys for a bit now, I have some things to take care of around the house/yard + lunch. Textable and can get back to keys pretty quick though. [15:57:46] 10Traffic, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Papaul) [16:01:09] 10netops, 10DC-Ops, 10Operations, 10cloud-services-team: Refresh switch ports descriptions for recently renamed cloud servers - https://phabricator.wikimedia.org/T201444 (10RobH) p:05Triage>03Normal [16:52:32] 10netops, 10Operations, 10ops-eqiad: Move asw2-a<->cr1 uplink back to asw-a - https://phabricator.wikimedia.org/T202075 (10ayounsi) p:05Triage>03High [21:12:07] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202 (10Jdforrester-WMF) [21:12:10] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Begin execution of non-forward-secret ciphers deprecation - https://phabricator.wikimedia.org/T192555 (10Jdforrester-WMF) 05Open>03Resolved a:03Vgutierrez Please re-open if I'm wrong. [21:12:22] 10Traffic, 10Operations, 10Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181 (10Jdforrester-WMF) [21:12:25] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202 (10Jdforrester-WMF) 05Open>03Resolved a:03Vgutierrez Please re-open if I'm wrong. [21:13:09] 10Traffic, 10Operations: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181 (10Jdforrester-WMF) I believe that the planning and execution of the work is all now complete? [21:21:24] 10Traffic, 10netops, 10Operations, 10ops-ulsfo: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030 (10RobH) Link-level type: Flexible-Ethernet, MTU: 9192, MRU: 9200, Speed: 40Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled, Flow contr...