[07:05:50] 10Traffic, 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4252716 (10Vgutierrez) >>! In T182993#4248709, @Ottomata wrote: > Hm, ya, sounds like a way off before we get that in Debian then, ya? Is that... [07:11:01] vgutierrez: o/ [07:11:50] thanks a lot for the work in --^ [07:12:27] if there are other steps to take (like package + deploy the new librdkafka, change vk's tls config etc..) can we make a list in the task's description? [07:12:44] something like: required to drop IPSEC vs nice to have vs etc.. [07:13:05] so we'll (we as analytics/traffic) know how to prioritize [07:17:05] elukey: I've still work to do on my side, basically reviewing the JVM side [07:18:08] vgutierrez: yep yep, I was just suggesting a list of things so we'll know more or less the steps to do [07:18:40] the vk/tls part to deploy your librdkafka feature could be done in parallel right [07:18:43] ? [07:19:09] indeed [11:33:03] 10Traffic, 10Operations: Package libvmod-re2 - https://phabricator.wikimedia.org/T196355#4253372 (10ema) p:05Triage>03Normal [11:54:07] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team: TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4253412 (10Krinkle) [12:49:03] 10Traffic, 10netops, 10Operations, 10ops-ulsfo: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030#4253597 (10ayounsi) >>! In T196030#4246171, @RobH wrote: > I replaced both of the optics with wholly different optics and a wholly different fiber cable. So these are using a second set o... [13:30:21] 10HTTPS, 10Traffic, 10Wikimedia-Site-requests: Wikimedia Hungary's website should use HTTPS - https://phabricator.wikimedia.org/T196368#4253743 (10Bencemac) [13:30:39] wut? [13:31:46] 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Site-requests: Wikimedia Hungary's website should use HTTPS - https://phabricator.wikimedia.org/T196368#4253761 (10Urbanecm) This domain is not controlled by Wikimedia Foundation, is it? [13:33:51] 10HTTPS, 10Traffic, 10Operations, 10Wikimedia-Site-requests: Wikimedia Hungary's website should use HTTPS - https://phabricator.wikimedia.org/T196368#4253743 (10Vgutierrez) we don't control the domain AFAIK nor the server where is hosted (193.218.98.220 / dyna-220.sx5.cable.tolna.net) [14:09:29] 10Traffic, 10Operations: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253860 (10Vgutierrez) p:05Triage>03Normal [14:26:28] 10Traffic, 10Operations: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253860 (10Jdforrester-WMF) Maybe combine the two so as to be something to give said IT admins something to go on? > Wikipedia is tightening its security measures,... [14:32:38] 10Traffic, 10Operations: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253947 (10Vgutierrez) @Jdforrester-WMF the short message should be addressed to non-technical users on their language (if possible) but we will be also providing a... [14:38:55] 10Traffic, 10Operations: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253954 (10BBlack) English grammar nits: it would be `forward secret ciphers` (meaning "ciphers which have the property of forward secrecy"). But these terms "forwa... [14:49:11] <_joe_> !log restarting low-traffic pybals in eqiad,codfw for adding the videoscaler VIP [14:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:45] 10Traffic, 10Operations: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4254124 (10Vgutierrez) Long explanation: ``` We have removed support for non forward secret ciphers, specifically AES128-SHA, which your browser software relies on... [15:52:40] do we currently have a list of performed TLS deprecations on TLS? like a timeline? or just https://wikitech.wikimedia.org/wiki/HTTPS/3DES_Deprecation ? [15:55:13] 3DES is the only real one we've been through [15:55:43] there's a very long and tangled list of changes that happened before that, but the difference is none of them amounted to any notable user impact [15:56:00] ack, so I think I'm going to create something like HTTPS/Hardening timeline on wikitech [15:56:10] we'd analyze and find we could make some improvement to the situation and not really affect anyone (or the affected were a rounding error several zeros below 0%) [15:56:26] 3DES, AES128-SHA && TLS 1.0 being the main events [15:56:40] if we're trying to paint a longer-view timeline though, probably some of those other past improvements should be there [15:56:46] and at some point I guess we get rid of TLS 1.1 as well [15:56:49] (the ones that didn't kick out significant users) [15:57:14] so it's less a "timeline of dumping users", and more a "timeline of positive changes, a few of which necessarily dumped some users" [15:57:37] "timeline of improving users privacy" [15:57:46] it could begin on 2014 moving everything to HTTPS :) [15:57:59] right [15:58:10] or even earlier. there were initial efforts just to deploy optional TLS at all [15:58:23] 2015 we forced it on w/ redirects+HSTS, then we extended HSTS [15:58:44] and there's been a bunch of other little changes along the way since, most of which didn't cut off any userbase fraction. [15:59:11] arg.. meeting /o\ [15:59:14] * vgutierrez running late [15:59:30] e.g. when we got STS-preload going for various major domains we own, when we started rejecting insecure POST traffic, when we turned on OCSP-stapling, when we started embedding SCTs, etc.... [16:00:02] and then maybe a few of the major ciphersuite change highlights, e.g. when we flipped DHE to 2048-bit and broke Java6 [16:00:19] (which wasn't so much user facing, but did face some other clients) [16:18:12] 10Traffic, 10Operations, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#4254340 (10Joe) >>! In T91820#4218746, @Krinkle wrote: > There are cases where a cookie doesn't work (specifically, for th... [16:21:49] 10Traffic, 10Operations, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#4254387 (10BBlack) Well, a potential lesser goal that involves fewer moving parts would just be to loadbalance non-session... [16:28:35] 10Traffic, 10Operations, 10Wikimania-Hackathon-2018, 10Availability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820#4254410 (10Joe) >>! In T91820#4254387, @BBlack wrote: > Well, a potential lesser goal that involves fewer moving parts wou... [16:28:52] <_joe_> bblack: I agree that would be a better solution, not 100% sure if that covers all edge cases [16:29:50] <_joe_> but intuitively I'd say that's the case [16:30:58] _joe_: right, there could still be some edge case involving login/session-creation where some critical un-sessioned GET happens that must be master-only [16:31:24] I don't have a good handle on the flow of session-creation and migration between domains by centralauth, etc... [16:31:35] <_joe_> me neither [16:32:33] but maybe given all these doubts and questions, we should at least start with this simpler variant, and then if we can achieve that, look at how much further we can go from there... [16:33:00] <_joe_> yes [16:33:04] <_joe_> that makes sense [16:33:23] <_joe_> and improve progressively [16:35:38] <_joe_> bblack: any objections to making proxy_read_timeout parametrizable in tlsproxy::instance? [16:35:48] <_joe_> I need to raise that value for the videoscalers :P [16:36:48] fine by me, just make sure my template outputs don't pointlessly-change for the current/default cases :) [16:36:55] <_joe_> yeah ofc [16:37:15] <_joe_> I need a proxy_read_timeout of 86400 there, specifically [16:37:43] that seems insane, but whatever :) [16:38:10] lol @ grep results, there are worse examples! [16:38:12] modules/profile/templates/etcd/tls_proxy.conf.erb: proxy_read_timeout 365d; [16:38:21] <_joe_> I was about to tell you :D [16:38:24] :O [16:38:31] human backed services! [16:38:36] <_joe_> that's because etcd can be watched indefinitely [16:38:51] <_joe_> so you have an open connection with no data flowing forever :P [16:39:01] <_joe_> 365d seemed like a safe default [16:39:12] yeah but surely you want connections to refresh once in a blue moon on their own, otherwise graceful restarts never completely on their own, etc... [16:39:19] s/completely/complete/ [16:40:02] maybe there are better places to control that though, proxy_read_timeout is probably more like a between-bytes timer [16:40:12] <_joe_> bblack: it is exactly that [16:40:16] http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout [16:40:31] <_joe_> and if you don't raise that, the connection for watchers will be closed abruptly [16:40:37] <_joe_> and pybal doesn't like that [16:40:52] <_joe_> we will get rid of that damn nginx once we move to etcd3 [16:42:09] software that makes HTTP-over-TLS-over-TCP connections should handle random disconnect gracefully and correctly [16:42:36] <_joe_> yeah, but sadly reality kicks in :P [16:43:19] <_joe_> I will try to get to work on a migration to etcd3 next FY [16:43:26] <_joe_> probably not in the first quarter [16:43:57] <_joe_> that can work without the proxy [16:44:13] <_joe_> as it can do RBAC properly and without a perf hit on unauthenticated reads [16:44:21] well there's arguably downsides to moving away from proxies, too [16:44:33] but it's just something we'll have to look at case-by-case [16:44:46] proxies have the general downside of complexity and separated moving parts, etc.... [16:44:51] <_joe_> yeah, in the case of etcd, it has pretty good encryption defaults AFAIR [16:45:09] but the upside is we control compatibility and bugfixes and security with one shared TLS implementation/configuration [16:45:31] <_joe_> that's why I use tlsproxy::instance whenever I can [16:45:49] take the proxy out and use 42 different applayers' own TLS implementations -> configuring them all sanely, and vetting that they use sane libraries and use the APIs well, etc, etc... [16:46:01] imagine the kafka java-TLS sec review stuff X lots of cases. [16:46:07] <_joe_> yeah [16:46:12] *sigh* [16:46:30] _joe_: etcd3 gRPC cannot play properly with nginx? [16:47:09] <_joe_> vgutierrez: nginx still doesn't properly support grpc IIRC [16:47:35] https://www.nginx.com/blog/nginx-1-13-10-grpc/ [16:47:39] out there in 1.13.10 [16:47:42] <_joe_> but even if it does, do you prefer we use that *instead* of the native TLS+RBAC functionality? it's a very bad idea IMHO [16:47:43] this is all the fallout of a deeper problem in this area, which is that there are a number of distinct TLS (and various underlying crypto layers) implementations in the world, they're all imperfect, their APIs are all hard to use for sane/best-practice usage, their configurations misalign, etc.... [16:47:54] <_joe_> and no, you can't separate RBAC from TLS on etcd either [16:48:11] it's really on the field of TLS implementation libraries that the blame for this rests, not the application developers necessarily (although they can be faulty at their own level separately as well!) [16:49:07] <_joe_> go tls libraries seemed well implemented last I checked [16:49:12] https://golang.org/pkg/crypto/tls/#pkg-note-BUG [16:49:25] as long as we go with the proper cipher suites O:) [16:50:36] <_joe_> vgutierrez: I guess they use grpc, probably with defaults [16:50:43] <_joe_> anyways, we will check when it's time [16:50:48] sure [16:51:04] <_joe_> for now, we must live with the long proxy_read_timeout on v2 [16:51:12] anyways.. go tls implementation at least on the performance side is pretty awesome :) [16:55:36] yeah I guess you could dig a layer deeper and even say that the standards have historically been awful too. Even if you try to envision the best APIs/Configs you can for tlsv1.[012], you've got to sanely offer varying levels of security config without bogging down in 100 little details [16:55:48] about compat/sec tradeoffs and such, varying risk models [16:56:06] I think tlsv1.3 helps a lot, but we're a long way from killing tlsv1.2 I think [16:56:36] hopefully when tls 1.3 is on the image, we'll drop completely support for < tls 1.2 [16:56:42] s/image/picture/ [17:00:33] yeah something like that [17:00:54] and then the next battle will be dropping all the odler ciphers from tlsv1.2 (the ones that don't exist in tlsv1.3 because they're not AEAD) [17:01:00] s/odler/older/ [17:01:06] right now tls 1.1 usage is pretty low.. I think is something feasible [17:08:28] (IE11 and some android devices) [20:01:26] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4255076 (10Imarlier) [20:04:17] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4255119 (10Krinkle) p:05Triage>03Low [20:25:42] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4255150 (10BBlack) Speaking for the big unified certs we get from commercial vendors: we generally do wait ~24h (usually longer?) , between the issue date of new major... [20:30:12] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248#4255184 (10BBlack) As for the rest, especially with the one-offs using LetsEncrypt scripting today, we definitely don't have this kind of resiliency, or any kind of dep... [21:26:40] 10Traffic, 10Operations, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4255350 (10Johan) a:03Johan The message should also clearly state that this means they won't be able to access Wikipedia in the future (or won't be... [22:21:09] 10Traffic, 10Operations, 10ops-eqiad: rack/setup/install cp1075-cp1090 - https://phabricator.wikimedia.org/T195923#4255477 (10BBlack) [22:27:28] 10Traffic, 10Operations, 10Wikimedia-Hackathon-2018, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962#4255518 (10Krenair) I looked at the puppetmaster apache config and noticed this line: ``` # If Apache complains about invalid signature... [22:29:41] ^ vgutierrez: We *are removing*, otherwise they won't be able to view the error…