[00:04:59] 10netops, 10Operations, 10observability: Determine & implement near-term method for escalating network alerts - https://phabricator.wikimedia.org/T237587 (10ayounsi) > Interface saturation See also T224888 > What else is in scope here? That's everything I have in mind right now. > In terms of “how” I can t... [00:27:04] 10netops, 10Operations, 10Wikimedia-Incident: Improve resiliency of the eqsin transport link - https://phabricator.wikimedia.org/T236878 (10ayounsi) 05Open→03Resolved a:03ayounsi Damping configured. [01:40:38] In case anyone is still working… wmcs could use a review of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/549058/ [01:40:49] (It's blocking us rotating a cert, which is soon to expire) [01:41:14] I made a note to ping here at about 9 this morning apparently this is how long it takes me to remember a thing [01:51:38] andrewbogott, Brandon already +1'd it? [01:52:52] hm, so he did! [01:52:55] so… nevermind :) [08:40:14] 10Traffic, 10Operations: ATS skipping certain logs due to lack of buffer space - https://phabricator.wikimedia.org/T237608 (10ema) [08:40:23] 10Traffic, 10Operations: ATS skipping certain logs due to lack of buffer space - https://phabricator.wikimedia.org/T237608 (10ema) p:05Triage→03Normal [09:02:59] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Update webrequest_128 dataset in turnilo to include TLS fields once available - https://phabricator.wikimedia.org/T237117 (10JAllemandou) Done! Hidding the turnilo aweful link under [[ https://turnilo.wikimedia.org/#webrequest_sampled_... [09:04:42] vgutierrez: --^ [09:11:58] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) Done! Example: ` spark2-shell --master yarn --driver-memory 4G --executor-memory 8G --executor-cores 4 --conf spark.dynamicA... [09:33:33] elukey: \o/ [09:41:55] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10observability: Update webrequest_128 dataset in turnilo to include TLS fields once available - https://phabricator.wikimedia.org/T237117 (10Vgutierrez) @BBlack I'm seeing some "nil" values on the TLS KeyExchange field when AES128-SHA is being us... [09:43:23] elukey: \o\ |o| /o/ [09:51:29] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10observability: Update webrequest_128 dataset in turnilo to include TLS fields once available - https://phabricator.wikimedia.org/T237117 (10Vgutierrez) I'm already loving the data, thanks @JAllemandou <3 [10:15:22] elukey: it looks like we have a lot of requests with all TLS fields set to "null" [10:16:35] with http status != (301, 403) [10:16:41] that's pretty weird [10:21:35] vgutierrez: I think that only recent data has the new fileds set, the rest might be null [10:22:28] yes try to split by say time and TLS version [10:22:35] you'll see when the new data comes in [10:22:59] yesterday at around 21 UTC more or less [10:23:11] yep [10:23:15] thx :) [10:23:49] np! [10:25:23] it's wonderful to see that data there already :D [10:25:45] TLSv1.0 deprecation suddenly looks almost feasible I'd say [12:52:54] yeah the data is already fascinating, thanks so much analytics [12:53:20] we'll need a week of data to really draw useful conclusions, and probably some hours of digging [12:54:12] but early insights are interesting. e.g. the bulk of DHE key exchanges are not, as feared/expected, legit Android 2.x devices. The top UAs are a known bot we can contact and a bunch of modern browsers (meaning they're just reporting as DHE because they're stuck behind awful "security" gateways) [13:08:38] or another: in the data so far, ~83% of requests using TLS < v1.2 belong to the user agent "-", with the next most-popular UA at sub-1% for some java apache-httpclient thing (not a human) [13:10:28] (and over half of the TLSv1.[01] with UA "-" are using X25519 key exchange, which is even more-modern than TLSv1.2, so they're very fishy and it's likely a conscious choice on some vendor's part...) [13:13:52] or another angle: of the ~3% TLS <1.2 in the current data, ~96% of those are marked as is_pageview:true :) [13:14:04] err I meant, is_pageview:false [13:14:17] (because they lack a UA or have other indicators that make us say not a normal pageview request) [13:14:49] fascinating stuff, and paints a whole different picture than the limited and erronous stats we had before [13:17:11] (but again, we'll need a week of data just to see reliably all the day/night/timezone and workweek/weekend cycles and get a fuller picture of the data) [14:30:46] bblack: expect possible emails from people at NS1 I met at Velocity who want to chat with you about DNS. I encouraged them to open-source some stuff they've made [14:31:22] gilles: ok, thanks for the heads up :) [14:34:20] they have something proprietary that does similar things as gdnsd + supports DNSSEC with on-the-fly signing [14:35:46] I don't have more details than that, but they seemed open to the idea of open-sourcing some of that stuff if it's useful to us and we'd potentially contribute upstream [14:41:36] we can always take a look and see! [14:43:26] (but obviously there's a lot of details to look at there before we'd consider a switch, on the technical/features front, on the resiliency/quality of the code, and what "open sourcing" really means in terms of throw-it-over-the-wall vs really building for a community of public users) [15:25:42] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10herron) Since it looks like cp3056 might be down for some time could we remove it from the config until fixed? It would be good to let the ipsec checks in icinga return to green. https://i... [15:37:20] test-lb/text-lb [15:42:05] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams, 10Patch-For-Review: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10BBlack) Sorry I missed that you already had a patch! But in any case, we only need commenting from cache::nodes to fix up this case (there's no good reason to e.g. chu... [15:43:17] XioNoX: yes? :) [15:44:54] bblack: one one side I like the simplicity of the name, on the other I'm thinking that it might confuse people [15:45:18] but I don't want to bikeshed :) [15:45:43] yeah but nobody should be really looking at this or touching it anyways, it's just a convenience label for an IP we're going to use in some very manual testing [15:45:58] indeed! [15:46:14] maybe I should put comments on all of it that it's temporary and will be removed again shortly or something? [15:46:55] (so people don't get the wrong idea and infer they should set up some other testing-related things against it or whatever) [16:02:21] is somone working on cp3055? Icinga puppet check is UNKNOWN - https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cp3055&service=puppet+last+run [16:03:09] well, nevermind, back to green [16:13:30] 10Traffic, 10Operations: Renew and deploy GlobalSign unified cert (2019) - https://phabricator.wikimedia.org/T237650 (10BBlack) p:05Triage→03High [16:13:49] 10Traffic, 10Operations: Renew and deploy GlobalSign unified cert (2019) - https://phabricator.wikimedia.org/T237650 (10BBlack) [16:25:13] ema: yoohooo [16:25:28] got some qs about tls termination and discovery urls and routing with ats vs vanrish etc. [16:25:43] i'm talking to _joe_ and he said I should ask you [16:25:49] basically [16:25:49] https://gerrit.wikimedia.org/r/c/operations/puppet/+/549177 [16:25:55] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams, 10Patch-For-Review: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10herron) >>! In T236497#5644446, @BBlack wrote: > Sorry I missed that you already had a patch! But in any case, we only need commenting from cache::nodes to fix up this... [16:26:17] my q atm is: should I set up https for my nginx server on the schema[12]00x nodes? [16:26:35] or is there some frontend LVSish related tls termination i can use? [16:26:47] so clients can do e.g. https://schema.svc.eqiad.wmnet [16:28:47] ottomata: I don't know details / haven't used it myself but profile::tlsproxy::envoy might be of some help to you [16:28:56] cool ty looking [16:29:06] <_joe_> cdanis: he already has nginx [16:29:11] oh okay [16:29:22] <_joe_> and envoy doesn't serve static files either [16:29:23] ya but it might be nice to use a separate thing [16:29:27] oh oh [16:29:31] but its jsut a proxy no? [16:29:41] <_joe_> yes [16:29:46] <_joe_> I mean it can't natively [16:30:02] <_joe_> please don't put two layers of http servers in front of static files :P [16:30:04] aye, might be nice to have in front of nginx? the choice of nginx vs apache or whatever was really just a whim. [16:30:11] might be nice to have the https separate in case we change [16:30:18] hah [16:32:26] if I use nginx ssl, do I need some stuff to provision a custom cert? [16:32:30] or can I use the wildcard somehow? [17:31:24] 10Traffic, 10Operations, 10Puppet, 10User-jbond: Serve volatile uri from local site - https://phabricator.wikimedia.org/T235427 (10jbond) 05Open→03Resolved a:03jbond This has now been implmented [18:15:44] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, and 2 others: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10Krinkle) Using the above method, I now get 11,086 unique fields in the dropdown menu. That's significantly more than last month. The fie... [18:17:19] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, and 2 others: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10Krinkle) [19:05:58] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, and 2 others: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10EBernhardson) >>! In T189333#5488005, @Krinkle wrote: >>>! In T189333#5483346, @fgiunchedi wrote: >>>>! In T189333#5481492, @Krinkle wrot... [19:20:17] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3056 hardware issue - https://phabricator.wikimedia.org/T236497 (10wiki_willy) a:03RobH [20:58:17] 10Traffic, 10Operations: ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10BBlack) p:05Triage→03High [22:38:51] 10Traffic, 10Operations: ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10BBlack) Reading up on the `debug_proxy` stuff a bit more.... currently hassium/hassaleh are proxies into mwdebug[12]00[12], and use the header to select the destination host, and also has some backward... [22:39:47] 10Traffic, 10Operations, 10Performance-Team (Radar): ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10Krinkle) [23:06:25] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) 05Open→03Resolved [23:06:46] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10observability: Update webrequest_128 dataset in turnilo to include TLS fields once available - https://phabricator.wikimedia.org/T237117 (10Nuria) 05Open→03Resolved [23:06:50] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) [23:06:55] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10observability: Update webrequest_128 dataset in turnilo to include TLS fields once available - https://phabricator.wikimedia.org/T237117 (10Nuria) [23:13:57] vgutierrez: yeah I see the tiny keyexchange=nil in turnilo too ... but maybe we should try to figure out what those are [23:14:10] or ignore them, if they remain tiny [23:14:31] so they come from aes128-sha users [23:17:37] dhe-rsa-aes-128 shows up with tiny usage on the ats instance drilldown as well [23:18:24] https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?panelId=47&fullscreen&orgId=1&var-site=eqiad%20prometheus%2Fops&var-instance=cp1075&var-layer=tls [23:35:38] 10Traffic, 10Operations, 10Performance-Team (Radar): ATS doesn't support X-Wikimedia-Debug - https://phabricator.wikimedia.org/T237687 (10BBlack) Maybe this is closer to a Lua replacement for all of it, although it still has issues! ` local debug_map = { '1' => 'mwdebug1001.eqiad.w...