[09:32:17] vgutierrez: i had to slightly modify your bgp unique logging test cases to make it work across different modules [09:32:42] hmm I remember reviewing that [09:33:00] it looked good to me [09:36:13] ok, do you want to review or should I merge? [09:38:35] ha perfect, thanks [09:38:54] np :) [09:43:53] bblack: I made a fool out of diff by moving log_xcps_info far away from vcl_recv https://gerrit.wikimedia.org/r/#/c/428580/ [10:15:59] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4156787 (10Vgutierrez) @Cmjohnson we will go with stretch and raid1-lvm (modules/install_server/files/autoinstall/netboot.cfg). Could you add the production dns entries for l... [13:22:53] hmmmm [13:23:01] there's one thing I don't like about this is_pooled removal thing [13:23:11] so there's pybal's instrumentation which dumps the state of pools/servers as json [13:23:19] and it appears to be doing the right thing, not renaming etc [13:23:30] and it uses Server.dumpState: [13:23:31] def dumpState(self): [13:23:31] """Dump current state of the server""" [13:23:31] return {'pooled': self.pool, 'weight': self.weight, [13:23:31] 'up': self.up, 'enabled': self.enabled} [13:23:43] ...but really that feels like it should be 'pooled': self.is_pooled [13:41:14] sigh... IE11 cipher order puzzles me.. https://www.ssllabs.com/ssltest/viewClient.html?name=IE&version=11&platform=Win%207&key=36 [13:58:17] yes, IE has always been a notable pain point for its combination of popularity and TLS-deficiency [13:58:59] note that IE<11 can't even do AEAD ciphers at all [13:59:42] and IE==11 can only do AEAD ciphers with ECDSA certs, but not RSA certs (this is one of the key reasons we run dual-cert instead of just going RSA-only until some future ECDSA-only transition makes sense) [14:00:51] (I really wish sslllabs would get on the ball about signalling the superiority of AEAD over non-AEAD among modern-ish forward-secret ciphers, but I guess there's lower-hanging fruit for the world in general for now!) [14:00:54] luckily IE11 got a minor update at some point to improve ciphersuite preference [14:01:13] yeah but who knows where the updates get applied or not :) [14:01:34] note we don't actually care about the order of their preferences (we enforce our ordering), only what's present or not in their list. [14:01:45] hmmm [14:01:59] you're right [14:02:30] then these IE11 I'm seeing are behind some kind of evil proxy(TM) [14:02:49] yes, I've even seen lots of latest-stable-Chrome in the past like that [14:03:01] yup [14:03:05] UA says latest-stable-chrome, but negotiation is e.g. TLSv1.2+AES128-SHA [14:03:14] I've a lot of Chrome 65 behaving like that [14:03:15] which is a near-certain case of evil proxy [14:03:43] interesting to follow: 400-int increase after merging the 400-on-stupid-host-header patch https://grafana.wikimedia.org/dashboard/db/varnish-caching?orgId=1&from=1524663447735&to=1524664945916&var-cluster=All&var-site=All&var-status=4&panelId=7&fullscreen [14:04:43] in general based on what I've observed before, I tend to think anything doing TLSv1.2+AES128-SHA is almost certainly a proxy. For real implementors trying to maximize security, it's not a sensible implementation combo (in the era that they could've implemented TLSv1.2, bare AES128-SHA would not be the best cipher available from any standard TLS library...) [14:05:12] no significant 400 increase overall though, which means we're now stopping those earlier on https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=2&fullscreen&orgId=1&var-site=All&var-cache_type=All&var-status_type=4&from=1524661616395&to=1524665100457 [14:06:01] but some proxy vendors chose to keep up with TLS protocol standards to 1.2 to better-support clients, but then willfully limit ciphers to older ones (presumably to save CPU costs, but also possibly to eliminate forward-secrecy to make certain kinds of corp-internal eavesdropping easier, since their fake certs can decrypt all past-logged traffic...) [14:06:33] ema: hmm but they should increase... before we were seeing HTTP 200 with a 400 body [14:08:45] shouldn't aggregate-client-status-codes and varnish-caching always align on status code counts? [14:08:52] (same POV?) [14:09:46] oh, it makes sense now [14:10:00] varnish-caching of course is just recording the hit-v-miss-ness, not really reporting counts [14:10:13] right [14:10:44] and this patch would shift some portion of them from some kind of hit/miss/pass status to internal status on varnish-caching [14:10:56] (which the graph seems to reflect!) [14:11:53] I doubt they were a significant fraction of traffic anyways, but it may have caused a more-notable drop on the be->applayer side as a fraction of miss/pass [14:14:21] vgutierrez: I don't think the empty-host case is frequent enough to be noticeable in aggregate-client-status-codes really [14:15:07] right [14:15:27] btw, what about those nagios/icinga checks not setting the Host header? [14:15:38] vgutierrez: I haven't seen any! [14:15:55] I guess they are external checks [14:16:16] aka somebody checking their internet connectivity against us [14:16:19] yeah [14:16:34] I've only seen "Host: varnishcheck" and "Host: $ipaddr" [14:16:58] we probably broke someones icinga :) [14:17:23] should have waited for Saturday evening! [14:17:29] lol [14:20:39] bblack: ok to upgrade a text node? (https://phabricator.wikimedia.org/T192368#4153519) [14:20:47] I think our regex still accepts Host: $ipaddr [14:20:55] (not that we need to support that, it just happens to) [14:21:12] (because stupid allowance of leading digits in hostnames!) [14:21:16] bblack: yes it does (and we do have check_http requests with Host: $ip) [14:21:29] heh, we probably shouldn't have such checks [14:21:45] +1 [14:21:51] ema: yes upgrade a text [14:22:45] that's probably the best rational argument against leading digits in hostnames: confusing naive software that tries to sort out true hostnames from IPs or might treat one as the other. [14:23:03] here's one: GET http://10.168.19.133:8080/ HTTP/1.1 "Host: 10.168.19.133" [14:23:26] once you allow leading digits in parsing hostname labels "192.0.2.1" is both a legal hostname and a legal IPv4 addr [14:26:14] mmh the "Host: $ip" case also is someone's icinga [14:26:37] nice to see that various people rely on us not messing things up [14:31:55] upgrading cp3030 then [14:32:54] btw, for coherence sake, our varnish 4xx error page should be different to the 5xx error page :) [15:33:10] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158084 (10Marostegui) [15:38:46] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158122 (10Cmjohnson) [15:53:19] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158245 (10ayounsi) [15:54:53] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158258 (10Cmjohnson) [15:55:36] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158271 (10Marostegui) >>! In T187962#4119429, @Marostegui wrote: >>>! In T187962#4119423, @jcrespo wrote: >> I would honestly move x1 replica (or the master d... [15:57:25] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158278 (10jcrespo) I agree, first one will probably be a direct decommision, but next one could be used for that. [16:01:47] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158327 (10ayounsi) [17:12:34] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158687 (10Cmjohnson) [17:14:57] bblack: now that the atop thingie seems sorted out, can I follow upgrading lvs instances? maybe esams next? :) [17:46:18] vgutierrez: yes :) [19:49:50] 10netops, 10Operations, 10ops-codfw, 10ops-eqiad: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4159243 (10ayounsi) a:05ayounsi>03None [20:09:32] 10Traffic, 10netops, 10Operations, 10ops-ulsfo: Rack/cable/configure ulsfo MX204 - https://phabricator.wikimedia.org/T189552#4159296 (10ayounsi) [20:29:39] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4159322 (10ayounsi) [22:15:21] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4159636 (10ayounsi)