[01:00:38] 10Traffic, 10netops, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2082783 (10Dzahn) [01:01:02] 10Traffic, 10netops, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2082348 (10Dzahn) {F3510969} [01:01:11] 10Traffic, 10netops, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2082789 (10Dzahn) [05:24:41] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2083273 (10Dzahn) I'm interested and happy to be a CC: but Brandon Black should definitely be the TO: [07:43:23] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2083401 (10elukey) Finally we should have a good picture of what changes between 3.0 and 4.0 api from a general overview: * https://www.varnish-cache.... [08:30:55] 10Traffic, 6Operations, 6Performance-Team: Segment Navigation Timing data by continent - https://phabricator.wikimedia.org/T128709#2083445 (10ori) [09:50:31] 10Traffic, 6Operations: 3 Varnish cache_upload servers crashed in a short time window - https://phabricator.wikimedia.org/T125401#2083609 (10MoritzMuehlenhoff) > IMHO, 4.4.x is getting close anyways, we may as well see if this problem just goes away after the switch to it. Agreed, I'm mostly done with 4.4.3 e... [10:23:53] 10Traffic, 10netops, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2083645 (10faidon) 5Open>3Invalid Our BGP sessions with the office are indeed down, but this is an OIT matter, you should ask them about it. If the pro... [12:37:36] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2084041 (10Florian) I would also like to get notified about the progress, but I'm probably the last person to think about t... [13:33:55] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084136 (10elukey) Summary after today's brainstorming with @ema: * We progressed a bit the remaining unclear points about the parse* functions, some... [13:34:41] 10Traffic, 10netops, 6Office-IT, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2084138 (10Peachey88) [14:02:11] I [14:03:07] FYI, I've updated cp1008 (and also some other servers) to openssl 1.0.2g earlier the day, seems all fine so far [14:10:49] thanks! [14:14:43] will add it to carbon in a bit [14:18:31] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: VCL source-DC switching: add "direct" capability - https://phabricator.wikimedia.org/T127483#2084248 (10BBlack) [14:18:33] 10Traffic, 6Operations, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for 3-tier capabilities and source-DC switching - https://phabricator.wikimedia.org/T127481#2084247 (10BBlack) [14:19:17] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: VCL source-DC switching: add "direct" capability - https://phabricator.wikimedia.org/T127483#2044873 (10BBlack) [14:19:19] 10Traffic, 6Operations, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for 3-tier capabilities and source-DC switching - https://phabricator.wikimedia.org/T127481#2044829 (10BBlack) [14:19:55] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL source-DC switching via confd - https://phabricator.wikimedia.org/T127482#2084252 (10BBlack) [14:19:57] 10Traffic, 6Operations, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for 3-tier capabilities and source-DC switching - https://phabricator.wikimedia.org/T127481#2044829 (10BBlack) [14:20:17] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL source-DC switching via confd - https://phabricator.wikimedia.org/T127482#2044847 (10BBlack) [14:20:19] 10Traffic, 6Operations, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for 3-tier capabilities and source-DC switching - https://phabricator.wikimedia.org/T127481#2044829 (10BBlack) [14:20:51] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for applayer datacenter-switching - https://phabricator.wikimedia.org/T127484#2084262 (10BBlack) [14:20:53] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL applayer datacenter-switch via confd - https://phabricator.wikimedia.org/T127485#2084263 (10BBlack) [14:21:07] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Refactor VCL for applayer datacenter-switching - https://phabricator.wikimedia.org/T127484#2044926 (10BBlack) [14:21:09] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL applayer datacenter-switch via confd - https://phabricator.wikimedia.org/T127485#2044954 (10BBlack) [14:21:55] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL source-DC switching via confd - https://phabricator.wikimedia.org/T127482#2084268 (10BBlack) [14:21:58] 10Traffic, 6Operations, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Switch ulsfo to backend to codfw rather than eqiad - https://phabricator.wikimedia.org/T127492#2084267 (10BBlack) [14:49:47] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084318 (10Ottomata) > Would it be feasible to just reuse varnishncsa and pipe its output (apache like log format) to a piece of software that just par... [15:06:15] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084349 (10elukey) @Ottomata the code is really optimized as you were saying to map VSL structures to log tags in apache format, but it is essentiall... [15:37:27] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084445 (10Ottomata) > the code is really optimized as you were saying to map VSL structures to log tags in apache format, but it is essentially what n... [15:58:36] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084490 (10faidon) >>! In T124278#2084445, @Ottomata wrote: >> the code is really optimized as you were saying to map VSL structures to log tags in apa... [15:58:52] 10Traffic, 10Analytics, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2084491 (10Ottomata) Ja! Am happy to help too! I’ve poked around in that code a bit too, so I might be able to offer some help. [16:51:04] re codfw-switcher stuff: https://phabricator.wikimedia.org/T127481#2084678 [16:51:07] paravoid: ^ [16:59:55] \o/ [16:59:56] you rock [17:00:00] has everything been merged? [17:00:18] I saw a flood of patches, need reviews on any? [17:00:39] everything so far is merged, mostly with lots of help from puppet-compiler :) [17:01:04] heh ok [17:01:08] I'm back to a blank slate of pending related patches, I can rest a bit and then go after ipsec, and then go after the per-service eqiad|codfw switches for talking to the applayer [17:01:11] automated review > manual review anyway :) [17:02:57] oh I probably need to see what I broke in poor ema's varnish4 patch, too. my refactors tend to abuse its ability to rebase heh. [18:09:56] 10Traffic, 6Analytics-Kanban, 6Operations, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2085127 (10Milimetric) [18:16:47] ema: v4 patch rebased onto all the merged refactor work again [18:17:05] (probably imperfectly!) [18:17:25] bblack: thank you! [18:18:44] let's see what varishtest thinks about that :) [18:24:05] bblack: fantastic, not a single issue [18:24:26] nifty! [18:30:34] paravoid: thanks for your comment on T124278! [18:30:52] so, I've tried using varnishncsa a little on cp2001, and it looks like CPU usage is around 2% [18:31:20] which is in line with what the various python scripts consume I'd say [18:31:57] how could varnishncsa go into the double-digit CPU usage territory is beyond me :) [18:34:27] paravoid: in your comment you mentioned 'varnishncsa (with UDP)'. What does that mean? [19:06:03] 10Traffic, 10netops, 6Office-IT, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2085479 (10bbogaert) Thanks Faidon. I'll troubleshoot, and let you know what I find. -Byron [19:12:59] 10Traffic, 10netops, 6Office-IT, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2085543 (10ori) >>! In T128669#2083645, @faidon wrote: > Our BGP sessions with the office are indeed down, but this is an OIT matter, you sho... [19:26:20] ema: we used to have methods of transporting stats other than kafka, which used UDP [19:29:08] bblack: oh I see. And how were the stats generated? varnishncsa or something else? Any pointers to find out more? [19:34:08] 10Traffic, 6Operations, 6Performance-Team: Segment Navigation Timing data by continent - https://phabricator.wikimedia.org/T128709#2085635 (10ori) [19:46:07] ema: it's actually still in our current varnishncsa (as in, in our current varnish3 patches list) [19:46:12] ema: https://github.com/wikimedia/operations-debs-varnish/blob/3.0.6-plus-wm/debian/patches/0010-varnishncsa-udplog.patch [19:46:23] we're just not actually using it, so we dumped it from the patches to worry about for v4, I think [19:47:28] (it being "udplog" outputs) [19:54:26] oh yes now I remember this one [20:29:26] 10Traffic, 10netops, 6Office-IT, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2085884 (10bbogaert) Hi, What I have found out so far: Routes are being advertised from Monkey Brains: router1# show ip bgp neighbors 208.9... [20:38:08] 10Traffic, 6Operations, 6Performance-Team: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2085945 (10ori) @bblack, @ema: can you confirm that we are on track for having HTTP/2 support on or by May 15th? Also, can you recommend a way for us to project the impact of this change ahead of the actual s... [20:38:53] 7Varnish, 6Operations, 10RESTBase, 6Services, 3Mobile-Content-Service: Enable caching for the Mobile Content Service's RESTBase public endpoints - https://phabricator.wikimedia.org/T113591#2085960 (10Pchelolo) 5Open>3Resolved a:3Pchelolo Mobile endpoints are cached in varnish and actively purged no... [20:48:57] 10Traffic, 6Operations: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2085985 (10ema) [20:51:05] 10Traffic, 6Operations: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2086002 (10ori) [20:51:08] 10Traffic, 6Operations, 13Patch-For-Review: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2086001 (10ori) [20:51:18] 10Traffic, 6Operations: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2085985 (10ori) [20:51:20] 10Traffic, 6Operations, 13Patch-For-Review: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#2086003 (10ori) [21:01:42] 10Traffic, 6Operations, 6Performance-Team: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2086020 (10BBlack) >>! In T96848#2085945, @ori wrote: > @bblack, @ema: can you confirm that we are on track for having HTTP/2 support on or by May 15th? Also, can you recommend a way for us to project the imp... [21:03:46] 10Traffic, 6Operations, 6Performance-Team: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2086028 (10ori) Thanks for the update, @BBlack, and for your work in this space. I think we're on the same page. Let's check in again in April. [21:25:51] 10Traffic, 6Operations, 6Performance-Team: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2086147 (10BBlack) I took another quick 5-minute sample just now in the EU like last time, but different time of day, and currently it's: | Protocol | Percentage | | --- | --- | | h2 only | 3.0% | | spdy3 o... [21:28:32] bblack: how did you generate this data? https://phabricator.wikimedia.org/T96848#2086147 [21:29:06] ema: you don't want to know, because it's really really really hacky [21:29:12] :) [21:29:24] sniffing the handshake? [21:29:29] well yes [21:29:57] I wrote a small shellscript, which invokes tcpdump|tshark|perl, where the perl is an ugly long oneliner [21:30:00] the meat of it is: [21:30:06] PUSER=nobody [21:30:06] BPF='dst port 443 and (tcp[((tcp[12:1] & 0xf0) >> 2)+5:1] = 0x01) and (tcp[((tcp[12:1] & 0xf0) >> 2):1] = 0x16)' [21:30:12] /usr/sbin/tcpdump -Z $PUSER -npi eth0 --direction=in -s 0 -W 1 -G $SECS -w - "$BPF" 2>/dev/null \ | su $PUSER -s /bin/sh -c "/usr/bin/tshark -n -Tfields -e ssl.handshake.extension.type -e ssl.handshake.extensions_alpn_str -E separator=% -r -" \ [21:30:16] | /usr/bin/perl -Minteger -lne '$n = 0;$s = 0; $h = 0; if($_ =~ m,\b13172\b, && $_ =~ m,%$,) { $n=1 } else { if($_ =~ m,spdy/3,) { $s=1 } if($_ =~ m,h2,) {m $h=1 } } if($n) { $x{"s"}++; } else { if($s) { if($h) { $x{"b"}++; } else { $x{"s"}++; } } else { if($h) { $x{"h"}++; } else { $x{"n"}++; } } } END{while(($k,$v)=each %x){print"$v;$k"}}' [21:30:34] <3 [21:30:52] and then it goes like this: [21:30:52] root@cp3030:/home/bblack# ./proto.sh 300 [21:30:52] 118860;n [21:30:53] 94156;b [21:30:53] 58472;s [21:30:55] 8454;h [21:31:04] and then I sum those numbers and divide them out of the total [21:31:17] n = neither, b = both, s = spdy, h = h/2, in the output [21:31:27] confusingly, $n means npn in the midst of the perl oneliner too [21:31:35] yeah I need to paste the oneliner in vim [21:31:39] (well, npn-only) [21:32:40] that first conditional with 'if($_ =~ m,\b13172\b, && $_ =~ m,%$,)' means "if NPN was sent, but no ALPN protocol list was sent" [21:32:54] the other matches are checking the ALPN protocol list [21:33:14] right the other ones make sense [21:35:09] hmm the paste looks faulty on my screen anyways, there's a stray 'm' after an '{' [21:35:16] but you can grab the script from cp3030 in my homedir too [21:36:10] * ema is looking [21:40:47] anyways, I didn't invent this for that. There's some puppet-committed utils scripts I did a while back for raw ciphersuite lists from clienthello here: https://github.com/wikimedia/operations-puppet/tree/production/modules/tlsproxy/files/utils [21:40:57] I stole from that and modified for NPN/ALPN usage [21:41:05] yeah I've seen the comment [21:41:22] it's pretty cool though (albeit yes, the oneliner...) [21:42:00] the cipher_cap + cipher_sim thing lets us capture the full client cipher list from the chellos, and then simulate them against different server-side cipher lists to see how it affects our negotiated-cipher stats [21:42:24] "Pay no attention to that man behind the curtain" [21:43:24] this is all really nice [21:43:58] the oneliner in cipher_cap wasn't so bad. the error was trying to irrationally extend it for the alpn/npn case heh [21:44:33] yeah I meant the proto oneliner [21:45:48] anyways, back in early November, I actually ran the cipher version on all the caches for an extended period and gathered up a broad corpus of data to then go simulate against [21:46:21] we could do somethign similar with NPN/ALPN, but I donno about doing it routinely via automation, it scares me a little for reasons I can't really rationally back up heh [21:46:43] in any case, it needs more thought to make sure we're even analyzing the data correctly [21:51:07] that all reminds me I haven't stared at https://grafana-admin.wikimedia.org/dashboard/db/tls-ciphers much lately [21:51:22] looks like our FS+AEAD peaks have reached 85% now, which is pretty awesome [21:51:52] ~ 6 months ago, FS+AEAD peaks were more like 65% [21:53:01] it would be really nice to be able to publish this data somewhere and keep it updated :) [21:53:18] grafana's open, it's been linked elsewhere before [21:53:22] just don't use the -admin hostname [21:55:37] oh I didn't seen the ciphers board yet! [21:55:43] 10Traffic, 10MobileFrontend, 6Operations, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2086258 (10Jdlrobson) >>! In T124356#2080027, @Jd... [21:56:54] but no spdy/h2 [22:04:21] bblack: where does this data come from? https://grafana.wikimedia.org/dashboard/db/client-connections [22:05:19] ema: it comes ultimately from: https://github.com/wikimedia/operations-puppet/blob/production/modules/tlsproxy/templates/localssl.erb#L48 [22:06:10] it's questionable data when comparing to other sources, though, as the data is inherently per-connection data (and that's how we see similar things in e.g. CHello captures), but nginx is sending it on towards stats per-request, for multiple reqs/conn. [22:10:21] bblack: with varnish4 we can group by connection I guess [22:12:03] yeah, no, I'm tired :) [22:12:06] :) [22:13:10] one possibly interesting road to explore might be systemtap for all these things [22:14:02] if nginx has the right probe points... [22:22:04] yeah [22:22:40] well in some cases, it would be whether openssl does [22:23:09] I was now thinking of h2/spdy [22:25:34] 10Traffic, 10netops, 6Office-IT, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2086474 (10bbogaert) Found problem to be ports for uplinks to ulsfo were in err-disable. These have been recovered (thanks cajoel), and the l... [23:17:01] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 6Operations, and 4 others: Move oldwikisource on www.wikisource.org to mul.wikisource.org - https://phabricator.wikimedia.org/T64717#2086727 (10TTO) [23:31:59] 10Traffic, 6Operations, 10RESTBase, 6Services, and 2 others: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387#2086805 (10GWicke) Summary of the latest status: - Path normalization is applied to regular & PURGE requests. - We verified purgin... [23:47:07] 10Traffic, 6Operations: cache_misc's misc_fetch_large_objects has issues - https://phabricator.wikimedia.org/T128813#2086850 (10BBlack) [23:48:33] 10Traffic, 6Operations: cache_misc's misc_fetch_large_objects has issues - https://phabricator.wikimedia.org/T128813#2086850 (10Smalyshev)