[06:59:21] <ema>	 vgutierrez: so the LVSs are now 11 jessie vs 11 stretch? :)
[06:59:49] <vgutierrez>	 yup :D
[06:59:57] <ema>	 nice
[07:02:49] <vgutierrez>	 re: T192555, I've been gathering some 10 minutes samples to get a rough idea about our current AES128-SHA users
[07:02:50] <stashbot>	 T192555: Begin execution of non-forward-secret ciphers deprecation - https://phabricator.wikimedia.org/T192555
[07:03:11] <vgutierrez>	 after discussing the results with bblack, I need to gather 24h data
[07:04:34] <vgutierrez>	 I could go for the hacky/ninja/quick&dirty way but maybe it would be interesting to log that info during 24h in logstash
[07:04:48] <vgutierrez>	 ema: is it hard to add a new log there?
[07:06:18] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167254 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` lvs3002.esams.wmnet ``` The log can be found in `/var/lo...
[07:07:34] <ema>	 vgutierrez: currently we're sending logs to logstash with specific python daemons (varnishospital, varnishslowlog)
[07:08:15] <ema>	 so yeah it's not particularly hard, it's a matter of writing a new daemon and installing it on all cp hosts
[07:09:02] <vgutierrez>	 for the 10 minutes thingie I've been using cumin + timeout/varnishncsa
[07:09:13] <vgutierrez>	 but we don't think that's reliable for a whole day
[07:10:14] <ema>	 did you send the info to logstash? Why do you think it's not reliable?
[07:10:35] <vgutierrez>	 cumin handling 91 ssh connection open for 24 hours?
[07:10:57] <ema>	 yes?
[07:11:17] <vgutierrez>	 hmm any tcp issue would stop the gathering in some nodes...
[07:11:50] <vgutierrez>	 also I was logging into /tmp, and for 10m it's acceptable.. but for 24h we'd need some summarization in place
[07:11:56] <ema>	 right
[07:12:14] <ema>	 is it actually important to get precise numbers or could you get a sample on one node per dc/cluster?
[07:13:05] <vgutierrez>	 we're discussing user affectation, so I feel more comfortable providing accurate data
[07:13:26] <ema>	 how about adding the info to varnishxcps instead?
[07:13:44] <vgutierrez>	 hmmm won't work, I need user agents
[07:13:54] <vgutierrez>	 that doesn't fit in prometheus :)
[07:14:54] <ema>	 alright then I'm out of ideas! :)
[07:16:00] <vgutierrez>	 I think something like varnishhospital/varnishshowlog could fit here
[07:16:12] <ema>	 sgtm
[07:16:28] <vgutierrez>	 it's going only to be used for 24h and then reverted, but we are going to need similar stuff in the future
[07:16:45] <vgutierrez>	 i.e: when we'd need to discuss TLSv1.0 deprecation
[07:20:31] <ema>	 can just go for `ensure: stopped` instead of reverting the
[07:20:33] <ema>	 *then
[07:20:55] <wikibugs_>	 10Traffic, 10Operations: Gather 24h data cluster wide of AES128-SHA usage - https://phabricator.wikimedia.org/T193376#4167263 (10Vgutierrez) p:05Triage>03Normal
[07:21:18] <vgutierrez>	 right
[07:49:34] <volans>	 vgutierrez: long command with cumin on so many hosts, it's possible but not advisable... I think two better options are:
[07:49:38] <volans>	 - puppetizing it
[07:50:09] <volans>	 - run with cumin in background: 'your-command &> /tmp/foo & exit' (cumin will launch it and exit immediately)
[07:59:07] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167320 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['lvs3002.esams.wmnet'] ```  and were **ALL** successful.
[08:09:28] <_joe_>	 I suggest puppetizing anything that needs to run longer than for a weekly experiment
[08:09:55] <_joe_>	 anything shorter is ok to run from tmux
[08:18:37] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167357 (10Vgutierrez)
[09:23:23] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167455 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` lvs3001.esams.wmnet ``` The log can be found in `/var/lo...
[10:01:48] <vgutierrez>	 elukey: crazy question, what would be the impact of logging TLS information in webrequest?
[10:03:01] <elukey>	 vgutierrez: in theory we would need to change the webrequest format itself and store it on HDFS tables etc.., so there would be a lot of work to do :)
[10:03:22] <vgutierrez>	 ack
[10:03:34] <elukey>	 plus the main issue is that we wouldn't have that info logged by varnish (unless nginx inserts a special header)
[10:03:38] <elukey>	 but!
[10:04:03] <vgutierrez>	 that info it's actually logged by nginx + varnish :)
[10:04:24] <elukey>	 what we could do instead is finding a way to have the TLS info at the varnish level, and then create a special varnishkafka instance only for that, that pushes whatever format we want to Kafka
[10:04:32] <elukey>	 ah sorry didn't know it :)
[10:05:48] <vgutierrez>	 maybe it could be interesting to explore that way in the future
[10:08:10] <vgutierrez>	 elukey: https://github.com/wikimedia/puppet/blob/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb#L111-L117
[10:08:38] <elukey>	 very nice
[10:09:26] <elukey>	 so adding a new vk instance is relatively cheap, and even collecting data on a regular basis to HDFS is not that hard (me and Arzhel worked for a bit on collecting netflow data for example)
[10:10:02] <elukey>	 so if you need something like that because regular metrics are not enough, let me know!
[10:10:07] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['lvs3001.esams.wmnet'] ```  and were **ALL** successful.
[10:10:42] <vgutierrez>	 elukey: right now I'm working on sending UA data + TLS data to kibana, it shouldn't be a problem cause it's going to match only a <0.09% of our traffic
[10:11:14] <elukey>	 ack
[10:11:18] <vgutierrez>	 but basically I need to do it only cause webrequest lacks TLS data :)
[10:11:42] <vgutierrez>	 so I was wondering the actual cost of including that info in webrequest
[10:12:38] <vgutierrez>	 with some love on that info we could provide reports on MiTM victims visiting wikipedia and stuff like that
[10:16:20] <elukey>	 I can definitely ask to my team this question
[10:16:27] <vgutierrez>	 thx :)
[10:32:13] <wikibugs_>	 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4167568 (10Vgutierrez)
[10:32:15] <vgutierrez>	 only codfw to complete T191897 \o/
[10:32:16] <stashbot>	 T191897: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897
[10:32:23] <vgutierrez>	 *only codfw missing
[10:32:26] <vgutierrez>	 :)
[10:37:53] <ema>	 nice!
[10:39:00] <vgutierrez>	 if bblack approves, that could be done on Wednesday
[10:50:25] <ema>	 all tests pass w/ the 4.1 backport of https://github.com/varnishcache/varnish-cache/pull/2555 as well as with the version in master, while backporting the patch to 5.1 makes https://github.com/wikimedia/operations-debs-varnish4/blob/debian-wmf/bin/varnishtest/tests/c00041.vtc#L85 occasionally fail
[10:50:43] <ema>	 it seems to me that we need https://github.com/varnishcache/varnish-cache/pull/2422 too. Backported, rebased and waiting for jenkins
[10:51:19] <ema>	 https://gerrit.wikimedia.org/r/#/c/429762/ is green which is a good start :)
[10:54:16] <ema>	 yay, success! https://gerrit.wikimedia.org/r/#/c/429440/
[10:54:24] <vgutierrez>	 :D
[10:55:03] <ema>	 lunch &
[13:40:49] <bblack>	 logging TLS to webreq doesn't sound like a bad idea.  but maybe in simplified form so it's just a single string or something (we have a ton of fields).
[13:41:25] <vgutierrez>	 bblack: maybe just the CP-Full-Cipher from VC_Log
[13:41:44] <bblack>	 well TLS version would be useful too
[13:42:25] <vgutierrez>	 hmmm
[13:42:39] <vgutierrez>	 in that case, X-Connection-Properties from nginx I guess it contains everything
[13:43:18] <bblack>	 it has some otehr bits too though, which might be nice to filter on separately if at all
[13:43:21] <bblack>	 hmmmm
[13:44:19] <vgutierrez>	 BTW, bblack / ema, what's the  best way to define a systemd service that only affects the frontend instance? I'm aiming to the define varnish::instance with an if $instance_name == 'frontend' {} block
[13:45:11] <bblack>	 I guess I don't understand context for that q
[13:45:23] <bblack>	 you're trying to add a new systemd service that depends on the fe instance, or?
[13:46:19] <vgutierrez>	 I'm adding a varnishslowlog, varnishospital alike daemon, but I only want it running recollecting data from the varnish-frontend instance
[13:48:44] <bblack>	 I think our existing pattern for that is just to ref the class for your daemon in in profile::cache::base, and give it a fixed parameter for the instance name it should connect to, like kafka::webrequest.
[13:48:58] <bblack>	 which has this in it:
[13:48:59] <bblack>	     $varnish_name     = 'frontend'
[13:49:00] <bblack>	     $varnish_svc_name = 'varnish-frontend'
[13:49:43] <vgutierrez>	 ack
[13:55:57] <vgutierrez>	 oh.. even better... varnish:logging :)
[13:57:49] <bblack>	 so, in existing webrequest we already have an ";http=1" field (missing if not HTTPS)
[13:57:53] <bblack>	 sorry ";https=1"
[13:58:22] <bblack>	 we could talk about perhaps repurposing that to specific the protocol
[13:58:42] <bblack>	 ";https=TLSv1.0", etc (which will cover future things like DTLS or QUIC)
[13:59:39] <bblack>	 and then separately an https_cipher="ECDHE-ECDSA-AES128-SHA" or whatever, which is also not set if https is not set.
[14:00:23] <bblack>	 it would preserve their existing query behvaior where they're just checking NULL-ness of https, but any queries relying on the explicit value https=1 would need updating to "https IS NOT NULL"
[14:02:53] <bblack>	 or we could leave ";https=1" alone as a legacy field to eventually remove, and make a new one for protocol, which has the crypto-layer protocol if https, and "http" otherwise
[14:03:22] <bblack>	 ";proto=http" or ";proto=TLSv1.2", etc...
[14:03:43] <bblack>	 it's kind of a mixed meaning though, as there's two layers of "protocol" and we could be putting e.g. H/2 info there...
[14:04:09] <bblack>	 I donno
[14:33:42] <ema>	 bblack: thoughts on https://gerrit.wikimedia.org/r/#/c/429394/ and related changes? I'd merge those and prepare 5.1.3-1wm8 if you agree 
[15:09:26] <wikibugs_>	 10Traffic, 10Operations, 10Goal: Begin execution of non-forward-secret ciphers deprecation - https://phabricator.wikimedia.org/T192555#4168193 (10Vgutierrez) After running several small captures (10 minutes lapses over 2 days), we've got the following results: * 56% MiTM victims * 32% deprecated human-operat...
[15:16:19] <ema>	 gilles: hi! I'm looking at varnishlogconsumer.py, nice one
[15:16:33] <ema>	 there are a couple pep8 errors apparently
[15:16:42] <ema>	 puppet/modules/varnish/files/varnishlogconsumer.py:53:17: E128 continuation line under-indented for visual indent
[15:16:52] <ema>	 puppet/modules/varnish/files/varnishlogconsumer.py:55:80: E501 line too long (84 > 79 characters)
[15:16:55] <ema>	 puppet/modules/varnish/files/varnishlogconsumer.py:58:80: E501 line too long (82 > 79 characters)
[15:18:01] <vgutierrez>	 hmmm jenkins pep8 sets the limit in 100 chars per line IIRC
[15:18:37] <vgutierrez>	 so the offending one it's on line 53
[15:19:15] <vgutierrez>	 ema: BTW, i remember you mentioning gilles work on some meeting, maybe it could be adopted for varnishtlsinspector before merging?
[15:19:42] <ema>	 vgutierrez: yes so his work is currently here (mediawiki/vagrant) https://gerrit.wikimedia.org/r/#/c/427641/
[15:20:39] <ema>	 once we're happy and it's merged there we need to add it to the puppet repo 
[15:21:49] <vgutierrez>	 ack
[15:46:31] <bblack>	 ema: +1 on the patch series
[15:46:46] <bblack>	 ema: I guess we should roll up all of this before trying to improve on the current vcl_hit situation?
[15:47:45] <ema>	 bblack: yes, the current (not particularly elaborate!) plan is https://phabricator.wikimedia.org/T192368#4153519
[15:52:03] <ema>	 with s/1wm7/1wm8/ as 4.1.10 was released two days after that comment :)
[16:14:39] <wikibugs_>	 10Traffic, 10DNS, 10Operations, 10Release-Engineering-Team, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4168429 (10demon) a:03demon I'll handle this. Should just be a domain swap--no need to bother doing renames...
[16:34:52] <wikibugs_>	 10Traffic, 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Deprecate python varnish cachestats - https://phabricator.wikimedia.org/T184942#4168470 (10ema) @Krinkle I've pushed https://gerrit.wikimedia.org/r/429833 to remove varnishmedia, my understanding is that there's only [[ https://grafa...
[16:38:56] <bblack>	 that reminds me, we should finish pushing numa_networking to rest of caches sometime (complicated by need for downtimes, etc)
[16:43:57] <ema>	 bblack: we can perhaps do that together with the varnish wm8 upgrades
[16:59:18] <gilles>	 ema: vgutierrez: ok, I'll fix the pep8 issues tonight and create the same patch for puppet
[17:11:40] <vgutierrez>	 gilles: cool :D
[17:24:47] <wikibugs_>	 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4168747 (10cwdent) SSL certs are what allow your browser to show you a green bar and guarantee that if you see that, you are talking to the Wikimedia Fo...
[17:27:05] <wikibugs_>	 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4168760 (10Ejegg) cwdent we formerly had silverpop-hosted urls in the email links, and lots of people thought they were phishing spam
[17:28:51] <wikibugs_>	 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4168775 (10CCogdill_WMF) We used a Silverpop URL for a few months and got enough complaints from donors that our Donor Services team asked us to turn cl...
[18:47:26] <wikibugs_>	 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4169093 (10cwdent) @Ejegg @CCogdill_WMF ok scratch that idea :)
[21:18:16] <wikibugs_>	 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10Patch-For-Review: Remove wildcard vhost for *.wikimedia.org - https://phabricator.wikimedia.org/T192206#4169688 (10EddieGP) a:03Joe Assigning to joe - it seems you're the one most comfortable (or only one comfortable?) on apache changes. Also p...