[00:08:11] 10Traffic, 06Operations, 07Beta-Cluster-reproducible: PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" - https://phabricator.wikimedia.org/T125938#2208836 (10BBlack) >>! In T125938#2005042, @BBlack wrote: > In general, it's probably best to disable gzip output compression in the applic... [03:05:41] 07HTTPS, 10Traffic, 06Operations: status.wikimedia.org has no (valid) HTTPS - https://phabricator.wikimedia.org/T34796#2208959 (10Dzahn) The actual blocker for 2. was that Catchpoint was able to replace almost all features of Watchmouse, _except_ that it doesn't have that kind of status page. So maybe an opt... [03:13:39] 07HTTPS, 10Traffic, 06Operations: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2208960 (10Dzahn) I think we should go with 4. short term. Then for mid/long-term maybe we want to have this redundant, one in each DC and that could possible solve the chicken-eg... [03:28:14] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2208983 (10MZMcBride) Thank you for the explanations and clarifications here. I really appreciate them. [03:35:12] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2197243 (10Dzahn) also see T126281 (i think we should not fix/redirect stats.wikipedia.org, but say that there is just stats.wikimedia.org and this new analytics.wikimedia.org [03:42:54] 10netops, 10Monitoring, 06Operations: graph interface drops in ganglia - https://phabricator.wikimedia.org/T80515#2209008 (10Dzahn) [03:43:45] 10netops, 10Monitoring, 06Operations: graph interface drops in ganglia - https://phabricator.wikimedia.org/T80515#876457 (10Dzahn) Are we still interested in graphing the interface drops in Ganglia nowadays? [04:01:41] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209015 (10MZMcBride) >>! In T118548#2201380, @GWicke wrote: > We'll initially deploy this without caching for the `?redirect=no` re... [04:08:32] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209016 (10Pchelolo) >>! In T118548#2209015, @MZMcBride wrote: > The mailing list post mentioned `?redirect=false`. Will any falsey... [04:09:44] 07HTTPS, 10Traffic, 06Operations: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209017 (10Chmarkine) I suggest we use Let's Encrypt. It can issue SAN certificates. > Can I get a certificate for multiple domain names (SAN certificates)? > Yes, the same certi... [04:12:18] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209018 (10MZMcBride) >>! In T118548#2209015, @MZMcBride wrote: > I have some vague memory that the value of some URL parameters, wh... [04:14:57] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209019 (10Pchelolo) > What happens with `?redirect=yes` or other truthy values? It's considered to be true and the redirect happen... [09:27:46] 10netops, 10Monitoring, 06Operations: graph interface drops in ganglia - https://phabricator.wikimedia.org/T80515#876457 (10fgiunchedi) I don't think so, if it is network devices interfaces drops those are in librenms, if it is host interface drop those are in graphite [09:35:19] 07HTTPS, 10Traffic, 06Operations: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209325 (10faidon) Option 4 sounds like the sanest (and easiest) to me too. apt and mirrors/ubuntu are different services really and might be split in the future (cf. T84817) so I... [09:49:31] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209336 (10BBlack) I guess the question remains, though: if the Varnish redirect-stripper sees ?redirect with a non-falsey value, sh... [09:52:09] 10netops, 10Monitoring, 06Operations: graph interface drops in ganglia - https://phabricator.wikimedia.org/T80515#2209339 (10akosiaris) 05Open>03declined I concur. Declining [09:55:58] paravoid: so, the LE suggestion isn't awful either. most of our issue with using LE more-broadly is dealing with our infrastructure/CI, but for the one-off public machines like carbon, it's relatively straight-forward. [09:56:47] could probably puppetize the automatic cert renewals as a script running from cron once a day that tries to renew when the certs have <= 30d on their expiry. and run the script once manually at the start to get the first certs using webroot protocol stuff. [09:56:49] have they lifted the top-N domain exception? [09:57:05] ubuntu/apt/mirrors aren't going to be top-N, right? [09:57:20] or does wikimedia.org count no matter what subdomain we use? [09:57:29] the original idea it was for the second-level domain, but no clue really [09:57:40] this is years old information :) [09:59:32] yeah I'll look into it [10:00:05] if it "works" for this kind of purpose, it might be nice to go down this road and doc/puppetize how to do it, and eventually replace all our little one-off public servers' certs with LE at least [10:15:31] hmmm yeah so the mirrors(/ubuntu) part is pretty obvious. the installserver web stuff just sets up a default port 80 listener serving /srv from whatever hostname hits it, presumably including carbon [10:15:46] but importantly also "apt" [10:25:28] so I might refactor the installserver + mirrors nginx configs a little, and make those server names explicit to start with, start looking at how to HTTPS-ize them in general regardless of where the certs come from [10:26:39] then maybe take a stab at one of the names as an LE cert and see if it flies. apparently they don't blanket ban alexa top-1000 anymore, but they have some shorter blacklist they don't publish, some of which came from top-1k, in order to protect high-value sites from being hijacked via abuse of LE or something [10:26:57] we can probably still contact manually and get ourselves unblacklisted or something if necc [10:28:08] this is a great test-case though. it lacks the complexities of going through our standard termination, it's not spread over several servers, not in labs, etc. it's the closest sort of thing we have to a simple direct webhost on the internet. [10:29:07] and really it's low on criticality too, after all, all the current clients use HTTP to reach these anyways, and we can't enforce redirects at this time [10:29:28] and browser access isn't OMG-critical if we break these certs the first time doing something dumb [10:42:47] 07HTTPS, 10Traffic, 06Operations: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#2209409 (10BBlack) [11:01:18] 10Traffic, 06Operations, 10Parsoid, 10RESTBase, and 3 others: Support following MediaWiki redirects when retrieving HTML revisions - https://phabricator.wikimedia.org/T118548#2209442 (10mobrovac) >>! In T118548#2209336, @BBlack wrote: > I guess the question remains, though: if the Varnish redirect-stripper... [11:16:34] 07HTTPS, 10Traffic, 06Operations: enable https for (carbon|ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209456 (10BBlack) [11:17:05] 07HTTPS, 10Traffic, 06Operations: enable https for (carbon|ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2198925 (10BBlack) Added carbon to the list, since that actually is the HTTP hostname we use for some of the access to this service (contents are the same as apt.wm.o though). [13:22:58] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209749 (10BBlack) [13:37:16] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209853 (10BBlack) So, carbon now has working certs for apt, mirrors, and ubuntu, from Letsencrypt. I ran the cert generation manually, and that part's not... [13:41:48] 10Traffic, 06Operations, 10Phabricator, 10hardware-requests: We need a backup phabricator front-end node - https://phabricator.wikimedia.org/T131775#2209885 (10mark) I think a backup Phabricator host in codfw would make a lot of sense, and is something we strive for (nearly) every service, anyway. - cod... [13:58:33] ema: once your initramfs::hook is in place, we could probably reuse it for the bnx2x thing too [13:59:16] well maybe [13:59:30] it doesn't really do an initramfs hook, but it runs update-initramfs after setting a module param [13:59:48] class interface::rps::modparams in modules/interface/manifests/rps.pp [14:00:12] be nice to get initramfs consolidated down to just 1 run from all triggers anyways [14:01:03] right, we could remove the exec from rps.pp and notify Exec['update-initramfs'] instead [14:07:15] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2209952 (10BBlack) @faidon noted on IRC https://github.com/diafygi/acme-tiny might be a better client option, and is debianized already for stretch+ [14:39:35] 10Traffic, 06Operations: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2210011 (10BBlack) Getting a little closer on the standard front! It's `Submitted to IESG for Publication` and the IESG state is `On agenda of 2016-05-05 IESG telechat // Needs 9 more YES or NO OBJEC... [15:28:41] mmmh puppet is failing with Invalid resource type initramfs::script at /etc/puppet/modules/base/manifests/initramfs.pp:5 [15:28:54] it worked on a test instance / pcc though... [15:30:24] oh, it works fine when running it again by hand with puppet agent -tv [15:30:51] maybe just race condition on merge to masters [15:32:32] (the merge of commits to the masters aren't transactional - you can end up with a client's catalog state reflecting only part of your change. seems especially prevalent when deleting or adding whole files, esp manifests) [15:34:33] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2198925 (10Krenair) I'd like to see it puppetised for T97593#2115226 [15:34:35] yeah, that explains it then [15:38:26] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2210442 (10BBlack) >>! In T132450#2210430, @Krenair wrote: > I'd like to see it puppetised for T97593#2115226 Yeah me too for a lot of things, but labs will... [15:46:09] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2210476 (10Krenair) >>! In T132450#2210442, @BBlack wrote: > there's no "webroot" to go stuff a file in and have it appear publicly Varnish should be able t... [15:48:20] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2210480 (10Krenair) >>! In T132450#2210476, @Krenair wrote: >>>! In T132450#2210442, @BBlack wrote: >> there's no "webroot" to go stuff a file in and have it... [16:13:38] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: enable https for (ubuntu|apt|mirrors).wikimedia.org - https://phabricator.wikimedia.org/T132450#2198925 (10Southparkfan) I have a setup where I tell Varnish to redirect all /.well-known/acme-challenge traffic to one backend server (practically any serve... [16:53:19] 10Traffic, 10Analytics, 10Analytics-Cluster, 06Operations: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 (or 0.10?) - https://phabricator.wikimedia.org/T121562#2210775 (10Nuria) [18:14:00] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211018 (10BBlack) [18:16:55] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211018 (10valhallasw) Some notes on my ideas on how to do this for tool labs are at {T122403}, but that's a more complex scenario than a simple single webserver. [18:24:36] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2211054 (10Nuria) @BBlack This would not be a full-fledged service. What we would be deploying either via puppet of fab is just html/js so we only really need an apache install via pup... [18:32:11] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2211069 (10BBlack) @Nuria - thanks for the details! We still need to sort out an actual place for the js/html to live at in production (which, if it's as simple as it sounds, can probabl... [18:47:48] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211085 (10BBlack) [18:50:48] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211103 (10BBlack) Edited description - having to stop the existing service is a problem for renewals, we still have a challenge to do there. Also, we could support nginx/... [19:04:30] 07HTTPS, 10Traffic, 06Operations: Preload STS for wikimedia.org - https://phabricator.wikimedia.org/T132685#2211138 (10BBlack) Note that T132450 is already resolved in practice. The ticket is just still open because we need to puppetize decent administration of the solution before the certs expire 90 days f... [19:08:14] 10Traffic, 10Analytics, 10DNS, 06Operations: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2211146 (10Nuria) @BBlack : ganeti sounds fine as really the majority of the time requests are going to be served by varnish. The fabfile we use to deploy to labs is here: https://github... [19:13:12] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2211153 (10ori) >>! In T96848#2203163, @BBlack wrote: > Notable: there's an ongoing report of 1.9.14 causing an HTTP/2 proto error in Chrome. We may need to be wary and stick with .13... [22:28:37] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211472 (10BBlack) Rough notes from thinking about implementation more: ``` # id = uniq id for this cert, e.g. puppet $title # names = foo.wm.o[,bar.wm.o[,baz...]] # mode... [23:50:13] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211601 (10Dzahn) https://github.com/aloyr/acme-tiny-automator "automates deployment of letsencrypt certs using acme-tiny library This relies on the acme-tiny library, do... [23:57:00] 07HTTPS, 10Traffic, 06Operations: Sort out letsencrypt puppetization for simple public hosts - https://phabricator.wikimedia.org/T132812#2211602 (10BBlack) It's similar in scope to what's going on in my paste above, but it's still missing a few bits we'll need on the webserver config chicken/egg thing even f...