[07:42:18] 10Traffic, 10Operations, 10User-notice: Rate limit requests in violation of User-Agent policy more aggressively - https://phabricator.wikimedia.org/T224891 (10ema) Thanks @Legoktm and @Quiddity! [08:04:15] 10netops, 10Analytics, 10Operations, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10ayounsi) Great, over to @Cmjohnson then! [10:50:55] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp3043.esams.wmnet'] ` The log can be found in `... [11:30:28] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp3043.esams.wmnet'] ` and were **ALL** successful. [13:17:50] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp3039.esams.wmnet'] ` The log can be found in `... [13:49:22] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp3039.esams.wmnet'] ` and were **ALL** successful. [13:50:32] bblack: I've found several non-canonical domains configured in redirects.dat parked in the wrong way, so they won't reach the apache servers ever [13:51:03] for example wiki-pedia.org [13:51:32] one way of fixing it could be changing the parking from templates/parking to templates/wikipedia.org [13:52:25] but I think that it could make more sense to add a specific DNS template zone for domains parked and configured in redirects.dat [13:52:43] right now we can point those domains to text-lb and after the ncredir service is deployed, just change that zone [13:57:48] bblack: so.. I guess that we have an easy way of pointing *.wiki-pedia.org to text-lb? [14:00:54] so [14:01:13] I think in those cases, they were probably once in the past linked to wikipedia.org or whatever [14:01:37] the parking template is newer, and I think daniel went through some effort over time to park off unimportant ones [14:03:02] we probably can/should eventually replace the "parking" zone contents with something that points at ncredir for some generic redirect to wikimedia.org or something, later. [14:03:24] but for now, if it's already in "parking" mode and it happens to also still exist in redirects.dat, IMHO we can just remove the redirects.dat entries as they're currently-useless [14:03:45] right now I've identified the following domains: wiki-pedia.org, wiikipedia.com, wikiepdia.com, wikiepdia.org, voyagewiki.com, voyagewiki.org, wikimediacommons.net, wikimediacommons.info, wikimediacommons.mobi, wikimediacommons.org, wikimediacommons.jp.net, wikimediacommons.co.uk [14:03:56] right [14:03:57] those are configured in the redirects.dat and lack proper DNS config [14:04:10] I can fix them either way [14:04:15] the lack of proper DNS config is intentional, in the current state of affairs lacking an ncredir service at all [14:04:28] getting proper A records or getting rid of them in redirects.dat [14:04:29] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in esams - https://phabricator.wikimedia.org/T222937 (10ema) 05Open→03Resolved a:03ema cp3039 was the last node in upload@esams still running Varnish. With its upgrade to cp3039, only the defunct cp3037... [14:04:52] I'd say kill them from redirects.dat for now, and later after ncredir is up and running and we've taken care of important things, we can mass convert all the parked domains to some simple common foundation redirect or whatever. [14:05:05] ok, thx [14:05:20] (and probably add hundreds to that list at that time, for all the lame delegations we currently own in whois but don't even park) [14:07:30] pretty nice exercise though, I've created a small script that checks the current redirects.dat behaviour (backed by Apache) and compares it with the nginx output of compile_redirects() against a nginx running on a docker container on my laptop [14:07:46] so far the discrepancies I've found are caused by DNS [14:17:30] https://gerrit.wikimedia.org/r/c/operations/puppet/+/515080 [14:28:31] <_joe_> vgutierrez: oh that's nice [14:28:39] <_joe_> what does the script check? [14:29:49] basically that the behaviour matches between apache & nginx [14:29:53] <_joe_> vgutierrez: uhm but most of those domains are owned by us [14:30:02] yup [14:30:06] <_joe_> so one could argue we should configure the dns [14:30:07] all of them AFAIK [14:30:12] <_joe_> rather than removing the funnels [14:30:16] <_joe_> or [14:30:27] <_joe_> we can comment them out while we fix the DNS I guess [14:34:15] _joe_: agree.. let's comment them out instead of removing them [14:37:51] vgutierrez: still need your patches on traffic-puppetmaster.traffic.eqiad.wmflabs or can I drop them and rebase the prod branch? [14:38:02] drop them all [14:38:09] ack [14:43:15] _joe_: commit amended [14:46:32] _joe_: https://phabricator.wikimedia.org/P8599 that's the output of my script before fixing some stuff [14:47:55] <_joe_> interesting that your script fixes some stuff apparently [14:48:12] :? [14:48:18] my script just tests stuff [14:48:32] the DNS vs redirects.dat discrepancies aren't random mistakes though [14:48:33] <_joe_> no I mean your nginx config [14:48:46] they're stuff that was in DNS pointed at text-lb and useful from redirects.dat sometime in the distant past [14:49:01] and ove rtime we moved them to DNS parking as "useless" and didn't bother cleaning up redirects.dat, basically. [14:49:19] <_joe_> bblack: we still own those domanis, but we just don't care? [14:49:24] (because they're too low-traffic to matter or whatever, given we have no secure solution for them) [14:49:46] we still own those domains, and kinda care, but can't do many useful things about it until later after what valentin is working on now is all deployed and done, basically [14:49:59] but for initial deploy, no sense changing the status-quo of them from parked-in-DNS [14:50:43] <_joe_> ok so commenting the lines out seems like the most sensible thing to do? [14:50:45] (there's also hundreds of other such domains we own, which we don't even bother to create parking zonefiles for and are just lame delegations, to also add at that phase. probably all as generic redirects to some generic WMF landing page) [14:51:26] you can comment them out I guess, but we'll probably never uncomment them in their current form [14:52:10] (we'll add them back with the other 700 missing domains en-masse as a generic redirect of a standard type or whatever) [14:52:47] (or whatever the number is, but it's at least 700-ish, maybe higher) [14:53:56] what's the source of truth for those ~700 domains? [14:54:20] the legal team, who manages our registrations (and is responsible for the bulk of the non-canonicals for various trademark/etc reasons) [14:54:40] they've dumped us a spreadsheet of them before, when we get to this phase we can ask for an updated one and set up a process to stay synced up going forward. [14:55:54] ack [14:56:28] so.. compile_redirects() works for nginx so it can be merged [14:56:47] the discrepancies I've found are basically DNS "issues" and a typo on my nc_redirects.dat [14:57:09] nc_redirects.dat being a version of redirects.dat containing non canonical domains only [14:57:33] so I think I can safely work on the puppetization of the ncredir service [14:57:37] ok [14:57:48] sounds awesome :) [14:58:10] as we discussed yesterday on the -traffic meeting, I'm holding the acme-chief/TLS support for compile_redirects() cause is not a naive problem [14:58:46] I don't remember that bit and can't parse it above :) [14:59:00] so.. without a better name.. I'll call it "ncredirectoid" [14:59:04] * vgutierrez hides [14:59:14] yeah we can come up with a name quickly [14:59:20] what's the TLS support thing above? [14:59:27] oh [14:59:42] obtaining a list of SNI list to issue the certificates automatically [14:59:47] (from a redirects.dat file) [14:59:48] oh ok [15:00:12] I'll assume the burden of checking that manually on the first iterations [15:01:40] yeah it's tricky. I think the root issues are (a) we don't want to duplicate the whole domain list in two places for nc_redirects.dat and the cert-issuance list but (b) if we want sanity of issuance going forward with nc_redirects.dat as our source, we need the parsing of nc_redirects.dat to be predictable (e.g. always parses/generates in file order) and append-only (nobody adding new things in [15:01:46] the middle of the file) [15:01:48] which are... interesting constraints. [15:02:55] even then I'm not sure how we sanely handle all possible future changes without explicit cert-grouping in a config somewhere [15:03:41] BTW, is out there any reason to not to change every redirects.dat rule to enforce https://? [15:03:43] (e.g. what if one of the existing ones needs to change? e.g. we need to change the rules for wiki-pedia.org from "wildcard" to "just root+www", and it causes a push down of the SNI list of 40 other certs after it in the file? [15:04:23] for example, replacing rewrite wikipedia.com //www.wikipedia.org with rewrite wikipedia.com https://www.wikipedia.org [15:04:32] vgutierrez: no, there's no reason not to... so long as we're talking about nc_redirects.dat + the canonical targets in redirects.dat [15:04:54] but there is a temporary reason not to, in the case of a current live non-canonical target in the existing redirects.dat [15:05:02] (because we don't even have https for them, so it would just break the redirect) [15:05:06] yup [15:05:31] assuming that the target is a WMF canonical domain [15:05:34] right [15:06:10] in some future world I hope to one day inhabit, there will never be questions of which things do or don't support fully-enforced HTTPS :P [15:15:34] yeah :) [17:00:04] * elukey off! [17:00:10] (wrong chan ;)