[00:04:04] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11435137 (10Dzahn) This change should have been linked here. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215240 (thanks cdanis!) It added... [00:05:31] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11435160 (10Dzahn) This should conclude the box: ` Prepare tcpproxy VMs for accepting traffic on the new public IPs ` on the parent task "Move Ge... [00:06:50] 06Traffic, 06collaboration-services, 06SRE, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11435162 (10Dzahn) 05In progressβ†’03Resolved from here on anything would be just updating 2 tickets at a time. This is done and if there are s... [03:21:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [03:25:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp3073:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3073 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:26:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [03:30:40] FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp3067:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:35:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp3067:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:55:40] FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp3067:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:15:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp3072:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3072 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:53:06] 10netops, 06Infrastructure-Foundations, 06SRE: rancid: message has lines too long for transport - https://phabricator.wikimedia.org/T410606#11436260 (10cmooney) 05Resolvedβ†’03Open Thanks for the work on this @MoritzMuehlenhoff! From what I can see we still have a small number of these mails coming throug... [14:07:49] 06Traffic, 10DNS, 13Patch-For-Review: Request to create the donate.wikipedia25.org domain + 301 redirect to a donate.wiki page - https://phabricator.wikimedia.org/T408168#11436391 (10SCampos-WMF) Hi @Dzahn, just wanted to check donate.wikipedia25.org will still work and redirect to `https://donate.wikime... [15:43:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:48:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [16:03:51] 06Traffic, 10DNS, 13Patch-For-Review: Request to create the donate.wikipedia25.org domain + 301 redirect to a donate.wiki page - https://phabricator.wikimedia.org/T408168#11436746 (10Dzahn) @SCampos-WMF Thanks for pointing this out. Yes, we will leave that unchanged, we will move wikipedia25.org but not... [18:03:01] jelto: mutante: it looks like maybe gerrit's firewall rules aren't willing to accept https traffic from the caches? I can ping it but get connection timed out to 443 [18:03:39] https://phabricator.wikimedia.org/P86430 [18:08:28] cdanis: I think the current firewall config allows 443 just for the service ip (208.80.154.151 gerrit.wikimedia.org) [18:08:56] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/gerrit.pp#91 [18:09:30] gerrit has an additional set of public ipv4 and ipv6 , different from gerrit1003.wikimedia.org [18:18:59] ack, we need to open that up to the caches [18:19:03] and tcpproxies :) [18:19:13] (for those on the ssh port) [18:19:29] I'm having lunch now but will send a patch later if needed [18:29:52] yep, as far as I can see apache (:::443) and java (:::29418) are binding on both public addresses, so it should be possible to . I'll send a patch in a sec [18:37:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1215673 [18:37:54] I saw the patch before this chat:) just now [18:37:59] yes, good catch! that is needed [18:38:47] looking at it [18:38:49] lgtm [18:39:00] I think as soon as you ship that, we should be able to have this work instead of timing out: [18:39:14] πŸ’”cdanis@cp1100.eqiad.wmnet ~ πŸ•œβ˜• curl https://gerrit.wikimedia.org/r/ --connect-to ::localhost [18:40:34] estimated shipping time: 5 min :p [18:41:28] I'll hand the merge over to mutante and head out to my weekend [18:41:39] cheers jelto, have a good one [18:41:45] thx [18:41:57] I'll have notes for you & Daniel on how to do the Liberica side of things together with Sukhbir next week [18:42:25] I'll be in Lisbon but will try my best to be accessible for consults if needed [18:42:50] that's great thank you [18:46:26] compiled. and want to deploy. having my own network issues it seems. just a minute [19:25:40] deployed, now a new issue to fix :D [19:27:49] ack! firewall good. cert issue is next. [19:30:22] I can also ssh to servers again (something on my side) [20:24:06] mutante: can you please take a read over the not-yet-checked items & the intermediate acceptance criteria in https://phabricator.wikimedia.org/T411895 and make sure it all makes sense to you? [20:26:59] cdanis: I was wondering about the varnish patterns.. in "Ensure Varnish VCL includes gerrit.wm.o in any relevant instances of its many hostname regex patterns ...?" [20:27:21] yeah I was going to quickly check that today [20:27:26] cool [20:27:36] but also something traffic could do [20:28:02] mostly I forget exactly *what's* in there, and we might just be covered for anything relevant by *.wikimedia.org [20:31:57] the acceptance criteria tests make sense to me. though I think the "Prepare cache_text servers for accepting traffic" I would want to include traffic team / not Friday [20:32:08] yes [20:32:14] so, to be clear -- the remaining stuff should happen Monday [20:32:18] I think including that ATS patch [20:32:29] liberica I have never touched but it does sound like it makes sense :) [20:32:43] the one thing here.. I will actually be out on Monday [20:33:00] ack, if Jelto wants to continue that's great [20:33:02] so if you guys want to go ahead it would be with Jelto or Sukhe maybe [20:33:04] or I can on Tuesday [20:33:08] either way would be coool [20:33:11] cool! [20:34:22] and I assume once we get to "people can opt-in to check it out" we should pause for maybe more than just 1 day [20:34:44] I'll leave that up to you, I think this is the kind of thing it makes sense to ship before the holidays though [20:34:47] it's really cool that we can offer the opt-in model vs one stressful failover meeting [20:34:54] yeah! [20:35:09] before the holidays seems to be a go, indeed [20:35:31] I was going to switch myself over as soon as it's live in at least one DC :) [20:35:59] :) thank you! a big yay to where we got already [20:36:23] thanks for all the work :) [20:39:30] cdanis: note that the current gerrit addresses are on the cloudvps egress nat exemption list, so assuming we want to keep that then I need to update some network acls and configs before the DNS record is flipped [20:40:54] taavi: ack! I'll add in a coordination step [21:21:02] 06Traffic, 06collaboration-services, 10Gerrit: ATS: validate TLS hosts for gerrit (revert workaround that skips validation) - https://phabricator.wikimedia.org/T411904#11437641 (10Dzahn) [21:23:40] 06Traffic, 06collaboration-services, 10Gerrit: ATS/Gerrit: validate TLS hosts for gerrit (revert workaround that skips validation) - https://phabricator.wikimedia.org/T411904#11437644 (10Dzahn) [23:07:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [23:12:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS