[09:08:07] should beta cluster cache hosts still have varnish installed or not? they currently have, not sure if they're still used somehow or it's just because those hosts have not been reimaged in some time [09:14:44] Majavah: hi! Yes, they should, we do use varnish for in-memory caching: https://wikitech.wikimedia.org/wiki/Caching_overview [09:31:27] ema: thanks! [09:32:53] Majavah: yw! wikitech is generally outdated but at least the upper part of that article is valid :) [09:33:51] 10netops, 10SRE: Higher latency on Lumen eqiad/esams link - https://phabricator.wikimedia.org/T277654 (10ayounsi) And it's permanent: > As mentioned by George/Lumen Technician. The route had to be changed to accommodate an equipment decommissioning. This is a permanent move. Thank you. [09:34:39] those hosts are horribly broken, since puppet is failing on them because some varnish config can't be reloaded, which does not exist because confd is failing because they're configured to a find etcd using a domain that is not visible on horizon, and the only thing that looks like it's even somewhat related to that points to deployment-confd* host, [09:34:39] while I just removed a deployment-etcd host as it was on Jessie :/ [09:35:35] oh, is puppet broken on them? [09:36:11] that deployment-etcd host doesn't even contain the key this confd instance is looking for :// [09:36:14] yes [09:36:56] I noticed yesterday that puppet in the 'traffic' cloud project was indeed broken due to various ACLs being added to prod while I was away but not to the (sadly separate) hieradata for the cloud part [09:37:21] and yes you're right, puppet is broken in deployment-prep too [09:37:57] I think the varnish failures are caused by confd not working [09:38:41] confd is configured to use "-srv-domain deployment-prep.eqiad.wmflabs" for discovery, and that does not exist and isn't delegated to deployment-prep on horizon [09:46:10] Majavah: so, a varnish-frontend restart fixed puppet. The confd issue wasn't related to the puppet errors, and we arguably shouldn't even use confd on cache nodes in beta given that the backend are statically assigned [09:50:23] hashar: do you happen to remember why did you do this hiera change? https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/e3cb08507cce71b742aeaa977c4c589bdcfd4109%5E%21/#F0 [09:55:18] hmm [09:56:04] Majavah: apparently that was to unbreak / fix the Varnish 6 upgrade on beta https://phabricator.wikimedia.org/T267561 [09:56:24] if I remember well it was left in a bad shape after some packages got upgrade [09:56:42] Majavah: https://phabricator.wikimedia.org/T267561#6620301 refers to the addition of that hiera variable [09:57:14] it was apparently already present at the instance level and I moved it to a prefix puppet so that it applies to any instance named '^deployment-cache.*' [09:57:59] and there is something further below stating that the variable is required for confd service https://phabricator.wikimedia.org/T267561#6620596 [09:58:55] hashar: those hosts have varnish 5 installed [09:59:22] doh [09:59:33] and that task was about having varnish 6 :-\ [10:00:09] https://paste.debian.net/1191065/ [10:00:44] deployment-prep.eqiad.wmflabs does not have etcd/confd discovery records set [10:01:08] beta.wmflabs.org has, but they refer to an instance that I've never heard of, deployment-conf* [10:01:29] maybe it never got setup? [10:01:34] for context: I just migrated deployment-etcd-01 to deployment-etcd02, etcd-01 was Jessie [10:01:49] deployment-prep.eqiad.wmflabs is not available in horizon [10:02:21] the current deployment-etcd02 host does not even have the keys that this confd template is asking for [10:03:57] I made T278007 some time ago about potentially setting up conftool in beta, but haven't acted on it yet [10:03:58] T278007: Configure etcd/confd/conftool in beta/deployment-prep like production - https://phabricator.wikimedia.org/T278007 [10:14:08] Majavah: neat. Though I have absolutely no idea what confd/etcd is used for :-\ So I can't really help on that front unfortunately [10:23:40] hashar: for lots of things :) - for instance the list of cache nodes and their pooled/depooled status is in etcd, both when it comes to cache frontends and cache backends [10:23:45] eg: https://config-master.wikimedia.org/pybal/esams/text-https [10:24:38] PyBal on the LVS servers uses that information to configure/update IPVS [10:26:06] for the cache backend part, instead, there's a thing called confd running on each cache node and reading from etcd another key to get the list of ATS backends (eg: /conftool/v1/pools/esams/cache_text/ats-be) [10:27:02] this is true for prod: in beta, instead, the list of cache backend nodes is static and simply configured in hiera [10:27:48] (beta has just one cache node, deployment-cache-text06) [11:11:35] moritzm: re:https://gerrit.wikimedia.org/r/c/operations/puppet/+/668026 we don't use varnish::setup_filesystem anymore at all, see b3ce6eab [11:11:47] I'm gonna get rid of it later this afternoon [11:42:44] ema: even better :-) [12:21:27] 10Wikimedia-Apache-configuration, 10Fundraising-Backlog, 10SRE, 10Thank-You-Page, and 3 others: Deal with donatewiki Thank You page launching in apps - https://phabricator.wikimedia.org/T259312 (10Pcoombe) Hi, can anyone confirm if this is fixed on iOS or is there further work needed? [18:52:56] 10Traffic, 10DNS, 10SRE, 10Abstract Wikipedia team (Phase δ): Establish wikifunctions.org - https://phabricator.wikimedia.org/T275904 (10Dzahn) re: "Transfer domain". From where/who to where/who? As of today the domain has WMF has registrant and MarkMonitor as Registrar, so that seems like a transfer is... [23:43:25] 10Traffic, 10SRE, 10SRE-tools, 10IPv6, 10User-crusnov: Some Traffic clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271144 (10crusnov) The point of the project is to get as many hosts to have an IPv6 address (and, obviously, to be functional on that address) as we can, and, i...