[07:48:13] greetings [09:34:47] if you wonder about the toolsdb replag alert, I don't know what caused it but it was kind enough to recover on its own :) [09:34:56] (probably a query took a bit longer to replicate) [09:36:00] I was wondering the same yeah, but yes judging from "innodb pages read" I'd say a db missing a primary key or similar and thus triggering table scans on replication [09:37:26] like we had a little while ago with that tool I forget the name of, missing the primary key on a table [10:06:20] yes that's usually the cause for replag, this time it was not too bad, it recovered in about 1 hour [13:29:27] patch to remove the old ingress-nginx nodes from puppet: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1272714 [13:38:32] could someone please look into https://phabricator.wikimedia.org/T423598 in the coming 2-3 weeks? I was unsure how to tag it in Phabricator [13:46:35] moritzm: I assigned it to myself [13:47:25] thanks Andrew [13:47:36] moritzm: I have an unrelated firewall question which you probably know all about. I'm using a class that sets up its firewall using src_sets => but for my purposes I only want to open the firewall to three hosts (cloudcontrol*) [13:47:52] I can't just enumerate those hosts because src_sets expects a predefined SETNAME [13:48:02] you can simply use srange [13:48:19] and then pass the hostnames, firewall::service will resolve them on the Puppet server side to IPs [13:48:25] ah, but that will require me to refactor or rewrite the profile which is already being used elsewhere :) [13:48:42] firewall::service is fully backwards compatible [13:48:51] if used with ferm, it creates ferm::service [13:49:06] if used with nftavkes, it created nftables definitions [13:49:32] which class is that? I can have a look, have a meeting soon [13:49:37] so reading between the lines, you would not like me to add a new SRC_SET definition for just this one case? [13:49:50] profile::zookeeper::firewall [13:50:51] andrewbogott: I assume this is about https://gerrit.wikimedia.org/g/operations/puppet/+/1eb1ca2844bdb1966e25420963620376fd4d0ab9/modules/profile/manifests/openstack/codfw1dev/designate/service.pp#59? [13:50:52] it's not that deep of a dependency chain but I was hoping not to rewrite it entirely [13:51:07] taavi: that's right. [13:51:10] if so, just replace the profile::zookeeper::firewall call with a firewall rule defined there? (or add srange to that profile) [13:51:35] yep, there are lots of ways I can modify or work around the existing profile [13:51:49] adding a set would mean one more place to maintain that list of servers, which seems silly if we already have the list of hosts around in the place where we define the rule [13:51:53] I just wanted to see if the /proper/ solution is to define myself a src_set restricted to cloudcontrols [13:52:34] yeah, I agree, unless moritzm thinks that sets are the future of all firewall rules everywhere (which was the question I was trying to get to :) ) [13:56:44] src sets and ferm macros are mostly meant for really common groups of servers we use in multiple places [13:57:11] since we deploy thrse to every host and they cause a ferm/nftables restart on every host if a definition changes [13:57:25] ok. That's /almost/ true here but I won't bother with it if you don't care :) [13:57:40] so let's just use srange, I can have a closer look at a patch later or tomorrow [13:58:12] ok! thanks [13:58:37] also I am out of the loop here clearly, why does designate now use Zookeeper? [13:58:48] * taavi is not terribly excited about java-based software on cloudcontrols [13:58:52] I don't need you to review, just wanted to make sure I wasn't undoing some kind of master plan that was happening along with the nftables refactor [13:59:37] taavi: designate uses tooz which uses a coordination backend, currently memcached. But the memcached backend doesn't actually handle node failure properly so anytime one cloudcontrol goes own designate freaks out and stops updating pdns [13:59:59] the zookeeper driver is the reference backend for tooz so I'm trying that out to see if it addresses the issue [14:01:06] here's the full menu: https://docs.openstack.org/tooz/latest/user/drivers.html [14:01:42] also T422646 [14:01:42] T422646: memcache is a SPOF for designate/tooz coordination - https://phabricator.wikimedia.org/T422646 [14:02:02] I'm not attached to zookeeper specifically although it is already nicely puppetized [14:03:01] fair [14:03:16] would that mean getting rid of memcached entirely? or would we then be stuck with both of them? [14:03:49] probably stuck with both since keystone uses memcached for session cache [14:06:34] :( [14:07:17] FWIW I don't recall any zk problems either from jvm or zk itself [14:07:34] or kafka for that matter, and we do ask quite a bit from it [14:07:59] my main concern is that cloudcontrols are already full of various different pieces of software that already need to be upgrade all at once, adding java versions to the mix will not make it any easier at least [14:08:16] * taavi still occasionally wishes for virtualization options for the control plane components [14:09:41] that's a fair concern yeah, I think if we stick with zk versions shipped with Debian upgrades should be fine, we can look back at how much adaptation zk configs required over the years though IIRC not many [14:09:52] I don't love adding One More Dependency but at least it won't be just our team using it [15:25:14] godog: https://phabricator.wikimedia.org/P89878 is the last janky script I used to restart all the webservices, it has some filters to only restart things that weren't restarted during that iterations that you'll need to adjust [15:28:04] taavi: nice! thank you that's quite useful [15:51:32] andrewbogott: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1272766 is a WIP; I'll properly test it tomorrow, and after it's merged we can useit for the cloucontrol hosts [15:52:08] fancy! thank you