[11:24:09] hello folks! [11:25:01] I'd need some brainbounce for Kartotherian, so I'll write some thoughts in here. If you have time to review/answer lemme know :) [11:25:42] so Kartotherian is currently deployed via scap on mapsXXXX nodes, where we co-locate a postgres cluster (one master, n replicas) [11:26:02] kartotherian runs only on the replica nodes, and it fetches data via localhost:5432 [11:26:11] so everything self contained in the same host [11:26:37] moving to k8s will mean having something like an LVS endpoint for read-only traffic in front of the postgres replicas [11:26:49] because I don't see any other feasible solution [11:27:13] I can't think of any reason why postgres read-only traffic couldn't be exposed via LVS, but maybe I am missing something [11:27:18] ideas/suggestions? [11:32:32] elukey: I recall e.ffi looking into this for tegola (https://phabricator.wikimedia.org/T286494) but it seems it did not end up being implemented [11:33:01] in general I'd try to avoid LVS if possible [11:35:18] IIRC gitlab uses bgbouncer for this and there is also pgcat - which I don't know anything about [11:35:21] jayme: any specific reason why not? [11:36:18] mainly because netops is basically trying to phase it out and because of the config overhead that comes with it [11:37:53] definitely yes, I'll verify with Valentin and netops that the LVS replacement will be able to support this. But the config overhead seems minimal, and it will be easier to perform maintenance in the future to the maps nodes in my opinion [11:38:37] ideally I'd love not to touch kartotherian too much :D [11:38:43] but the maintenance improvements would be the same with every proxy, no? [11:39:01] proxy/loadbalancer [11:39:33] yes sure, but bgbouncer/pgcat should be package/puppetize/deployed-somewhere/etc.. [11:39:42] meanwhile lvs is already there and working [11:39:47] this is my point :) [11:39:53] yeah, understood [11:40:04] and this is not read/write traffic, only read, otherwise I would have avoided it [11:42:43] it that the same set of postgresql nodes that tegola uses? [11:44:53] jayme: yeah, after you mentioned I checked tegola and I noticed a tcp_services_proxy set up for it [11:45:11] # Temporarily we will use envoy as a L4 tcp proxy until envoy's [11:45:12] # Postgres proxy filter is production ready [11:45:14] yeah...I was thinking there's something in envoy [11:45:50] so yes the only downside is to list the maps nodes in deployment-charts [11:46:06] and depooling means filing a change, deploy, etc.. [11:46:10] yeah..ofc :/ [11:46:31] maaaybe that can be circumvented by leveraging external-services? [11:46:39] but maybe tegola needs to write also to postgres [11:46:51] mmm no those are all replicas, scratch that [11:47:24] at this point whatever solution we decide we'll apply it for tegola too [11:50:16] I'll follow up with Effie to figure out what it was done and why, maybe there was a reason to avoid LVS [11:51:54] sgtm [12:24:13] another way of doing things might be running your own https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/CloudnativePG/ cluster ? [12:24:22] in Kubernetes, I mean [12:29:20] sorry, I meant to send https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/CloudnativePG [12:40:23] elukey: good catch on the Istio dep. AFAICT, v1.12.4 would be the latest version that doesn't have that dep on 1.20 [14:00:50] hi folks, I'm now puppet-merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094489, which (in a very stupid way) prevents a misconfiguration of mostly the wikikube clusters. we can do it a better way in the future but this is a fine bandaid for now, since we've over-filled wikikube on nodes (and run out of pod IP space) at least twice now, which leads to many interesting and annoying [14:00:52] failure cases [14:10:06] On the positive side, that means y'all've migrated a ton of stuff to k8s ;) [14:31:08] klausman: upgrading istio is an option, I am pretty sure that we'd need to do it anyway to keep up with security fixes etc.. so I'd align with what jayme has in mind for the upcoming k8s upgrade (and if nothing is still targeted, we can choose a version together). [14:31:35] but knative + istio + kserve need to be tested before even hitting staging to narrow down all the incompatibilities [14:34:13] brouberol: thanks for the link TIL! In my case I'd just need something to load balance between read replicas, no failover / write traffic etc.. [14:34:43] np! [14:36:10] elukey: my familiarty with minikube is very limited, I dunno if I can do sufficient testing of the three components on it. [14:38:08] klausman: I am not an expert either, but it is pretty easy to do, and it allows you to build everything locally and just apply stuff via kubectl. We can even start from the vanilla config from upstream, ideally ending up with an isvc running. If we go ahead with merging images helm etc.. then the upgrade window would need to be reasonably quick, since otherwise any urgent fix/patch for the [14:38:14] current setup couldn't go through (because blocked by the new version) [14:38:45] yeah, we need a staging for staging [14:38:54] that's minikube :) [15:12:17] Honestly we need a minikube or kind recipe that resembles production closely [15:12:42] with enough work it’d even let us experiment with calico talking to (virtual) junos [15:15:10] should not be too hard...a standard kind setup with disableDefaultCNI: true and some values.yaml for out calico helm chart I suppose [15:19:41] What would be really amazing is if we could use dcl to run our puppetization of k8s and get a cluster ;) [15:21:54] * elukey likes Chris' attempt to nerd snipe a wide group of people [15:22:42] I thought that’s what “special interest group” meant [15:34:38] ahahahha [15:38:09] reproducibility? Nah, let's just have fun ;P