[00:21:35] cdanis: I wonder if a dev environment similar to the https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo system that WMCS uses would be reasonably possible to maintain for the production cluster(s)? [00:24:15] https://wikitech.wikimedia.org/wiki/Local-charts could probably be repurposed if that would be a useful start too. [00:25:44] * bd808 would love to see some convergence in local dev of k8s things at WMF where possible [11:01:19] Yeah, we can't use Istio 1.20 immediately, it requires kubernetes 1.25 [11:06:46] Latest Istio we could use with k8s v1.23 is 1.17.8 [11:12:11] it really depends what your goal is [11:12:36] if you want to prepare for the next upgrade, everything could be tested separately and packed when upgrading (like we did last time IIRC) [11:13:03] or if you want to make a smaller and more compatible bump right now, to then do another one when the upgrade happens [11:17:37] Well, at the moment I am looking at eithe rupgrading all of it (including k8s itself) or trying to edge as close to latest with everything except k8s, and then... idk. Upgrading k8s and everything ontop seems like a mountain of work. [11:18:29] I can't even get Istio 1.17 to work on (mini) k8s 1.23 because the docs on the web are "too new" and some of the commands to prove the example app works fail [11:30:01] well, checking out an old version of the docs and building them made that work [11:36:32] And Knative v1.12.4 requires v1.26 (it could be overridden, but that seems like shoveling more techdebt onto the pile) [11:36:43] k8s v1.26, that is [11:37:47] Upgrading knative to something that we can run now is surely a good testbed to write down an upgrade strategy, that could re-occur periodically to avoid falling behind upstream too much [11:38:24] but you'll probably have to do another one when upgrading k8s, and IIUC Janis is trying to figure out what could be a good istio target that satisfies all needs [11:39:40] re: minikube and istio, you can use the binaries that we have to test if your target knative version runs fine, the commands to use are the ones that we usually run (init the cluster etc..) [11:40:09] you can build your knative images locally, and there should be a minikube option to allow its docker to see them [11:40:26] and then it is just a matter of kubectl apply some yaml horror to make everything run [11:40:30] then kserve etc.. [11:40:54] We did it in the past for the sandbox vms, it should be easily scriptable after the first time [11:41:14] Minikube lets you set the k8s version, so I've done that, downloaded an older Istio version (1.17.8 for now) and gone through the old docs for installing and testing it. [11:41:59] I'd just install our own version, 1.17.8 is surely not the final target for the next k8s upgrade [11:42:07] so you may end up testing something that we won't use [11:42:38] and use our own docs, it should be way quicker [11:42:45] I mean, using istioctl etc.. [11:43:06] I can give that a try, my current approach is uttelry frustrating [11:44:27] In your current case, I'd proceed as you started doing leaving istio/kserve at the current version and bumping knative to something that can run on them, so you'd have only a single variable to play with (knative) [11:44:51] after the upgrade then you can plan the k8s upgrade, that will require a lot of work, like it happened the last time [11:45:05] but staging will surely help when the time comes [11:45:42] When you say "that I should use our minikube and istio binaries. Where can I find them? [11:46:25] we have them in our apt repos, together with the cni stuff (but that is something skippable for minikube, to preserve mental sanity) [11:46:40] https://wikitech.wikimedia.org/wiki/Istio#Istioctl [11:47:36] basically what I mean is updating https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration with something more up-to-date [11:48:06] and if you make it scriptable, then testing the stack locally will be way easier over time [11:48:12] it is a pain I know [11:48:30] Ah, I I thought you meant we had our own minikube binary [11:49:34] Note that using minikube the way the page suggests will result in it running a very recent version of k8s, and thus things might be smooth when they wouldn't be with the current prod version [11:49:50] oh nvm, scrolling helps %-) [12:15:03] elukey: is Calico setup missing from the sandbox page intentionally? [12:43:35] klausman: yep, for the purpose of the test we don't need to have calico or any bgp config [12:43:47] everything local on one node [13:00:24] ack, just making sure [13:01:01] I think that means that one of the earlier YAMLs needs tweaking since it contains a ref to Calico, IIRC. I'll investigate and update the docs. [13:29:39] you can deploy a version that doesn't require calico, all good [13:29:53] it is sufficient to just create a manifest for istioctl [13:30:10] the one that we have in deployment-charts is not compatible [14:47:39] I think the dep is in knative or kserve: [14:47:41] $ grep -li calico dev-k* [14:47:43] dev-knative-serving.yaml [14:47:45] dev-kserve.yaml [14:48:37] So the `helm template` includes calico-related stuff from the existing charts [14:49:18] Specifically, NetworkPolicies [15:52:14] but dev-knative-serving.yaml comes from upstream? [16:02:54] Well, it's generated from the one in our deployment charts [16:03:36] https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/ML-Sandbox/Configuration#Helm has `helm template "charts/knative-serving" > dev-knative-serving.yaml` [16:03:43] (and then an sed line to fix the release name) [16:04:07] That file, as generated, as Calico NetworkPolicy instances [16:04:55] klausman: yeah IIRC you added those no? [16:04:56] It has unfortunately scrolled out of my buffer, but I think that made (parts of) what `kubectl apply -f dev-knative-serving.yaml` does fail [16:05:07] Yes, a while back [16:05:44] it is fine, for this test, to just use the knative upstream vanilla yaml shipped with the release [16:06:00] alright [16:06:06] that helm template command doesn't take into account any value.yaml file, so we may miss configs etc.. already [16:06:40] I'll also add a note to the wikitech page that depending on usecase, one should use the upstream chart over the deployment repo one