[04:50:53] elukey: dbstore1005 upgraded [06:17:12] nice! [06:17:19] thank you :) [06:31:48] "Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle." [06:31:56] never seen that error before [06:33:18] apparently it is a transient error that goes away on next run [06:35:59] yeah we see them on random hosts from time to time [08:58:03] akosiaris: `require ::k8s::infrastructure_config` is annoying me in `k8s::kubelet`, because we are now declaring our own /etc/kubernetes/kubeconfig without using that class [08:58:30] any suggestion on how to workaround it? [09:04:55] depends if it's used on $kubeconfig == '/etc/kubernetes/kubeconfig' the way I see it [09:06:00] so I guess, the easy and lazy way out is an if guard [09:06:28] but if all envs are overriding kubeconfig (I think they all do, need to audit though), they we can probably just ditch it [09:06:44] * arturo nods [09:06:59] lemme have a look [09:07:15] thanks! [11:13:09] akosiaris: any news? [11:36:14] akosiaris: this is my proposal https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/519375 [11:36:50] we could use `fail()` even [12:22:07] <_joe_> can I say I think you're attacking the problem from the wrong POV? Why is having infrastructure config a problem? It probably needs to be properly configured to match your needs [12:22:37] <_joe_> brr that class [12:23:20] <_joe_> arturo: what I mean is that it should be possible to adapt k8s::infrastructure_config to your needs [12:23:33] <_joe_> if somehow that's not possible, we should get rid of it completely [12:23:46] <_joe_> and only check for the presence of the file [12:23:59] <_joe_> and there is a more elegant way to do that :) [12:25:13] mmm [12:26:10] in general, I'm trying to avoid doing modifications to the underlying k8s modules [12:31:51] <_joe_> arturo: looking at your code specifically [12:31:58] <_joe_> you can just remove the require [12:32:26] <_joe_> Service['kubelet'] depends on File[$kubeconfig] [12:32:41] <_joe_> so the catalog compilation will fail unless that file is declared [12:33:02] ok [12:34:33] I assume subscribe => implies that requirement [13:04:21] arturo: yeah I am working on it. I was detoured a bit when playing with that module https://gerrit.wikimedia.org/r/#/q/topic:k8s_data_types+(status:open+OR+status:merged) [13:04:39] I don't think just removing it is good enough [13:04:48] also note it is also used in kube-proxy as well [13:04:55] so just removing from kubelet is not enough [13:05:55] arturo: does the require btw break anything? or is it just redundant? [13:06:29] cause the kubernetes profile/module are full of redundant stuff from the PoV of production that's there just to support toolforge [13:07:44] it would be cool to start removing those [13:31:26] arturo: just to understand this, I am guessing toolforge has/will stop using profile::kubernetes::node ? [13:38:36] https://gerrit.wikimedia.org/r/#/c/519398/ fwiw [14:34:19] akosiaris: yes, the new code we are writing doesn't use profile::kubernetes::node [14:36:24] arturo: in that case, https://gerrit.wikimedia.org/r/#/c/519398/ should work for you. And it's no hurting production either [14:37:16] ok [14:37:45] anyway, the require itself doesn't prevent much, since there is a parameter in the infrastructure_config class that has no default [14:37:56] so you need to explicitly declare the class somewhere with a proper parameter [14:38:16] relying in the require doesn't work because it has no default [14:38:23] the only effect I see is ordering TBH [14:39:23] so +1 [14:39:43] will run pcc just in case for the current toolforge deployment [14:42:27] akosiaris: [14:42:28] Error: Failed to compile catalog for node tools-worker-1028.tools.eqiad.wmflabs: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Class[K8s::Kubelet]: parameter 'extra_params' expects a String value, got Tuple at /srv/jenkins-workspace/puppet-compiler/17151/change/src/modules/profile/manifests/kubernetes/node.pp:44:5 on node [14:42:28] tools-worker-1028.tools.eqiad.wmflabs [14:43:34] it doesn't look related to this patch? [14:43:54] hmm ok easy to fix [14:44:01] no, it's the addition of data types [14:44:11] I wonder what that value is though [14:44:16] ok [14:44:53] ah, ok my mistake [14:45:13] I read daemon_args.concat(@extra_params) wrong [14:45:13] [14:47:06] unrelated but related akosiaris, we are considering introducing kubeadm into puppet to build the cluster. You may have an opinion on that. [14:47:15] (to build the new toolforge cluster= [14:47:38] how do you intend to do that? It's not exactly written with puppet in mind [14:48:14] I mean you can use puppet to manage kubeadm's config [14:48:21] we know that. I did some tests some months ago and apparently we only need a pre-shared token for bootstrapping [14:48:29] but actually run it? not really... [14:48:29] (which can be stored in hiera or whatever) [14:49:24] we could have a hiera key to indicate we need (or not) bootstrapping and only run the kubeadm cmd in that case by using `exec{` [14:49:47] anyway we have 0 puppet code for this, so this would be a brand new effort [14:50:02] keep in mind it's going to probably timeout [14:50:23] as puppet has maximum execution times etc [14:50:41] I highly doubt you will managed to fully automate it (although I would love to be proven wrong) [14:50:57] we have already very long exec times for some other stuff, like openstack or a simple exec node in SGE [14:51:06] that being said, getting it to a state where a human has to run 1 command and wait it out it should be ok [14:56:07] arturo: should be fixed now [14:58:02] it works! thanks akosiaris [14:59:17] kubeadm in HA mode with external etcd is a great idea [15:00:46] the hard part is PKI for the cluster certificates [15:57:19] <_joe_> fsero: yeah I was reading about it, it seems to be getting better [15:58:11] <_joe_> I did chalk kubeadm as "nice toy" back in the day [15:59:01] <_joe_> and no, I wouldn't use it via puppet [15:59:18] <_joe_> I'd rather use puppet to set up the nodes so that kubeadm can act on top of it [15:59:30] puppet should just put files in place and install packages [15:59:36] the actual kubeadm dance via cumin [15:59:40] <_joe_> although I still have doubts about how does it gets the kube packages [15:59:55] docker images [16:00:07] it means a severe effort of repackaging docker images [16:46:26] or alternatively you trust the upstream kubernetes repo and run the docker images directly [16:46:39] for the apiserver, etc [16:47:44] using cumin is a good idae, but unfortunately we don't have spicerack for inside CloudVPS :-( [21:08:50] boered??? review https://gerrit.wikimedia.org/r/c/labs/private/+/519051 its blocking the hiera reorg if thats motivation and i will happily review in return :) [21:14:40] that's a big diff [21:27:57] yes its all the missing keys currently in the private repo not in the labs/private i think its mostley harmless but am really not sure about that [21:29:28] that's a noble goal.. should fix puppet compiler on a lot of things then [21:29:49] for production its harmles, it may break pcc but not sure about wmcs [21:30:04] maybe you can separate "add missing private fake password" from the changes to common.yaml etc [21:30:08] to make it simpler [21:30:20] yes, agree it should be harmless for prod [21:30:35] yes ill catch u with wmcs tomorrow [21:30:47] i think it probably has the biggest impact on them [21:30:59] oh, i see stuff i added just for testing once [21:31:25] ok [21:32:25] stuff like "salt_key" makes me think it should be deleted btw [21:32:34] from the prod site that is [21:33:01] there shouldbt be salt anymore [21:33:21] yes i really have no coment on the data i used a script to create the patch https://gerrit.wikimedia.org/r/c/labs/private/+/519050 [21:33:25] but doesnt mean adding it to labs/private would be harmful [21:33:31] it's good if both sides are matching either way [21:33:38] ack [21:34:32] but i have low prio idea to add to that script so we can audit theses things a bit better