[17:03:38] on a given host that is not puppetserver, how do I look up a key using puppet lookup, as an external command (and not lookup())? [17:03:48] my lookup fails on the host but it works on puppetserver [17:04:12] what I am trying to do here is read a key in Python (and not Puppet) [17:04:52] sukhe@puppetserver1001:~$ sudo puppet lookup --render-as s --compile --node "dns1004.wikimedia.org" "profile::dns::auth::authdns_servers_ips" [17:04:55] {"dns1004.wikimedia.org"=>"208.80.154.6", [17:04:57] on dns1004, it fails [17:04:59] works [17:05:18] the full context is that I want to read the values say as in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092324/6/hieradata/hosts/dns4003.yaml [17:05:51] and while I can put them in data/, I don't want to because I want to make use of the hiera hierarchies [17:05:54] any thoughts on how to do that? [17:06:05] you could materalize the hiera into a file [17:06:18] you can't lookup hiera from a client [17:06:20] to_yaml on the lookup even [17:06:23] cdanis: do you a flat YAML file? [17:06:36] volans: I figured as much but how do I do achieve this? [17:06:36] as chris says you can create a config file from it and read it [17:06:40] I mean, that's kind of up to you [17:06:55] or any other way puppet can materialize data on the host [17:06:57] I see, so what you are suggesting is reading the hiera from lookup() but then dupming to YAML [17:07:00] hmmm [17:07:01] defining a puppet file resource where the contents are yaml of the lookup is very reasonable [17:07:03] yeah [17:07:18] what's running the script? [17:07:45] well, the original invocation is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092324 [17:07:51] in which I was failing Puppet so that works [17:08:01] but we decided not to do that so will export metrics instead [17:08:05] ah [17:08:13] I know from where your patch is coming from but I'm not a fan of hardcoding hardware specs into hiera [17:08:29] volans: where would you define the source of truth? [17:09:05] specifically, the ones we want there; see https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092324/6/hieradata/hosts/dns4003.yaml [17:11:50] we've discussed in the past multiple times about source of truths for this kind of data and similar ones like partman recipes, raid config, etc. I think we need yet to find a solution, but whatever the solution will be it should be aware of clusters and generations and avoid to hardcode them for each host IMHO [17:12:21] that would be nice. the idea is not to hardcode for each host though but just clusters per-site. [17:12:33] hardcoding 34359738368 doesn't seem a great idea if you want to check that the host has X dimms of Y GB each [17:12:36] the only reason the host override is in above is to test it out before rolling it out everywhere else [17:13:03] and beside that this seems a provisioning check, not somethign that should run all the time [17:13:06] volans: there is individual per-DIMM calculation too though that adds up to the total bytes [17:13:43] volans: we decided we want to do it all the time though, so that bit is intentional [17:13:55] which is why we are not failing anymore since and just alerting [17:14:03] what I meant is that if tomorrow a dimm breaks and you replace it with another one that has a slighgly different number but is still Y GB [17:14:11] the check would fail while it should pass [17:15:11] what number do you mean though? I am comparing the size though and that's intentional [17:15:25] like we want to see if we have 2x16 DIMMs and not one 32 [17:16:56] so while some other source of truth would be nice, till then we are doing hiera-per-cluster-per-site CDN hardware config and DNS hardware config (DNS HW config being the same) [17:18:17] (note in the above, the DIMM value is coming from lshw and not /proc/meminfo that can vary, so the "bytes" should be the total size unless I am missing something) [17:19:17] what I meant is, are you sure all the dimms will report the same exact size? why not broadly checking they are 32GB instead of checking the exact number [17:19:23] also more readable in hiera by humans [17:20:03] sukhe: why doing the check *all the time* instead of just @reboot? [17:20:41] I that covers all possible scenarios of volunteer or involunteer changes [17:21:43] volans: lshw should report the exact size and we want to add them up individually to avoid NUMA performance regressions due to imbalanced DIMMs, so we check total system memory adding up from individual DIMMs [17:22:10] volans: yeah so in the first iteration, my idea was to only fail() during provisioing but Traffic is not a fan of it so no fail, only alerting [17:22:32] and the regular check is meant to cover HW maintenance and such, as we have observed disk changes and want to be aware of that and alert on that [17:22:49] HW mainteance requires a reboot [17:23:40] sure, most does, but we have considered hot-swapping in some cases. but I guess also don't see why it's an issue running this check periodically in general [17:24:12] it's simply going to export the metrics that we alert on, once a day or something. [17:24:19] and the of course on the first provisioning [17:24:56] is your concern that it is resource intensive or not required? [17:30:21] The original problem is to avoid host mistmach when racking/provisioning new batch of hosts right? [17:30:34] as this is what happened [17:30:34] yes, as we saw in magru [17:30:43] so it stemmed from there [17:30:45] this seems waaaay too overkill [17:30:56] ok. I am curious to hear then how you would do it. [17:32:24] I miss the context, why where they mismatched? how they are identified when unboxing them? [17:33:53] not sure of those details exactly [17:34:03] this is the key [17:34:08] to avoid the problem :) [17:34:19] well the mistake happened during racking somehow and that's what we are trying to catch [17:34:44] because I think and we need to check again to be extra sure, we have a possible similar mixup in one other site [17:35:04] so if we are saying cp hosts shoudl be Gold and everything else should be silver with X amount of RAM, we want to confirm that [17:35:15] I can't comment on the racking part obviously [17:35:51] and I also don't think we can fix this during racking somehow given we will use remote hands in some cases [17:36:08] so I am still not sure why you are opposed to a simple alerting if the HW config doesn't match :P [17:36:26] magru mixup cost us a lot of time and money, so that's one way measure it [17:45:51] in italian there is a say that goes along the lines of "prevention is better than cure" in some cases the first puppet run might even be too late, pick the example that reimage into insetup works fine, the onsite people goes back home and then the issue is disovered with the actual final role later [17:47:25] that said I'm not against having some HW source of truth, fully agree on that [17:47:30] yes. it doesn't cover that case for edge sites at least (other than ulsfo). but then what does is the question [17:47:38] but checking that the CPU hasn't changed every 30m seems silly to me :) [17:48:03] I don't think it is every 30 mins or so [17:48:11] isn't at each puppet run? [17:48:21] if you worry about everything silly that happens every 1.8e+9 microseconds on a computer you're gonna be worrying about a lot of things [17:48:22] <_joe_> I have my 2 cents on this topic, neither of you will like them though :) [17:48:28] that's in the current version yes. but like I said, we are not doing that, we are alerting [17:48:33] <_joe_> (volans and sukhe) [17:48:45] and that alert is not going to be every 30 mins, more like once a day, but yes, alert and automated [17:48:47] sukhe: so a prometheus metric with the CPU model? [17:48:49] I also don't see the harm though [17:49:07] volans: or just a CPU check overall, yes [17:49:16] volans: i'm guessing a metric that says mismatch or not, probably just as a textfile exporter and systemd timer [17:49:17] _joe_: let's hear it :) [17:49:19] <_joe_> The problem you want to solve is misconfiguring /mislabeling hardware in a new POP, right? [17:49:21] cdanis: yes [17:49:25] _joe_: yes [17:49:28] <_joe_> ok so [17:49:43] <_joe_> this is a precise point in time when you should check [17:49:49] when you label :LD [17:49:51] <_joe_> so, when the hardware is first racked [17:49:57] what I asked before [17:50:07] how to identify the hosts when unboxing [17:50:09] <_joe_> volans: shut up for a second :) [17:50:12] btw, my take is, I think we've already spent more engineering time discussing the details of what to do about this than any choice we make here could possibly matter [17:50:18] <_joe_> ^^ [17:50:36] <_joe_> so, I would go with what the japanese rail company does [17:50:42] the code is mostly out there but it seemed like volans is really opposed to it and I wanted to know why :) [17:50:50] <_joe_> have two groups of people independently check the hardware [17:50:57] <_joe_> at that precise time [17:51:02] _joe_: not possible with remote hands [17:51:13] <_joe_> sukhe: I was getting there [17:51:16] and then the check has to be defined somewhere and right now it's on a wiki or something [17:51:37] More density could help...fewer hosts to check, less to go wrong [17:51:47] <_joe_> if we don't have anyone actually physically installing the racks, maybe we should think of a way to ship the racks preconfigured [17:52:07] inflatador: more density in PoPs is a reliability nightmare. IMO we're already too dense (or maybe just not-numerous-enough) there [17:52:24] <_joe_> but AIUI we do send someone from dcops to finish the setup, right? [17:52:25] that's how it is usually except not what happened in magru for some reason (I am not sure why and for what I know, I am not confident discussing it in a public channel) [17:53:06] not enough density sounds like a reliability nightmare too [17:53:10] <_joe_> sukhe: ack, but my point is basically - once the servers are in the racks and you have oob access, make it an explicit part of the procedure to validate hardware installs [17:53:19] _joe_: not always; for example, eqsin is all remote hands and I think so is esams and drmrs [17:53:40] once you have oob you could check those via redfish for example :D [17:53:50] _joe_: yes but I think what doesn't help is that we don't have a source of truth anywhere basically [17:53:58] it's all on spreadsheets, phab tasks, and pages like https://wikitech.wikimedia.org/wiki/CDN/Hardware [17:54:23] volans: that still comes with chances of human error though, unless you are piping some predefined values somewhere and it's still manual [17:54:43] I think I am beginining to regret asking the question of a lookup() now :P [17:55:00] <_joe_> sukhe: tbh I don't think doing data entry in a further place will also be a place where human errors can happen :) [17:55:12] <_joe_> sukhe: you would've gotten my piece tomorrow anyways [17:55:14] lol [17:55:24] <_joe_> I already said this to valentin :P [17:56:24] <_joe_> and yes, cdanis +100 on the fact we're currently underprovisioned at the edge (by not being overprovisioned for our regular traffic) [17:57:05] we are overprovisioned too in some cases, at least for the DNS hosts, and part of doing the inventory for this was to figure out where/how and what we can do about it [17:57:20] <_joe_> just this week, these damn AI techbros sucking our images created enough busywork for oncall people to justify my point for me [17:58:11] sukhe: why manual? like the provision cookbook could access the source of truth and check that the values are correct for example, anyway as said we've discussed too much on this. [17:58:42] sure, that's one idea too, but that would still mean a source of truth. [17:59:06] volans: yeah, like I said, I should have just asked the question maybe and not given more context probably :) [17:59:16] anyway, it's late for you and we can wrap this up [18:11:58] re: unboxing earlier, it also matters after remote hands works on the site between hw refresh cycles (to revalidate) [18:21:26] would it be practical to have a bunch of 1TB hypervisors running VMs <=128GB? Fewer ports, fewer hosts, etc? [18:22:40] err... >=128GB that is [18:23:54] I think at that point it's mostly just math on efficiency and failure risks [18:24:12] (when considering how large the metal machines should be) [18:24:54] you probably get more compute/ram/etc per $$ with somewhat-larger-than-normal metal boxes, up to a point. Beyond that point they start getting expensive. [18:25:32] and fewer hosts and fewer network ports, etc seems nice too, but you want want to have all your stuff spread around a sufficient count of metal boxes that you're not creating large failure domains, where one electronics fault wipes out 17% of your capacity or whatever. [18:25:50] (ditto racking/power/etc -level failure domains) [18:43:43] Yeah, it's all a trade-off for sure. I was just thinking about the large VM stuff as we just split wdqs into 3 services, and the shape of our hardware is sub-optimal ATM [18:46:46] Long-term solution is k8s, of course. We'll get there ;)