[15:40:17] herron, jbond42: I got a failure yesterday on the puppet compiler with a query_nodes that failed returning Undef [15:40:35] https://puppet-compiler.wmflabs.org/compiler1001/19374/cumin1001.eqiad.wmnet/change.cumin1001.eqiad.wmnet.err for the details [15:41:42] volans: we discusssed that yesterday in the SRE foundations meeting, see John's entry from https://etherpad.wikimedia.org/p/SRE-Foundations-2019-11-13 [15:41:51] https://phabricator.wikimedia.org/T238053 [15:42:33] doesn't look that one [15:43:10] what is failing there is: $homer_peers = query_nodes('Class[profile::homer]').filter |$value| { $value != $::fqdn } [15:43:41] that after is used as [15:43:45] private_git_peer => $homer_peers[0], [15:43:46] ah, yes indeed, unrelated in fact [15:44:03] volans: i bellive the homer nodes where previously failing because they where missing secrets in the labs dir [15:44:05] parameter 'private_git_peer' expects a Stdlib::Host value, got Undef [15:44:17] this is not hiera [15:44:17] if thats now been fixed you may need to run the populate-db script again [15:46:28] is query_nodes needed for the use case? how dynamic are homer peer expected to be? [15:47:13] volans: could be simlar to this https://phabricator.wikimedia.org/T228266 [15:50:05] herron: I think so to avoid to hardcode stuff in hiera [16:01:03] how many home peers are there out of curiosity? [16:01:12] homer* [16:01:26] 2 [16:01:37] and in each of them I need the other one [16:01:47] to replicate a private git repo between them [16:02:08] so when compiling 1001 I need 2001 and when I compile 2001 I need 1001 as value [16:02:25] * volans break, bbiab sorry [16:04:49] this isn't just because someone needs to re-replicate the facts into the compiler, is it? [16:08:38] cdanis: no the facts are fine, its becauyse the puppetdb running on the compiler hosts dpont have the exported resources. [16:10:02] volans: i have just tried to populatye the db on comiler 1001 manuly and i get the same error, it seems there is a circuler dependency here. the nodes that are trying to export there resources are also the nodes that are trying to query for the exported resources. As the catalouge never commpiles the resources never get exported to the db [16:10:42] haven't we had this situation before though? [16:10:45] modules/profile/manifests/cumin/master.pp: $cumin_masters = unique(concat(query_nodes('Class[Role::Cumin::Master]'), [$::fqdn])) [16:10:47] modules/profile/manifests/debmonitor/server.pp: $hosts = unique(concat(query_nodes('Class[Role::Debmonitor::Server]'), [$::fqdn])) [16:11:32] cdanis: it depends if the policy cares about getting undef. in the example above if nothing is returned from the db the array will still have [$fqdn] [16:11:38] ahhh [16:11:40] thanks [16:12:52] although ther change is working in production ... [16:12:55] there are some other usages of query_notes that don't have defaults, but looks like those only get referenced by template files [16:13:31] yes i think this is causing a problem because its a class param [16:24:40] ok so it looks like the circuler dependncy did not al;ways exist which is why i think it was able to work on production https://github.com/wikimedia/puppet/commit/864f858630ea5383eca7e53027ae7d1266340b25 [16:47:36] volans: i have applied https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/550874 and manually update the db on compiler1001 and its working now https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/19396/console [16:58:24] jbond42: unrelated, do you have any idea of the timeframe on which cloud will be using puppet 5.x? [17:00:04] i know its been worked on https://phabricator.wikimedia.org/T235218 not sure what the timelines are Krenair is your best bet for timelines [17:05:08] ahh I see [17:05:10] thanks [17:11:22] jbond42: ack but that's not a good fix, in the sense that if we don't have any that would not install homer [17:11:26] that is a bit wierd [17:11:42] but yeah I understand the problem [17:11:44] thinking about it [17:12:19] yes wasn't sure iof its what you wanted, just a bandaid to get it working [17:12:42] yeah thanks for that [17:13:07] I don't think there is a solution thouhg, and yeah that thing would not work on first installation of the cluster, my bad [17:14:11] I don't think there is a clean way to do that in puppet that works for both first installation and subsequent ones [17:14:12] i ges you want the git peer to be optional and if its not present dont install the cronjob to sync the repos??? [17:14:18] that doesn't require a double run at the start [17:14:30] yes either way its a double run i think [17:14:30] yeah most likely that's the only option [17:14:41] and that's bad :( [17:15:13] I might end up hardcoding them in hiera :( [17:15:13] if we solve this problem maybe we can also solve the icinga master monitoring the icinga replica ;) [17:15:29] cdanis: lol [17:20:00] homer would be in good company with the various other cluster nodes set in hiera [17:21:24] ahahah [17:22:52] alert deduplication is a way (IMO the way) to solve master/replica monitoring. if both hosts are actively sending alerts, and alerts are being deduplicated, they could both monitor each other pretty easily