[06:50:02] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=deploy1001&service=Keyholder+SSH+agent "CRITICAL: Keyholder is not armed. Run 'keyholder arm' to arm it." Is it ok to do that or there is something more to know about it? I see a lot of things in SAL about deploy1001 but nothing like a reboot [06:53:53] awwww. seriously ? :( my concern about adding zuul came true i guess. i'll try it [06:56:38] XioNoX: fixed. yes, it is ok to do it. it will ask for a passphrase. that passphrase is in pwstore deployment-key-passphrase [06:56:56] yep, just wanted to make sure as it's friday :) [06:57:02] at least it's just one passphrase nowadays and not multiple different ones anymore [06:57:59] it probably broke because a new key was added yesterday.. but with a delay [06:58:17] ok, thanks! [07:01:33] in the morning icinga round, there is also contint2001 that fails its puppet runs because: E: Unable to locate package blubber [07:01:55] (and others) [07:02:01] contint2001 is in downtime for multiple days. is that deleted again ??? [07:02:12] i can't confirm that as an unhandled issue [07:02:21] as opposed to sodium and the cloud-dev thing [07:02:55] i see exactly 3 remaining unhandled issues right now [07:03:09] netbox reports .. as usual [07:03:26] i fixed sodium and acked cloud-dev to reduce it [07:03:59] contint2001 is in scheduled downtime as it should be [07:04:02] mutante: it's more sneaky, it's in that one https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=puppetdb1002&service=Ensure+hosts+are+not+performing+a+change+on+every+puppet+run [07:05:43] i see. that check. yea, i don't know what to do about that because it's about multiple hosts at once [07:05:52] dunno what's the best way to handle those alerts, I guess that's why it's warning and not critical [07:06:12] ideally they should be service check for their respective hosts, but it might be tricky to do [07:06:17] yea, i agree [07:06:45] the one on netbox1001 is legit because it git clones netbox-extras on every run [07:07:02] yeah, I pinged Cas about it [07:07:04] contint2001 is just broken because it waits for missing packages [07:07:24] idp-test2001 is a candidate where i think "if test is in the hostname then icinga should ignore it" [07:07:33] and idp-test I don't want to look at it says test [07:07:35] haha yeah [07:08:43] XioNoX: we have done that for "labtest" in the past [07:08:51] in hiera regex.yaml [07:09:01] __regex: !ruby/regexp /^labtest/ [07:09:11] do_paging: false [07:09:11] etc [07:44:18] <_joe_> tbh that alert should be an advice to the people maintaining the role applied to the host [08:34:48] what alert? [08:45:56] <_joe_> Ensure+hosts+are+not+performing+a+change+on+every+puppet+run [15:10:27] mutante XioNoX, i responded to the CR, i agree that we should just manually pull the repo and ensure => present [15:11:12] chaomodus: alright, thank you [15:11:23] i'll update the docs when you merge it [15:13:37] chaomodus: ok, done :) [15:13:41] cool :) [19:51:36] what is the difference between modules/base and modules/standard in operations/puppet? [19:53:57] good question! :) [19:54:06] my hunch is that base is more-standard than standard [19:54:27] (as in base things should apply just about everywhere, while there are more known exceptions/options to things using the standard stuff) [19:54:30] I would bet that one of the two is not included in wmcs, but I might be wrong [19:59:02] I believe both are included in wmcs -- at least, I see changes from both teams in both places [20:01:38] yeah I donno, every time I think I could say something definitive about them, I go do some puppet spelunking and find a counter-argument :) [20:02:01] 🙃 [20:03:51] the only hard fact I can say, is that profile::standard includes profile::base (via the "standard" class in modules/standard/manifests/init.pp), while profile::base doesn't seem to include profile::standard. [20:04:16] which seems to roughly align with my presumption that base is more-universal than standard [20:06:34] is there a pre-existing icinga check for uncommitted local changes in a git repository after a certain time period? [20:11:36] chaomodus: I only know of one that checks against an origin (e.g. for puppet-merge) [20:13:02] I'm actually kind of surprised such a thing doesn't exist for the private-private repo [20:14:07] modules/profile/files/dns/auth/authdns-git-pull has the git incantations you'd need (untracked files, unstaged changes, staged but uncommitted changes) [20:14:53] chaomodus: happy to review such a check if you write it; you could piece it together pretty easily from ^ and also from modules/monitoring/manifests/icinga/git_merge.pp [20:23:43] interesting [20:24:25] thanks for teh pointers! [20:24:26] (another check that centrally checks that all private repos have the same HEAD sha1 would also be a fine idea) [20:24:36] 👍 [20:24:58] yah i think that's more in the tooling of the like merge magic that i think v.olans was thinking about [20:26:23] like ideally the repos we're switching to ensure=>present would alert after a few days or something if there are merged gerrit changes that aren't present locally [20:26:30] and also alert if there are local changes [20:26:49] (i'll just be looking at the later but the former needs to happen too, along with the merge tooling:) [21:00:29] what's the pattern in an icinga check for like 'it's been bad like this for n seconds' [21:06:59] chaomodus: the usual idiom is to write a high number of retries 🙃 [21:07:14] oy [21:07:18] i'll just let you pass an age and look at the ctime [21:07:24] that is also fine, yeah [21:07:31] not perfect but seems good as a first approximation of what i want [21:07:37] you can't always determine the time that the badness started from things the check script can see [21:08:41] how do you mean? [21:13:07] if instead of inspecting a filesystem, you were calling a remote API, or just checking the current level of some metric, or similar, where there isn't any state [21:13:10] or history [21:13:25] then all you have is # of retries / retry duration etc in icinga [21:14:31] ah [21:14:43] that makes sense although of course not applicable here.