[00:16:56] Anyone know if there's a way to have a `monitoring::graphite_threshold` without an associated critical threshold? [00:17:24] If not I'll just set the critical threshold absurdly high, but basically I'm looking to add an alert that's just a warning (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/643362 for the patch in question) [00:18:47] I do see a variable `$nagios_critical` so maybe if that's set to false then the critical doesn't lead to an actual physical page? [00:24:46] ryankemper: from what I can tell, a critical threshold is neccessary for the check command to render properly. `nagios_critical=true` means send a page for this alert [00:25:20] ryankemper: that's all correct :) [00:25:33] the word 'critical' is unfortunately very overloaded in this context [00:25:37] shdubsh: Thanks, that's what I was thinking since the `critical` value gets used in the actual command itself. Thanks, I'll set a `critical` threshold without setting `nagios_critical` to true [00:25:47] cdanis: ack [00:25:52] that will just mean it gets reported on IRC, which is likely what you want [00:26:25] warning > current value > critical means it won't get reported on IRC, but will show up on icinga.wm.o/alerts [00:27:49] interesting [15:56:39] Hi SRE types. Can I get a +2 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/642178 please ? [15:56:49] (already tested in production) [15:58:46] dancy: done [15:58:59] remind me, wmf-auto-reimage-host does not support VMs yet, or does it if you skip the MGMT FQDN question [15:59:38] Thanks kormat! [16:00:02] will do a manual "boot into PXE" for VM, manually removing and re-adding puppet cert, icinga downtime.. to achieve that for a Ganeti VM [16:00:49] mutante: according to doc, it doesn't: https://wikitech.wikimedia.org/wiki/Ganeti#Reinstall_/_Reimage_a_VM [16:01:16] jynus: oh, thanks for the right link, somehow missed that [16:01:21] mutante: no it doesn't [16:01:30] it is also warned here: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reimage [16:01:52] ACK, thanks, i'll follow the docs [16:02:15] that looks like what I was going to do, yep [16:02:42] to be fair, worklow is highly simplfied for vms [16:03:43] it's all good, that wasn't even meant as a request that we necessarily need the cookbook, more like making sure i do it still the right way [16:05:33] the only extra thing seems icinga downtime [16:20:20] well, and the puppet cert revoke/resign [16:58:50] mutante: would a decom + makevm work in this case or you have to upgrade in place? [17:08:05] volans: it would work too but i feel like it might take actually a bit longer to get the same result. fine with either [17:08:20] I dont want to change the name in this case [17:08:26] ok [21:59:48] 'Watchmouse" has renamed like the 5th time. what had many names and was most recently "DX App Synthetic Monitor" will be "Broadcom Service Status", heh. in case anyone will be surprised and still wants the mails [22:00:28] just because I am reading maint-announce as part of clinic duty and clicked "no action needed". they say it will migrate automatically somehow