[08:29:13] anyone have a trick for mass removal of downtimes? [08:30:31] write a cookbook :-) [08:30:42] your definition of "trick" needs.. improving. :) [08:30:43] I think the only way is the GUI, where you can tick off multiple ones [08:31:22] kormat: I do!....let them expire [08:32:37] like if you navigate to the full host list, then tick the affected servers and select "Remove Downtimes (..)" under "Commands for checked hosts(s)" [08:34:20] hmm. i'll have a look. [08:35:08] ahh, this is much better. thanks moritzm :) [08:43:42] victory \o/ https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&style=hostdetail&hostprops=1 will show all hosts that are downtimed. bookmarking this [12:26:46] I have a shell one liner as well kormat [12:30:07] cost: 0.25 souls [12:31:20] it might have been lost to the sands of time, actually, if icinga1001 has been decommed (haven’t checked yet) [12:39:04] still alive and kicking [12:53:32] godog: poor prometheus2003 sent a mail 13mins ago saying it was unable to fork [12:55:26] :( indeed poor prometheus2003 [12:55:54] looks like it was due to a heavy query, mitigations are part of o11y OKRs this quarter tho [12:58:29] godog: out of interest - do you have a dashboard that shows this clearly? [13:00:17] kormat: yeah e.g. cluster overview for prometheus/codfw, https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&from=now-1h&to=now&var-site=codfw&var-cluster=prometheus&var-instance=All&var-datasource=thanos [13:00:27] kormat: also 'prometheus server' https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?orgId=1&refresh=1m&var-Prometheus=%5E.*$&var-RuleGroup=All&var-datasource=codfw%20prometheus%2Fops&from=now-1h&to=now [13:03:46] godog: mm. i'm not seeing a graph there that makes it clear that the machine hit max procs [13:04:03] oh, or maybe it just ran out of ram? [13:05:03] yeah ok, that's what it was [13:11:31] it looks like prom node exporter 0.17.0 _just_ missed out on having metrics for number of processes: https://github.com/prometheus/node_exporter/pull/950/files [13:12:37] :( [18:02:31] mutante: I'm looking at bd5ce0aceaeba2a3dece91c9d36a0c9c1492928a and I see a renamed (or replaced?) argument: [18:02:31] $puppetmasters = hiera('puppetmaster::servers'), [18:02:40] replaced with [18:02:40] + Hash[String, Puppetmaster::Backends] $servers = lookup(puppetmaster::servers), [18:02:44] is that on purpose or a typo? [18:03:48] do you mean hiera() vs. lookup(), or quotes vs. no quotes? [18:04:07] oh wait, neither, you probably mean $puppetmasters vs. $servers, sorry [18:04:28] yeah [18:04:43] It looks like it was on purpose but I have a spot where the change wasn't tracked which made me wonder [18:05:01] I guess I'll hope that it didn't change format as well [18:08:42] andrewbogott: it looks like it's my change but it's actually not. but yea, it was on purpose [18:08:57] ok, I'll try to fix in the class that calls it [18:09:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/633215 [18:09:30] "we can infer the workers with `$servers[$facts['fqdn']]`" [18:12:34] andrewbogott: yes, that looks correct. and no, format has not changed [18:12:49] thx [18:12:54] it's still getting it from the same key in hiera [19:36:11] jbond42: re: nmap (too many channels, not sure where the backlog was). but looks like it was already installed manually in multiple places: [19:36:15] https://debmonitor.wikimedia.org/packages/nmap [19:37:01] we should probably avoid doing that and make them official [19:38:19] the only actual puppet classes installing it are a role pentest::tools which doesn't seem to be used anywhere and diffscan itself [19:39:13] somewhat tempted to removed unpuppetized packages [23:19:32] no page at 22:00 today :)