[12:28:08] godog: I assume I can drop the upstart file from prometheus-pdns-rec-exporter for stretch, right? [12:32:40] cc moritzm [13:09:47] sorry, was in an interview, only saw it just now; yes, all Upstart jobs you find anywhere in our git repos can be removed now :-) [13:35:13] :-) cool [15:04:40] godog cdanis do you have plans for move some data from icinga to prometheus using a nrpe exporter like https://www.robustperception.io/nagios-nrpe-prometheus-exporter? [15:08:32] elukey: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=stat1007&service=puppet+last+run [15:08:59] fsero: (in an interview) [15:12:04] onimisionipe: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=maps2004&service=Disk+space [15:12:38] XioNoX: Ive extended downtime. Thanks for the ping [15:43:46] fsero: that could for sure be useful, i've also thought about somehow adding icinga alerts as grafana annotations, so you can see them on graphs [15:50:06] cdanis: have we decided on an avenue of annotations yet or is thact still a question? [15:50:13] I see you can stick em in a variety of data sources [15:50:39] very much still an open question [15:51:09] although XioNoX is already storing icinga alert output in logstash so maybe we could just point grafana at that elasticsearch? [15:51:16] but i haven't looked at the details [15:51:22] cdanis: no, only the alerting ones [15:51:33] not all icinga states [15:51:39] mm [15:52:27] I looked at it a bit and there is some configuration around an instance of ES on the grafana host (at some point in the past?) [15:53:18] ES or mysql seems like the best way to store em for like expilict events like puppet-merge and such [15:53:30] That's not the case any more though. Given the public nature of grafana, the logic around being picky with events made sense to me. [15:53:32] it'd be nice if we could agree on a Standard so we could start pushing them from tooling [16:01:07] fsero: indeed what cdanis was saying, I had toyed too with having icinga's alerts themselves exported as metrics, i.e. the final result not the nrpe check [16:01:16] I'll be a bit late to SRE meeting. Need to find a good connection [20:16:27] random question, what monitoring does PoolCounter (https://www.mediawiki.org/wiki/PoolCounter) have? there is no grafana dashboard for it I could find [20:22:34] This should be an easy +1 but as it has a global impact (at least update many hosts ip6tables), I'd like more eyes to have a look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/514109 -- thanks! [20:38:01] PoolCounter is one of those obscure but very important things [20:39:08] IIRC it was once taken down by physical server maintenance on a benign-sounding server name or something. [20:40:20] there is a nagios check that the port it exposes is reachable [20:40:34] would not surprise me if there were no grafana dashboard though