[04:27:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [04:32:40] RESOLVED: [2x] LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [10:36:13] tappof: godog can you review? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1159406 [10:36:22] fixes a missing entry in cloud.yaml [10:37:42] dcaro: yep LGTM [10:37:55] hmm.. that might not work though, as it expects it to be non-null I think [10:38:26] mmhh maybe it should be nullable? not sure, or an empty hash too [10:38:40] i think you need to set it to an empty hash [10:39:01] or even {series: []} [10:39:28] also the puppet logic is going to need to be adjusted to only run the service when there's a series configured [10:40:14] maybe better to check if it's null, and if it is skip the whole instantiation? [10:42:35] yeah making it nullable seems simpler to me [10:47:01] dcaro: got to run to lunch, patch LGTM though [10:47:05] ttyl [10:47:26] okok, I just rephrased the commit to the new changes, I'll merge once tested [13:39:27] FIRING: IcingaOverload: Checks are taking long to execute on alert1002:9245 - https://wikitech.wikimedia.org/wiki/Icinga#IcingaOverload - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org/?q=alertname%3DIcingaOverload [13:41:21] FYI, prometheu7002 will briefly go down in ca. one hour, I'm switching it to DRBD disk storage (when it was initially installed we only had a single node in the fresh cluster), the VM will go down for 20-30 seconds [13:44:27] RESOLVED: IcingaOverload: Checks are taking long to execute on alert1002:9245 - https://wikitech.wikimedia.org/wiki/Icinga#IcingaOverload - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org/?q=alertname%3DIcingaOverload [20:58:58] Let's say you have a bunch of prometheus::blackbox::check::http and other checks together in a class.. and in the puppet-world you would include that in the role on your hosts being monitored. Then services move to k8s and you have no more puppet role including it.. but you still want to keep the monitoring as is. Where to include the puppet class now? dump it all into [20:59:04] profile::prometheus::ops which is already over 2700 lines long? [21:00:07] some special virtual host just for that? [21:04:54] I think that having a a dedicated profile like `profile::prometheus::blackbox::standalone_checks` would be a good approach, specially since `profile::prometheus::ops` is already way too big. That dedicated profile could be included in a vhost dedicated to only running those blackbox checks. [21:05:11] Tho let's see what others think about it. :) [21:06:51] Thank you, denisse. Ok, I am going to wait for other timezones. Then let's see. Happy to make a patch if something like this gets upvotes.