[00:48:56] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:48:56] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:48:56] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:24:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [10:29:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [12:46:28] I've silenced the curator_actions_cluster_wide.service alert until next week btw [13:50:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [13:55:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [13:56:18] thx go.dog [13:56:41] hi folks, I'm trying to create an annotation query based on haproxy_process_build_info that includes haproxy version running as the content of the label "version" [13:57:19] label_values(haproxy_process_build_info, version) doesn't do the trick, what I'm missing, probably cause the value is always the same (aka 1)? [14:02:59] sigh I see.. can't use label_values directly [14:09:12] and last_over_time(haproxy_process_build_info[1h]) will flood the dashboard with annotations [14:36:32] vgutierrez: ah, you're looking for changes in the value from the same instance? [14:36:45] yes [14:36:57] I'd like to get automatic annotations when haproxy gets updated [16:13:14] vgutierrez: interesting, were you able to get sth working ? [16:15:18] nope [16:15:32] closest seems to be https://github.com/grafana/grafana/issues/11948#issuecomment-403841249 [16:15:50] `haproxy_process_build_info unless (haproxy_process_build_info offset 1m)` [16:16:12] but doesn't seem to work properly if you try to get it per instance/cluster/site [16:16:40] the approach proposed here https://github.com/grafana/grafana/issues/11948#issuecomment-389743476 makes sense but needs a recording rule [16:18:15] I see, yeah not ideal and seems workable at least