[00:05:40] FIRING: SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:22:27] FIRING: ThanosCompactHalted: Thanos Compact on titan2001 has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [04:05:40] FIRING: SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:42:12] RESOLVED: ThanosCompactHalted: Thanos Compact on titan2001 has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [06:45:25] FIRING: [2x] SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:23:41] FIRING: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance prometheus4003:9900) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=ulsfo%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [08:28:41] RESOLVED: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance prometheus4003:9900) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=ulsfo%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [08:45:25] FIRING: [2x] SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:45:40] FIRING: SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:45:40] FIRING: SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:45:40] FIRING: SystemdUnitFailed: grafana-ldap-users-sync.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:53:01] hello o11y - this quarter, we'll be migrating conf* hosts to trixie. among other things, those run zookeeper, so we'll also need to rebuild the JMX exporter package [0] for trixie. [21:53:01] I'm happy to give that a try, but I realize your team has managed that package historically, so I wanted to check here for concerns before touching anything :) thanks! [21:53:01] [0] https://gerrit.wikimedia.org/g/operations/debs/prometheus-jmx-exporter [23:08:41] Yeah, I think that package is due for an update as we're using an old release of it. [23:11:06] Looking at the releases breaking changes were introduced here [0]. Maybe we should package the existing version to Trixie and then upgrade it. https://github.com/prometheus/jmx_exporter/releases/tag/1.0.1 [23:32:17] thanks for taking a look! yeah, that's what I was inclined to propose - rebuild 0.15.0 for trixie and then (later) update to a newer version from upstream (i.e., to unblock debian upgrades while decoupling from any complexities from upgrading the exporter)