[00:17:50] FIRING: DiskSpace: Disk space kafka-logging1002:9100:/srv 3.155% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=kafka-logging1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:17:50] FIRING: DiskSpace: Disk space kafka-logging1002:9100:/srv 3.353% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=kafka-logging1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:17:50] FIRING: DiskSpace: Disk space kafka-logging1002:9100:/srv 3.721% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=kafka-logging1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:04:34] FIRING: DiskSpace: Disk space titan2001:9100:/srv 1.687% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=titan2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:14:34] RESOLVED: DiskSpace: Disk space titan2001:9100:/srv 0.01337% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=titan2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:48:12] FIRING: ThanosCompactHalted: Thanos Compact has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [10:08:12] RESOLVED: ThanosCompactHalted: Thanos Compact has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [11:07:34] FIRING: DiskSpace: Disk space titan2001:9100:/srv 1.643% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=titan2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:56:32] ^^ This is due to compactor operations [12:17:50] FIRING: DiskSpace: Disk space kafka-logging1002:9100:/srv 3.602% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=kafka-logging1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:32:34] RESOLVED: DiskSpace: Disk space titan2001:9100:/srv 2.756% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=titan2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:37:12] FIRING: ThanosCompactHalted: Thanos Compact has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [13:22:12] RESOLVED: ThanosCompactHalted: Thanos Compact has failed to run and is now halted. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactHalted [14:57:35] RESOLVED: DiskSpace: Disk space kafka-logging1002:9100:/srv 2.993% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=kafka-logging1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:04:01] the k8s-mw-codfw topic has been growing in size considerably since the beginning of the month. for now I've set a max retention size of 2T for this topic to avoid exhausting kafka-logging storage. we can adjust as needed https://usercontent.irccloud-cdn.com/file/blj7zRVY/Screenshot%202026-02-05%20at%209.55.40%20AM.png [15:18:25] FIRING: SystemdUnitFailed: statograph_post.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:23:25] RESOLVED: SystemdUnitFailed: statograph_post.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:59:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [21:09:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag