[01:25:40] <jinxer-wm>	 FIRING: [2x] LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors  - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures
[01:30:40] <jinxer-wm>	 RESOLVED: [2x] LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors  - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures
[08:40:18] <Emperor>	 Hi again, I know there was a change to thanos retention, but disk usage doesn't seem to have been impacted, and is still growing - https://grafana.wikimedia.org/goto/of1PBnGHg?orgId=1
[09:06:57] <Emperor>	 I notice in the past SAL entries suggesting further action to delete older blocks e.g. https://phabricator.wikimedia.org/T351927#10170614
[09:07:27] <Emperor>	 ...because the compactor isn't running automatically ATM, so needs doing manually
[09:08:33] <Emperor>	 So can you delete these older blocks, please?
[13:51:25] <cwhite>	 Emperor: bucket cleanup job is running now
[13:55:29] <Emperor>	 👍
[14:49:28] <cwhite>	 Emperor: Cleanup finished - looks like we freed up about 1TB which might not be enough?
[14:51:53] <Emperor>	 cwhite: Mmm, the disks are still full enough to be alerting :(
[14:59:17] <cwhite>	 Ok.  Instructions left for us were to keep whittling down 5m retention to a lower bound of "a few weeks".  I'll assume we can bring 5m to 3w retention.  After that, we'll have to start dipping into 1h and raw data retention.
[15:03:13] <Emperor>	 I really am hoping the new hardware will be here soon :-/
[15:04:07] <Emperor>	 Let's see what it's like once you've run the cleanup again?
[15:47:43] <cwhite>	 I'm not sure it's making much of a dent. :/
[15:48:33] <denisse>	 I wonder if we should trim up the raw retention instead. What do you think?
[15:49:24] <cwhite>	 What's weird is after setting the 5m retention down another 5 weeks, the bucket cleanup command doesn't remove much.
[16:01:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[16:06:43] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[17:32:48] * kamila_ ^ not sure what the spike was about but looks happy now, will follow up if it happens again
[17:34:31] <denisse>	 kamila_: thanks!