[07:23:50] greetings [07:24:09] o/ [08:02:13] morning [08:35:50] morning [08:55:16] dcaro: thank you for the extended explanation on T403043, very useful [08:55:16] T403043: [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) - https://phabricator.wikimedia.org/T403043 [08:56:40] yw :) [08:59:49] I'm still a bit confused on what `recovery`/`backfill`/`rebalance` really mean to ceph, as it seems that is not exactly what they say in the docs, or not completely (ex. recovery showing up when norecover is set) [09:01:48] :| [09:11:55] I'll go over the task and ceph docs again and think about more questions [09:13:08] 👍 [09:49:49] silly exporter of the rados gw quotas, as it's not exported by default https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182780 [13:10:19] I would merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/989993 in five minutes unless it's a bad time [13:11:23] LGTM, is there any way to see if many users will get affected? [13:24:58] * andrewbogott waves good morning [13:27:05] dcaro: Taavi checked the logs and noone logged in with a DSA key, so ideally/hopefully zero [13:27:16] 👍 [13:27:26] * dcaro waves back [13:28:05] * godog waves [13:28:31] morning! [13:50:34] I'm looking into JobUnavailable for ebpf_exporter_eqiad, new alert [14:44:28] godog: thanks, it was a temporary thing [14:46:03] dcaro: hehe sure np, yes the comment was very convenient, I didn't even have to look at git log [15:48:40] oh, we have hit the object quota in toolsbeta-logging [15:48:57] https://usercontent.irccloud-cdn.com/file/j8qv4Z22/image.png [15:50:36] oh... it's one of those projects with hash id [15:51:28] godog, dcaro, I just pooled 2 osds in cloudcephosd1048, seems to be balancing without complaint. [15:51:41] 🎉 [16:15:20] hm, we should probably bump the quota for toolsbeta-logging then [16:15:29] and that's a good reminder that I need to check on the alerting of that [16:16:32] andrewbogott: neat! [16:17:03] only 3,461,467 objects to go [16:20:47] * godog off [16:23:03] andrewbogott: did you tweak the number of backfills? if it's too slow we can try increasing a bit [16:23:56] I didn't change anything. And it's not actually all that slow, should only take a few hours. [16:24:11] I'll revise that opinion if it's not done by the end of my day :) [16:24:20] 👍 [17:05:06] * dhinus off [17:19:49] up for review and discussion https://gerrit.wikimedia.org/r/c/operations/alerts/+/1182900 , adding alerts for object storage quota [17:25:42] * dcaro off