[08:10:44] morning! [08:10:56] I'm going to start draining what's left of the ceph D5 nodes [08:32:17] okok, so I set norebarance manually first, then I ran the drain rack cookbook to take out all the D5 rack, that seemed to work well so far, now I manually removed the `norebalance` flag and the cluster is starting to recover, no problems so far, the space warning is also gone [08:32:28] (as now it will move the data out of the overcrowded D5) [09:50:10] cluster is in warning status (E4 starting to fill up), but so far so good :) [11:27:18] the cluster now is reaching the point at which the object creation is ~ the same as the recovery speed xd, so the total amount of displaced objects is kinda stablizing, but everything is stable and running \o/ [12:02:19] just stopped the osds on the D5 nodes that still had pgs hanging onto them, that forced ceph to put them in the other nodes, currently copying-data/rebalancing [12:02:39] (still no issues, going good) [13:20:08] dcaro: that all sounds good! I just now figured out that topranks wanted to reschedule the upgrade for tomorrow so I guess there's nothing left to do for today, switchwise. [13:22:05] yep, we can let the cluster settle a bit more, but I think that's all :) [13:23:42] And hope the switch lives another day in its current state [13:24:25] Do you know why ceph was trying so hard to rebalance into D5? Hadn't you already told it not to do that? [13:28:19] I think it strongly tries to keep the data close to where it was [13:28:38] but I'm not certain on why so much (to the fill up limit) [17:12:52] * dcaro off [17:12:54] cya tomorrow! [20:27:54] I would like to play with opentofu in deployment-prep and need T372353 or something similar so that I can configure opentofu to store state in an S3-compatible bucket. [20:27:57] T372353: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353