[09:49:01] Amir1: auto schema on s3 completed, shall I move on to s1/2/4/7 ? [09:49:41] federico3: sure, s7? [09:49:45] ok [14:23:28] Emperor: o/ re: https://phabricator.wikimedia.org/T390251#10743894, is there a way to add more capacity to APUS to be able to support the docker registry use case during the next fiscal? [14:25:29] elukey: how much space & bandwidth do you need? We (ahem, j.elto is doing all the work) are currently moving gitlab onto apus and doing some perf testing which will give us a clearer idea of what the cluster can do now, but it's currently pretty small (so you get less of the scale-out gains from Ceph than we will when it's bigger) [14:25:55] There's a bit of expansion due to happen this quarter. [14:27:17] Emperor: atm IIRC we are talking about 5/6TBs of stored data, but that is likely way less since we haven't ever done a proper cleanup. For the bw requirements I don't have numbers yet, but I can come up with something. [14:27:56] I am asking since regardless of the choice of the registry (keep docker distribution or use another tool) we'll have to migrate away from Swift, and I was assuming that APUS was the right target [14:28:30] realistically we'll gradually migrate clients over (pushing new images and pulling them) [14:28:31] elukey: FWIW when j.elto was pushing larger objects earlier we were seeing ~100 MiB PUSH throughput [14:28:43] nice [14:29:05] that seems more than enough :D [14:29:17] but now they're pushing a lot of smaller objects and so the bulk throughput is much slower (more like 4-5MB) [14:30:05] we are probably more concerned about pull times rather than push times in the registry case [14:31:09] also we are currently keeping pooled the registry in codfw due to the swift replication, the async replication of apus seems to allow for an active/active solution (in the future) [14:31:32] anyway, I just wanted to know if capacity-wise we'd be ok during next fiscal [14:31:35] elukey: Mmm, but do be aware (given your current woes) that cross-DC replication is async [14:31:48] yes yes, eventually consistent basically [14:32:01] sorry, just seemed worth flagging :) [14:32:55] definitely yes [14:33:06] for the capacity, do you want me to open a task etc...? [14:34:08] give me a mo, just reminding myself of what's happening next FY re apus h/w [14:35:03] no rush :) [14:35:25] probably it would be worth to follow up with observability to ask how many use cases have been abusing Thanos Swift up to now [14:35:40] just today I've cut a task to Machine Learning to migrate away, and I suggested APUS [14:35:48] but their use case is way smaller [14:37:46] elukey: we're refreshing the two smaller storage nodes next FY, which will give us a bunch more effective capacity ('cos the replacements are bigger), so I think that should be doable without further expansion. But it'd be good to have a ticket to track this if you think that's where you want docker image storage to go next FY. I'd suggest you do a bit of testing to make sure you're happy with the available performance before [14:37:46] committing, though? We could spin you up a test account with a more modest quota if that would be helpful [14:38:05] (it's apus, not APUS, though, FTR) [14:38:27] ack right :D [14:38:39] yes perfect I'll open a task, a test account seems good, we may want to run some tests [14:38:47] and yes, we may look at migrating other workloads off thanos-swift In Due Course, but I wanted us to only move a few things first to make sure we're entirely happy with how apus performs in practice rather than just in theory with my ad-hoc tests [14:38:57] perfect thanks a lot! [14:40:29] Oh, the other thing to say is: if you're likely to want to do this, we should ask Willy to bring forward the refresh (which is currently scheduled for Q3) - ping him and/or me on the task once you've made it? [14:42:51] ack