[09:49:01] <federico3>	 Amir1: auto schema on s3 completed, shall I move on to s1/2/4/7 ?
[09:49:41] <Amir1>	 federico3: sure, s7? 
[09:49:45] <federico3>	 ok
[14:23:28] <elukey>	 Emperor: o/ re: https://phabricator.wikimedia.org/T390251#10743894, is there a way to add more capacity to APUS to be able to support the docker registry use case during the next fiscal?
[14:25:29] <Emperor>	 elukey: how much space & bandwidth do you need? We (ahem, j.elto is doing all the work) are currently moving gitlab onto apus and doing some perf testing which will give us a clearer idea of what the cluster can do now, but it's currently pretty small (so you get less of the scale-out gains from Ceph than we will when it's bigger)
[14:25:55] <Emperor>	 There's a bit of expansion due to happen this quarter.
[14:27:17] <elukey>	 Emperor: atm IIRC we are talking about 5/6TBs of stored data, but that is likely way less since we haven't ever done a proper cleanup. For the bw requirements I don't have numbers yet, but I can come up with something.
[14:27:56] <elukey>	 I am asking since regardless of the choice of the registry (keep docker distribution or use another tool) we'll have to migrate away from Swift, and I was assuming that APUS was the right target
[14:28:30] <elukey>	 realistically we'll gradually migrate clients over (pushing new images and pulling them)
[14:28:31] <Emperor>	 elukey: FWIW when j.elto was pushing larger objects earlier we were seeing ~100 MiB PUSH throughput
[14:28:43] <elukey>	 nice 
[14:29:05] <elukey>	 that seems more than enough :D
[14:29:17] <Emperor>	 but now they're pushing a lot of smaller objects and so the bulk throughput is much slower (more like 4-5MB)
[14:30:05] <elukey>	 we are probably more concerned about pull times rather than push times in the registry case
[14:31:09] <elukey>	 also we are currently keeping pooled the registry in codfw due to the swift replication, the async replication of apus seems to allow for an active/active solution (in the future)
[14:31:32] <elukey>	 anyway, I just wanted to know if capacity-wise we'd be ok during next fiscal
[14:31:35] <Emperor>	 elukey: Mmm, but do be aware (given your current woes) that cross-DC replication is async
[14:31:48] <elukey>	 yes yes, eventually consistent basically
[14:32:01] <Emperor>	 sorry, just seemed worth flagging :)
[14:32:55] <elukey>	 definitely yes
[14:33:06] <elukey>	 for the capacity, do you want me to open a task etc...?
[14:34:08] <Emperor>	 give me a mo, just reminding myself of what's happening next FY re apus h/w
[14:35:03] <elukey>	 no rush :)
[14:35:25] <elukey>	 probably it would be worth to follow up with observability to ask how many use cases have been abusing Thanos Swift up to now
[14:35:40] <elukey>	 just today I've cut a task to Machine Learning to migrate away, and I suggested APUS 
[14:35:48] <elukey>	 but their use case is way smaller
[14:37:46] <Emperor>	 elukey: we're refreshing the two smaller storage nodes next FY, which will give us a bunch more effective capacity ('cos the replacements are bigger), so I think that should be doable without further expansion. But it'd be good to have a ticket to track this if you think that's where you want docker image storage to go next FY. I'd suggest you do a bit of testing to make sure you're happy with the available performance before
[14:37:46] <Emperor>	 committing, though? We could spin you up a test account with a more modest quota if that would be helpful
[14:38:05] <Emperor>	 (it's apus, not APUS, though, FTR)
[14:38:27] <elukey>	 ack right :D
[14:38:39] <elukey>	 yes perfect I'll open a task, a test account seems good, we may want to run some tests
[14:38:47] <Emperor>	 and yes, we may look at migrating other workloads off thanos-swift In Due Course, but I wanted us to only move a few things first to make sure we're entirely happy with how apus performs in practice rather than just in theory with my ad-hoc tests
[14:38:57] <elukey>	 perfect thanks a lot!
[14:40:29] <Emperor>	 Oh, the other thing to say is: if you're likely to want to do this, we should ask Willy to bring forward the refresh (which is currently scheduled for Q3) - ping him and/or me on the task once you've made it?
[14:42:51] <elukey>	 ack