[05:36:06] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata, and 2 others: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [05:36:44] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata, and 2 others: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) db2048 (s1 codfw master) done. Waiting now for eqiad to be back to finish off this. [05:47:09] 10DBA, 10MediaWiki-API, 10MediaWiki-Database: prop=revisions API timing out for a specific user and pages they edited - https://phabricator.wikimedia.org/T197486 (10Marostegui) For what is worth, this is still happening on 10.1.36 ``` root@db1119.eqiad.wmnet[(none)]> show explain for 8203632; +------+-------... [07:08:39] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) Thanks guys! Just to clarify, I didn't mean to delete anything just yet, just wanted to know what was pending (specially for the DBA... [07:22:07] my first plan is to the morginig to deploy this: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458810/ [07:22:23] Cool, how will you do it? [07:22:46] I'll disable puppet on labsdb1009|labsdb1010|labsdb1011 [07:23:05] and first deploy only on labsdb1010 [07:23:28] there are 2 analytics hosts, so in a worst case sceanario I'll only mess up one [07:23:34] Sounds good - how will you determine if it is working as expected? [07:26:16] I change the config to be a bit more strict (on analyitcs the long-running queries defined too long for testing) and run a query locally [07:26:43] if that get's killed I re-run puppet which changes the config for the one we need [07:26:45] Isn't it 14400 seconds? [07:26:51] yes, it is [07:27:07] ok, you'll need to kill the pt-query-kill that is running on a screen [07:27:15] yes [07:27:19] I was thinking on that [07:27:51] ok then [07:27:52] is there anything else I should give attention? [07:28:19] As soon as you get a query killed, get the config back to 14400 as otherwise you'll be affecting users [07:28:55] Can't I depool that host somehow? [07:28:59] I suggest once you get a query killed, go back to 14400 and monitor the log until the end of the day, to see which queries are getting killed and if the time is correct [07:29:06] You can [07:29:14] Then you'll need to simulate the queries yourself [07:29:43] I was thinking on 'SELECT SLEEP(20)' [07:30:00] I would capture some real queries too [07:30:22] actually this is a good idea [07:30:22] I am sure there are some long running queries right now that you can also capture [07:30:33] I can extract those from the current logs [07:30:41] ;) [07:31:20] one last question [07:32:00] Where can I find the passwords of those users? I can run the lrq-s as root, but then I have to change not just the timout but the user string too [07:32:09] what? [07:33:29] You cannot use those users passwords [07:33:48] MATCH_USER="'^[spu][0-9]' [07:34:07] then I have to change this to 'root' as well during the test [07:34:25] If you want to simulate with an user, you'd need to: log thru a wmcs bastion and using sql client connect to an analytics host, once you are in the desired host, check the hostname, and depool that one from puppet (while you remain logged there) and then copy a query from the logs and run it as your user [07:34:52] I am saying the "remain logged" because if you depool it, you will no longer be able to connect to it, as it is depooled :) [07:35:35] I think I'll go with root then [07:35:44] why? [07:36:07] I suggest you learn how to use "sql" client as you'll needed for further debugs [07:38:11] yea, that makes sense [08:06:28] jynus banyek any thoughts on: https://phabricator.wikimedia.org/T205257#4610104 you guys fine with that proposal? [08:07:58] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Banyek) I like this. [08:14:27] deploying wmf-pt-kill on labsdb1010 [08:16:34] config, package in place, now checking if works [08:16:53] So in the end you didn't depool the host? [08:17:14] no [08:17:18] ok [08:17:58] the template is not good [08:18:13] I restarted the query killer in the screen [08:18:31] ok [08:18:56] Oct 01 08:16:08 labsdb1010 wmf-pt-kill[3562]: Use of uninitialized value $val in concatenation (.) or string at /usr/bin/wmf-pt-kill line 741. [08:19:25] I didn't depooled it, because in worst case it just won't start - like now, so it isn't a dangerous operation afaik [08:19:40] yep [08:19:54] as long as it doesn't kill all the queries :-) [08:20:47] the opposite :( [08:34:47] so, I found what the problem is (the template's `<% @busy_time %>` tag wasn't closed properly - and now I am try to fix that - I am in the middle of wrestling with git [08:45:12] the change is out, and it seems working [08:45:15] now I am testing it [08:45:40] great+ [09:09:21] this is really interesting [09:09:41] during testing the service starts without any problem, it runs [09:10:04] however when I run a long-running query, it doesn't kills it [09:10:05] BUT [09:10:13] if I check the running service with PS [09:10:19] *ps [09:10:35] and stop the service and run the exact same command line it works [09:11:03] the log also doesn't work [09:11:08] yes [09:11:21] which query are you testing? [09:11:39] ```select * from revision where rev_page IN (select page_id from page);``` [09:11:50] as which user? [09:11:54] root [09:12:03] don't run busy queries [09:12:04] what if you run the exact same command on the CLI and using verbose? what do you see? [09:12:12] when you can run select sleep(1000) [09:12:27] root queries don't get killed [09:12:39] only of certain users- I guess you are changing the target? [09:12:53] ```MariaDB [enwiki]> select * from revision where rev_page IN (select page_id from page); [09:12:53] ERROR 2006 (HY000): MySQL server has gone away [09:12:53] No connection. Trying to reconnect... [09:12:53] Connection id: 65303068 [09:12:53] Current database: enwiki [09:12:53] ERROR 2013 (HY000): Lost connection to MySQL server during query [09:12:53] MariaDB [enwiki]>``` [09:13:05] for testing I changed the target [09:13:10] ^that is normal [09:13:39] banyek: what if you run the command on the CLI and not under systemd? what do you see? [09:14:01] in which mode does systemd check the service started? [09:14:04] ```root@labsdb1010:~# /usr/bin/wmf-pt-kill --print --kill --victims all --interval 10 --busy-time 5 --match-command 'Query|Execute' --match-user root --log /var/log/wmf-pt-kill.log -S /run/mysqld/mysqld.sock [09:14:04] # 2018-10-01T09:13:27 KILL 65303518 (Query 337 sec) select * from revision where rev_page IN (select page_id from page) [09:14:04] # 2018-10-01T09:13:47 KILL 65308263 (Query 10 sec) select sleep(30)``` [09:14:06] it works [09:14:11] so it works on the CLI [09:14:17] yes [09:14:22] (not the log though) [09:14:32] (not the log indeed) [09:14:50] the cli command is copied directly from the output of ps [09:15:22] jynus: first I tested with sleep(30) but it doesn't worked that's why I turned for some more heavy query [09:15:50] banyek: so there are two problems, systemd doesn't seem to like what the CLI does like. And the log, which doesn't work either way [09:15:58] yes [09:16:14] no next iteration will be to chagne puppet to install the config and the package [09:16:20] but keep the service stopped [09:16:32] 10DBA, 10Patch-For-Review: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) Tables reverted on s6 and compression finished, now compressing s8, only the following (largest) are missing: ``` $ mysql -BN -S /run/mysqld/mysqld.s8.sock -e "SELECT tab... [09:16:44] then I can re-enable the puppet on the labsdb hosts, and they can run, while I can debug [09:16:59] I would prefer if you debugged locally [09:17:05] Yeah, I was going to say that [09:17:19] ok [09:17:21] makes sense [09:17:24] banyek: Not big deal if we leave puppet stopped on the labs hosts for a few more hours [09:18:45] OK [09:18:56] the service stopped on labsdb1010 [09:19:02] or if this was a very complex issue [09:19:03] puppet is disabled on all the labsdb hosts [09:19:16] we could partially disable the daemon and enable puppet [09:19:29] and I restarted the pt-kill-patched in a screen [09:19:36] but probably it is something trivial? [09:19:55] yea, my first thought was 'install the files, but don't start the service' [09:20:04] we can puppetize that [09:20:10] if necessary [09:20:25] ensure => stopped [09:20:30] yes [09:20:35] and then we can enable puppet [09:20:43] but check if it is a silly mistake first [09:20:44] we can talk about this on the pers. meeting [09:22:20] where is the systemd unit? [09:22:36] I cannot find it [09:28:33] there is one thing, the service runs as root, a dedicated user should be created [09:28:43] `/lib/systemd/system/wmf-pt-kill.service` [09:29:26] yes, sorry, I found it in the end [09:29:39] it was put there with debhelper [09:30:18] banyek: meeting? [09:30:19] the root thing may need package changes [09:30:28] joining [09:30:46] jynus: you were talking already with Manuel about the root/user part [09:31:04] ? [10:32:31] A heads up, I'm running a script that is running on user table on all wikis, is there anything I should be aware of? [10:32:50] writing on the table I mean (The coffee hasn't kicked in yet :/) [10:57:17] * banyek having a really quick lunch [11:24:21] back [11:26:39] marostegui: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458810/ you and Jaime were talking about this under patch set 1. sorry I can't link it [12:15:34] I deployed the wmf-pt-kill for all the labsdb hosts, with disabled service [12:18:23] 10DBA, 10User-Banyek: dbstore2002 tables compression status check - https://phabricator.wikimedia.org/T204930 (10Banyek) resuming s2 compression [12:43:35] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Banyek) @Papaul I'd like to do the coordination with you about the BBU change from db2064. The host is an active backup host, it won't be a good idea to work on it during a backup, but if you can pick a good time window for the bbu ch... [13:51:53] \o I just noticed a big speed up in the wikidata dispatching process between when we were running in eqiad vs now in codfw https://phabricator.wikimedia.org/T205865 [13:52:16] is there anything different between the s8 servers in each DC? [13:52:30] addshore: hardware [13:52:39] codfw is faster? [13:52:43] speed up? [13:52:51] a crazy speed up [13:52:54] ACtually not [13:52:59] The hardware for s8 is the same [13:53:02] hmmmmm [13:53:07] well, if it is dispatch [13:53:18] how important is s8 vs other hw [13:53:21] s8 is the same [13:53:29] the other is, in average, worse [13:53:44] (because s8 is new on both dcs) [13:54:46] addshore: what is the largest bottleneck, master speed? [13:55:10] masters may(not 100% sure) be faster [13:55:17] jynus maybe you enabled —run-faster option for codfw? [13:55:19] :-) [13:55:29] masters are the same on codfw and eqiad [13:55:33] so, it should be a select on a replica [13:55:48] which loadgroup, the main one? [13:55:51] yup [13:55:52] addshore: replica hardware are the same for s8 [13:55:56] in both DCs [13:56:28] the speed increase is so crazy we can probably reduce the number of dispatchers, but I guess this might revert once we get back on eqiad [13:56:46] where do you record the dispatching state? [13:56:49] redis? [13:56:59] the state is in the sql tables [13:57:04] *gets some names* [13:57:09] the locks are in redis [13:57:19] wb_changes and wb_changes_dispatch [13:57:22] (I was thinking maybe some old events got lost) [13:57:27] but sql is the same [13:57:33] that I know for sure [13:58:07] please give all technical details you can on the ticket [13:58:28] and add me as subsciber [13:58:45] in many cases, those surprise optimizations may be a sign of something wrong [13:59:00] the script doesn't have super fine grained timing metrics, so I know the speed u is down to one of 2 db queries or the redis lock [13:59:07] I'll add some details to the ticket now [13:59:12] there is some normal explanations [13:59:21] e.g. if most edits come from bots [13:59:36] and most bots are on cloud (eqiad) [13:59:47] bots may take a fraction of second of extra roundtrip more [14:00:08] enough to make less edits or at a lower rate [14:00:24] but that would be seen on edit rate ,which I think is not the case [14:07:46] https://phabricator.wikimedia.org/T205865#4629863 << there are the queries that are run in the second of code that I am looking at the timing data for [14:08:29] the edit rate of wikidata also appears to be the same [14:08:35] addshore: is there any way to do a test "read only " dispatch [14:08:49] are the other masters in codfw faster? [14:08:53] like a custom script that does mostly what the dispatch does but doesn't write [14:09:06] for production debugging [14:09:16] banyek jynus I have made a draft for the moving wikis from s3 to s5 in our etherpad, starting on line #5 [14:09:27] Please take a look and add/remove/modify stuff as you wish [14:09:36] checking [14:09:41] I made this quickly, so mistakes are probably there, but at least we have an skeleton to work with [14:10:17] we tought of doing it through the row based replication replica [14:10:53] jynus: did we? add your comments and thoughts [14:11:15] we == banyek and I [14:11:18] ah XD [14:11:22] when we were talking later [14:11:35] you are not part of the "cool dbas" group, sorry [14:11:49] I know :-( [14:12:01] I am sure you have a proper etherpad for you too [14:12:06] cool-dbas-etherpad [14:12:14] how did you know? [14:12:31] I'd vote for badass-dbas because it rhymes [14:12:40] I installed netbus on your laptop during all hands [14:13:50] So, the row based replication replica would be the master? [14:14:07] it was just an idea, we didn't elaborate [14:14:35] that also makes things more complicated [14:14:46] with the s3 master, the s5 master, the s5 master master, and codfw [14:14:56] yep [14:15:18] not that the original setup is ok [14:15:29] also the issue of fixing codfw [14:15:49] which is also present with the original proposal [14:16:06] which fixing codfw? [14:16:10] what about not setting up replication but copying over binlogs, and applying them? [14:16:14] oh, I see, you want to set it up [14:16:23] and what, suffer lag? [14:16:38] not sure 100% how codfw would work [14:16:47] Are you talking to me or to banyek? [14:17:00] both [14:17:17] I don't understand what you mean [14:18:03] so forget my proposal, which has nothing to do with my question [14:18:12] ok XDD [14:18:25] (for the dispatching thing) it could also be the specs of the mwmaint server, but I dont think that would have a drastic impact [14:18:26] I am not 100% sure how the replication flow would work [14:18:50] addshore: mwmaint1001 is also now setup, to replace the old one, too [14:19:08] marostegui: not the basic one, which is understood [14:19:12] but in case of a rollback [14:19:27] of both the feature and of dc switch [14:19:30] jynus: Once eqiad is active and the new wikis are in s5….we will need to set a replication filter on s5 codfw until we reimport the wikis there [14:20:16] ok, so if we start writing and then go back to codfw, we "lose" (temp) those changes, righ? [14:20:27] or we import quickly [14:20:56] that's why I thing -> dump/load -> copy over the binlogs, clean up the unrelevant databases, and then apply with mysqlbinlog and then we only have to put to read-only mode on the original part when we do fail over the databases [14:20:57] Yeah, that's the problem we'd have in case of roll back [14:21:03] *thought [14:21:21] marostegui: not a huge issue, just trying to see all possiblities [14:21:34] or things that could go wrong [14:21:36] Or rename the tables back and apply the binlogs in codfw [14:21:56] so in my mind I though of this, slightly differently but essentially the same [14:22:22] and I have to think if it has any real difference [14:22:35] take your time and add your thoughts once you have them [14:22:44] as I said, this is a skeleton so we can at least start a discussion about it! [14:22:57] yeah, I need to think more [14:23:17] 10DBA, 10Core Platform Team, 10SDC Engineering, 10Wikidata, and 4 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10daniel) [14:23:21] not because I don't like it, I do, but I need to see all things that could go wrong [14:23:54] Again, this is just a skeleton to kick off a discussion, otherwise we'd be discussing without anything on "paper" [14:24:02] marostegui: let me some extra time! [14:24:06] :-) [14:24:09] jynus: you have 1 minute [14:33:28] We've got a disk with errors on db2058 [14:33:37] I will ack the alert and let it fail by itself [14:33:48] it is a codfw slave [14:33:52] sorry, codfw s4 slave [14:34:02] so I will just let it fail and get the automatic ticket whenever that happens [14:34:36] that was the one that was lagging worse [14:34:42] oh really? [14:34:48] yep [14:34:49] then worth getting it replaced [14:34:56] I will get a task for papaul [14:35:04] did you check the media errors? [14:35:13] or does it only have smart errors? [14:35:19] Let me seeee [14:35:40] physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Predictive Failure) [14:35:47] to be fair, normally smart -> hard only takes 1 week or so [14:35:52] yeah [14:35:54] or less even [14:36:00] I will get a task up [14:37:09] 10DBA, 10Operations, 10ops-codfw: db2058: Disk #1 predictive failure - https://phabricator.wikimedia.org/T205872 (10Marostegui) [14:37:21] 10DBA, 10Operations, 10ops-codfw: db2058: Disk #1 predictive failure - https://phabricator.wikimedia.org/T205872 (10Marostegui) p:05Triage>03Normal [14:45:19] I was able to reproduce the bug of wmf-pt-kill not kills the queries if run as systemd service but kills when start in the foreground [14:45:46] what is it! [14:49:26] permissions? [14:49:40] I did not found the cause, I can just confirm that the package & service works the same way in my virtualbox than on the servers :( [14:50:02] but this is at least *something* to start with [14:50:21] * marostegui misses init.d [14:50:37] * banyek too [14:51:12] go away you devuan zealots! [14:51:40] * banyek sitting in a rocking chair with a banjo and singing some sad song about "ye olde good" init.d [14:51:52] systemd substituded both the horrible init.d AND the even more horrible mysqld_safe [14:52:14] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Papaul) You can power the server off tomorrow at 10:00 am CDT [14:54:50] Marostegui: jynus: where can I check when the backup supposed to start? 5pm CEST looks good to me (i doubt if the BBU change takes more than 30 minutes - 1 hour tops) I just want to be sure that it won't overlap with the backup [14:55:08] (even better where can I check it?) [14:55:46] banyek: check the profile mariadb::backup [14:55:55] perfect, thanks [14:56:41] banyek: wikitech can also probably give you some hints on where to start to look for :) [14:56:41] I'll leave at 5 but I'll be back for the SRE meeting) [14:56:46] mariadb::backup::mydumper [14:57:18] minute => 0, hour => 17, weekday => 2, [14:57:27] everthing UTC [14:58:45] ok [14:59:10] then it seems the 10:00am CDT is bad for us [14:59:11] https://www.worldtimebuddy.com/cdt-to-utc-converter?qm=1&lid=6,100,3054643&h=6&date=2018-10-1&sln=10-10.5 [14:59:23] banyek: so what's the plan? [15:00:25] I think I'll ask him to hold back the BBU change until the backup finishes, and then he can pick whichever date is good to him [15:00:45] or you can re-schedule the backups [15:01:08] shall I ? [15:01:16] I can bump them for 2 hours [15:01:17] Don't know, what do you think? :) [15:01:21] just FYI "the backup finishes" can within 12-14 hours since the start [15:02:16] yea, I was thinking on postponing the bbu change for a day [15:02:29] but that leads to a slower backup probably [15:02:59] banyek: Probably worth checking when the backups for the sections of dbstore2002 finishes, on zarcillo, to see what's the trend [15:03:11] banyek: Keep in mind that DC people are normally quite busy too [15:03:42] ^^^ That makes sense on bumping the backup start for two hours [15:03:56] You decide! :) [15:04:13] banyek: I agree with marostegui, you reach a point in which there is not good answer [15:04:17] I decided, 2 hours bump [15:04:20] just your call [15:05:26] what I would suggest is not to edit the hours, but to disable the dbstore2002 backups [15:05:41] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Banyek) @papaul It will be good thank you. The backus are normally starting in the same time, but I'll bump them for 2 hours for the next time, so you'll have plenty of time for this [15:05:42] and then run them manually or edit after everthing is good [15:06:03] hm [15:06:04] as 2 hours normally end up being 3 or more [15:06:16] specially on a complex host like dbstore2002 [15:06:21] with 6 instances [15:06:30] I'll ask about this in the beginning of tomorrow [15:06:38] but now i have to leave for an hour [15:06:46] bb at 6pm [15:55:47] As I understood the backup is triggered via cron, it uses the /etc/mysql/backups.cnf as config for backups. I *think* the best idea for solving this BBU change is to bump the backup time with one full day in the config, then shut down the server, and get the BBU changed. After that boot the server back up, write back the backup start time to the original one, and then start the backup process manually in a screen with [15:55:58] `/usr/local/bin/dump_section.py --config-file=/etc/mysql/backups.cnf >/dev/null 2>&1` [15:56:03] did I forgot something? [15:56:50] banyek|away: you can move backups on puppet with cron [15:57:06] just change the 2 to a 3 [15:57:39] exactly that was when I meant with ' bump the backup time with one full day' [15:57:45] in config I meant puppet, sorry [15:58:01] *as 'config' I meant puppet [16:43:47] 10DBA, 10Patch-For-Review: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) All sections should be available now , but s8 is still compressing the last 2 tables, and s6 and x1 are uncompressed. [16:44:58] 10DBA, 10monitoring, 10Core Platform Team Kanban (Watching / External), 10Performance-Team (Radar): Improve database application performance monitoring visibility - https://phabricator.wikimedia.org/T177778 (10CCicalese_WMF) [16:51:13] 10DBA, 10MediaWiki-Database, 10Core Platform Team Kanban (Watching / External), 10Performance-Team (Radar), 10Wikimedia-Incident: Fix mediawiki heartbeat model, change pt-heartbeat model to not use super-user, avoid SPOF and switch automatically to the real maste... - https://phabricator.wikimedia.org/T172497 [16:51:27] 10DBA, 10User-Banyek: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Banyek) [17:02:12] see you tomorrow [18:55:15] 10DBA, 10JADE, 10Operations, 10MW-1.32-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), and 3 others: Write our anticipated "phase two" schemas and submit for review - https://phabricator.wikimedia.org/T202596 (10awight) @Marostegui I'm not sure if this helps, but I'll try to better illustrate my question us... [18:57:39] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) [18:57:44] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) p:05Triage>03High [18:58:01] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) Setting it to high as we need to do this before we do the failover back to eqiad [18:58:18] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) [18:58:27] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), and 2 others: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [19:08:07] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) Wikis created after 2009 will not need this change, so for instance, s8 (wikidata) doesn't need it. ``` root@neodymium:~# mysql.py -hdb1071 wikidatawiki -e "show create table change_tag\G" | grep -i key PRIMARY KEY... [19:11:35] 10DBA: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) [20:25:23] there are 2 tables left uncompressed in dbstore2002 - I doubt if they will be able to finish tomorrow early enough for let the replication catch up until the backup process start, so I stop the compression now, and restart the replication. After the BBU change and backups I will resume this. [23:26:17] 10DBA, 10Core Platform Team, 10SDC Engineering, 10Wikidata, and 5 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF)