[05:23:47] ottomata: Either clouddb* or dbstore100* could be a good example of multiinstance setups, the failover part isn't that easy though [06:51:27] in dewiki_p there is sone index corrupted. If this is not the best channel, please tell me https://phabricator.wikimedia.org/T293855 [06:53:33] Wurgl: Thanks for the report, I am taking a look [07:39:59] Wurgl: Fixed [07:42:55] +1 [08:08:31] marostegui: that's a bit concerning re: clouddb1020. i did a full check on the data before signing off on it [08:09:25] kormat: It is not strange to get indexes corrupted, meaning it is not something we've not seen before (with other versions too) [08:09:33] ah i see [08:09:34] But yes, we need to keep an eye on it [08:09:46] sorry, i forgot we were dealing with quality software [08:09:57] XDDD [08:15:12] marostegui: any issue with me going ahead with T277116 in codfw? [08:15:12] T277116: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production - https://phabricator.wikimedia.org/T277116 [08:15:25] kormat: go ahead! [08:15:32] damn [08:16:00] The E in EDP stands for Experimental (not Electronic) [09:20:18] * Emperor patches recon.py to not outrageously lie about disk usage [09:54:28] not entirely sure I believe it yet, but it's a bit less daft [09:55:15] Disk usage: space used: 178 TB of 390 TB for Thanos isn't an Obvious Lie [09:56:14] Emperor: more believable than if the 2 numbers were reversed, for sure. [09:56:32] codfw-prod my patched version says Disk usage: space used: 2 PB of 3 PB [10:12:25] Hm, yes, I think these numbers are correct. [10:13:39] now let's see if upstream want my patch... [10:24:32] sigh, is gmail helpfully throwing away the verification email? [10:28:09] Emperor: slander. _obviously_ you don't need it. [10:53:55] mm. not happy with how long that schema change is taking to replicate on s4. going to extend the downtime to another 2h [10:54:07] Bah, no sign of said email, still, even if I ask gmail to search spam too :( [10:54:24] ... or nevermind. it just replicated :) [10:54:32] Emperor: ugh :( [11:11:18] oh FCOL following the instructions at https://docs.openstack.org/swift/latest/first_contribution_swift.html doesn't fscking work [11:14:34] PROBLEM - MariaDB sustained replica lag on s4 on db2095 is CRITICAL: 1396 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2095&var-port=13314 [11:18:32] got there in the end, but really. I'm _almost_ grumpy enough to send them a docs fix too [11:18:44] RECOVERY - MariaDB sustained replica lag on s4 on db2095 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2095&var-port=13314 [11:18:50] [you need to clone https://review.opendev.org/openstack/swift ] [11:21:54] Emperor: is this work the result of our recent conversation about available space? [11:22:04] * sobanski feels partly responsible if so [11:55:39] sobanski: yeah, I'm trying to make swift-recon -d tell something like the truth. [11:56:14] swift/cli/recon.py | 5 +++++ [12:17:16] FCOL => for crying out loud (took me a minute) [12:24:40] looks like it took you 66 ;p [13:30:31] marostegui: cool, so maybe analytics_multiinstance is good to emulate? [13:30:51] oo but dbstore_multiinstance looks probably nicely maintained by yall since it is primary? [13:31:36] ottomata: If you want to have more than one mysql process (what we call multi instance) on those hosts, either dbstore multiinstance or clouddb101* hosts are something you can use to get inspired :) [13:31:50] yes mulltiinstance for sure [13:31:58] ok [13:32:04] both are equivalently good to copy? [13:32:51] ottomata: oh hey [13:32:58] hello! ;) [13:33:06] ottomata: I would say, to adapt :) [13:33:06] ottomata: can you maybe talk a bit about why you're going with multi-instance? [13:35:10] kormat: sure (and maybe i'm wrong!0 [13:35:12] this is for https://phabricator.wikimedia.org/T284150 [13:35:31] aye, i was looking at that, and T280905 [13:35:31] T280905: Analytics coordinator failover improvements - https://phabricator.wikimedia.org/T280905 [13:35:54] i had assumed multi instance for: [13:35:54] - db1108 is already multi instance backup [13:35:54] - the dbs on an-coord1001 are mostly for separate services, so multi instance would make it easier to separate them later if we need to [13:36:05] - easier to do failover for specific serivces? [13:37:28] ottomata: failover or maintenance? (or both?) [13:37:40] ottomata: my main concern would be re: failover. the tools we (data-persistence) have for changing primaries have been written assuming the primaries are single-instance. we don't have any multi-instance primaries [13:37:42] both would be nice, maintenance is more likley [13:38:06] so i wouldn't be at all surprised if you run into assumptions that break with that kind of setup [13:38:12] hm [13:38:24] if you're planning to use other tools, etc, then ignore this :) [13:38:35] what tools do you mean, the dbproxy stuff? [13:38:56] yes, we do not have dbproxy stuff adapted for multi-instance on masters [13:39:25] ottomata: what gets shipped in wmfmariadbpy-admin. `db-switchover` and `db-move-replica` in particular [13:39:30] (any docs i can read on how this all works?) [13:40:01] ottomata: the closest we have to handle multiinstance with dbproxy are dbproxy1018 and 1019 which handle clouddb hosts, but that's a very complex setup so not sure if it would fit you [13:40:38] ottomata: https://wikitech.wikimedia.org/wiki/HAProxy [13:40:41] ottomata: *mumbles awkwardly* no docs, no [13:40:44] reading that [13:40:54] ottomata: also, note that there's _no_ automatic failover of any DBs in prod [13:41:14] ottomata: that's what we use for misc services (one single instance master + multi-instance replica as a stand by) [13:41:51] aye [13:42:00] ok auto failover never mind then [13:42:30] at some point in the far and fanciful future we might have orchestrator doing auto-failover. but that's a pipe-dream right now. [13:42:38] ottomata: We have "some" sort of auto failover when using dbproxy, meaning if the master goes down, the proxy fails to the replica (which is on read only, so we'd manually need to set it to RW if we wanted to) [13:42:54] That doesn't handle replication though, so you'd need to rebuild the master if you set the replica to RW [13:43:16] (and also change the rest of the replication tree) [13:43:23] The reason no to have the replica on RW mode is to avoid any flapping creating a split brain, so we can decide, ok , the master is dead for sure, so we are going to failover to the slave [13:44:17] right makes sense [13:45:45] q about binlog replication [13:45:55] if you have 3 nodes, 1 master 2 replicas (each pulling from the master) [13:46:11] a (master) b, and c [13:46:17] and you wantto promote b to master [13:46:29] if you stop writes to a, make sure a b and c are all at the same binlog pos [13:46:54] do you manuall point c and a to replicat from b, and then enable writes to b? [13:47:07] yes, and reset replication on b [13:47:10] right [13:47:23] https://bugs.launchpad.net/bugs/1947862 seems to be causing some confusion [13:47:34] does the dbproxy thing make it possible to not have to change the master hostname for the rreplicas? [13:47:44] e.g. if you just changed the backend name for the master alias or something? [13:47:57] ottomata: dbproxy doesn't handle replication in anyway [13:48:01] yes [13:48:05] or dns or whatever [13:48:18] just wondering, if you used a hostname alias for the master, would it work? [13:48:25] like like, mariadb.master.wmnet => a [13:48:35] ottomata: not sure if I am following but what we do with it is point m1-master.eqiad.wmnet to dbproxy [13:48:39] then you do the readonly, failover stuff, wait for binlog positions to match [13:48:46] root@cumin1001:/home/marostegui# host m1-master [13:48:46] m1-master.eqiad.wmnet is an alias for dbproxy1012.eqiad.wmnet. [13:48:46] dbproxy1012.eqiad.wmnet has address 10.64.0.134 [13:48:47] then make master.wmnet => b [13:48:47] ? [13:49:03] k listening... [13:49:36] ottomata: note that if you use haproxy _or_ cnames/aliases, you can't use ssl (so no cross-dc queries) [13:49:36] if you change the master behind that, you don't need to care about dns (but you do need to change haproxy config files of course) [13:50:20] i guess my question is more abstract: is it possible to swap masters from a replicas standpoint via a hostname change, as long as the new master is at the proper binlog position? [13:50:20] what kormat said - which also means no SSL over replication [13:51:11] ottomata: i'm not sure what the gain is.. you'd still need to stop and restart replication on the replica in any case [13:51:32] basically, wihtout doing any CHANGE MASTER TO [13:51:37] on each of th replicas [13:51:38] nope, you cannot [13:51:54] you still need change master to to specify the file and binlog pos [13:52:09] oh, yeah. that ^ [13:52:19] even if the master hostname resolves to the new master? (and the binlog filename is the same?) [13:52:26] ottomata: the binlog file is not the same [13:52:30] it contains the hostname in it [13:52:30] yes, you still need to specify the file and position [13:52:54] oh, always? i thought that was configurable? or is that in the binlog messages itself? [13:53:09] ottomata: it doesn't matter, the position won't be the same [13:53:22] ah ^ this is the interesting part. ok interesting. [13:53:26] ottomata: it is configurable, but sanity says put the hostname in it :P [13:53:53] so an equivlanent binlog pos on any replica might have a differnet pos #? [13:53:56] ottomata: the filename position (if you don't use GTID, which I don't think you should) won't be the same on A binlog than in B's own binlog [13:54:02] ottomata: it will. [13:54:18] hahah getting mixed signals... :p [13:54:37] ottomata: i'm agreeing with manuel [13:54:47] oh sorry, it will be differnt. got it. [13:54:52] right. [13:54:56] the equiv binlog pos on a replica _will_ have a different pos, short of a cosmic concidence [13:55:18] in a perfect world everything starts at 0 and is synced, but we are not in a perfect world :) [13:55:23] What is the same is the last executed position on both replicas FROM A's binlog, but it won't be on B's or C's position on their own binlogs [13:55:31] right. [13:55:33] ottomata: that's called GTID sort of, but it is a mess :) [13:55:44] so swapping c to replicate from b with the same pos # would break stuff [13:55:47] and be incorrect [13:55:56] yes, that will corrupt your data [13:56:16] you need to specify b's binlog position [13:56:59] reading about GTIDs... [13:57:01] ottomata: GTID was developed to fix all these problems, but we've found that GTID on mariadb (which follows a different implementation than mysql's) is pretty broken and unreliable at least on our env [13:57:07] huh. [13:57:50] ottomata: i wouldn't bother spending too much time on reading the GTID docs. so much is left unspecified it's impossible to figure out how they work [13:57:54] so when changing topologies, we do not use, we go back to the file/pos method [13:58:20] do you use GTIDs in some places now anyway? [13:58:21] the so called traditional replication coordinates :) [13:58:25] right [13:58:39] ottomata: we use GTID as a belt-and-braces approach [13:58:44] ottomata: We only use GTID on the replicas as it stores the replication coordinates on a table, which makes it more resiliant to crashes and corruption [13:58:47] it's enabled in normal operation. but it gets disabled for any replication changes. [13:58:52] aye [13:59:09] Which in mysql is called safe-crash-replication, but in mariadb it comes with GTID, so simply that [13:59:21] It is more reliable than trusting a file storing coordinates [13:59:30] (which is the old traditional method) [14:00:02] ottomata: if GTID would be fine, you'd still need to issue a CHANGE MASTER TO though [14:00:15] But yeah, no position or file [14:00:30] interesting [14:00:50] marostegui: even if the hostname alias changed (and you restarted replication after?) [14:01:01] ottomata: GTID fixes the problem you were mentioning, a transaction would have the same ID (to call it that way) across all the hosts in a replication topology [14:01:11] aye [14:01:18] ottomata: i really would not use aliases for replication targets [14:01:27] ottomata: I wouldn't recommend using aliases/cnames for that [14:01:30] kormat: <3 [14:01:35] you're depending on mariadb's behaviour about reresolving dns names [14:01:55] i don't know what it is, i certainly wouldn't want to depend on it doing the right thing in all cases [14:02:15] (fyi, i'm not suggesting to do so, just trying to understand) [14:02:36] what if LVS? [14:02:48] 😬 [14:03:08] ottomata: could you maybe explain why you're asking these horrifying questions? :) [14:03:29] hahah, mostly just trying to understand why swapping a master sounds so annoying! [14:03:30] if you're doing LVS, i assume that also means you don't get ssl for replication [14:03:36] ottomata: a variant of all this is having a floating VIP as a master, so you move it from host to host depending on which one is the master [14:03:59] (mysql/mariadb's wire protocol is tragically hostile to anything sensible re: ssl) [14:04:12] Which brings lots of possibilities and nightmares at the same time, including floating vips across different vlans XD [14:04:25] _and_ ssl won't work [14:04:42] and you need a good STONITH method [14:04:54] otherwise you can really corrupt things in a bad way (/me has been there) [14:05:30] I think we've drifted quite a lot from ottomata's original question :) [14:06:32] i guess you could use one VIP for the replicas, but a different name or VIP for the master hostname the application uses? so you could do: readonly mode, swap the master VIP host for the replicas, make sure they all point at the proper master, then swap the application's master target hostname, then set read/write [14:06:37] oh very far [14:06:45] just always been curious why this isn't easier... :) [14:07:00] having to issue manually commands on all the replicas out there just to swap a master sounds so annoying! [14:07:14] ottomata: that's why automatic failover on mysql is so hard :) [14:07:18] many things to consider [14:07:25] orchestrator makes them a lot easier, but we are not there yet [14:07:52] If you care about your data, it is very hard to do it right [14:08:00] If you don't, it is very easy to do it :) [14:08:31] ottomata: To be honest, I rather have a slow failover than data corruption [14:08:42] indeed! [14:09:13] ottomata: the sooner you stop trying to be fancy with VIPs/LVS/aliases, the sooner you'll come to accept the uncomfortable reality that is managing mariadb replication. ;) [14:09:22] haha [14:09:47] ottomata: db-switchover handles most of the work for you, btw. if you're using a straight-forward setup. [14:10:00] (and not multi instance :) ) [14:10:13] ottomata: it handles the multi-instance replicas replication change [14:10:19] ? [14:10:23] oh single-instance is a founding principle of "straight-forward setup", i assure you. :) [14:10:26] ahha [14:10:38] _replicas_ can be multi-instance no problem [14:10:49] but it's not designed/tested/used against multi-instance primaries/candidates [14:11:39] still trying to understand GTIDs in the abstract...if you still need to manually change master to even with GTIDs, it doesn't really help that much (at least with automation of master swap), does it? it just makes doing the swap less error prone, because the GTID pos is the same everywhere, so it mostly the same command on each replica? [14:12:19] ottomata: correct; you don't need to specify the GTID position, because the replica already knows it. but you're still doing `CHANGE MASTER TO` etc. [14:12:30] marostegui: i see, so db-switchover will help point replicas at a new master, but will not help make the old master become a replica? [14:12:38] ottomata: finding the right position and file is _massive_ task, so in that sense GTID is a great tool [14:12:50] aye [14:12:56] ottomata: correct, as kormat said, it doesn't handle multi-instance masters [14:13:00] got it. [14:13:21] ok, so back to my task then (thanks for the tangent :) ) [14:13:24] XD [14:14:08] multi instance is probably not worth it here. if we do have to factor out a service's db later, we can, but for now we only have 1 master and 1 replica server anyway. so might as well have the dbs all in one? Or....maybe i should consider service usage and mariadb instance isolation too? [14:14:31] most of these dbs are not heavy usage..although that might change slightly as we use airflow more [14:15:05] ottomata: i'd definitely vote for keeping everything in a single instance [14:15:17] ok. sounds good. [14:15:18] ottomata: That's really up to you, having more databases means more downtime if a single host goes down, and having to coordinate more stuff if you need maintenance [14:15:28] i'll go single instance then for now. [14:15:32] But on the other hand it is easier to handle from a replication point of view [14:15:43] It also means shared innodb buffer pool and such [14:16:02] ok...should we enable GTIDs just because the risk is low and it sounds like a nice thing to have...even if it doesn't help automating maintenance (that much) ? [14:16:15] ottomata: you can if you like of course [14:16:33] it sounds likea nice thing to have a little experience with :) [14:16:33] We do have it in production, we simply don't use it when we do replication changes [14:16:50] so, the GTIDs are there in the binlog, alongside of the regular master position #s? [14:16:53] in case of a replica crash, it is less likely replication will come up corrupted [14:16:57] ottomata: yes [14:17:05] and you could potentially change master to using either of them? [14:17:19] ottomata: no, you change the replica [14:17:32] marostegui: i think he meant `CHANGE MASTER TO` [14:17:37] ^ ya [14:17:39] ah! [14:17:50] then yes :) [14:17:54] got it. [14:17:57] interesting [14:18:08] (but our tools only do it via binlog+pos) [14:18:22] marostegui: multi instance is 'easier to handel from a replication point of view'? [14:18:28] what do you mean? [14:18:36] i very much doubt that's what he meant :) [14:19:41] ottomata: manuel's comments there were all about the single-instance route [14:21:22] (oh maybe you meant the opposite, haha) [14:25:24] yeah...i can see that, if we had 5 instances, thats 5 different binlog positions to consider when swapping masters [14:25:24] i couldl see it being mayyybe useful to be able to just swap one instance master if we have to do some upgrade for a particular service or something [14:25:24] but, more likely it will be the whole host at atime [14:25:24] so single instance would be fine. [14:25:24] q: can I use the multiinstance stuff in puppet to set up a single instance, so it will be easier later to do multiinstance if we need to? [14:25:36] aye ^ makes sense [14:26:27] you could yeah, just double check the port configuration and all that as we do not use 3306 when multi-instance is in place, so you'd need to change that [14:26:37] k [14:26:50] another q: notes from luca on the multi instance thing here: [14:26:51] https://phabricator.wikimedia.org/T280905 [14:26:57] It is not too difficult to go from single instance to multi-instance though, it is just a matter of mv /srv/sqldata to /srv/sqlda.whatever and not a lot ore [14:27:08] > The Analytics-Meta mariadb instance is running multiple databases, that is not what suggested by Data Persistence (better one instance per db, for isolation and better replication). [14:27:17] i guess..that is not quite true given our convo today? [14:27:36] also [14:27:37] https://phabricator.wikimedia.org/T272973#6779459 [14:27:37] ottomata: it depends on how many databases that was and all if they made sense from a logical/app point of view [14:28:52] ottomata: both approaches (single vs multi) have pros and cons, it is really up to whoever need to deal with it on a daily basis to consider them [14:29:15] ottomata: i would use single-instance classes in puppet [14:29:28] there are currently 11 databases on an-coord1001 [14:29:35] each for a different 'service' (or instance of a service) [14:29:56] ottomata: So 11 services would be impacted if the host goes down or you need to put the master on RW, if that's fine with you then go ahead [14:29:59] there are some logical groupings: there are 4 airflow databases for each airflow instance (and will probably be more) [14:30:29] If it is not a big deal to coordinate those 11 services RO time at the same time, then you can go for single [14:30:47] if it is a big deal, then you probably can't do failovers anyway :P [14:30:47] i'm kind leaning towards 2 instances, one for more backend-y things like hive metastore (which is really important), and another for more user facing things like airflow and superset [14:31:25] like; metadata management instnace, and user service / jobs storage instance [14:32:07] Really up to you as service owner :) [14:32:23] OooOkKKKaayyyyy [14:32:26] :) [14:32:28] It is hard to give an answer without knowing much the operational side of things [14:32:40] I personally prefer to have things separated in logical ways [14:32:44] yeah, and we aren't really sure about how hard airflow will hit the db [14:32:49] So "same" things get impacted [14:33:01] yeah...that's what i'm thinking too [14:33:03] ottomata: also depends on where you think the bottleneck will be [14:33:13] hive metastore + druid all manage things like "where does this data live" [14:33:20] if it's disk i/o, then separate instances is _probably_ worse [14:33:28] airflow is "job history", superset is "dashboard configuration" [14:33:46] most things dont' do a lot of writes...but airflow we aren't sure [14:34:00] how much data do you currently have? [14:34:03] probably not that much, maybe max ~100 writes per second per instance [14:34:08] data is smallish for all [14:34:08] checking [14:34:50] hive_metastore is the largest, 9.2G [14:35:02] superset 1G [14:35:03] 😶 [14:35:14] airflows are barely used yet, but ~50M currently [14:35:24] * marostegui giggles [14:35:41] you laughing at our teenie weenie dbs?!?! [14:35:45] hahaah [14:35:56] these are all 'metadata' [14:36:01] real data is in hadoop [14:36:24] So everything will be on the buffer pool for now, so it should be pretty fast anyways [14:36:50] ottomata: i'd say that the _conversation_ has been overkill for a deployment on that scale, nevermind what you're thinking of actually doing :P [14:37:05] haha [14:37:15] well its been educational either way :p [14:37:28] ottomata: if you liked it, we are hiring! [14:37:44] hive_metastore is pretty important and may one day be more important more things [14:37:51] given the data as a service OKRs [14:37:59] buutu maybe it'll be a different one, who knows [14:38:15] (a trend is hive metastore is being used for a general purpose data catalog even for things outside of hive...) [14:38:29] marostegui: haha you don't want me, i'll try to do crazy things [14:38:46] haha [14:39:00] (I used to manage mysql replication for couchsurfing.org, so had to think about this stuff a lot...but that was 10 years ago) [14:39:19] that explains why that site was always so slow!! i used it a lot! [14:39:26] so now I know it was you! [14:39:37] hahahah [14:39:39] yup. [14:39:48] eaxctly [14:42:49] this was my mysql backup solution from long ago. also used it to make new replicas: https://github.com/ottomata/dbbackup/blob/master/dbbackup.sh [14:43:00] worked pretty well, i bet i'd read it now and shriek in terror [14:44:24] aw mysqld_multi the good old times [14:44:40] i remember really liking mylvmbackup [14:45:02] much faster than mysqldump and easier to use for copying replicas [14:45:21] yeah, we considered here when we re-did all the backup work, but discarded it in the end [14:45:29] they are not as flexible (among other things) [14:45:30] how come? [14:45:31] ah [14:45:51] yeah only useful for restoring the exact same thing [14:45:59] if you want to split out dbs, not useful i guess [14:46:24] and the fact that you need to stop/start mysql all the time to make it consistent was also something we didn't want to do [14:48:47] oh rly? i remember just locking tables [14:49:12] but only for a second while lvm made the disk snapshot [14:49:31] ottomata: and what about the stuff you had in memory? :p [14:50:26] hmm eyah flush tables too [14:51:07] and logs? (reading my dbbaskup.sh script now :p) [14:51:21] how do you do backups now? mysqldump? [14:51:30] stop slave on replica & mysqldump? [14:51:32] do you do incrementals? [14:51:41] we do mydumper (which is similar to mysqldump) and we also do snapshotting via xtrabackup [14:51:47] ah cool [14:51:50] we don't do incrementals yet, that's to come [15:00:03] ok, we actually already have a profile::mariadb::misc::analytics::multiinstance that i think will work, its used for dbstore1101 backup host [15:00:15] i will try to use apply that on the new hosts and set up multiinstance with it there [15:00:30] that's what dbstore1003 and friends use too? [15:00:31] will justh ave to compare an-coord's analytics-meta.my.cnf [15:00:53] no that is profile::mariadb::dbstore_multiinstance [15:01:12] which...is cool because it is the generic one? [15:03:52] yeah, I believe so [15:22:40] marostegui: it looks like the only real difference between mariadb::dbstore_multiinstance and misc::analytics::multiinstance [15:23:09] is that the ::instances hiera param is more flixible with the analytics one; it takes key: value pairs rather than only being able to set the buffer pool size [15:23:15] that and different monitoring contact group. [15:23:38] generic one does: [15:23:39] $instances.each |$section, $buffer_pool| [15:23:47] analytics does [15:23:48] $instances.each |$section, $instance_params| [15:24:11] seems to me like latter is better...but i'd rather use the generic one and remove the analytics on altogether if possible. [15:24:30] would it be ok to modify that part of profile::mariadb::dbstore_multiinstance to be more flexible? [15:24:57] it should be a no-op, but i'd have to edit the hiera for all usages of profile::mariadb::dbstore_multiinstance to match [15:28:37] that's for kormat to evaluate I would say cause that sounds risky [15:28:53] i think it'll be easy to make the patch, i'll put it up and see what yall think [15:31:47] sounds good! [16:13:23] kormat: marostegui https://gerrit.wikimedia.org/r/c/operations/puppet/+/732369 [16:13:25] lemme know what you think [16:57:38] hm, just talked to luca and thought of an another potential reason to do more multi-instance? [16:58:01] what is the snapshot restore from backup process currently for single instance with multi db? [16:58:27] e.g. if one database gets messed up somehow, is it possible to restore just that database from a backup, without messing with the state of the others? [18:22:34] How does the main page parser cache being purged seperately work [18:22:36] Like enwiki [18:22:39] Amir1: ? [21:33:24] Hello everyone, could someone explain the difference between the runtimes and EXPLAINs for the queries shown above? Of course, the right thing to do is to cast the IDs to integers before querying, but I'm wondering what makes MariaDB to ignore the index when it gets mixed types, but use it both in all-strings and all-integers scenarios. https://www.irccloud.com/pastebin/i5j7mvX8/ [21:33:48] Context: Blogpost written at https://phabricator.wikimedia.org/T293452, draft available to WMF staff at https://docs.google.com/document/d/1OTcog8suaJ0CHkCDBhGDUUj44cY6Sook5auBM7X8eIA/edit# for now