[04:54:43] 10DBA, 10Data-Services, 10MediaWiki-Change-tagging, 10Patch-For-Review: Recent duplicate entries on change_tag on sanitarium hosts - https://phabricator.wikimedia.org/T200061 (10Marostegui) [05:09:11] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [05:09:26] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [05:09:28] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [06:34:29] I am going to take over db1118 and remove its contents- I need 3 servers to test topology changes [06:34:37] +1 [07:18:08] 10DBA, 10Data-Services, 10MediaWiki-Change-tagging, 10Patch-For-Review: Recent duplicate entries on change_tag on sanitarium hosts - https://phabricator.wikimedia.org/T200061 (10Marostegui) [07:19:56] 10DBA, 10Data-Services, 10MediaWiki-Change-tagging, 10Patch-For-Review: Recent duplicate entries on change_tag on sanitarium hosts - https://phabricator.wikimedia.org/T200061 (10Marostegui) @Ladsgroup the following section have been checked and have reported no differences, so you can go ahead and run the... [07:41:32] marostegui: Morning and Happy Monday, according to this can I run the script on s1, s5,s6,s7, and s8? [07:43:57] oops, forgot to put the link: https://phabricator.wikimedia.org/T200061 [07:51:26] Amir1: yep, those were checked and reported no differences [07:53:56] coolio [09:04:48] The queries I'm running for the maintenance script on wikidatawiki is a little bit slow but basically there is no way around it :/ It's a one time thing, I hope it finishes soon [09:10:27] Ping me if the load on replicas of s8 is too much [12:24:47] so I can confirm the replica movement worked nicely, just it tried to reply the same (single) query twice [12:25:49] so I just have to add 1 to the transaction, beacause GTID logic is slightly different (transaction to run) to binlog position (gap between transactions) [13:06:08] Nice! [13:06:12] Very great news! [13:06:57] 10DBA, 10Data-Services, 10MediaWiki-Change-tagging, 10Patch-For-Review: Recent duplicate entries on change_tag on sanitarium hosts - https://phabricator.wikimedia.org/T200061 (10Marostegui) [13:07:10] 10DBA, 10Data-Services, 10MediaWiki-Change-tagging, 10Patch-For-Review: Recent duplicate entries on change_tag on sanitarium hosts - https://phabricator.wikimedia.org/T200061 (10Marostegui) @Ladsgroup s2 is also fixed, you can run it there too [13:08:15] Once you have more time, can you elaborate a bit on what did you mean with "adding 1?" [13:13:50] 10DBA, 10Operations, 10ops-eqiad: db1069 bad disk - https://phabricator.wikimedia.org/T199056 (10Marostegui) Just talked to Chris - as this disk is on predictive failure but not failed yet, we are going to wait for the new disks to arrive in order to avoid trying again with used ones. [13:13:53] so with binary log, coord X:Y means that it will search offset Y on file X [13:14:18] it points to a gap between events, or a specific, infinitely small point inside the file [13:14:29] the intention is to apply transactions after offset Y [13:14:45] with GTIDs, the number applies to transactions [13:15:24] which means that if you applied transaction X-Y-Z, the next thing you will probably execute is X-Y-(Z+1) [13:15:33] I assume there can be gaps, etc. [13:15:40] Ah right, i get it now [13:15:44] so it may not be Z+1, but other [13:15:46] yeah [13:16:08] also I guess there could be A-B-C in between [13:17:25] marostegui: Thanks! [13:18:21] 10DBA, 10Patch-For-Review: Test database master switchover script on codfw - https://phabricator.wikimedia.org/T199224 (10jcrespo) What the first tests on production-like hosts look like: ``` root@neodymium:~/wmfmariadbpy/wmfmariadbpy$ time ./move_replica.py db1118 db1095 Purging binary logs to speed up coord... [13:18:31] ^I am still breaking things [13:26:32] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [13:28:50] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [13:29:30] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [13:30:23] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [15:07:38] so the whole script works, including the replicas switch- with a very big BUT [15:08:22] we assume that after writing transaction X, we are searching for X+1, and X+1 may not exist [15:08:56] and if it doesn't exist, what does it do? [15:09:05] it stalls searching for it [15:09:11] reading the whole binlogs [15:09:26] the method is not good [15:09:30] we need to search for the gap [15:09:39] but not for another transaction [15:09:56] in theory I wait 1 second to at least having pt-heartbeat [15:10:07] but once I forgot to migrate it, and it stalled [15:10:42] also, by chance we could find a non-global transaction [15:10:49] like a local only transaction [15:11:05] which we should normally not execute, but we do from time to time [15:11:14] and of course that will not be found [15:11:15] yeah [15:11:27] so this is not as much a limitation on the script but the model [15:11:34] How confident are you for es? [15:11:39] Like, to use the move as well [15:11:41] I think this is ok for a first approach [15:11:53] for the failover process 99.99% [15:12:02] it is only the slave change that is tricky [15:12:24] I think it is the same issue with repl.pl + gtid [15:12:41] yeah [15:12:46] and we ended up disabling it XD [15:12:50] I would do that manually, and then refactor the move() later [15:13:14] to be fair, pseudogtid has similar shorcomings [15:13:16] sounds good [15:13:28] if there is no replication, it will not work either [15:13:39] or at least not easily [15:14:06] so the actual failover is easy [15:14:23] it is only moving the replicas to another master what is tricky [15:14:36] yeah, without stopping things [15:14:51] however, knowing that pt-heartbeat has to be running [15:14:57] on the master [15:15:12] I have done 20 failovers with movement already [15:15:17] I thought we were going to kill it [15:15:21] and it works without problem [15:15:34] yeah, the issue is it has to be running for the slave movement [15:15:42] but not running for the actual failover [15:16:11] so there are transactions where to understand what is going on [15:16:45] the issue is not really as much technical as of model [15:16:56] so we may need to do changes of how we replicate [15:19:15] Why it needs to run for the slave movement? [15:19:20] pt-heartbeat I mean [15:19:38] Ah, the slaves movement [15:19:39] we need some writes to happen so we have that X+1 transaction [15:20:03] also I realized I was executing BINLOG PURGE [15:20:08] and that created a local transaction [15:20:14] which make it confused [15:20:26] that creates a transaction? interesting [15:20:34] so we have to run it with sql_log_bin=0 [15:20:36] and flush logs too? [15:20:48] yes, sorry, I meant flush [15:20:54] ah right [15:22:13] 10DBA, 10Patch-For-Review: Test database master switchover script on codfw - https://phabricator.wikimedia.org/T199224 (10jcrespo) ``` root@neodymium:~/wmfmariadbpy/wmfmariadbpy$ for host in db1095 db1102 db1118; do mysql.py -h $host -e "set sql_log_bin=0; FLUSH BINAR Y LOGS;"; done; time ./switchover.py db109... [15:22:32] for the most part, it works, after those disclaimers, but it is not ideal https://phabricator.wikimedia.org/T199224#4445146 [15:22:57] I meant https://phabricator.wikimedia.org/T199224#4445514 [15:23:09] What is: Testing if to migrate db1118.eqiad.wmnet:3306/(none)... [15:23:10] ? [15:23:21] probably garbage debugging [15:23:29] it reads all replicas [15:23:35] Ah ok :) [15:23:38] (it says Nope below) [15:23:42] impressive 1 second :) [15:23:58] actually, 1 second of waiting for pt-heatbeat to run [15:24:15] 1 second per replica to move [15:24:19] of wasted time [15:24:35] which is not that good- we could just stop in sync [15:24:55] the good thing is that this move() method works on unrelated replicas [15:25:16] so you could put sanitarium replicating from a separate datacenter [15:25:37] (not labsdb because right now it doesn't support multisource or non-binlog replicas) [15:25:45] but that should not be hard to add [15:25:59] yeah, which is something we discussed already for T196367 [15:26:00] T196367: Implement a script to facilitate sanitarium failovers between DCs - https://phabricator.wikimedia.org/T196367 [15:26:19] the mulstisource is adding a default_master on start [15:26:23] so trivial [15:26:39] but the sanitairums aren't multisource anymore [15:26:39] and the other is doing the scan based on the relay log or master binary log [15:26:46] yes, those would work [15:26:49] right now [15:26:53] ah you meant for labsdb right [15:26:56] (we can try :-P) [15:27:00] haha [15:27:29] so I am quite confident of the current method, it just have too many shortcomings [15:27:40] and edge cases it doesn't handle [15:27:59] Yeah [15:28:04] and again, separate move() from the switchover script [15:28:10] Do you want to go ahead and try the move on Wed? [15:28:12] the switchover is rock solid [15:28:21] it its the move that is not ideal [15:28:37] which is only 1 step of the switchover [15:28:57] we are going to use the switchover one [15:29:14] Yes [15:29:15] we can separate the move aside and do it beforehand [15:29:20] Sounds good [15:29:34] in fact, I think that is the right way [15:30:03] it is as easy as comment one line of code [15:30:12] (it can be an option, I guess) [15:30:24] Yeah, an option would be nice [15:30:37] I am also ok with 2 scripts [15:30:38] To either use it separately or by pass it on a failover [15:30:43] Up to you [15:30:58] this was a sprint to have something done [15:31:05] I know [15:31:13] I did what I wanted to do (switchover.py) [15:31:18] which is in the critical read only time [15:31:32] the rest is bonus, and not as easy as it looks [15:31:44] hehe yeah [15:32:13] maybe we should stick to repl.pl (or that method, repimplemented) [15:32:22] For now? [15:32:24] and focus on the use cases that doesn't cover: not direct replicas [15:32:30] and master failure [15:32:31] Yeah, that is also an approach [15:32:44] repl.pl is relatively easy [15:32:53] because it doesn't try to be fancy [15:33:22] yeah, it stops slaves for a bit and does it safely [18:30:48] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, and 2 others: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10awight) Here are the notes from our meeting, plus some more discussion afterwards: https://etherpad.wikimedia.org/p/JADE_scalabili...