[04:23:50] marostegui: jynus I can verify that I can access labsdb1004 from tools, so no need to massage VLANs or firewalls [04:24:00] I do find that it has less databases than 1005 tho. not sure if that's expected [04:46:53] jynus: marostegui https://gerrit.wikimedia.org/r/#/c/337775/ will switch the aliases we ask people to use to labsdb1004 from 1005 [07:00:11] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3028418 (10Marostegui) >>! In T153743#3026114, @jcrespo wrote: > I've added a workaround that makes no sense but that works for now ,we need to revisit it... [07:15:59] yuvipanda: https://phabricator.wikimedia.org/P4935 [07:16:03] I guess it is not too worrying [07:19:13] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 07User-notice: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3002516 (10Marostegui) ``` 04:23 < yuvipanda> marostegui: jynus I can verify that I can access labsdb1004 from tools, so no need to massage VLANs or fi... [08:30:49] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#3028544 (10Marostegui) x1 is done too, so I believe this ticket can be closed. I am not going to paste all... [08:31:00] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#3028545 (10Marostegui) 05Open>03Resolved [08:34:34] 10DBA, 06Operations: Adapt wmf-mariadb10 package for jessie or puppetize differently its service to adapt it to systemd - https://phabricator.wikimedia.org/T116903#3028551 (10MoritzMuehlenhoff) My two cents: From a high level view I personally prefer the systemd unit to be in the Debian package since it's part... [09:38:26] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 07User-notice: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3028611 (10Marostegui) After a chat with Jaime we have moved those old databases in labsdb1005 to: `labsdb1005:/srv/tmp/old_dbs` . They didn't have an... [10:00:44] 10DBA, 10Analytics, 06Labs: Discuss labsdb visibility of rev_text_id and ar_comment - https://phabricator.wikimedia.org/T158166#3028656 (10JAllemandou) [10:10:03] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028719 (10Marostegui) I assume the following is going to happen in all the hosts as we have MIXED everywhere (but some specific cases like the sanitarium2 masters): ``` root@neodymium:~#... [10:26:33] Hello! qq - I am reviewing the analytics ACLs on cr1/cr2, and there is a rule called prelabsdb-mysql listing some ips. One is not used anymore, one is now kubernetes1003, and the last one is db1057 [10:26:40] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028736 (10jcrespo) Use "--no-check-binlog-format"- pt-t-c forces binlog format already if using super for itself, it should only cause issues for multi-level slaves, but you can check lat... [10:26:51] I think it is all old garbage but I wanted to double check with you [10:26:59] (my team does not remember) [10:28:17] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028741 (10jcrespo) Also, let's centralize the dsn tables on tendril or any other central place- so we do not have garbage tables in the future scattered all around. [10:29:28] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028742 (10jcrespo) Are you sure also you are using an updated version of pt-table-checksum, one without the binary bug? [10:46:14] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028759 (10Marostegui) >>! In T154485#3028736, @jcrespo wrote: > Use "--no-check-binlog-format"- pt-t-c forces binlog format already if using super for itself, it should only cause issues... [11:03:25] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028802 (10jcrespo) > Good point: pt-table-checksum 2.2.20 I do not know when that was fixed- `grep -A 15 ' CREATE TABLE checksums' $(which pt-table-checksum)` should force the table or t... [11:06:41] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028813 (10jcrespo) >>! In T154485#3028759, @Marostegui wrote: > That makes sense, however we'd need to truncate the table after using it as it will be used to check specific slaves from d... [11:09:35] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028822 (10Marostegui) >>! In T154485#3028802, @jcrespo wrote: >> Good point: pt-table-checksum 2.2.20 > > I do not know when that was fixed- `grep -A 15 ' CREATE TABLE checksums' $(wh... [11:43:15] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3028898 (10Marostegui) For the record I have created the dsns tables on tendril (and the first test with pt-table-checksum on m3 is using it). The only one that has data so far is dsns_m3... [11:48:47] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#3028935 (10Marostegui) Not sure if it is worth altering the master (db1049) anymore as it is going to be decommissioned soon: T134476 Probably not worth the risk and the time. [14:33:00] 10DBA, 06Operations: db1082 MySQL crashed - https://phabricator.wikimedia.org/T158188#3029269 (10Marostegui) [14:49:22] 10DBA, 06Operations: db1082 MySQL crashed - https://phabricator.wikimedia.org/T158188#3029289 (10Marostegui) Server rebooted fine it showed this on dmesg which I am not completely aware of what it means : ``` [ 32.823256] hpsa 0000:08:00.0: Acknowledging event: 0xc0000000 (HP SSD Smart Path configuration ch... [15:00:40] 10DBA, 06Operations, 10ops-eqiad: Replaced BBU for db1060 - https://phabricator.wikimedia.org/T158194#3029382 (10Marostegui) p:05Triage>03High [16:05:49] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 07Performance, 07Schema-change: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#723244 (10Huji) [16:06:28] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#723244 (10Huji) [16:24:53] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 07User-notice: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3029816 (10Marostegui) For the backup data: es1017 looks like a good candidate: ``` marostegui@es1017:~$ df -hT /srv Filesystem Type Size... [17:02:01] yuvipanda jynus time has come I believe :-) [17:03:33] I'm here [17:04:16] I guess we need for this: https://gerrit.wikimedia.org/r/#/c/337775/ to be deployed :) [17:04:28] before we can stop 1005 and copy it over [17:04:50] not yet, first we announce it [17:05:02] then we put the master in read only [17:05:07] ah :) [17:05:10] then we repoint [17:05:21] then we shutdown it [17:05:36] we also need a puppet change while the server is down [17:06:29] I think db2062 didn't boot to a proper state [17:06:39] a proper state? [17:06:46] hello [17:06:48] I'm here [17:07:14] marostegui, https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=db2062 [17:07:35] yuvipanda, let's announce the start of the work [17:07:40] on IRC [17:07:50] not sure if you usually do it twice on mail [17:08:01] jynus: oh wow…I will take care of it later then thanks for the heads up [17:08:25] jynus: yeah, I do. let me do that [17:09:08] done [17:09:30] jynus: am ready to merge and test the DNS failover whenever you want :) [17:10:01] yuvipanda, wait [17:10:14] we need to disable 3 users [17:10:24] ok! [17:10:37] (the 3 that cannot replicate to 4 [17:11:06] I've changed the topic on labs [17:11:17] thanks! [17:11:28] so [17:11:38] I got the screen with the nc commands on es1017 and the iptables rule for the transfer ready [17:11:43] good [17:12:20] I will change the permissions of the conflictive accounts to root now [17:13:00] That is s51412\_\_data.%,s51071\_\_templatetiger\_p.%,s52721\_\_pagecount\_stats\_p.% [17:13:12] on labsdb1004 [17:13:20] ping me if I am going to do something stupid [17:13:28] haha [17:13:34] so it makes all sense :) [17:16:02] [s51412__data]> create table test (i int); [17:16:08] ERROR 1005 (HY000): Can't create table `s51412__data`.`test` (errno: 13 "Permission denied") [17:16:32] \o/ [17:16:56] I checked that I can create and drop tables, still [17:17:17] so, next stop, setting master as read only [17:17:20] nice [17:17:23] and change replication direction [17:17:25] * marostegui taking notes to ask jynus tomorrow a few questions in our meeting [17:17:57] and note the binlog position of both servers [17:18:06] this should be done as fast as possible [17:18:14] yuvipanda, prepare the patch for merge [17:18:28] yes sir! [17:18:30] so this is as little disrruptive as possible [17:18:34] but wait for our ok [17:18:42] we have to confirm replication works the other way [17:18:51] while in read only mode [17:19:23] jynus: yup! [17:19:34] patch ready whenever you are [17:20:20] marostegui, I do the changes, I assume? [17:20:31] yep, I am taking notes on an etherpad [17:20:37] setting labsdb1005 in read only [17:20:39] but i can doble check the binlog position too [17:21:24] log.124340 | 21527357 for the current master [17:21:28] 1005: log.124340 21527357 [17:21:29] it is in read only mode [17:21:32] yep [17:21:37] copy that to the etherpad [17:21:51] done [17:21:53] and 1004 too [17:22:06] slave is up to that with that [17:22:28] local master pos on 4 is : log.059837 | 11232939 [17:22:29] https://etherpad.wikimedia.org/p/labs-migration [17:22:49] good, we agree [17:22:58] no, reseting the replication [17:23:01] on 4 [17:23:32] and running change master on 5 [17:25:21] check the etherpad [17:25:24] for coords [17:25:31] marostegui? [17:25:36] yes [17:25:40] did you see my comment? [17:25:43] maybe i got disconnected [17:25:49] I did [17:25:53] makes sense? [17:25:58] it was the same number [17:26:03] then it looks good [17:26:09] running on 5 [17:27:39] I see it now [17:27:45] replication looks good [17:27:54] yep :) [17:27:55] we will see if it breaks :-) [17:27:58] XD [17:28:01] no ssl yet [17:28:05] we can merge, yuvipanda [17:28:15] oh wait [17:28:17] ok [17:28:26] it is still read_only=ON [17:28:27] we can put 4 in read/write [17:28:34] yep [17:28:35] merge now, doing it now [17:29:12] doen now [17:29:19] let's repoint to labsdb1004 [17:29:43] see how replication and tools react, etc. [17:29:55] I've merged, it takes a little time for it to propogate anyway. let me force a puppet run [17:30:01] I know [17:30:10] we can put a proxy here in the future [17:30:19] if we get the money :-) [17:30:37] :D [17:31:38] some users seem to be using persistent connections [17:31:54] I can "help" changing the server once the change has been propagated [17:32:05] hahaha [17:32:07] "help" [17:32:07] XD [17:32:23] let's get that copy prepared meanwhile [17:32:30] and the puppet role change [17:33:19] once we stop the server I am ready to hit the enter and start the copy [17:33:26] good [17:33:38] then review my puppet change when it is ready [17:33:57] I will copy /srv/postgres and /srv/labsdb into two different tar.gz [17:34:12] it is already right? [17:34:14] the postgres one is just 67M :-) [17:34:28] No, I haven't started the copy [17:34:31] I do not think there is a real postgress there [17:34:48] I can do it now, there is no process no [17:34:49] have you checkjed the contents [17:35:03] indeed, there is nothing there XD [17:35:12] postres is on 1006 and 1007 [17:35:23] it has 4T for it there, I am glad we are going to reimage :) [17:35:36] ok, once we are ready we can stop mysql and I will start the copy of /srv/labsdb [17:35:44] and 1004 [17:35:56] I thought the role was wrong [17:36:00] but it is right [17:36:07] it may need a check [17:36:58] this is simpler than I thought [17:37:01] mariadb10 => false [17:37:13] to nothing (we have 10 as default) [17:37:55] primary DNS complete, awaiting secondary DNS puppet run to finish [17:38:01] cool [17:39:34] just curious - are we also upgrading to mariadb 10/ [17:39:39] yes [17:39:54] people asked/complained about it [17:40:06] about not being 10 [17:40:10] not about the upgrade [17:40:32] there are some tools blocked by it (e.g. wanting innodb fultext search) [17:41:08] nice [17:41:10] \o/ [17:41:21] jynus: marostegui ok, DNS done [17:41:29] create database test; and it apears on the slave! [17:41:49] I drop it and it drops! magic! [17:42:03] ok, I will kill all connections on the previous master [17:42:19] we've got a bunch yes [17:42:20] and wait for people to complain because they have not programmed [17:42:33] their services to reconnect [17:43:37] heh [17:46:20] 1 stuborn user reconnected [17:46:25] i see how the connections reduced yep [17:46:27] from 90 to 12 XD [17:46:32] several, actually [17:47:17] I see people using the swithover, though [17:47:53] and i see the same user again there [17:48:14] replication is up [17:48:37] yuvipanda, should we wait for checking important tools, or should we stop mysql already? [17:48:57] for me I would put it down already- less downtime [17:49:38] replication broke, but I think what it is [17:49:42] jynus: I think as long as we're sure we won't completely lose data, I say we stop it [17:49:52] we can fix it later [17:50:06] shutting down 1005 [17:50:09] yuvipanda: we won't lose data [17:50:21] I just checked PAWS, and it's just reconnected and come bac up [17:51:28] 5 is down [17:51:33] jynus: i see the process is now down, you want me to start the copy? [17:51:38] yes [17:51:43] ok [17:52:27] started [17:52:36] copying /srv/labsdb to es1017:/srv/tmp [17:53:26] I do not like using production hosts [17:53:38] but there is not much now in the dc [17:53:58] yeah, me neither, but until we get the new dbstores...:( [17:54:01] let me disable puppet [17:54:05] and merge the change [17:54:08] cool [17:56:18] yuvipanda, we do not really need you around anymore until 1-2 hours [17:56:23] marostegui, agree? [17:56:30] yep [17:56:35] I think it will take around 1h to finish the copy [17:56:42] ok then! I'll go shower and stuff :) [17:56:59] I will also check I can decompress the tar.gz (not the whole of it, but just a few files) [17:57:01] I'll check back in at most 1h but possibly earlier [17:57:07] feel free to call me if needed [18:51:38] 1h into the transfer and we have copied half of the dataset [18:51:48] :D [19:47:20] almost done [19:47:34] w00t [19:48:32] yeah, 60G to go :) [19:54:13] let verify the tar when we are done [19:54:18] unrelated, I'd like to do https://phabricator.wikimedia.org/T146718#3028336 later this week. [19:54:30] yeah, I will check I can decompress it [19:54:36] for a few minutes and then crtl+c [19:54:53] +1 [19:55:01] \o/ [19:55:59] yuvipanda, https://phabricator.wikimedia.org/T157359 has higher priority [19:56:06] it is part of our chosen goal [19:56:32] jynus: yeah, I agree. [19:56:39] they copy is done [19:56:42] let me verify the tar [19:56:49] yes, take your time [19:57:02] last thing we want is to lose it [19:57:08] * yuvipanda nods [19:57:37] extracting [19:59:07] let it reach something other than the binlogs [19:59:19] yeah [19:59:21] no worries [19:59:26] it is still extracting binlogs :) [19:59:34] i am going to let it run for a while [19:59:47] and meanwhile will merge this: https://gerrit.wikimedia.org/r/#/c/337911/ [19:59:53] if it gets verified sometime soon [19:59:56] :( [20:09:05] 10DBA, 06Operations, 13Patch-For-Review: mysql user and group should be a system user/group - https://phabricator.wikimedia.org/T100501#3030736 (10jcrespo) The user part should be fixed, or fixed when all trusties are decommissioned. The group part will take effect starting on stretch. This is mostly done... [20:10:53] oh, you merged it [20:11:06] I was waiting for the verify looking at: https://integration.wikimedia.org/zuul/ [20:12:04] it has extracted a few databases already, I think it is fine [20:12:19] ok for me [20:12:33] wait for dhcp change to apply [20:12:34] root@es1017:/srv/tmp/labsdb/data# du -sh . [20:12:34] 27G . [20:12:57] (not counting the binlogs) [20:13:59] dhcp updated [20:14:02] ok [20:14:06] let's reimage [20:14:15] * marostegui crosses his fingers [20:14:20] labsdb1005 [20:14:29] you do it or I do it? [20:14:34] I can du it [20:14:36] I shall cross fingers too [20:14:36] oki [20:14:46] oh, the reimage is the easy party [20:14:53] search me the ticket number, please [20:14:56] sure [20:15:25] https://phabricator.wikimedia.org/T157358 [20:15:33] T157358 [20:15:34] T157358: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358 [20:16:14] wmf-auto-reimage -p T157358 labsdb1005.eqiad.wmnet [20:16:18] ^ok ? [20:16:31] I had the fear of doing db1005 by accident [20:16:40] looks good to me [20:16:45] but there is no db1005, so it was not a huge issue [20:17:00] and we'd need to decomission it anyways if it existed :p [20:17:23] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3002516 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['labsdb1005.eqiad.wmnet'] ``` The lo... [20:17:29] it is running now^ [20:17:30] \o/ [20:21:23] I can see it installing [20:21:27] nice! [20:21:41] I think it is jessi [20:21:57] hopefully! [20:32:41] finished, running puppet [20:33:11] yeah I am watching it live too [20:33:12] like a film [20:37:54] do you want to copy it back? [20:37:59] yep [20:38:03] is it back already? [20:38:24] wait [20:38:28] it may restart it once [20:38:30] 2017-02-15 20:38:15 [INFO] (jynus) wmf_auto_reimage::submit_job: Submitted job '20170215203815311253' on target '['labsdb1005.eqiad.wmnet']' with action 'system.reboot' and params '[]' [20:38:34] yep :) [20:38:36] it just did [20:38:51] that is useless for us [20:39:06] it would make sense if the data and mysql was already there [20:39:35] jynus: https://phabricator.wikimedia.org/T136192#3030801 is one of the users whose db isn't on 1004 (is large). I'm going to repsond to them saying it'll be back once maint completes. is that accurate? [20:40:34] yes, tell them it is under maintenance, and that it will be abvailable in some hour's time [20:41:09] ok [20:41:45] server is back [20:41:53] let's copy suff back [20:41:56] ok [20:42:09] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3030805 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labsdb1005.eqiad.wmnet'] ``` and were **ALL** successful. [20:44:02] started [20:44:31] ETA? [20:44:39] 1:15 [20:44:56] good, will take a break, let's come back in 1 hour [20:45:06] for the "fun" part [20:45:06] yeah, going for a break as well [20:45:09] need some fresh air XD [20:45:32] (user ack'd my response) [20:46:05] good [20:46:15] again, nothing to see here until 1 hour [20:46:37] after that the actual upgrade, which is where thing can go wrong [20:46:52] 5.5->10 upgrade [21:17:30] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3030925 (10Marostegui) The first test with phabricator_file database and generated a peak of 500 seconds lag on db1048 and db2012 while checksuming the biggest table of the database: file_... [21:40:24] almost there, 12 minutes [21:42:01] w00t [21:44:07] marostegui, give a look at https://gerrit.wikimedia.org/r/337990 [21:44:58] checking [21:45:19] any particular reason? [21:46:10] well, it has a 5 MB one now [21:46:27] oh, i thought it had 128M XD [21:46:30] oh, no [21:46:36] that is some random file [21:46:40] on /src [21:47:20] 128M is ok, I suppose [21:47:23] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3031005 (10Papaul) [21:47:29] I just didn't want only 5 [21:47:55] jynus: I am fine with 500M I was just wondering why the increase from 128 to 500M if it was for something specifically [21:47:55] we could do a general check of the options, that may have not been checked for a while [21:48:10] specially anything that requires a reboot [21:48:26] what did you do for all the migrations 5.5->10 that you did in the past? [21:48:34] what do you mean [21:48:35] start with —skip-networking and then mysql_upgrade and hope for th best? [21:48:39] I mean in this case [21:48:53] because tools was restricted to 5.5 options for a long time [21:49:21] start with skip networking and --skip-slave-start [21:49:48] then we'll see [21:49:58] * marostegui crosses his fingers again [21:50:07] 3 minutes left for the transfer [21:50:34] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3031031 (10Papaul) [21:52:01] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3031005 (10Papaul) [21:55:10] well, the transfer is done guys [21:57:50] so, can I start the db? [21:57:58] or will you? [21:58:09] I will do it [21:58:21] I am tailing the error log [21:58:41] it is up [21:58:44] yep [21:58:49] complaining about p_s [21:58:55] but that is to be expected [21:59:05] let's run upgrade [21:59:37] ok [22:00:11] running [22:02:09] it may take some time [22:02:16] but notice anything strange [22:02:20] *note [22:02:29] yeah, it is in a screen so we can easily check it [22:02:31] so far so good [22:03:41] unix socket authentication worked nicely [22:04:25] finished [22:04:29] let me scan the log [22:04:34] to see if there is anything strange there [22:04:58] log = output [22:06:06] it all went fine [22:06:26] that is good, no corruption, no anything [22:06:47] let's stop and start with only —skip-slave? [22:06:50] let's restart for the changes to take place, but still skip networking and slave start [22:06:57] oki :) [22:07:02] I do not want anything yet connecting [22:07:18] I do not care for production, but this is so public... [22:07:22] restarting [22:07:34] no errors [22:08:07] so I would start the slave [22:08:20] and I think the filters will cure the replication error we had later [22:08:47] we still have —skip-networking [22:09:02] oh, will that prevent the slave connection? [22:09:24] not sure, we can try to start the slave anyways :) [22:09:32] I thought that only affected the open port [22:09:38] try, please [22:10:00] started [22:10:03] and looking good [22:10:09] that is exactly what I wanted [22:10:13] catch up with no users [22:10:22] dup entry [22:10:30] :-( [22:10:31] is that the one you were expecting? [22:10:35] nope [22:11:02] but this happened to me before and I have not solved the problem [22:11:13] some kind of config incompatibility [22:11:35] but I supposed it was becase different versions [22:13:10] 37746 rows, I would say backup + ignore + start slave [22:13:39] go for it then [22:14:00] let's hope there are no more :| [22:14:08] backed up [22:14:11] let me know if / when you want to switchover tools.labsdb again [22:14:21] yuvipanda, if everyhing is ok, soon [22:14:55] ready for set global sql_slave_skip_counter = 1 then? [22:15:04] no no [22:15:08] we will not skip [22:15:14] just ignore the whole thing [22:15:14] https://gerrit.wikimedia.org/r/#/c/338012/ is ready to go whenever [22:15:45] not sure what you meant with ignore, sorry :( [22:15:59] you will see now [22:16:02]