[00:13:49] 10DBA, 06Labs, 10Tool-Labs: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2257889 (10Krenair) That's now resolved, but I still have trouble logging in: ```krenair@tools-bastion-03:~$ mysql -h labsdb1001.eqiad.wmnet Welcome to the MariaDB monitor. Commands end with ; or \g.... [00:18:08] 10DBA, 06Labs, 10Tool-Labs: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2970947 (10Krenair) Woah, wait, what? I can get in as `s52299` but not as `u2170`? ```tools.alex@tools-bastion-03:~$ mysql -h labsdb-web.eqiad.wmnet Welcome to the MariaDB monitor. Commands end with ;... [00:21:07] 10DBA, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM, 07FR-2016-17-Q2-Campaign-Support, and 2 others: Spike: Look into transaction isolation level and other tricks for easing db contention - https://phabricator.wikimedia.org/T146821#2970952 (10DStrine) p:05High>03Normal [07:48:59] 10DBA, 06Operations, 10netops, 13Patch-For-Review: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2971613 (10Marostegui) This has happened already. Times in UTC: Preparation of all the code, topology changes etc: 06:30-07:30 read only on: 07:30:40 do all the necessary c... [07:57:51] jynus: db1057 is still to be fixed, right? [07:58:07] (i can do that) i have the coordinates of db1052 before the read_only=off [07:58:15] btw: https://etherpad.wikimedia.org/p/switchover [07:58:53] yeas, please take care of that [07:59:03] I will take care of the codfw link [07:59:07] ok [07:59:16] also [07:59:22] if touching 57 [07:59:25] i will start db1057 to replicate to the coordinate right before the read_only = off on db1052 [07:59:26] enable gtid there [07:59:29] ok [07:59:41] it should be ok [07:59:47] sometimes, however [08:00:07] events start to go around when circular replication [08:00:12] mmm I am wondering one thing [08:00:13] which are a paing [08:00:19] *pain [08:00:27] so db1057 still have slaves (multisource etc) [08:00:29] because our multi master [08:00:38] yes, those are outdated now [08:00:59] yes, but what's going to happen with hearbeat? [08:01:10] i mean [08:01:11] we should disable it [08:01:28] i can kill heartbeat on db1057 [08:01:29] and downtime lag checks [08:01:34] no need [08:01:37] puppet does it for us [08:01:44] with master = false [08:01:53] that is more or less "well done" [08:01:54] yep we need to push that [08:02:02] remember that right now [08:02:05] but my question is [08:02:11] lag is show as 0 [08:02:14] but it is unreal [08:02:20] yes [08:02:22] on all slaves depending [08:02:32] so downtime lagh checks everywhere [08:02:44] so db1057 is going to replicate from the position of db1052 right before read_only=OFF, so it will replicate heartbeat events too [08:03:15] yeah, we want that [08:03:24] different rows [08:03:25] is that going to mess somehow with the current heartbeat table of db1057 and its slaves? [08:03:27] no colission [08:03:29] no [08:03:38] ah ok :) [08:03:54] that is why we allow several servers to write at the same tiem [08:04:03] in fact, we whould have it on all the time [08:04:16] but we do not want 20 updates sent everywhere [08:04:22] etc. [08:04:33] on all servers, I mean [08:04:52] not sure I expressed clearly [08:05:07] yes yes [08:05:08] you did [08:05:11] you answered my question [08:05:22] 2017-01-26T08:05:07.000600 | 171974683 | db1057-bin.002885 | 576019402 | NULL | NULL | s1 | eqiad | [08:06:20] and a different row one for 52 and 2016 [08:06:30] checks are based on shard and datacenter only [08:06:56] indeed i see [08:07:38] give it priority, labs users have already created 2 tickets complaining [08:08:00] yeah, i am going to fix it right now [08:08:08] at least https://tools.wmflabs.org/replag/ should show the lag [08:09:36] jynus: change master to master_host='db1052.eqiad.wmnet', master_log_pos=210334207, master_log_file='db1052-bin.004556'; [08:09:40] https://etherpad.wikimedia.org/p/switchover [08:09:44] can you confirm? [08:10:09] you may need to enable gtid, but can be done later [08:10:25] yeah, i can do that later [08:10:41] the coords are right [08:10:50] it is true that the slaves may differ [08:11:02] but only on heartbeat, which we do not care [08:11:07] and temporarelly [08:11:15] yeah, that was my only worry [08:11:27] beware of the alerts [08:11:38] going to downtime the lag for all the slaves of db1057 [08:11:39] because of lag, I created some false alters [08:11:47] last time [08:11:53] let me downtime them first [08:11:53] not a big deal, just annoying [08:11:56] yeah [08:12:05] also db1057 iself [08:13:19] I am going to break db1057 -> db1052 replication [08:13:26] ok, db1057 is silenced itself [08:13:31] no going back from here [08:13:31] before the switchover [08:13:43] mmm wait [08:13:52] db1057 is not replicating anymore from db1052 eh [08:13:59] no [08:14:03] but the other way yes [08:14:08] ah db1052->db1057 [08:14:08] sure sure [08:14:11] no [08:14:28] db1052 is techinically replication from 57 right now [08:14:39] just no writes are bing sent [08:14:45] well, the heartbeat [08:15:08] I am going to stop that and migrate to 2016 (as a master) [08:15:09] yes [08:15:40] tendrill will disapper here [08:17:29] yeah, you need to reset master all [08:17:35] it will not work otherwise [08:17:37] haha yeah [08:17:55] makes sure you use ssl when connecting [08:19:26] there we go [08:19:29] ssl and replicating fine now [08:20:12] I didn't say anything becuase I thiough you weren't pasting the secrest publicly [08:20:14] sorry [08:20:37] channing thins like that only work for gtid [08:22:18] no worries [08:22:27] going to switch it to slave_pos [08:24:18] https://tendril.wikimedia.org/tree is bad, fixing tendril now [08:24:43] oki [08:25:36] looking good now [08:25:42] db1057 caught up now [08:25:45] going to enable gtid [08:25:51] "good" [08:25:55] haha [08:26:10] https://dbtree.wikimedia.org/ [08:26:16] ^this looks better [08:26:17] done [08:26:52] why arent't dbstore servers shown there? [08:27:02] just they aren't [08:27:11] that is supposed to help mediawiki [08:27:15] ah [08:27:20] so anything that is not mediawiki, is not shown [08:27:27] but no special reason [08:27:36] ah ok ok [08:27:37] it was like that when I arrived, it kept like that [08:27:44] *it [08:28:00] so, actually, now [08:28:15] to change db1057's masters [08:28:28] sorry [08:28:29] slaves [08:28:36] easier way is to stop slave :-/ [08:28:43] because no gtid [08:29:10] 47, 69 and 1002 at least [08:29:19] 1001 is its own problem [08:31:46] yeah [08:32:06] we need to use the same position we used for db1057 to start replicating from [08:32:13] heartbeat will repeat the events [08:32:15] but that is all [08:32:22] what? [08:32:31] oh [08:32:32] slaves [08:32:34] nevermind [08:32:47] no, we stop 57, use the current 57's master coords [08:32:55] yeah yeah, sorry [08:33:01] (wait for slaves to catch up) [08:33:38] I hope by this time you hate it enough [08:33:48] i am hating it yes XD [08:33:55] i was going to say: so many snowflakes [08:33:58] to want to automatize it as much as I want [08:34:54] most of the issues come from code not being able to commit and query propery distributely [08:35:27] *be commited [08:35:36] and *Be queried [08:36:31] ok, updating s1-master.eqiad.wmnet CNAME [08:37:00] ok [08:37:04] going to start fixing db1047 [08:38:03] (if it catches up..) [08:38:42] it looks good [08:38:46] in a few minutes [08:38:56] let me help by sending more load there [08:39:51] db1069 is up to date, I can fix that meanwhile :) [08:42:12] looking good? https://gerrit.wikimedia.org/r/#/c/334248/1/templates/wmnet [08:42:16] checking [08:42:30] that should not be used anywhere [08:42:38] yep [08:42:40] but I like it for easy ssh, etc. [08:43:03] emergency, don't remember s6 master, ssh s5-master [08:43:29] https://phabricator.wikimedia.org/P4815 -> master_log_file=db1052-bin.004556, master_log_pos=326009889 to change db1069 too? [08:43:47] do not understand the question? [08:44:01] i am going to point db1069 to those coordinates [08:44:09] let me see [08:44:14] show slave slave status is from db1057 [08:44:20] checked that 69 is up to date? [08:44:22] yep [08:44:42] Seconds_Behind_Master: 0 [08:44:57] you know I like coords ;-) [08:45:06] not an issue, it is stopped here [08:45:28] yess Relay_Master_Log_File and exec are correct [08:45:49] oooook [08:49:44] important stuff [08:50:15] makes semisync is working on 52 [08:50:19] *make sure [08:50:49] it could be that the plugin is unloaded [08:51:01] in fact, we should check everywhere [08:51:07] but specially on s1 [08:52:48] rpl_semi_sync_slave_enabled | ON - | rpl_semi_sync_master_enabled | OFF | [08:53:25] good catch [09:05:20] only dbstore1001 hanging now from db1057 [09:10:57] ok, that can be handled later [09:12:01] 10DBA, 06Operations, 10netops, 13Patch-For-Review: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2971729 (10Marostegui) recap of the cleanup work: dns changed for s1-master.eqiad.wmnet multisource slaves changed (only pending dbstore1001): db1047, db1069,dbstore1002 rep... [09:12:02] jynus ^ [09:12:33] only one change [09:12:48] we want to change dbstore1001 , not wait for catch up [09:12:54] it is one day delayed [09:13:06] and it has weird replication control [09:13:54] that means archeology or how did you do it last time? [09:14:58] you were around, remember [09:15:29] pure archeology plus disabling replication and events 20 times [09:16:13] only thing that worries me is have semysinc in place before today's peak time [09:16:23] let me enable it now [09:16:44] it may need stop start on slaves, not that easy [09:16:57] not that difficult, I just mean not immediate [09:17:11] let's do it now and get it out of the way? [09:17:15] sure [09:20:54] i am checking if the slaves have it enabled [09:21:41] the client may be [09:21:53] but we have to chech on the stats/log if it is active [09:22:04] cant remember how, long time since last time [09:22:09] it may be documented [09:23:43] "09:33 jynus: renabling semisync replication throughout s4" [09:23:53] so probably that means stop-start [09:24:15] that is from 2016-07-15 [09:25:04] right [09:25:08] Rpl_semi_sync_master_clients | 9 [09:25:17] Rpl_semi_sync_master_status | OFF [09:25:55] ^thats 52 [09:26:44] rpl_semi_sync_master_enabled | OFF [09:27:47] https://dev.mysql.com/doc/refman/5.5/en/replication-semisync-installation.html [09:28:35] rpl_semi_sync_slave | ACTIVE | REPLICATION | semisync_slave.so | GPL | [09:28:39] | rpl_semi_sync_master | ACTIVE | REPLICATION | semisync_master.so | GPL | [09:28:42] ^db1052 [09:28:44] so the plugin is at least there [09:28:57] we have to enable on the master, I think [09:29:05] and stop start the slaves [09:29:13] the replication, I mean [09:29:29] yeah [09:29:33] 6 months and I do not remember [09:29:42] I remember being worried about performance, etc [09:29:44] i don't remember what i did 2 weeks ago XD [09:29:56] but it was pretty safe [09:30:36] I think I enabled in on the fastest slaves first just in case [09:30:49] yeah, let's try that [09:32:19] the master part is enabled on 57, and not on 52 [09:32:29] we have to switch that [09:32:55] the timeout was very low, however [09:33:08] yeah, timeouts are different now [09:33:12] in db1052 and db1057 [09:33:23] db1052:100, db1057:10000 [09:33:28] a low timeout may actually create worse lag [09:33:39] because it timouts sooner [09:34:42] we can leave 10000 and do: rpl_semi_sync_master_enabled = ON [09:34:47] and stop/start one of the fast slaves [09:35:05] yes [09:35:24] looks like what I did ,according to docs [09:35:43] ok, let me enable it on the master [09:35:45] I really do not remember :-| [09:36:12] and let's reset db1080 for instance? [09:37:55] it is not on the master yet [09:38:07] no, i haven't pushed yet, doing it now [09:38:12] done [09:38:19] ok, ok, sorry [09:38:28] restared db1080 [09:38:51] [Note] Slave I/O thread: Start semi-sync replication to master [09:38:54] ^db1080 [09:39:01] yeo, SHOW GLOBAL STATUs like 'rpl%'; on master show stats [09:39:51] do you want help to roll it to all mediawiki slaves? [09:40:01] yeah [09:40:07] s1 you mean? [09:40:11] yes [09:40:14] sure [09:40:16] I will take care of that [09:40:19] we can check others just in case [09:40:35] sure, let me enable it on s1 and then we can check other masters [09:40:37] but I think riccado did a full check not a long time ago [09:40:48] so I do not think it will be a problem [09:40:58] after this, we maybe can take a break [09:41:49] I will check site.pp [09:41:54] ok [09:43:12] labs is replicating from 52 aleady, right? we can move 57 to mixed, at least on config? [09:43:39] labs? [09:43:48] you mean db1069? [09:43:50] if so, yes [09:43:56] old labs, yes [09:44:10] old labs has to be statement [09:44:12] yes, only replicates from db1057 dbstore1001 [09:44:20] new labs has to be*** ROW [09:44:22] the rest has been moved [09:47:04] https://gerrit.wikimedia.org/r/334256 [09:47:11] checking [09:47:40] looks good [09:47:40] don't know if you prefer to have it special (statement) for some time [09:47:52] it will be like that until restart anyway [09:47:55] not really, dbstore1001 doesn't care [09:49:53] the funny thing is that it is easier to do a full datacenter switchover than a master switchover [09:50:00] for dbs [09:50:37] haha yeah [09:50:39] a lot easier [09:50:46] ok, all the slaves restarted (including codfw master) [09:51:10] merging 334256 [09:52:36] maybe we have to update prometheus (it is not yet automatic) [09:53:46] Exec[pt-heartbeat-kill]/returns: executed successfully [09:53:56] -plugin_load = rpl_semi_sync_master=semisync_master.so [09:53:58] etc [09:54:03] :) [09:54:14] we have to disable it on db1057 [09:54:15] i can do that [09:54:21] thanks [09:54:44] done [09:56:55] we will find dozens of ones like: https://gerrit.wikimedia.org/r/334259 [10:00:05] so, break? [10:00:40] yeah [10:00:44] i need to get some breakfast XD [10:01:02] only pending dbstore100 [10:01:09] ^ [10:01:12] 10DBA, 06Operations, 10netops, 13Patch-For-Review: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2971915 (10jcrespo) only pending: * change dbstore1001 to replicate from db1052 [10:01:18] ^ [10:01:19] haha :) [10:01:37] very smooth change - well done! :) [10:01:56] let's get some food [10:02:01] meet at noon or later? [10:02:10] we are supposed to have the meeting at 12, right? [10:02:15] but we can do it at 13 [10:02:21] unless m*rk was going to attend [10:02:31] we can ask [10:02:40] not sure if he is around, though [10:02:43] he mentioned he had a dr appoitment [10:02:53] mark you around? you plan to be at the 12:00 meeting? [10:03:26] if he doesn't respond or doesn't attend, let's do it later [10:03:46] if not, 12 is ok (11 UTC) [10:04:30] sounds good :) [10:04:41] or whatever you prefer [10:04:48] I was just going for a proper break [10:04:58] yeah, I am going to get a shower, eat and all that [10:05:19] see you o/ [10:09:52] I see a couple of m3 slave lag in icinga since 8h, good to ack or it is for realz? [10:10:17] those servers crashed [10:10:30] horrible issue [10:10:40] keep it there for now [10:10:51] no prod impact, but very bad news [10:11:42] sigh, ok thanks [10:13:09] I think this is the first job where I saw mysql repeatedly crashing heh [10:13:25] it wasn't repeatedly [10:13:35] but 3 slaves crashed at the same time [10:14:10] the other is dbstore2001 [10:44:25] we can do the meeting a bit later if you prefer [10:45:25] now manuel will be gone :-) [10:45:56] I would indeed prefer, at least :15 for a quick coffee [10:46:34] ok [10:48:24] sure, :15 is good for me [10:48:34] anytime [10:55:12] congrats for the successful switchover again [10:55:23] and thanks for all of the emergency work, really appreciated [10:55:44] just a quick status update since there a lot of tasks: are there any critical/spof database systems left in c2? [10:56:27] paravoid: I was about to update the ticket, saying that what we have left shouldn't be a blocker for the switch replacement [10:57:38] it is only dbstore, the delayed one, which is 24 hours delayed, so it doesn't really matter if we don't move it today (it is super hard to change its master), and it gets a few more hours delayed [10:57:45] it is still replicating from the old master, which is in c2 [10:58:10] ok! [11:03:48] 10DBA, 06Operations: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2972088 (10Marostegui) a:03Marostegui [11:05:04] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2972091 (10Marostegui) 05Open>03Resolved Should be good for now ``` root@db1015:/srv/sqldata# df -hT /srv Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 1.6T 1.1T 536G 67% /srv ``` [12:14:04] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Cognate DB review - https://phabricator.wikimedia.org/T148988#2972244 (10jcrespo) 05Open>03Resolved a:03jcrespo Yes, no major problem in the current state. [14:49:13] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972713 (10jcrespo) [14:50:11] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972739 (10jcrespo) p:05Triage>03High [14:51:08] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972713 (10jcrespo) I will backup current dataset and recover the last backup into db1048. [14:55:37] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972746 (10Marostegui) From looking at db2012 (uses GTID). When starting slave it always tries to start at the sa... [15:27:51] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972833 (10Marostegui) This is the original crash: ``` 2017-01-26 02:00:22 7fa9cebf6700 InnoDB: FTS Optimize Rem... [16:03:43] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972979 (10Paladox) Looking at the above ^^, upstream have created a new phabricator_search table. So we no longe... [16:16:42] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973018 (10Marostegui) >>! In T156373#2972979, @Paladox wrote: > Looking at the above ^^, upstream have created a... [16:20:34] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973027 (10Paladox) >>! In T156373#2973018, @Marostegui wrote: >>>! In T156373#2972979, @Paladox wrote: >> Lookin... [16:26:21] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973052 (10Paladox) I found this bug report https://jira.mariadb.org/browse/MDEV-11233 that looks very similar to... [16:34:51] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973070 (10Marostegui) >>! In T156373#2973052, @Paladox wrote: > I found this bug report https://jira.mariadb.org... [16:58:43] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973202 (10Marostegui) The alter table worked on 10.1.21. So to recap: 10.0.23 -> works 10.0.28 -> crashes 10.0... [17:39:21] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972713 (10greg) From the description, have the phab statistics crons been disabled? [17:44:47] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973390 (10Paladox) >>! In T156373#2973202, @Marostegui wrote: > The alter table worked on 10.1.21. > > So to re... [18:29:35] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973525 (10Marostegui) >>! In T156373#2973370, @greg wrote: > From the description, have the phab statistics cron... [18:33:51] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973539 (10greg) I don't have the rights, I believe @jcrespo did it (with verification from @mmodell) last time w... [18:35:11] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973540 (10Marostegui) >>! In T156373#2973390, @Paladox wrote: > > Maybe we should find the patch that fixed it... [18:38:41] 10DBA, 06Labs, 10Tool-Labs: Labs users reporting timouts when connecting to labsdb-web.eqiad.wmnet - https://phabricator.wikimedia.org/T156285#2973566 (10Superyetkin) Database connection works but queries end up with the following error. ``` SELECT command denied to user 's51698'@'10.64.37.15' for table 'wb... [18:40:22] 10DBA, 06Labs, 10Tool-Labs: Labs users reporting timouts when connecting to labsdb-web.eqiad.wmnet - https://phabricator.wikimedia.org/T156285#2973589 (10jcrespo) @Superyetkin Can you provide the server and the full query used? [18:49:44] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973641 (10Paladox) @Marostegui hi, i was talking about doing it upstream. [18:55:09] 10DBA, 06Labs, 10Tool-Labs: Labs users reporting timouts when connecting to labsdb-web.eqiad.wmnet - https://phabricator.wikimedia.org/T156285#2973675 (10Superyetkin) mysql_select_db fails with "Access denied for user 's51698'@'%' to database 'trwiki_p'" on [[ http://tools.wmflabs.org/superyetkin/test.php |... [19:05:20] 10DBA, 06Labs, 10Tool-Labs: Labs users reporting timouts when connecting to labsdb-web.eqiad.wmnet - https://phabricator.wikimedia.org/T156285#2973711 (10jcrespo) @Superyetkin trwiki is s2, which is not yet part of the available wikis. right now only enwiki and the 800 s3 wikis are available. s2, s4, s5, s6... [19:19:33] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2972713 (10epriestley) Not sure this is helpful, but the `phabricator_search.search_documentfield` table is just... [19:24:57] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973770 (10jcrespo) @epriestley we are already doing that in parallel. But if it happens to be the... [19:27:30] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973783 (10Paladox) @jcrespo and @Marostegui i found the patch https://github.com/MariaDB/server/co... [19:33:57] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973802 (10Paladox) I got some merge conflicts when cherry picking to the 10.0 branch ~/server$ gi... [19:37:42] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973806 (10epriestley) @jcrespo Ah, sorry, I hadn't actually read the linked bug. As another possi... [19:39:29] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973808 (10Paladox) Actually, it's very easy to fix merge conflicts. [19:40:25] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973809 (10Paladox) remove this ``` -<<<<<<< HEAD - /* One variable length column, word... [19:52:38] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973904 (10jcrespo) As usual, @epriestley, I thank you a lot for the support (thank you so much!),... [20:03:43] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973962 (10jcrespo) https://jira.mariadb.org/browse/MDEV-11918 [20:04:37] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973965 (10Paladox) @jcrespo and @marostegui I've back ported the fix here https://github.com/Maria... [20:12:41] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973994 (10jcrespo) >>! In T156373#2973965, @Paladox wrote: > @jcrespo and @marostegui I've back po... [20:13:07] 10DBA, 06Operations, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2973995 (10Paladox) Ok, your welcome :)