[06:57:56] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2984832 (10Marostegui) Thanks @jcrespo and @Papaul. The server looked good yesterday night when I checked it :-) [07:28:35] 10DBA, 10Analytics: Json_extract available on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T156681#2983518 (10Marostegui) `research` user doesn't appear to have EXECUTE privileges. Was that working before or is it the first time you've tried to play with that function? Thanks! [08:22:18] 10DBA, 06Operations, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2984963 (10Marostegui) Yesterday I recloned db1072 but it has not been able to catch up with the master whereas db1073 (the server it was recloned from) had no problems. In order to test if the stora... [08:32:23] 10DBA, 06Operations, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2984980 (10Marostegui) Looks like forcing it to be WriteBack works (but it is bad anyways if the BBU is really broken): ``` root@db1072:~# megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll Set Wr... [09:29:11] 10DBA, 10Analytics: Drop m3 from dbstore servers - https://phabricator.wikimedia.org/T156758#2985275 (10Marostegui) [13:50:54] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#1979211 (10Marostegui) Hi Is this ticket a consequence of: https://phabricator.wikimedia.org/T87661? So removing the autoincrement... [14:32:08] I will want to restart tendril at some point, when it is not a blocker [14:32:19] you can go ahead now if you like [14:32:38] I am a bit worried if things go badly [14:32:46] extended downtime, etc. [14:32:54] but it has to be done at some point [14:33:29] what can go wrong? as in: what can be the cause of a big downtime? [14:33:53] I think I have neved done it before [14:34:08] so here be dragons [14:34:11] haha [14:34:59] when you say tendril you say the db1011 or all the stuff around tendril? ie: webserver.. [14:36:46] the db [14:36:59] the webserver, we moved it several times [14:37:26] and was puppetized (it didn't use to) [14:39:00] well, we can always: stop db1011, copy the content somewhere else and then restart the host, if you feel safer that way? [14:39:13] uptime [14:39:13] 14:39:08 up 552 days, [14:39:15] no [14:39:18] nice [14:39:25] I am not worried about the content [14:39:47] I am worried about special unpuppetized stuff we may not know about [14:40:12] which is only hypothetical [14:40:23] on the other side [14:40:30] mysql keeps crashing all the time [14:40:38] so probably not worrying? [14:40:47] yeah, it crashes almost every night XD [15:02:59] db1095 went down [15:03:21] WAT? [15:03:42] can't connect to it [15:04:12] it got rebooted [15:04:16] it is booting up now :| [15:04:30] it is back up [15:04:36] I logged it [15:04:41] please see SAL [15:04:44] and I acked it [15:04:44] oh sorry [15:04:47] sorry [15:04:48] lol [15:04:52] and you just +1 [15:04:58] a code review that said [15:05:06] I did, but I wasn't aware that you just rebooted it [15:05:07] "this is in preparation for db1095 reboot" [15:05:14] and I executed a script right when it went down XD [15:05:17] that is why i noticed [15:05:18] sorry [15:05:47] it took me 10 minutes from the SAL till I restarted [15:06:03] so not like I did it well in advance [15:39:54] 10DBA, 06Operations: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2986612 (10jcrespo) a:03jcrespo [15:43:20] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986648 (10Ottomata) Hah, hilarious. Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then... [15:45:46] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986659 (10jcrespo) > and use normal MySQL replication Which I'd love, but this application refuses to be compatible with, and I go... [15:46:56] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986686 (10Nuria) @Marostegui : +1 to what @Ottomata said The prior incarnation of the system (which has been heavily upgraded now)... [15:48:07] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986694 (10Marostegui) >>! In T125135#2986648, @Ottomata wrote: > Hah, hilarious. Ok, IF all of the current tables that eventloggin... [15:57:21] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986724 (10jcrespo) > This is key, is that somehow doable from the application side? Note analytics has refused to acknowledge/decl... [16:25:47] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2986872 (10Ottomata) > Note analytics has refused to acknowledge/decline this problem several times Not sure this is an accurate rep... [16:52:43] 10DBA, 06Operations: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2987013 (10jcrespo) 05stalled>03Resolved a:05jcrespo>03Marostegui I chhanged the master of dbstore1001. Resolving now, but let's monitor dbstore1001 to make sure nothing broke (because its delayed rep... [16:52:48] 10DBA, 06Labs, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987016 (10jcrespo) [16:56:21] 10DBA, 06Labs, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987020 (10jcrespo) [16:58:58] 10DBA, 10MediaWiki-Change-tagging, 06Operations: db1072 change_tag schema and dataset is not consistent - https://phabricator.wikimedia.org/T156166#2987023 (10jcrespo) 05Open>03Resolved a:03Marostegui This is resolved, leaving T156226 open for pending issues, non-related. ``` SELECT ct_rc_id, ct_tag... [16:59:03] 10DBA, 06Labs, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2987028 (10jcrespo) [19:38:08] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2987577 (10Ottomata) > This is key, is that somehow doable from the application side? It would help to reduce this snowflake we have... [19:39:25] 10DBA, 10Analytics, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2987580 (10Ottomata) [19:39:28] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2987579 (10Ottomata) [19:40:43] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#1979211 (10Ottomata) >> Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then yeah, no acti... [19:44:33] 10DBA, 10Analytics, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2987587 (10Ottomata) @Marostegui ok! So the T125135 auto-increment thing is a very small piece of this larger issue. Let's see if we can hammer out a way to use regular MySQL replication... [20:01:23] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Ottomata) [20:02:21] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987634 (10Ottomata) [20:02:42] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Ottomata) a:05jcrespo>03None [21:59:12] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2987956 (10yuvipanda) Update: the replica script doesn't actually work... [23:00:10] 10DBA, 06Labs, 10Labs-Infrastructure: Design a method for keeping user-created tables in sync across labsDBs - https://phabricator.wikimedia.org/T156869#2988296 (10Halfak)