[07:32:57] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure: Rebuild old timestamp format tables - https://phabricator.wikimedia.org/T151607#3032303 (10Marostegui) a:05Marostegui>03None Unassigning as there is nothing we can really do now as we are not in 10.1 in general. [07:33:51] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3032318 (10Marostegui) Thanks @Papaul! I will get that ready! [07:54:48] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: db2060 not accessible - https://phabricator.wikimedia.org/T156161#3032346 (10Marostegui) I have repooled the server. [08:11:40] thanks for working on https://gerrit.wikimedia.org/r/#/c/337018, that role is currently applied to neodymium and sarin and these have mariadb-client, mariadb-client-10.0, mariadb-client-core-10.0 and mariabd-common installed (the packages shipped by Debian). can these be removed now? [08:11:54] apt-cache rdepends shows no other rev deps, so seems safe to me [08:13:11] also a few of the older DB servers have mariadb-client-5.5 installed: https://servermon.wikimedia.org/packages/mariadb-client-5.5 [08:13:45] probably not worth spending on though, these will shake out over time with jessie reimages [08:15:02] and one last thing: The phabricator role pulls in mariadb-client, if wmf-mariadb-client will be available for trusty this can be switched as well [08:15:59] (it doesn't use the mariadb::client role, it has a manual require_package() on mariadb-client [08:21:34] and one more thing; we have quite a few places across our puppet repo where mysql-client is pulled in, shall these also be migrated to role::mariadb::client? [08:31:37] 10DBA, 06Operations, 10ops-eqiad: Replace BBU for db1060 - https://phabricator.wikimedia.org/T158194#3032388 (10Marostegui) @Cmjohnson were you able to find a replacement BBU in the end? Thanks! [08:50:54] 10DBA, 06Operations, 13Patch-For-Review: db1082 MySQL crashed - https://phabricator.wikimedia.org/T158188#3032411 (10Marostegui) [08:50:57] 10DBA, 06Operations, 13Patch-For-Review: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#3032410 (10Marostegui) [08:51:17] 10DBA, 06Operations, 13Patch-For-Review: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2633433 (10Marostegui) I have added the subtask of the last crash of this server, so we can have some tracking as it's been twice already. [08:52:19] 10DBA, 06Operations, 13Patch-For-Review: db1082 MySQL crashed - https://phabricator.wikimedia.org/T158188#3029269 (10Marostegui) I will close this ticket after restoring the original weight for this server. Also added a parent task, which is the first crash this server had back in September (T145533). It wi... [08:52:33] 10DBA, 06Operations, 13Patch-For-Review: db1082 MySQL crashed - https://phabricator.wikimedia.org/T158188#3032416 (10Marostegui) 05Open>03Resolved a:03Marostegui [08:52:36] 10DBA, 06Operations, 13Patch-For-Review: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2633433 (10Marostegui) [08:59:42] 07Blocked-on-schema-change, 10DBA, 05MW-1.28-release-notes, 13Patch-For-Review: Clean up revision UNIQUE indexes - https://phabricator.wikimedia.org/T142725#3032424 (10Marostegui) [08:59:45] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#3032422 (10Marostegui) 05Open>03Resolved Going to mark this as resolved, if we want to alter it once we have done the switchover we can do it. But the risk isn't worth at all if it is going... [09:19:42] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3032453 (10Marostegui) dbstore1001 struggled to execute the pt-table-checksum query: ``` REPLACE INTO `phabricator_file`.`__wmf_checksums` (db, tbl, chunk, chunk_index, lower_boundary, upp... [09:42:25] 10DBA: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3032477 (10Marostegui) Given this on dbstore1001: `0 1 * * 3 /usr/local/bin/dumps-misc.sh >/srv/dumps-misc.log 2>&1` means that it might have collided with a backup being taken at the same... [09:52:06] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3032508 (10Marostegui) @Papaul please review the DNS changes: https://gerrit.wikimedia.org/r/#/c/338087/ [10:40:22] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3032574 (10Marostegui) [10:40:25] 10DBA, 06Operations, 10hardware-requests, 10ops-eqiad, 13Patch-For-Review: db1019: Decommission - https://phabricator.wikimedia.org/T146265#3032572 (10Marostegui) 05Open>03Resolved I believe this is done [10:46:04] 10DBA: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3032575 (10Marostegui) @jcrespo after lots of thought on it, I think we should try to go for multi instance rather than multi source for the following reasons: - we can have GTID enabled (which is something I am not for... [12:36:56] 10DBA: db1045 disk space - compression needed - https://phabricator.wikimedia.org/T155399#3032805 (10jcrespo) a:05Marostegui>03jcrespo [13:28:15] gtid_domain_id is interesting also for the sanitarium2 case [13:28:42] in which sense? [13:29:08] "Suppose we have two different masters M1 and M2, and we are using multi-source replication to have S1 as a slave of both M1 and M2. S1 will apply events received from M1 in parallel with events received from M2. If we now have a third-level slave S2 that replicates from S1 as master, we want S2 to also be able to apply events that originated on M1 in parallel with events that originated on M2." [13:29:08] M1: MediaWiki Userpage - https://phabricator.wikimedia.org/M1 [13:29:08] M2: Confirm MediaWiki Account Link - https://phabricator.wikimedia.org/M2 [13:29:24] lol stashbot [13:29:49] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3032918 (10Marostegui) @Papaul db2070 off, mediawiki files changed with its new IP. If you review the DNS patch I will push it too. [13:31:08] Ah that is an interesting feature, if it works fine :) [13:31:23] which with GTID and multisource I am a bit skeptical [13:31:24] my point being [13:31:34] we should deploy it anyway [13:31:52] without waiting for compatibility for multi-source [13:32:19] the gtid_domain_id, in theory shouldn't cause issues if it is not with multisource, my only concern would be when there is a switchover [13:32:32] we tested it and worked fine, but I would like to test it again [13:32:41] just to be sure (again) [13:33:02] going for lunch [16:11:23] fiew... https://grafana.wikimedia.org/dashboard/file/server-board.json?var-dc=eqiad%20prometheus%2Fops&var-server=db1045&panelId=17&fullscreen&from=1487166120465&to=now [16:15:07] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033362 (10Papaul) [16:26:05] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033445 (10Marostegui) Hey @RobH To clarify things, db2070 has been moved from row D to row C (as @Papaul updated on the original task description). Thanks for helping out! [16:38:01] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033481 (10RobH) [16:38:26] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3031005 (10RobH) [16:38:40] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3033483 (10Marostegui) db2070: - DNS updated - network/interfaces changed - mediawiki files changed - MySQL up and replication up Pending: port configuration Once the... [16:39:24] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3031005 (10RobH) Ok, the new port is setup in row c. Please assign this back to me once db2070 is moved! [16:40:32] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3033489 (10Marostegui) Oh, I saw that @RobH already changed the port and the server is replicating fine! :) [16:41:01] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033504 (10Marostegui) a:05Papaul>03RobH The server has been already moved to row C [16:41:33] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3033509 (10Marostegui) a:03Marostegui Claiming this task to do the last checks, repool the server etc before closing it. [16:42:49] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033512 (10RobH) >>! In T158246#3033504, @Marostegui wrote: > The server has been already moved to row C When? I just setup (as in when I put in my comment) that the port wasn't allocated or enabled,... [16:44:55] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3033543 (10RobH) [16:44:58] 10DBA, 06Operations, 10ops-codfw: codfw: switch ports clean up - https://phabricator.wikimedia.org/T158246#3033541 (10RobH) 05Open>03Resolved [19:08:08] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3034156 (10Umherirrender) I can not say anything about the task, my comment was just a info, that the missing index is not... [19:11:21] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3034163 (10jcrespo) p:05Normal>03Low Based on the comments, I am going to lower the priority of this, without trying t...