[05:15:26] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) The failover was done successfully. read-only start: 05:00:50 read-only stop: 05:02:21 Total read-only time: 01:31 minutes [05:16:28] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) [05:48:20] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) 05Open→03Resolved Everything is looking good, so resolving this. [05:48:29] 10DBA, 10Goal: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 (10Marostegui) [05:48:37] 10DBA, 10Operations, 10Patch-For-Review: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [05:48:39] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) [05:49:05] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) [07:16:15] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:16:28] marostegui: when you get some time, what do you think of https://gerrit.wikimedia.org/r/c/operations/puppet/+/525535 ? [07:16:53] godog: I will check later, is that ok? [07:17:01] marostegui: sure no rush, thanks! [07:17:05] thanks! [07:23:06] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:26:21] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:46:14] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: (2019-08-31)rack/setup/install db2131.codfw.wmnet - https://phabricator.wikimedia.org/T229251 (10Marostegui) @RobH @Papaul I have merged: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/526378/ The only changes pending from your side to be able t... [08:05:11] 10DBA, 10Goal, 10Patch-For-Review: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 (10Marostegui) [08:06:19] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) a:03Marostegui [08:21:40] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui) [08:24:30] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui) [13:45:11] 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Papaul) @Marostegui This system crashed again . This time the error is on DIMM A1 see below. ` "Correctable memory error logging disabled for a memory device at location DIMM_A1. Mon 29 Jul 2019... [13:47:25] 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Marostegui) @Papaul interesting, that crash didn't make MySQL or the host to frozen this time. Good catch! It did kill other processes: ` [Tue Jul 30 00:47:38 2019] mce: Uncorrected hardware memory...