[01:57:48] 10DBA, 10CirrusSearch, 10Discovery-Search, 10MediaWiki-Categories: Special:RandomInCategory does not return all pages with equal probability - https://phabricator.wikimedia.org/T200703 (10Bawolff) >>! In T200703#5422151, @Marostegui wrote: > It would be helpful if you guys can come with some specific examp... [05:25:59] 10DBA, 10Patch-For-Review: Replace db2044 (m2 codfw master) with db2067 - https://phabricator.wikimedia.org/T230705 (10Marostegui) db2067 is the new m2 codfw master: ` ./replication_tree.py db2067.codfw.wmnet db2067, version: 10.1.39, up: 18h, RO: ON, binlog: MIXED, lag: 0, processes: 16, latency: 0.0429 + db2... [05:26:14] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [05:26:58] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [05:39:16] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [05:47:28] 10DBA, 10Operations: Decommission db2044.codfw.wmnet - https://phabricator.wikimedia.org/T230761 (10Marostegui) [05:56:56] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [06:00:56] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) @Cmjohnson this host is now OFF, so you can act on it whenever you get to the DC. Thanks! [06:07:53] 10DBA, 10Operations: Switchover s8 (wikidata) primary database master db1104 -> db1109 - https://phabricator.wikimedia.org/T230762 (10Marostegui) [06:08:03] jynus: let me know if this day/time would work for you: https://phabricator.wikimedia.org/T230762 [06:08:14] 10DBA, 10Operations: Switchover s8 (wikidata) primary database master db1104 -> db1109 - https://phabricator.wikimedia.org/T230762 (10Marostegui) p:05Triage→03Normal [06:09:16] If it works, I will remove the "tentative" part and request a read-only window [08:35:44] 10DBA, 10Goal, 10Patch-For-Review: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 (10Marostegui) [08:36:05] 10DBA, 10Goal, 10Patch-For-Review: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 (10Marostegui) s5 codfw master has been swapped from db2052 to db2123 [09:19:04] 10DBA, 10Operations: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) [09:19:36] 10DBA, 10Operations: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) [09:20:15] 10DBA, 10Operations: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) p:05Triage→03Normal [09:21:03] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [09:21:26] 10DBA, 10Operations: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) p:05Triage→03Normal [09:40:02] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) [09:40:10] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) [09:43:17] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) [09:44:47] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) [09:53:12] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) [09:53:23] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) [10:03:54] T230762 ok [10:03:54] T230762: Switchover s8 (wikidata) primary database master db1104 -> db1109 - https://phabricator.wikimedia.org/T230762 [10:04:10] jynus: thanks! [10:04:30] 10DBA, 10Operations: Switchover s8 (wikidata) primary database master db1104 -> db1109 - https://phabricator.wikimedia.org/T230762 (10Marostegui) [10:08:09] jynus: We also need to failover 3 more masters, so I am proposing these other dates: s2: 17th Sept, s3: 24th Sept, s4: 26th Sept, all at 05:00 UTC, would that work for you? (If so I will create the tickets and ask for all the read-only on the same ticket to the liasons team) [10:10:58] ok [10:11:15] thanks, I will get all the calendar, invites and all that [10:11:36] 10DBA, 10Operations: Switchover s8 (wikidata) primary database master db1104 -> db1109 - 10th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230762 (10Marostegui) [10:13:58] 10DBA, 10Operations: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) [10:14:11] 10DBA, 10Operations: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) p:05Triage→03Normal [10:15:57] 10DBA, 10Operations: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 (10Marostegui) [10:16:08] 10DBA, 10Operations: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 (10Marostegui) p:05Triage→03Normal [10:21:22] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) [10:21:34] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) p:05Triage→03Normal [10:21:55] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [10:21:58] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) [10:36:20] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) [10:36:22] 10DBA, 10Operations: Switchover s4 (commonswiki) primary database master db1081 -> db1138 - 26th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230784 (10Marostegui) [10:36:28] 10DBA, 10Operations: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) [10:36:31] 10DBA, 10Operations: Switchover s8 (wikidata) primary database master db1104 -> db1109 - 10th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230762 (10Marostegui) [10:51:51] 10DBA, 10Operations: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) [10:51:57] 10DBA, 10Operations: Decommission db2051.codfw.wmnet - https://phabricator.wikimedia.org/T230778 (10Marostegui) [10:54:26] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [10:55:04] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [14:29:10] db2102 currently runs a backport kernel (which did this to debug some hw issue and forgot to revert back), it uses the core_test role. is there anything to be considered for rebooting it? I'd like to revert it to a proper stretch 4.9 kernel [14:29:47] let me check it [14:30:32] moritzm: just downtime it, stop mysql and you should be good to go [14:31:05] ack, thanks, will do that tomorrow morning, then [14:31:09] cool! [14:31:10] thanks [14:46:20] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Cmjohnson) Swapped the DIMM B3 with A3 and B7 with A7. Powered on and cleared log. Let's see if the errors return or change, [14:50:27] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T230682 (10Cmjohnson) @Marostegui I had a used disk on-site and replace it....it's currently in rebuild Device Firmware Level: ES66 Firmware state: Rebuild [15:28:57] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) Thank you Chris! I have started MySQL, let's wait a few days before closing this, and if it happens again we can re-open!