[04:23:34] you just did the replica change? [04:24:45] yep :) [04:45:10] I just realised that the banner they set up on itwiki is wrong [04:45:25] they put 15:00 UTC :) [04:47:50] and 19th march :) [04:59:53] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10jcrespo) [05:10:44] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) This was done read-only start: 05:00:44 read-only stop: 05:01:34 Total read-only time: 50 seconds. [05:16:50] 10DBA, 10Operations: Switchover s2 primary database master db1066 -> db1122 - 17th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230785 (10Marostegui) 05Open→03Resolved [05:16:54] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [05:18:09] 10DBA, 10Operations: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) [05:18:46] 10DBA, 10Operations: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) p:05Triage→03Normal This host has been switchedover and it is not a master anymore, let's give it some days before decommissioning it. [05:37:28] is icinga update fast or did you on purpose run puppet on icinga? [05:37:43] 10DBA, 10Operations, 10Patch-For-Review: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) [05:38:06] what do you mean? [05:38:23] read only check on icinga [05:38:32] it normally take quite some time to switch [05:39:09] I guess it was a race condition, and puppet ran fast on icinga [05:39:14] I didn't run it manually, no [05:39:20] I did run it on the hosts themselves [06:01:15] 10DBA: Failover DB masters in row D - https://phabricator.wikimedia.org/T186188 (10Marostegui) [06:36:45] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [07:14:09] 10.1.42 is scheduled for 2019-10-25 [07:22:21] oh wow, that's far away [07:30:07] yeah, when making a release with a replication regression I'd rather do a minimal bugfix only release and move all other changes to a 10.1.43 [07:31:06] they did that for 10.1.40 [07:32:43] I am going to do a half reimage of backup2001 to see what is the current drive device layout [07:36:46] oh, cool, now not even the reimage works because "Unified Server Configurator does not support console redirection." [07:37:45] how's that possible? [07:37:49] it should have nothing to do, no? [07:38:06] is there something in the BIOS to disable that maybe? [07:38:17] or during boot itself? [07:39:59] 10DBA, 10Operations, 10Patch-For-Review: Decommission db1063.eqiad.wmnet - https://phabricator.wikimedia.org/T232564 (10Marostegui) [07:40:12] I think I may be able to access the virtual console from web [07:40:57] it is a graphical interface, so it doesn't support serial console [07:41:05] but it doesn't let me exit either [07:42:14] not even from the webconsole? as that normally launches a java virtual cnsole [07:42:18] with a virtual keyboard too [07:43:18] yeah, it tried to start an active x plugin [07:43:23] I switched to html5 [07:43:33] but it is a pain it is not text-based [07:43:42] yeah :( [07:45:53] honestly, I don't think this host is fully setup [07:46:10] 10DBA, 10Operations, 10Patch-For-Review: Decommission db1063.eqiad.wmnet - https://phabricator.wikimedia.org/T232564 (10Marostegui) [07:46:11] it seems to have factory configs [07:46:37] yeah, maybe a deep check on-site is needed, along with upgrading all the firmware, bios, etc [07:48:49] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [07:54:58] so the issue is that sda is detected as the raid, and it tries to create a software raid on top of the hardware raid and an ssd disk [07:55:10] hey, I heard you like raids [07:55:12] is sda the SD card? [07:55:22] so I put a raid on top of your raid [07:55:35] We faced an issue with that, I had to disable the SD card [07:55:36] it is raids all the way to the bottom [07:55:46] no, it is the hw raid [07:55:47] look [07:56:05] /dev/sda1 2048 97656831 97654784 46.6G Linux RAID [07:56:26] swap and [07:56:32] /dev/sda3 99610624 257838544895 257738934272 120T Linux RAID [07:57:04] the solution is easy, change sda for sdc on the recipe [07:58:24] oh, interesting [07:58:47] I guess it depends on how it was created [07:58:59] But if you cannot access the raid controller menu,there is not much that can be done indeed [07:59:04] other than playing tricks on the recipe [07:59:50] I think I will be able to set it up once I install the os [07:59:51] 10DBA, 10Operations, 10ops-eqiad: db1074 crashed: Broken BBU - https://phabricator.wikimedia.org/T231638 (10Marostegui) This host original weight was 200 in main traffic and 1 in API. I have only pooled it with weight 50 on main traffic, just to get it to do something. [08:00:05] will you take care of the depools? [08:00:06] you can try to set up the OS manually [08:00:10] yep, I will do that [08:00:16] I can review if you want [08:00:22] I will do it later [08:00:39] The maintenance is at 1pm our time, so I think I am going to take a break now, as I haven't since we started the switchover [08:00:58] ok [08:03:28] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1074 crashed: Broken BBU - https://phabricator.wikimedia.org/T231638 (10Marostegui) The BBU showed up again (usual behaviour with a broken BBU) ` root@db1074:~# hpssacli controller all show status Smart Array P840 in Slot 1 Controller Status: OK... [17:39:17] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Anomie) [18:32:18] 10DBA, 10CheckUser, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review, 10Schema-change: Schema changes for `cu_changes` and `cu_log` table - https://phabricator.wikimedia.org/T233004 (10Anomie) [18:37:44] 10DBA, 10CheckUser, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review, 10Schema-change: Schema changes for `cu_changes` and `cu_log` table - https://phabricator.wikimedia.org/T233004 (10Anomie) [18:43:12] 10DBA, 10CheckUser, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review, 10Schema-change: Schema changes for `cu_changes` and `cu_log` table - https://phabricator.wikimedia.org/T233004 (10Anomie) [20:08:20] 10DBA, 10Operations, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10Jclark-ctr)