[05:02:50] 10DBA: db2064 crashed - https://phabricator.wikimedia.org/T195228#4220093 (10Marostegui) [05:03:11] 10DBA: db2064 crashed - https://phabricator.wikimedia.org/T195228#4220103 (10Marostegui) p:05Triage>03Normal [05:10:40] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2064 crashed - https://phabricator.wikimedia.org/T195228#4220110 (10Marostegui) [05:23:16] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2064 crashed - https://phabricator.wikimedia.org/T195228#4220115 (10Marostegui) a:03Papaul Can you take a look at this server? Maybe power drain it? I am not even able to power it on: ``` hpiLO-> power status=0 status_tag=COMMAND COMPLETED Mon... [05:27:08] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Possibly BBU issues on db1067 - https://phabricator.wikimedia.org/T194852#4220120 (10Marostegui) After the weekend, everything looks fine: ``` root@db1067:~# megacli -AdpBbuCmd -a0 | grep Temper Temperature: 47 C Temperature... [05:38:13] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Possibly BBU issues on db1067 - https://phabricator.wikimedia.org/T194852#4220125 (10Marostegui) After the reboot: ``` root@db1067:~# megacli -AdpBbuCmd -a0 | grep Temper Temperature: 48 C Temperature : OK ``` [05:42:52] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for pmswikisource - https://phabricator.wikimedia.org/T195008#4220132 (10Marostegui) p:05Triage>03Normal Let us know when this is created so we can redact it before handling it over to #cloud-services-team for the views creation. [06:04:22] 10DBA: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) [06:04:25] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#3940857 (10Marostegui) [06:04:27] 10DBA: BBU issues on db1054 (s2 primary master) - https://phabricator.wikimedia.org/T194867#4220148 (10Marostegui) 05Open>03declined I am going to close this as we will not replace the BBU or anything. We will failover the master (T194870) and decommission the host (T186320). This this task can remain on the... [07:47:41] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Patch-For-Review, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4220264 (10Marostegui) [07:47:59] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519#4220265 (10Marostegui) [07:48:07] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299#4220266 (10Marostegui) [09:06:20] so what is the deal with db2064? [09:06:40] died and cannot be powered on [09:06:46] hw then? [09:06:53] most likely [09:06:59] https://phabricator.wikimedia.org/T195228 [09:07:07] that's what I have seen so fafr [09:07:08] far [09:07:15] System Power Fault Detected [09:07:31] yeah, not something new, what is "new" is that "power on" doesn't even work [09:07:33] could be just the power supply or the actually power [09:07:35] so maybe the PS burnt [09:07:49] which is better than fully broken [09:07:53] totally [09:07:56] we'll see [09:08:21] I was worried about a 10.1 crash [09:09:12] maybe servers commit suicide when they realise they are running 10.1 :-) [09:22:26] do you mind lending me db1077 for 30 minutes after the alter finishes? [09:28:22] sure [09:28:28] all yours :( [09:28:31] ups: :) [09:29:26] ping me when you are done with it and I will repool it for you [09:29:28] i will let you know when it is finished [09:29:28] yeah [09:29:36] will ping you when I am done with it [09:29:46] great [11:23:20] jynus: db1077 is all yours - when you are done feel free to merge: https://gerrit.wikimedia.org/r/#/c/434318/ [12:45:12] ok, thanks [14:28:24] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Possibly BBU issues on db1067 - https://phabricator.wikimedia.org/T194852#4220796 (10Marostegui) Still looking fine ``` root@db1067:~# megacli -AdpBbuCmd -a0 | grep Temper Temperature: 47 C Temperature : OK ``` If by tom... [15:35:30] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4220956 (10jcrespo) [15:42:40] marostegui: db1077 is upgraded and repooled back [15:43:42] \o/ [16:06:57] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295#4221007 (10Daimona)