[06:57:56] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3881935 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1063.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201801220657_maroste... [08:43:23] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3916405 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1063.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['db1063.eqiad.wmnet'] ``` [08:43:58] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3916408 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1063.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201801220843_maroste... [08:44:39] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3916409 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1063.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['db1063.eqiad.wmnet'] ``` [08:44:57] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3916410 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1063.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201801220844_maroste... [09:04:29] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3916481 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1063.eqiad.wmnet'] ``` and were **ALL** successful. [12:05:52] 10DBA, 10Patch-For-Review: Replace codfw x1 master (db2033) (WAS: Failed BBU on db2033 (x1 master)) - https://phabricator.wikimedia.org/T184888#3916959 (10Marostegui) db2034 is now replicating directly from eqiad master it has no slaves hanging from it (that will be done once we do the failover) [12:07:01] <3 [12:31:20] 10DBA, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3917020 (10Marostegui) db1030 is no longer serving vslow in s6. db1063 is now serving vslow there, let's leave it running for a week before proceeding and starting db1030 decommissioning process. [13:48:26] 10DBA, 10Operations, 10ops-eqiad: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3917284 (10Marostegui) This failed again - I have forced a relearn [16:33:33] s3 full compare.py finished with no differences and no server crashes (that I can see) [16:33:45] nice [16:33:52] maybe it was just bad luck [16:36:02] “There is no such thing as luck. There is only adequate or inadequate preparation to cope with a statistical universe.” [16:36:20] xdddddddddddddddddddd [17:48:06] 10DBA, 10Operations, 10ops-eqiad: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3917896 (10Marostegui) After the relearn: ``` ˜/icinga-wm 18:42> RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy ``` Not sure it will last like that f...