[04:48:07] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Marostegui) Sounds good thanks - I will have this host ready. [05:04:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Marostegui) [05:19:55] 10DBA, 10Operations: db2068 is misbehaving (but is depooled) - https://phabricator.wikimedia.org/T235366 (10Marostegui) Thanks for filling this out. This host had a storage crash sometime ago T180927 and it looks like it had another one: Logs from the 13th. ` description=An Unrecoverable System Error (NMI... [05:27:13] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [05:28:32] 10DBA, 10Operations: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10Marostegui) [05:29:31] 10DBA, 10Operations: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10Marostegui) [05:30:54] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10Marostegui) p:05Triage→03Normal [06:03:15] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10Marostegui) [07:01:55] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2068.codfw.wmnet` - db2068.codfw.wmnet (**FAIL**) - Downtimed host on Icinga - Downt... [07:05:50] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2068.codfw.wmnet - https://phabricator.wikimedia.org/T235399 (10Marostegui) >>! In T235399#5571734, @ops-monitoring-bot wrote: > cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2068.codfw.wmnet` > - db2068.codfw.wmnet (*... [07:14:52] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [07:17:42] 10DBA, 10Operations: db2068 is misbehaving (but is depooled) - https://phabricator.wikimedia.org/T235366 (10Marostegui) 05Open→03Resolved a:03Marostegui Resolving this as the host has been labelled as broken and sent to DC-Ops for decommissioning T235399 [08:00:42] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Marostegui) [08:26:26] jynus: I am going to change db1125:3312 back to eqiad's sanitarium master, can I get another pair of eyes on the coordinates? https://phabricator.wikimedia.org/P9321 [08:29:50] let me see [08:29:55] thanks [08:41:13] yes, it looks ok, and I double checked with heartbeat location [08:41:25] thanks <3 [08:41:41] I will change it then [08:43:18] done [08:43:59] looking good [08:53:17] 10DBA, 10Operations, 10ops-eqiad: db1074 crashed: Broken BBU - https://phabricator.wikimedia.org/T231638 (10Marostegui) 05Open→03Resolved db1125:3312 has been moved under db1074 with the following coordinates (GTID also enabled): ` change master to master_host='db1074.eqiad.wmnet', master_user='repl', ma... [09:13:17] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) s7 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1003 [] db1136 [] db1125 [] db1116 [] db1101 [] db109... [09:13:19] 10Blocked-on-schema-change, 10DBA: Schema change to rename user_newtalk indexes - https://phabricator.wikimedia.org/T234066 (10Marostegui) s7 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1003 [] db1136 [] db1125 [] db1116 [] db1101 [] db1098 [] db1094 [] db1090 [] db1086 []... [09:22:32] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Marostegui) @Jclark-ctr you can proceed and change the PSU now. MySQL has been stopped. [12:43:46] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team (Kanban): Prepare and check storage layer for banwiki - https://phabricator.wikimedia.org/T234770 (10Ladsgroup) >>! In T234770#5550592, @Marostegui wrote: > Let us know when the database is created so we can sanitize it on labs hosts Done now \o/ [12:52:18] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team (Kanban): Prepare and check storage layer for banwiki - https://phabricator.wikimedia.org/T234770 (10Marostegui) a:03Marostegui [13:01:31] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) [13:10:25] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Jclark-ctr) Replaced failed PSU [13:15:12] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Marostegui) Thanks John! The alert recovered: ` Sensor Type(s) Temperature, Power_Supply Status: OK This service is currently in a period of scheduled downtime View Extra Service Notes OK 20... [13:19:52] 10DBA, 10Data-Services, 10Operations, 10cloud-services-team (Kanban): Prepare and check storage layer for banwiki - https://phabricator.wikimedia.org/T234770 (10Marostegui) I have sanitized this wiki, but before adding the grants and creating the `_p` database I am running a check to make sure all the info... [13:23:55] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Jclark-ctr) 05Open→03Resolved [18:35:21] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10wiki_willy) @Jclark-ctr @Marostegui - thanks guys