[05:27:36] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Marostegui) s1 finished checking - all good. Going to repool this host now. [05:46:51] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Marostegui) 05Open>03Resolved Server repooled [05:51:07] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) db2042 keeps recharging. Even if the BBU fails eventually, I will leave it as it is and not replace the BBU (we only have 1, which is the one from db2064). The reason I wouldn't use the BBU... [05:56:36] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [05:56:41] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [05:56:44] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) [06:00:56] 10DBA, 10Core-Platform-Team, 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 (10Marostegui) [06:02:55] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 (10Marostegui) [06:35:53] 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10Marostegui) Backups for nova_api_eqiad1 and nova_eqiad1 were generated correctly nova_api_eqiad1 -rw-r--r-- 1 dump dump 492 Aug 28 18:05 nova_api_eqiad1.build_requests-s... [06:47:31] 10DBA, 10cloud-services-team: cloudvps: dedicated openstack database - https://phabricator.wikimedia.org/T202889 (10Marostegui) p:05Triage>03Normal My thoughts on this: I agree that long term, ideally, we shouldn't share this service with wikitech, as they can affect each other in case we have issues (most... [07:36:54] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) 05Resolved>03Open We can reboot it again- it worked last time- at least as a short term measure. [07:37:34] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) It is still recharging, it has not failed yet [07:42:16] 10DBA, 10cloud-services-team: cloudvps: dedicated openstack database - https://phabricator.wikimedia.org/T202889 (10jcrespo) Notice {T167973} [07:46:48] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) This is the HW log from the first time the battery failed (16th Aug) ``` description=POST Error: 1705-Slot X Drive Array - Please replace Cache Module Super-Cap. Caching will be enabled... [07:50:20] 10DBA, 10cloud-services-team: cloudvps: dedicated openstack database - https://phabricator.wikimedia.org/T202889 (10Marostegui) >>! In T202889#4541255, @jcrespo wrote: > Notice {T167973} Good catch - forgot about that task That would pretty much unblock this task and would leave m5 pretty much for openstack... [07:58:09] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) After the reboot it has finally marked itself as failed: ``` date=08/29/2018 time=07:54 description=POST Error: 1705-Slot X Drive Array - Please replace Cache Module Super-Cap.... [08:04:49] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) [08:17:35] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) I have forced the BBU to be WriteBack to let the server catch up: ``` root@db2042:~# hpssacli controller all show detail | grep "Drive Write Cache" Drive Write Cache: Disabled root@db... [08:54:14] 10DBA, 10Patch-For-Review: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) Backups last week https://phabricator.wikimedia.org/T198987#4525868 vs this week (not all have finished yet): ``` root@db1115.eqiad.wmnet[zarcillo]> SELECT name, section, TIME... [10:06:10] 10DBA, 10Operations: Investigate slow servermon updating queries on db1016 - https://phabricator.wikimedia.org/T165674 (10Marostegui) Any objections to decline this ticket as per volans comment above? (T165674#4449561) [10:09:58] 10DBA, 10Operations: Investigate slow servermon updating queries on db1016 - https://phabricator.wikimedia.org/T165674 (10jcrespo) 05Open>03declined [10:59:19] hoo: do you still want to get alerted by dba contact groups? [10:59:36] no problem on our side, just asking in case those could get anoying [11:16:32] Yes, I'm not annoyed yet :) [11:16:45] cool then [11:17:01] we may be adding some dba-only alerts [11:17:04] soon [12:22:07] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Performance: [Task] Use an own load section for Wikibase DB queries - https://phabricator.wikimedia.org/T97414 (10hoo) 05Open>03declined I think this grew from some discussion back then, but given i... [16:41:29] 10DBA, 10Cloud-VPS: VPS puppet enc 'prefix' field size too small - https://phabricator.wikimedia.org/T203104 (10Andrew) [16:44:22] 10DBA, 10Cloud-VPS, 10Patch-For-Review: VPS puppet enc 'prefix' field size too small - https://phabricator.wikimedia.org/T203104 (10Andrew) (I altered it to 127 in the meantime so that @Krenair can get on with his work) [16:48:19] 10DBA, 10Cloud-VPS, 10Patch-For-Review: VPS puppet enc 'prefix' field size too small - https://phabricator.wikimedia.org/T203104 (10Krenair) Thanks! [16:50:55] 10DBA, 10Operations, 10ops-codfw: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) I have forced db2042 to be WB again as it was lagging too much behind: ``` 16:45 < icinga-wm> PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag:... [17:05:34] 10DBA, 10Cloud-VPS, 10Patch-For-Review: VPS puppet enc 'prefix' field size too small - https://phabricator.wikimedia.org/T203104 (10bd808) The "ERROR 1709 (HY000): Index column size too large." error means that you are hitting the InnoDB table type maximum index length due to the encoding (utf8mb4 in this ca... [17:11:32] 10DBA, 10Core-Platform-Team, 10Structured-Data-Commons, 10Wikidata, and 4 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF) [17:11:53] 10DBA, 10Core-Platform-Team, 10Structured-Data-Commons, 10Wikidata, and 4 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF) [17:12:24] 10DBA, 10Core-Platform-Team, 10Structured-Data-Commons, 10Wikidata, and 4 others: Deploy MCR storage layer - https://phabricator.wikimedia.org/T174044 (10CCicalese_WMF) [20:34:30] 10DBA, 10Core-Platform-Team, 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 (10Jdforrester-WMF) [21:41:24] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) Marking as ""under discussion" on the RFC board for now. One thing that I believe would move this...