[05:33:20] <wikibugs>	 DBA: db2034 crash - https://phabricator.wikimedia.org/T137084#2356666 (RobH)
[05:36:48] <wikibugs>	 DBA: db2034 crash - https://phabricator.wikimedia.org/T137084#2356683 (RobH) I've rebooted the host in an attempt to return it back online.   This should be flagged into notes for the host history (we don't really have a good way to do that now.)  For now I'm setting it to high priority and assigned to @jcre...
[05:39:30] <wikibugs>	 DBA: db2034 crash - https://phabricator.wikimedia.org/T137084#2356685 (RobH) P3211 has the ilom log
[05:44:02] <wikibugs>	 DBA: db2034 crash - https://phabricator.wikimedia.org/T137084#2356686 (RobH) mysql isn't online, but im not sure if its as simple as just manually starting it, or if it has to be manually checked/synced.  Since db2034 crashed and wasn't cleanly shut down, I don't want to assume I should just restart the db/m...
[06:26:14] <wikibugs>	 DBA: db2034 crash - https://phabricator.wikimedia.org/T137084#2356666 (jcrespo) p:Triage>High
[06:30:49] <wikibugs>	 DBA, Operations, ops-codfw: db2034 crash - https://phabricator.wikimedia.org/T137084#2356708 (jcrespo) It seems there was a RAID controller failure:  > A controller failure event occurred prior to this power-up  We had similar issues on T130702. We may need a general upgrade of all machines with simi...
[06:39:13] <wikibugs>	 DBA, Operations, ops-codfw: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2356722 (jcrespo) a:jcrespo>Papaul This host crashed today: T137084 due to a RAID controller failure. Are we still sure this was safe? Papaul, could you please follow up with support?
[06:41:21] <wikibugs>	 DBA, Operations, ops-codfw: db2034 crash - https://phabricator.wikimedia.org/T137084#2356726 (jcrespo) This host being down was creating log noise due to health checks (no users affected):  https://logstash.wikimedia.org/#dashboard/temp/AVUkao15_LTxu7wl9U3S
[08:20:44] <wikibugs>	 DBA, Performance-Team, Availability, Epic, Patch-For-Review: MASTER_POS_WAIT() alternative that works cross-DC - https://phabricator.wikimedia.org/T135027#2356817 (jcrespo)
[08:20:46] <wikibugs>	 DBA: Change dbstore1001 delayed slave to be a direct slave of the eqiad masters - https://phabricator.wikimedia.org/T133386#2356818 (jcrespo)
[08:20:48] <wikibugs>	 DBA, Operations, Epic: Eliminate SPOF at the main database infrastructure - https://phabricator.wikimedia.org/T119626#2356819 (jcrespo)
[08:20:52] <wikibugs>	 DBA, MediaWiki-Database, Operations, Performance: Implement GTID replication on MariaDB 10 servers - https://phabricator.wikimedia.org/T133385#2356814 (jcrespo) Open>Resolved a:jcrespo GTID rolled in on all production coredb servers. Resolving now, although it will still be applied to...
[08:20:54] <wikibugs>	 DBA, Availability: Look into Maria 10 parallel-replication - https://phabricator.wikimedia.org/T85266#2356820 (jcrespo)
[08:31:58] <wikibugs>	 DBA, Labs: Wrong page title in labs database replica enwiki page table - https://phabricator.wikimedia.org/T136618#2341449 (jcrespo) After seeing many cases like this, I can conclude that replication to labs breaks whenever there is a page move, an archival or an undeletion. I have not yet clear why, but...
[08:33:54] <wikibugs>	 DBA, Labs: Wrong page title in labs database replica enwiki page table - https://phabricator.wikimedia.org/T136618#2356836 (jcrespo) Of course, I can fix individual reports, although we should setup a more convenient way than a ticket per row with problems.
[09:15:21] <wikibugs>	 DBA, Operations, ops-codfw: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2356875 (jcrespo) Tuesday, whenever you start working and are available (my afternoon)?
[09:22:29] <wikibugs>	 DBA, Analytics: dbstore1002 crashed - https://phabricator.wikimedia.org/T136333#2356887 (jcrespo) Open>Resolved a:jcrespo
[09:24:16] <wikibugs>	 DBA: db1034 was killed 22-05-16 at 14:17:06 - https://phabricator.wikimedia.org/T135944#2356891 (jcrespo) Open>Resolved a:jcrespo
[09:39:31] <wikibugs>	 DBA: Identical EventLogging queries give different results on db1047 and dbstore1002 - https://phabricator.wikimedia.org/T131236#2356947 (jcrespo) Open>Resolved ``` MariaDB  db1047 log > SELECT COUNT(*) AS events FROM log.NavigationTiming_14899847 WHERE timestamp like '20151203%'; +--------+ | events...
[12:08:17] <wikibugs>	 Blocked-on-schema-change, DBA, Notifications: Temporary index for Echo backfillReadBundles.php? - https://phabricator.wikimedia.org/T137100#2357217 (Catrope)
[13:09:33] <wikibugs>	 Blocked-on-schema-change, DBA, Notifications: Temporary index for Echo backfillReadBundles.php? - https://phabricator.wikimedia.org/T137100#2357379 (jcrespo) > Is adding a temporary index for this kind of thing recommended / a thing we do?  Is something we can do indeed, but please combine it with th...
[13:11:07] <wikibugs>	 DBA, Notifications, Schema-change: Temporary index for Echo backfillReadBundles.php? - https://phabricator.wikimedia.org/T137100#2357382 (jcrespo) (this is not yet a proper DBA request -it is in planning phase-, please create a proposal of several actions to do and re-add the tag when ready)
[15:14:46] <wikibugs>	 DBA, Notifications, Schema-change: Temporary index for Echo backfillReadBundles.php? - https://phabricator.wikimedia.org/T137100#2357613 (Catrope) >>! In T137100#2357379, @jcrespo wrote: >> Is adding a temporary index for this kind of thing recommended / a thing we do? >  > So, please try to minimize...
[15:17:16] <wikibugs>	 DBA, Notifications, Schema-change: Temporary index for Echo backfillReadBundles.php? - https://phabricator.wikimedia.org/T137100#2357623 (jcrespo) > I'll line up all the things we talked about in the previous task, but unfortunately we can't fix the over-indexing until after this migration is complet...