[03:08:10] DBA, Wikimedia-Site-requests, Tracking: Database table cleanup (tracking) - https://phabricator.wikimedia.org/T18660#2460541 (Quiddity) [07:28:35] DBA, MediaWiki-API, Performance: plwiki API request is excessively slow when including a badrevid - https://phabricator.wikimedia.org/T140302#2459623 (jcrespo) I think I have nothing to do here, you all solved it before I could have a look at it. Now, I am unsure about the perfect solution- ids are... [08:01:24] DBA, MediaWiki-API, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-12_(1.28.0-wmf.10): ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2461156 (jcrespo) No errors last night (was something deployed or did something changed on... [08:16:31] DBA, Operations, ops-eqiad: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461162 (jcrespo) [08:25:59] DBA, Operations, ops-eqiad: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461192 (jcrespo) It is not the disk, I am going to rebuild it into the RAID. [08:42:13] DBA, MediaWiki-API, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-12_(1.28.0-wmf.10): ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2453162 (Legoktm) >>! In T140108#2461156, @jcrespo wrote: > No errors last night (was somet... [08:42:35] jynus: ^ [08:43:00] good [08:43:36] that means we need to add those indexes [09:01:37] DBA, Operations, ops-eqiad: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461285 (jcrespo) On rebuild I am getting more and more media/other/predictive errors, I think the drive should still be replaced, but @Cmjohnson has the last word on this. I will deal with th... [09:01:55] DBA, Operations, ops-eqiad: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2461287 (jcrespo) [09:05:11] DBA: dbstore1002 lag on s7 - https://phabricator.wikimedia.org/T140341#2461305 (jcrespo) [09:12:03] DBA: dbstore1002 lag on s7 - https://phabricator.wikimedia.org/T140341#2461357 (jcrespo) p:Triage>Low There seems to be a large amount of `WikiPage::updateCategoryCounts` on s7, and dbstore1002 is its slowest slave. [09:29:42] DBA, MediaWiki-API, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-12_(1.28.0-wmf.10): ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2461372 (jcrespo) The difference can be seen at: https://logstash.wikimedia.org/#dashboard/... [09:30:04] DBA, MediaWiki-API, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-12_(1.28.0-wmf.10): ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2461373 (jcrespo) p:High>Normal [09:33:13] DBA, MediaWiki-API, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-12_(1.28.0-wmf.10): ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2461376 (jcrespo) I would add a third: * document properly all `FORCE/USE/IGNORE INDEX`ES... [09:35:14] DBA, Operations, ops-eqiad: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2461377 (jcrespo) I think I killed the drive for good: ``` Rebuild Progress on Device at Enclosure 32, Slot 6 Completed 0% in 38 Minutes. Media Error Count: 777 Other Error Count: 2313 Predictive Failure... [09:41:02] DBA: dbstore1002 lag on s7 - https://phabricator.wikimedia.org/T140341#2461387 (jcrespo) It seems that converting frwiktionary category to TokuDB did the trick (because a problem with the table, probably, not thanks to InnoDB specifically). [09:49:12] DBA: dbstore1002 lag on s7 - https://phabricator.wikimedia.org/T140341#2461394 (jcrespo) Open>Resolved a:jcrespo [10:46:23] DBA, Collaboration-Team-Interested, MediaWiki-extensions-CentralAuth, Notifications, and 2 others: CentralAuthCreateLocalAccountJob failing on meta due to Echo deadlocks - https://phabricator.wikimedia.org/T121161#2461528 (jcrespo) deadlocks are caused by inter-locks between InnoDB transactions,... [12:12:52] DBA, Beta-Cluster-Infrastructure, Continuous-Integration-Infrastructure, MediaWiki-Database, WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2461718 (hashar) [12:12:55] DBA, Beta-Cluster-Infrastructure, Patch-For-Review, WorkType-NewFunctionality: Send deployment-db1 deployment-db2 syslog to beta cluster logstash - https://phabricator.wikimedia.org/T119370#2461716 (hashar) Open>Resolved Confirmed that both db1 and db2 MySQL send syslog. Messages ends up... [13:14:19] DBA, Labs, Tool-Labs, Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2461836 (jcrespo) [13:14:22] DBA, Labs, Operations, Tool-Labs, Traffic: Antigng-bot improper non-api http requests - https://phabricator.wikimedia.org/T137707#2461834 (jcrespo) Open>Resolved a:jcrespo [13:21:57] DBA, OTRS: OTRS database is "too large" - https://phabricator.wikimedia.org/T138915#2461847 (jcrespo) p:Triage>Low Low because the main issue has been work-arounded (rOPUP3ae9c670916f10621771cc2fc8ca454bfa15d4a0), but the issues with the size (mainly, a backup recovery) are still existing, so let... [14:01:51] DBA, Collaboration-Team-Interested, MediaWiki-extensions-CentralAuth, Notifications, and 2 others: CentralAuthCreateLocalAccountJob failing on meta due to Echo deadlocks - https://phabricator.wikimedia.org/T121161#2461948 (jcrespo) This is either likely related or caused by the same issue: T139970 [14:26:03] DBA, Operations, ops-codfw: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2462072 (jcrespo) Open>Resolved A month without crashing, we will reopen if it happens again. [14:28:55] DBA, Patch-For-Review, codfw-rollout: Automate database datacenter switchover steps - https://phabricator.wikimedia.org/T133337#2462079 (jcrespo) Open>Resolved [14:30:17] DBA, Patch-For-Review, codfw-rollout: Automate database datacenter switchover steps - https://phabricator.wikimedia.org/T133337#2228601 (jcrespo) The pending steps will be tracked on T24923. [15:27:49] DBA, Operations, ops-codfw: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462375 (Papaul) [15:55:59] DBA, Operations, ops-codfw: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462444 (Papaul) [16:21:05] Blocked-on-schema-change, DBA, MediaWiki-API, MW-1.28-release-notes, and 2 others: ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2462483 (jcrespo) [16:21:28] Blocked-on-schema-change, DBA, MediaWiki-API, MW-1.28-release-notes, and 3 others: ApiQueryRecentChanges::run is spiking, nuking API servers - https://phabricator.wikimedia.org/T140108#2453162 (jcrespo) [16:27:32] DBA, Operations, ops-codfw: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462549 (jcrespo) I will check the logs and then close this if I do not see anything strange. We will reopen this or one of its child tasks if a crash happens. [16:45:08] DBA, Operations, ops-eqiad: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2462602 (Cmjohnson) I submitted a work order for a new disk Congratulations: Work Order SR932776781 was successfully submitted. [18:39:50] DBA, MediaWiki-API, Performance: plwiki API request is excessively slow when including a badrevid - https://phabricator.wikimedia.org/T140302#2463342 (Halfak) Is it a malformed parameter? Are there some docs that specify that the int must be less than 2^32 = 4,294,967,296? It seems that, from the M... [19:01:27] DBA, Datasets-General-or-Unknown, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-07-05_(1.28.0-wmf.9): Select of revisions for stub history files does not explicitly order revisions - https://phabricator.wikimedia.org/T29112#2463490 (ArielGlenn) [19:54:58] jynus: There's been a spike in "Cannot connect to db 10.64.16.191" from MW. [19:55:12] Started a little bit before the deployment window [19:55:49] (generally, that's not the only one, but appears to be the noisiest) [19:55:50] https://logstash.wikimedia.org/#dashboard/temp/AVXq-oGtc8qLrUhXkHIm [20:17:22] db1077, an s3 slave [20:24:49] Eh, seems to be subsiding.... [20:52:00] DBA, Analytics, Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2463961 (mforns) @jcrespo I did some additions to the white list in the gerrit change set. But I still need the confirmation of 1 schema owner. Please do not activate the purging un... [21:33:28] DBA, Analytics, Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2464200 (mforns) @jcrespo Ok, it's confirmed. We can proceed :] [22:07:20] DBA, MediaWiki-API, Patch-For-Review, Performance: plwiki API request is excessively slow when including a badrevid - https://phabricator.wikimedia.org/T140302#2464343 (jcrespo) >>! In T140302#2463342, @Halfak wrote: > Is it a malformed parameter? Are there some docs that specify that the int mu... [22:22:21] DBA, MediaWiki-API, Patch-For-Review, Performance: plwiki API request is excessively slow when including a badrevid - https://phabricator.wikimedia.org/T140302#2464453 (Halfak) Re. "ignore it", that'd be fine if so long as the response includes the "badrevids" parameter for it. [23:25:51] DBA, Operations, Phabricator, Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2464723 (jcrespo) I do not feel confident with the current status- while I could rush it and do it today (the slave is ready), after checking that this shard is not yet... [23:27:30] DBA, Operations, Phabricator, Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2464725 (jcrespo)