[02:25:52] 10DBA, 10Growth-Team, 10MediaWiki-Watchlist, 10Wikimedia-log-errors: Deleting large watchlist takes > 4 seconds causing rollback due to write time limit - https://phabricator.wikimedia.org/T171898 (10Krinkle) [04:39:18] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2061 - https://phabricator.wikimedia.org/T199759 (10Marostegui) p:05Triage>03Normal a:03Papaul Can we get a replacement? Thanks! [05:53:34] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [05:53:36] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [05:53:49] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [07:11:55] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [07:11:56] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [07:12:10] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [08:29:50] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) s6 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1002 [] db1125 [] db1113 [] db1098 [] db1096 [] db1093 [] db1088 [] db1085 [] db1061 [08:29:52] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) s6 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1002 [] db1125 [] db1113 [] db1098 [] db1096 [] db1093 [] db1088 [] db1085 [] db1061 [08:29:57] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) s6 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1002 [] db1125 [] db1113 [] db1098 [] db1096 [] db1093 [] d... [08:30:23] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [08:30:25] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [08:30:43] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [11:31:42] 10DBA, 10Patch-For-Review: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) What files do we have related to the potential recovery of enwiki.categorylinks? ``` root@db1115.eqiad.wmnet[zarcillo]> select backups.id, backups.source, backup_files.file_name,... [11:37:27] I see some stalls at 10:25 on s6 [11:38:13] not many, just 12 replication errors and a 1 minute and a 30-second latency alert [13:11:18] 10DBA, 10Patch-For-Review: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) >>! In T198987#4430419, @Marostegui wrote: > What's the file_date vs backup_date? backup_date is when the backup started and file_date when the file was last modified on the file... [14:28:21] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2061 - https://phabricator.wikimedia.org/T199759 (10Papaul) a:05Papaul>03Marostegui disk replaced [14:32:29] 10DBA, 10Operations, 10decommission, 10ops-codfw: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228 (10Papaul) @RobH we can not do disks wipe on this system. The system can't boot and we do not have any identical server not in use to put the disk in and do the... [14:32:56] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10awight) @daniel Surprisingly, there is interest in this going through TechCom after all. I've been digesting... [14:56:58] 10DBA, 10Operations, 10decommission, 10ops-codfw: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228 (10RobH) We don't need an identical system, just any system we can install the disks into. I advise using a spare system to do this. Make sense? [14:58:52] 10DBA, 10Operations, 10decommission, 10ops-codfw: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228 (10Marostegui) Or servers that we already decommissioned? [15:02:21] 10DBA, 10Operations, 10decommission, 10ops-codfw: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228 (10Papaul) We have no spare system that can take 12 disks I will just use one of the Dell decommissioned server. [16:32:06] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2061 - https://phabricator.wikimedia.org/T199759 (10Marostegui) 05Open>03Resolved All good - thank you! ``` root@db2061:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337F3720) Port Name: 1I Port... [20:32:38] 10DBA, 10Commons, 10MediaWiki-Database, 10MediaWiki-Special-pages, and 2 others: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 (10Marostegui) I have narrowed this to this query: ``` SELECT /* IndexPager::buildQueryI... [20:35:11] 10DBA, 10Commons, 10MediaWiki-Database, 10MediaWiki-Special-pages, and 2 others: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 (10Josve05a) [20:57:19] 10DBA, 10Commons, 10MediaWiki-Database, 10MediaWiki-Special-pages, and 2 others: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 (10Anomie) > ``` > /*!50100 PARTITION BY RANGE (log_user) > ``` What's going to happen... [21:02:14] 10DBA, 10Commons, 10MediaWiki-Database, 10MediaWiki-Special-pages, and 2 others: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 (10Marostegui) >>! In T199790#4431953, @Anomie wrote: >> ``` >> /*!50100 PARTITION BY RA... [21:02:51] 10DBA, 10Commons, 10MediaWiki-Database, 10MediaWiki-Special-pages, and 2 others: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 (10Anomie) Looking at the queries, I see SHOW EXPLAIN says it's using an even worse plan...