[05:56:40] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3254098 (10Marostegui) s1 in codfw is done: ``` root@neodymium:~# mysql --skip-ssl -hdb2016.codfw.wmnet enwiki -e "show create tabl... [05:56:49] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#3254099 (10Marostegui) [05:57:34] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3254102 (10Marostegui) s1 in codfw is done: ``` root@neodymium:~# mysql --skip-ssl -hdb2016.codfw.wmnet enwiki -e... [05:58:06] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 4 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3254103 (10Marostegui) [06:34:04] 10DBA, 06Operations, 13Patch-For-Review: remove mira wikitech grants - https://phabricator.wikimedia.org/T164968#3254128 (10Marostegui) 05Open>03Resolved a:03Marostegui I have dropped the mira user on silver (I have saved this info just in case we need to recreate it because something else has broken,... [06:37:16] 10DBA, 06Operations, 10ops-eqiad: Decommission db1024 - https://phabricator.wikimedia.org/T164702#3254133 (10Marostegui) Hello @Cmjohnson From the DBA side you can proceed whenever you like. We do not have to do anything else I believe. MySQL is down It has been added to spare role on site.pp Disabled and... [06:37:24] 10DBA, 06Operations, 10ops-eqiad: Decommission db1024 - https://phabricator.wikimedia.org/T164702#3254134 (10Marostegui) [06:37:36] 10DBA, 06Operations, 10ops-eqiad: Decommission db1024 - https://phabricator.wikimedia.org/T164702#3242621 (10Marostegui) [08:31:18] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254287 (10Marostegui) huwiki is done and the only difference was: ``` Differences on dbstore1002 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY huwiki.geo_tags 1 0 1 PRIMARY 33027 8263595 ``` [08:31:34] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254289 (10Marostegui) [09:56:54] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254508 (10Marostegui) kowiki is done and these are the only differences: ``` Differences on db1028 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY kowiki.archive 8613 0 1 PRIMARY 1011705 1011846 kowiki.archive... [09:57:09] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254509 (10Marostegui) [10:57:05] 10DBA: Unify revision table on s2 - https://phabricator.wikimedia.org/T162611#3254606 (10Marostegui) a:03Marostegui [12:03:12] 10DBA, 10Wikidata, 13Patch-For-Review, 07Schema-change: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548#3254768 (10Marostegui) dbstore1001 is done: ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdbstore1001 w... [12:04:20] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3254769 (10Marostegui) dbstore1001 is done: ``` root@neodymium:/home/marostegui/git/software/dbtoo... [12:12:32] 10DBA, 10Wikidata, 13Patch-For-Review, 07Schema-change: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548#3254820 (10Marostegui) 05Open>03Resolved All hosts are done: ``` dbstore2002.codfw.wmnet dbstore2001.codfw.wm... [12:22:10] 10DBA: dbstore2001 not in a healthy status - https://phabricator.wikimedia.org/T165033#3254845 (10Marostegui) [12:25:11] 10DBA: dbstore2001 not in a healthy status - https://phabricator.wikimedia.org/T165033#3254862 (10Marostegui) [12:30:18] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254882 (10Marostegui) rowiki is done. Differences found on `archive` table on most of the hosts and on dbstore1002 also `geo_tags` table ``` Differences on db2047 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDA... [12:30:36] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3254883 (10Marostegui) [13:44:19] 10DBA: dbstore2001 not in a healthy status - https://phabricator.wikimedia.org/T165033#3255109 (10Marostegui) After a while the event scheduler started to complain (I still don't have access to MySQL): ``` OpenTable: (2002) Can't connect to local MySQL server through socket '/tmp/mysql.sock' (11 "Resource tempor... [14:43:26] 07Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 06Scoring-platform-team, 15User-Ladsgroup: Deploy uniqueness constraints on ores_classification table - https://phabricator.wikimedia.org/T164530#3255269 (10Halfak) a:03Ladsgroup [14:43:31] 07Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 06Scoring-platform-team, 15User-Ladsgroup: Deploy uniqueness constraints on ores_classification table - https://phabricator.wikimedia.org/T164530#3255271 (10Halfak) a:05Ladsgroup>03None [15:14:59] 10DBA, 06Operations, 10hardware-requests, 10ops-codfw, 13Patch-For-Review: codfw: (1) spare pool system for temp allocation as database failover - https://phabricator.wikimedia.org/T161712#3255387 (10Papaul) [17:50:13] 10DBA: dbstore2001 not in a healthy status - https://phabricator.wikimedia.org/T165033#3256159 (10Marostegui) In the end MySQL started fine after almost 3 hours after finishing the InnoDB recovery: ``` 170511 13:17:59 [Note] InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in pe... [17:52:33] 10DBA: dbstore2001 takes 3 hours to start MySQL after a crash - https://phabricator.wikimedia.org/T165033#3256164 (10Marostegui) p:05Triage>03Normal [17:53:02] 10DBA: dbstore2001 takes 3 hours to start MySQL after a crash - https://phabricator.wikimedia.org/T165033#3254845 (10Marostegui) I have started all the slaves withouth any issues. Also, manually started event scheduler again. [18:43:57] 10DBA: dbstore2001 takes 3 hours to start MySQL after a crash - https://phabricator.wikimedia.org/T165033#3256367 (10Marostegui) I will test tomorrow a normal stop and a normal start to see how long it takes. [18:46:19] 10DBA, 06Operations, 06Performance-Team, 10Traffic: Cache invalidations coming from the JobQueue are causing slowdown on masters and lag on several wikis, and impact on varnish - https://phabricator.wikimedia.org/T164173#3256370 (10aaron) 05Open>03declined >>! In T164173#3253516, @jcrespo wrote: > I th... [20:56:14] 10DBA, 06Operations, 13Patch-For-Review: dbtree: don't return 200 on error pages - https://phabricator.wikimedia.org/T163143#3256809 (10Dzahn) How about [[ https://gerrit.wikimedia.org/r/#/c/353388/1/index.php | this ]]? [21:19:30] 10DBA, 06Operations, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#3256864 (10Dzahn) status update: nowadays terbium and wasat use the identical role and profile in site.pp, as in: ``` 2600 # mediawiki maintenance servers (https://wik... [21:22:11] 10DBA, 06Operations, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#3256871 (10Dzahn) reason: `database connection to tendril on tendril-backend.eqiad.wmnet failed`