[05:59:20] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3156234 (10Marostegui) Might be related to the work that has been done by some analysts with some SUPER heavy queries in the last few days... [06:10:05] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3156241 (10Marostegui) db2061 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb2061.codfw.wmnet $i -e "show create table revision\G" | egrep "KEY";done arwi... [06:28:42] 10DBA: Remove partitioning from db2019 (codfw master) commonswiki.templatelinks - https://phabricator.wikimedia.org/T161683#3156283 (10Marostegui) 05Open>03Resolved This finished and all the slaves caught up around 5:30AM ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb2019.codfw.wmnet -e "show cre... [06:43:19] 10DBA, 06Operations, 10ops-eqiad: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3156329 (10Marostegui) p:05Triage>03Normal [07:08:13] 10DBA, 13Patch-For-Review: Defragment s4: db1091, db1084, db1081, d1059 and probably the rest - https://phabricator.wikimedia.org/T161088#3156389 (10Marostegui) I have started to defragment db1081, in order to have at least have one healthy host with file per table to be able to reclone the new eqiad servers t... [07:37:09] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3156441 (10Marostegui) As I mentioned here: T159430#3153285 I would like to convert a couple of enwiki tables to InnoDB+compression to see if it helps this: https://jira.mariadb.org/browse/MDEV-9027 which we are suffering... [07:42:11] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3156444 (10jcrespo) 05Open>03Resolved a:03jcrespo Sure. For now I will close this as it seems healthy again. [07:51:41] what was the deal with db1047 and db1070? [07:55:05] db1047 had issues with /var/log/account [07:55:13] and db1070 is doing compression since yesterday and came back from downtime [08:00:49] no big issue, then [08:00:58] no, thankfully no :) [08:48:55] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3156591 (10Marostegui) For s2 the suggested host in: https://gerrit.wikimedia.org/r/#/c/338996/ was db1054. I have been doing some research about its history and the on... [08:53:04] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3156617 (10jcrespo) Looks ok to me. [09:06:54] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3156640 (10Marostegui) For s4 the suggested host in: https://gerrit.wikimedia.org/r/#/c/338996/ was **db1068**. Jaime mentioned there could be underlying issues with th... [09:34:13] dbstore1002 is lagging again [09:35:25] I am running an alter table [09:35:36] oh, ok [09:35:37] sorry [09:35:45] did it page? [09:35:55] I thought I was downtimed [09:35:58] Maybe it failed [09:35:59] no, I just thought it was a runaway query [09:36:00] let me check [09:36:02] ah [09:36:25] no, I have been talking with the analyst all day yesterday and today [09:36:28] he is going to change a few things [09:36:30] while I alter that [09:36:36] good [09:37:01] I am running a long query on dbstore2002 [09:37:56] ok :) [09:38:25] it will soon pop up to the top of the activity page, it is just me doing a full table scan of revision [09:39:18] oh [09:39:21] good luck [09:39:33] how long will it take you think? [09:41:30] if the table was warmed up, no more than 20 minutes [09:41:44] it is probably mostly on disk, so who knows? [09:56:21] 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3156758 (10hashar) >>! In T161232#3150750, @Zppix wrote: > It couldn't hurt @reedy,@hashar would this affect CI? On the CI nodes we have: | Trusty | mariadb-serv... [10:49:20] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3156921 (10Addshore) [10:49:38] Hi both! Would it be possible for you to give the above task a quick once over ^^ Thanks!! [10:50:05] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3156921 (10Addshore) [10:50:14] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3156921 (10Addshore) a:05Addshore>03None [11:03:12] marostegui: jynus: Is one of you available tomorrow between 14:00 and 16:00 UTC? [11:03:23] I need to disable and re-enable a cron job during that time [11:03:29] that will be done via puppet [11:03:38] yes [11:03:44] also, it is puppet swat [11:03:59] put it there so at least someone is available [11:04:04] I will need my own window, we will test the fix for https://phabricator.wikimedia.org/T151681 on test.wikidata [11:05:34] yes, add it to deployments [11:06:11] that is used for coordination, even if technically it is not a deployment [11:34:33] 10DBA, 06Labs, 10Labs-Infrastructure: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3157034 (10jcrespo) [12:26:54] 10DBA, 10Wikidata, 13Patch-For-Review, 15User-Daniel, and 2 others: Use redis-based lock manager for dispatchChanges on test sites. - https://phabricator.wikimedia.org/T159828#3157089 (10hoo) a:05daniel>03hoo Deployment of this has been schedule for [[https://wikitech.wikimedia.org/wiki/Deployments#Thu... [12:27:05] 10DBA, 10Wikidata, 13Patch-For-Review, 15User-Daniel, and 2 others: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3157091 (10hoo) [12:43:24] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3157126 (10Marostegui) db2054 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb2054.codfw.wmnet $i -e "show create table revision\G" | egrep "KEY";done arwi... [12:55:17] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3156921 (10Marostegui) As per T148988#2742029 I assume this is all public info and no further filtering is req... [13:01:51] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3157145 (10Addshore) >>! In T162252#3157135, @Marostegui wrote: > As per T148988#2742029 I assume this is all... [13:02:46] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3157147 (10Marostegui) For s5 the suggested host in: https://gerrit.wikimedia.org/r/#/c/338996/ was **db1063** db1063 currently lives in s2, so it'd need to be recloned... [13:12:14] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3157170 (10Ottomata) > How's the process to decommission db1047 going? I guess ok! I think we should just dump all the user created databases to a file and archive it before... [13:14:32] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3157185 (10Marostegui) >>! In T156844#3157170, @Ottomata wrote: >> How's the process to decommission db1047 going? > > I guess ok! I think we should just dump all the user c... [13:14:56] FYI I'm merging ^ and will shephard on db1047 and dbstore1002 [13:15:26] ok! [13:24:33] done, looking ok so far [13:25:21] best check would be monitoring the backfilling and checking it is no longer needed [13:31:58] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3157226 (10Marostegui) For s6 the suggested host in: https://gerrit.wikimedia.org/r/#/c/338996/ was **db1061** The recent pt-table-checksum ran on s6 (T160509) revealed... [14:07:12] 10DBA: convert dbstore1001 to InnoDB compressed by importing db shards to it - https://phabricator.wikimedia.org/T159430#3157313 (10Marostegui) page, categorylinks and template links on dbstore1002 have been converted to InnoDB and compressed. I will talk to the analyst again and check if we got rid of those loc... [14:37:24] 10DBA, 06Operations, 10ops-codfw: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3157431 (10Papaul) [15:04:55] 10DBA, 07Performance: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3157544 (10jcrespo) > 364 took 300+ seconds That is probably biased of queries that complete successfuly. 300 second queries are killed- so... [15:05:51] 10DBA, 06Operations, 10ops-codfw: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3157547 (10Marostegui) ` install_server module update (mac address and partitioning info,) Please provide partition schema` Please create a RAID10 with the following options (https://wikitech.w... [15:10:04] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3157548 (10Marostegui) For s7 the suggested host in: https://gerrit.wikimedia.org/r/#/c/338996/ was **db1062** We haven't run pt-table-checksum on s7 yet, so data-wise... [15:32:12] 10DBA, 06Operations, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#3157631 (10matmarex) a:05matmarex>03None There is nothing else I can do myself to resolve this. I do not have the access to run the two queries I pos... [15:38:52] 10DBA, 06Operations, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#2953992 (10jcrespo) This was classified as a low priority task. It will be eventually done, do not worry, it is not forgotten, but at the cost of other,... [15:41:10] 10DBA, 06Operations, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#3157662 (10Marostegui) For the record, I checked the "consistency" of that row across s4 (commons) and s1 (enwiki), and to make sure at least it is prese... [16:23:19] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3157874 (10Tbayer) >>! In T156844#3157170, @Ottomata wrote: >> How's the process to decommission db1047 going? > > I guess ok! I think we should just dump all the user creat... [16:26:18] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3157875 (10Ottomata) > I thought the plan was to import them (in particular the "staging" database) to dbstore1002, so that they can be queried there as before? Ah sure we can... [17:21:39] 10DBA, 06Operations, 10ops-codfw: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158019 (10RobH) [17:23:04] 10DBA, 06Operations, 10hardware-requests, 10ops-codfw: codfw: (1) spare pool system for temp allocation as database failover - https://phabricator.wikimedia.org/T161712#3140543 (10RobH) 05Open>03stalled p:05Triage>03Normal I'm setting this to stalled and normal priority, as this task will also serv... [17:32:18] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158082 (10jcrespo) To clarify the state of this, we still need this ASAP for service implementation ahead of the switchover (that can take quite some time, it is more than just runn... [17:43:41] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158106 (10RobH) I'm getting the OS installed today and handed off. [17:44:12] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158108 (10RobH) [17:59:25] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3158194 (10jcrespo) ^the above should be enough for the recipe. In addition to what Manuel stated, given problems we had in the past, we need to check: * IPMI calls work a... [18:05:49] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3158209 (10jcrespo) The guidance is the same as T162159#3157547 (documented for databases on https://wikitech.wikimedia.org/wiki/Raid_and_MegaCli#Raid_setup_at_Wikimedia ).... [18:16:18] 10DBA: For switchovers: A way to check if slaves are up to date - https://phabricator.wikimedia.org/T156465#2975756 (10jcrespo) Technically, volans Implemented already a way: https://gerrit.wikimedia.org/r/343270. We only need to steal it and focus on general master switchover, and optionally, use mysql as a tra... [18:20:58] 10DBA, 13Patch-For-Review, 07Performance: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3158267 (10Anomie) I ran the query of API request lengths again, for all of March this time. 15682708428 total queries... [18:29:48] 10DBA, 13Patch-For-Review, 07Performance: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3158334 (10Anomie) If I had to guess, I'd guess the big bump at 120 seconds is something killing the original query af... [18:33:06] 10DBA, 13Patch-For-Review, 07Performance: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3158340 (10jcrespo) You can double check my stats here: https://tendril.wikimedia.org/report/slow_queries?host=^db&use... [19:52:45] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158631 (10RobH) a:05RobH>03jcrespo [19:55:03] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3158633 (10RobH) a:05jcrespo>03None So this is now ready for puppet key/salt key and service implementation by the #DBA team. This already has their tag for #DBA on the task, I... [23:52:59] 10DBA, 10MediaWiki-Database, 07Schema-change: Update schema for wmf wiki's for the table archive, migrating to new index ar_usertext_timestamp - https://phabricator.wikimedia.org/T161252#3159309 (10Reedy) [23:57:44] 10DBA: For switchovers: A way to check if slaves are up to date - https://phabricator.wikimedia.org/T156465#3159336 (10Volans) In the medium term I've in mind a bunch of things that should help towards this direction. Feel free to ping me to talk about it.