[06:04:36] 10DBA, 10MediaWiki-Categories, 03Community-Tech-Sprint, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3048866 (10kaldari) >We could just hash only in the case that the key is too long It's kind of an awkward solution, but I would be willi... [07:07:21] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3048905 (10Marostegui) db2062: ``` *************************** 1. row *************************** Table: revision Create Table: CREATE TABLE `revision` ( `rev_... [07:33:03] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3048928 (10Marostegui) @Jgreen one last thing, if you are planning to migrate to 10.0.29 you might want to take a look at: ``` innodb_buffer_pool_load_at_startup innodb_buffer_pool_dump_at_shutdown ``` Those... [08:02:39] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3048981 (10Marostegui) I have started to run pt-table-checksum only for the logging table (17GB) and the expected time reported by pt-table-checksum is around 40 minu... [08:09:52] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3048985 (10jcrespo) > Obviously, it would increase the shutdown time a bit though as you need to dump It takes 3 additional seconds to shutdown a 512GB server, as it only dumps the LRU, which only contains p... [08:37:39] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049010 (10jcrespo) > The main problem we see at present is contention around long-running queries. I do not know much about your architecture, but consider, instead of buying only large servers, buying als... [08:43:04] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049017 (10jcrespo) You seem to have high Aria activity, research that. However, you send me the session status, not the global status. Invest memory on performance_schema, you will get free usage metrics wh... [08:50:57] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3049020 (10Marostegui) It finished successfully in 44 minutes: ``` TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 02-23T08:45:43 0 0 2... [09:24:54] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3049054 (10Marostegui) Some seconds after starting the pt-table-checksum for the revision table I had to kill it as both recentchanges slaves in eqiad and codfw (with... [13:25:57] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3049500 (10jcrespo) I had some things in the way, and we have some important maintenance this afternoon (T153768) that could take more than expected, ok to delay until 27 m... [13:26:30] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3049502 (10Dereckson) Ok [13:26:52] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3049503 (10jcrespo) We should also warn in advance to Rel-eng of this potential breakage. [13:27:39] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049507 (10Jgreen) >>! In T158446#3049017, @jcrespo wrote: > You seem to have high Aria activity, research that. However, you send me the session status, not the global status. Whoops. Here's global status:... [13:29:53] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3049509 (10chasemp) I think @akosiaris is the only person with a lot of context for this setup. If I understand correctly the users are https://wikitech.wikimedia.org/wik... [13:31:55] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3049513 (10Marostegui) The data copy has been finished, so we'd be good to go. @Cmjohnson we would need you to: install the disks and create the RAID10 for us if you can (last time I tried to do it... [13:40:39] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3049532 (10jcrespo) Please give me 5 additional minutes to check we have copied everything we wanted. [13:41:55] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3049537 (10Marostegui) >>! In T153768#3049532, @jcrespo wrote: > Please give me 5 additional minutes to check we have copied everything we wanted. Sure, no rush. I was not expecting Christ to be on... [13:48:13] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049550 (10Jgreen) >>! In T158446#3049017, @jcrespo wrote: > You seem to have high Aria activity, research that. However, you send me the session status, not the global status. Could this all be due to temp... [13:54:13] 10DBA, 06Operations, 10hardware-requests, 10ops-codfw: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3049557 (10mark) Approved. [13:55:57] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049559 (10jcrespo) > Could this all be due to temp table activity? Not from the session stats, probably, but yes, indeed. ON 5.5/10 we didn't have the feature of innodb temp tables. Again, on those versions... [13:57:02] 10DBA, 06Operations, 10hardware-requests, 10ops-codfw: Decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#3049564 (10jcrespo) a:05mark>03RobH [14:09:37] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3049588 (10dschwen) Yes, it is used by me! I'm pulling data from that server for the client-side rendered tiles and 3D buildings in WikiMiniAtlas. [14:12:36] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049593 (10Jgreen) >>! In T158446#3049559, @jcrespo wrote: >> Could this all be due to temp table activity? > > Not from the session stats, probably, but yes, indeed. ON 5.5/10 we didn't have the feature of... [14:32:33] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3049645 (10chasemp) >>! In T157359#3049588, @dschwen wrote: > Yes, it is used by me! I'm pulling data from that server for the client-side rendered tiles and 3D buildings... [14:33:40] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049646 (10jcrespo) > a back of envelope calculation suggests we might recover 20-30GB total. I do not think that is accurate (eg. doesn't have into account ibdata1 overhead), to make sure: run on your produ... [14:40:38] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049670 (10jcrespo) I forgot- the other thing that you should deploy is GTIDs, the extra functionality is not that great, but it makes the slaves more reliable due to transactional replication control. [14:42:43] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049688 (10Marostegui) >>! In T158446#3049670, @jcrespo wrote: > I forgot- the other thing that you should deploy is GTIDs, the extra functionality is not that great, but it makes the slaves more reliable due... [14:44:54] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049690 (10Jgreen) >>! In T158446#3049646, @jcrespo wrote: > I do not think that is accurate (eg. doesn't have into account ibdata1 overhead), to make sure: run on your production: Here's what we got: Datab... [14:47:22] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3049693 (10jcrespo) >>! In T158446#3049690, @Jgreen wrote: >>>! In T158446#3049646, @jcrespo wrote: >> I do not think that is accurate (eg. doesn't have into account ibdata1 overhead), to make sure: run on yo... [14:58:46] dbstore1001 is cool now, right? [14:59:05] yep [14:59:13] I was extracting the last misc one [14:59:19] the rest of them were fine [14:59:30] I was about to update the ticket actually to see whether you were done on your side [14:59:37] I am [15:02:51] I was doing a last $ grep 'm[1-5].*222.*' MD5SUM | md5sum -c [15:03:33] i told chris to give us 5 minutes more [15:03:41] so we can be done [15:03:51] it is ok [15:03:59] we can go [15:04:06] we can shut it down [15:04:51] ok, let me talk to chris [15:07:55] oh, you downtimed it till tomorrow - nice :) [15:09:19] server off [15:25:07] all files are ok [15:25:28] \o/ [15:44:16] I am continuing the maintenance for db1026 [15:44:48] How long are the alters taking normally? [15:45:09] on this very old servers? 13-30 hours [15:45:14] *these [15:45:25] we have to do 4 for s5 [15:45:33] 30 hours…wow [15:46:07] at least it is done and can just be copied to the new servers [15:46:42] there is something worse- I cannot redo the partitions and compress at the same time [15:46:52] so I am compressing and adding indexes [15:46:59] we can do the rest at another time [15:47:45] yes, db1036 is s1? [15:48:17] s5 hopefully will be smaller... [15:51:09] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#2890451 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['dbstore1001.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-r... [16:11:27] I am with db1026 [16:11:29] s5 [16:11:32] ok [16:14:38] 10DBA, 10MediaWiki-Categories, 03Community-Tech-Sprint, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3049962 (10matmarex) >>! In T158724#3048866, @kaldari wrote: > I'm curious though what's wrong with just changing to varbinary(50)? That... [16:28:10] 10DBA, 10MediaWiki-Categories, 03Community-Tech-Sprint, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3050000 (10jcrespo) There is worse problems on *links tables, that is why I ok with a poor's man patch until I send your way a complete... [17:51:31] marostegui, maybe we should ignore the _wmf_checkums table on db1036? It is not getting any better, rev_id is going by id 39091161 [17:51:54] i was right now checking the lag [17:51:59] it is huge yeah [17:52:30] max rev id is 48654322 [17:52:45] maybe we can wait at this point [17:52:57] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3050418 (10demon) >>! In T126832#3049503, @jcrespo wrote: > We should also warn in advance to Rel-eng of this potential breakage. Releng knows :) [18:26:20] it is now doing a full table scan for every chunk [18:27:01] I would ignore `nlwiki`.`__wmf_checksums` [18:27:09] ok, give me a bit and I will do it [18:27:20] I will [18:27:26] ah, thanks :) [18:29:27] that can actually be "a fix" if we still want to run pt-table-checksum in core [18:29:39] to make the rc slaves to ignore %checksums [18:29:47] but of course we wouldn't be able to evaluate them [18:29:49] is it depooled, right? [18:29:53] it is [18:29:58] just the stop slave took 1 minute [18:30:07] the codfw has delay too, we can let that one finish [18:30:09] for the query to finish or be reverted [18:30:09] wow [18:30:10] 1 minute? [18:30:24] as I said, I did SHOW EXPLAIN [18:30:31] and it was doing a full table scan [18:30:41] for each chunk calculated [18:31:36] lag is going down [18:32:39] quite fast actually! [18:32:43] thanks for taking care of it [18:32:59] db2035 has the same issue, I can take care of it later, no worries [18:33:04] will this page or something? [18:33:07] no [18:33:08] it is downtimed [18:33:20] you did that before the alert? [18:33:31] there was no alert :) [18:33:35] i caught it on time [18:33:42] yeah, that is what I meant [18:34:59] let's review downtimes for dbstore1001, 2001, 26, 36 and others [18:35:18] the rest can wait [18:35:55] db1036 and db2035 are downtimed till monday [18:36:04] dbstore os until 24 afternoon [18:36:26] the ETA for the transfer is 12 hours [18:36:30] maybe we can do a larger one for the mysql related ones? [18:36:32] so tomorrow morning it should be there [18:36:43] I just know myself [18:36:46] I forget [18:36:53] then it is a matter of mysql_upgrade script I assume [18:36:59] yeah, I would downtime 1001 till monday [18:37:04] yeah [18:37:08] let me do it [18:37:10] i am there already [18:37:12] plus lag [18:37:40] 26 is until 24, too [18:37:41] done [18:37:49] I will put it until monday, too [18:37:52] ok [18:38:01] 36 and 2035? [18:38:19] till monday too [18:38:23] (they are already done) [18:38:27] good [18:38:34] that is all, the rest can wait [18:38:43] only lag [18:38:45] for those two [18:38:52] sure [18:40:24] I will bug you tomorrow with cleaning up and diogenes syndrome of screen sessions [18:40:38] XDDDDD [18:40:41] I know [18:40:44] I have a problem [18:41:10] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3050573 (10Marostegui) Server got reinstalled, unfortunately manually as there were some issues with ipmitool (I will explain further in T150160 as that server was in that list of affected hosts). A... [18:45:33] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3050636 (10jcrespo) a:03Marostegui [18:46:33] db2035 added the repl filter [18:46:36] and lag coming down [18:47:14] that reminds me of db1001 and the others... [18:55:21] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3050688 (10MaxSem) Yes, I'm using it (from tool labs, actually), but feel free to take it offline any time. [18:58:01] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3050709 (10chasemp) Thanks @maxsem AFAIK we can take this down with proper notice (and really we must). My thinking is to send a general 1 week notice to labs-announce a... [19:00:24] 10DBA, 10MediaWiki-Categories, 03Community-Tech-Sprint, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3050710 (10kaldari) >Perhaps it would make very little difference, and perhaps I'm cargo-culting a bit, but it seems like a step in the... [19:01:36] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3050716 (10Marostegui) db1036 and db2035 were still lagging behind more than 30k: ``` ˜/jynus 19:26> it is now doing a full table scan for every chunk ``` So we dec... [19:05:42] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3050727 (10dschwen) @chasemp yes a few days downtime should be OK. I have a cache layer that should serve most of the requests. [19:10:02] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3050755 (10jcrespo) > @jcrespo how does that sound? Good to me. [19:32:14] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3050881 (10RobH) [20:42:55] 10DBA, 07Schema-change: Apply el_owner patch for SecurePoll on all wikis - https://phabricator.wikimedia.org/T158906#3051205 (10demon) [20:43:05] 10DBA, 07Schema-change: Apply el_owner patch for SecurePoll on all wikis - https://phabricator.wikimedia.org/T158906#3051220 (10demon) p:05Triage>03Normal [20:48:59] 10DBA, 07Schema-change: Apply el_owner patch for SecurePoll on all wikis - https://phabricator.wikimedia.org/T158906#3051240 (10demon) 05Open>03Invalid Actually, this was applied [22:03:45] 10DBA, 10fundraising-tech-ops: fundraising database tuning - https://phabricator.wikimedia.org/T158446#3051390 (10Jgreen) >>! In T158446#3048928, @Marostegui wrote: > @Jgreen one last thing, if you are planning to migrate to 10.0.29 you might want to take a look at: > innodb_buffer_pool_load_at_startup > innod...