[06:58:46] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), 13Patch-For-Review: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2698690 (10jcrespo) The plan problems is a known issue of IN + a large list of items http://mysqlserverteam.com/you-... [07:02:02] I had problems yesterday with db1069 because the horrible memory tables [07:02:18] jynus: I saw it got restarted for every instance [07:02:21] What happened? :( [07:02:27] hopefully we can get rid of them soon [07:02:44] no, the restarts were planned so to apply the filter to all [07:02:51] By the way, the alter on commonswiki.revision has now been running for 1 hour, so it will be done soon - so far no issues [07:03:00] Ah [07:03:16] marostegui, it will probably will take more than 1-2 hours [07:03:57] db1040 << db1080 [07:04:26] Ah, I see :_( [07:05:03] So what happened with db1069? [07:06:31] the tmp file only has 175KB, which is scary [07:06:43] yeah, I was seeing that right now [07:06:56] but it is not file per table [07:07:35] you think it is because it will do an inplace update? [07:07:41] makes sense [07:07:42] I think so [07:08:01] the other trick is to check global handler writes [07:08:42] Access: 2016-10-07 05:57:54.802326035 +0000 [07:08:42] Modify: 2016-10-07 05:57:54.826326375 +0000 [07:08:42] Change: 2016-10-07 05:57:54.826326375 +0000 [07:08:52] So I want to believe it will be done in the end of the process [07:08:59] as it is rebuilding the table with file per table [07:11:02] it is strange, because it is a noop for innodb [07:15:03] It is strange indeed, but I think the non file per table is playing its role here [07:15:06] with the temporary file [07:15:08] and might be confusing us [07:28:10] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2698765 (10Marostegui) Interesting links Jaime found: https://github.com/giacomolozito/ibdata-shrinker for defragmenting https://github.com/joshuaprunier/trite for transporting tables [07:39:06] jynus: I am scanning the graphs of db1040 to see if it is actually doing something and I do see a constant increase in innodb_history_list_length (which is good) and from an OS point of view, the disk reads and writes have increased too [07:39:20] And it matches when I started the ALTEr around 7:50AM [07:39:53] yes, but strangely, innodb io is low [07:41:02] the show full processlist says it is 99% done [07:41:37] State: altering table [07:41:37] Info: alter table revision DROP index rev_id, ADD PRIMARY KEY (rev_id) [07:41:40] Progress: 99.985 [07:41:50] But we know that cannot be trusted much .p [07:44:42] .991 now [07:44:44] we will see [07:48:18] 8.0 will contain per-stage information and prediction of all sql commands [07:48:36] prediction of how long they will take=? [07:48:48] yes [07:48:53] sweeeet [07:49:10] on performance_schema [07:49:28] It would be interesting to see how they do the calculations [07:49:38] To see how it changes with the configuration changes [07:50:24] yes, I suggested mixing it with the optimizer calculations [07:50:31] but it was too early for that [07:51:00] at least they are working towards that [07:51:02] which is useful [07:51:22] Did you go to Simon's Mudd talk? [07:51:23] They are not getting rid of all SHOW command yet [07:51:29] but they wish [07:51:47] because p_s has all the information in a non-locking way [07:52:04] no, I wasn't [07:52:09] yeah, the show commands are a pain [07:52:12] *didn't [07:57:07] it went to 0 and now the ibd file it is growing indeed, 2.5G :) [07:57:40] So looks like the non file per table thing was the issue [07:58:25] It is now inserting in the tablespace with a good rate, 3.7G now [07:58:39] ok, good [07:58:52] And it is not locking anything, as I have an "wc -l" running [07:58:57] with the connections from time to time [07:59:00] and it is constant [07:59:10] so we are good so far :) [07:59:23] do you want to have a hangout later for my findings at the conf? [07:59:29] sure [07:59:32] 10.30? [07:59:33] 11? [07:59:39] yes [07:59:48] Or after lunch, whenever you want :)= [08:02:46] I will mark 10:30 then :) [08:04:57] It is a 107G file [08:05:17] so hold your breath and check available space [08:06:01] Yeah, I checked that in a slave before, we have more than double so I think we should be good [08:06:49] but yes, at some point we will need a master with more disk [08:06:50] we should prepare all masters <50 for replacement before the next datacenter failover [08:07:45] yeah, and degrament the others, and get rid of that insane ibdata file [08:08:01] yes, that is a blocker for that [08:09:23] we could try the ibdata shrinker script [08:09:32] in db1082 or something (which is already running file per table) [08:23:17] root@dbstore2002:/opt# ./wmf-mariadb10/bin/mysqlbinlog -h [08:23:17] ./wmf-mariadb10/bin/mysqlbinlog: unknown variable 'ssl-ca=/etc/mysql/ssl/cacert.pem' jynus_ any hint before I start troubleshooting this? :) [08:23:46] that is a known bug [08:23:54] by mariadb [08:23:58] lovely :) [08:24:04] see my alias [08:24:12] to see why I do not care about it anymore [08:24:26] Ah I see it [08:24:27] oki :) [08:24:30] I will use it [08:24:34] thanks [08:25:04] works well :) [08:25:07] see reporter: https://jira.mariadb.org/browse/MDEV-9605 [08:26:40] Fix Version/s: [08:26:41] https://jira.mariadb.org/issues/?jql=project+%3D+MDEV+AND+fixVersion+%3D+10.0.27 [08:26:47] 10.0.27.. [08:27:28] yes, but we do not need a task for upgrading, that is granted [08:27:29] It is interesting that they mark it as Major and it takes 2 months to update the ticket :( [08:28:12] marostegui: you can add your own aliases in puppet ;) [08:28:53] in modules/admin/files/home/marostegui, see the others, ofc remember that is a public repo :) [08:29:03] volans: haha I was doing a grep now [08:29:06] for jynus :p [08:29:16] alias my_secret_password=12345 [08:29:24] first find, then grep :-P [08:29:28] XDDDD [08:29:45] oh, was that public? [08:29:59] now everybody knows it! [08:30:01] haha [08:30:04] lol [08:51:19] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2698895 (10Marostegui) I have been investigating what happened and I have used dbstore2002 for this. I have renamed the table again (S2), one of the ones that broke yesterday as per the... [08:58:44] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2698902 (10jcrespo) Let's just get rid of them ASAP. Including on non-mediawiki hosts such as db1069, analytics, labs, etc. [09:01:56] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2698910 (10Marostegui) Yeah, so far S1 and S3 are clean. S2 only db1063, dbstore1002 and dbstore2002 (the ones that broke yesterday) have them for the given wikis. I will clean those thr... [10:39:00] 10DBA: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2699016 (10Marostegui) The master has now the new PK: ```MariaDB MARIADB db1040 commonswiki > show create table revision\G *************************** 1. row *************************** Table: revision Create Table: CREATE TABL... [11:38:54] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2699090 (10Marostegui) dbstore1001 broke now too. I fixed it by doing the same thing I did with the hosts. ``` root@dbstore1001:~# for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i... [14:02:01] marostegui, Re: proxySQL https://engineering.tumblr.com/post/151350779044/juggling-databases-between-datacenters [14:03:15] checking [14:04:36] very nice one [14:04:46] it would be good to see it with more details about all that [14:05:03] but if it is being used at their scale, for such critical thingsā€¦it must be stable enough :) [22:31:17] 10DBA, 10MediaWiki-General-or-Unknown, 06Operations, 13Patch-For-Review: img_metadata queries for PDF files saturates s4 slaves - https://phabricator.wikimedia.org/T147296#2687819 (10brion) I strongly recommend investing in T32906 -- storing the text blobs and such for DjVu and PDF in a structured way inst...