[17:21:15] <hoo>	 jynus: Have a second?
[17:22:53] <jynus>	 yes, hoo
[17:24:48] <hoo>	 I have queries like SELECT DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q1005059','Q10865115','Q10885463','Q10924077','Q10930884','Q10938107','Q11181347','Q14582744','Q14583061','Q14583132','Q14583192','Q148','Q391942','Q42200');
[17:25:20] <jynus>	 aha
[17:25:29] <hoo>	 Sometimes they are "Using where; Using index for group-by" and sometimes they're not (depending on the host)
[17:25:46] <hoo>	 If they don'T use the index for the group by, stuff gets extremely slow
[17:25:51] <hoo>	 timing out, even
[17:26:32] <hoo>	 Eg. zhwiki on db1054 is problematic zhwiki on db1060 isn't... differen maria versions?
[17:27:16] <jynus>	 only on zhwiki?
[17:27:34] <hoo>	 No, but that one just came to my attention through the error logs
[17:27:38] <hoo>	 seeing that on various wikis
[17:28:44] <jynus>	 it could be the statistics
[17:29:18] <jynus>	 the idea would be to run anlyze on a non production slave, then copy the index stats to the other hosts
[17:30:33] <hoo>	 How much work would that be for $everything
[17:31:10] <jynus>	 depends on the size of the table, it could take a day for our largest table
[17:31:54] <jynus>	 most of it is checking that after the statistics have been rebuit they have the right outout
[17:32:12] <hoo>	 I think these tables have less than 10m rows on all wikis
[17:32:17] <jynus>	 we recently find a case in which we needed to force the plan because mariadb was taking the wrong one
[17:32:25] <hoo>	 might peak slightly above that on a few big ones, but probably not much
[17:34:32] <jynus>	 both boxes have 10.0.16
[17:34:58] <hoo>	 Yeah, just saw
[17:34:59] <hoo>	 :S
[17:37:35] <hoo>	 holy crap
[17:37:41] <hoo>	 that table has 50M entries on zhwiki
[17:38:50] <jynus>	 https://phabricator.wikimedia.org/P2222
[17:39:41] <hoo>	 Yeah, that's what I saw, thanks
[17:41:28] <jynus>	 I will open a task, I have not seen it as an unbreak now and investigate it
[17:41:45] <hoo>	 Yes, please do
[17:42:07] <jynus>	 but probably it will have to wait until monday, I am in the middle of something
[17:42:12] <hoo>	 $ grep -c 'SELECT DISTINCT eu_entity_id' exception.log
[17:42:12] <hoo>	 20
[17:42:21] <hoo>	 That's the ones timing out, so not to many
[17:42:34] <hoo>	 but I guess we have several ones running annoyingly long, still
[17:44:42] <jynus>	 is this api activity or regular user requests?
[17:45:26] <jynus>	 is this for the wikidata item usage?
[17:45:45] <hoo>	 Job queue only, I think
[17:45:59] <hoo>	 so it's not slowing down actual web requests, I think
[17:46:07] <jynus>	 so in that case, it can wait
[17:46:30] <jynus>	 I have right now higher priorities, but I will check it on monday
[17:46:36] <hoo>	 Ok, good
[17:47:00] <jynus>	 do you have the function name, for the ticket?
[17:47:10] <jynus>	 the one that is shown on the query comment?
[17:47:26] <hoo>	 EntityUsageTable::getUsedEntityIdStrings
[17:47:35] <jynus>	 thanks
[17:47:36] <hoo>	 or Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings if you want it qualified
[17:49:11] <jynus>	 https://phabricator.wikimedia.org/T116404
[17:49:19] <jynus>	 thanks for the heads up
[17:49:51] <jynus>	 I normally monitor the slow queries, but these week I have been focused on pending schema changes, etc.
[17:49:56] <jynus>	 *focusing
[17:52:15] <jynus>	 I created a dashboard recently: https://logstash.wikimedia.org/#/dashboard/elasticsearch/Slow%20queries
[20:45:03] <Krenair>	 beta replication from deployment-db1 to deployment-db2 has broken
[20:45:34] <Krenair>	 I found it had stopped, tried to make it start but without much luck