[16:58:00] marostegui: regarding the new role out, You saw the rows read increased a lot but that's sorta expected in normalization, it might even get really higher (I'm trying to put some more cache into it) but beside that, did anything else was problematic? like IO, CPU, etc. [17:18:13] Amir1: No, I saw a big spike on read_key handlers, but I guess that is expected as the indexes were probably not in memory yet [17:18:41] https://grafana.wikimedia.org/d/000000273/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1109&var-port=9104&from=1564565159154&to=1564593430714 [17:24:06] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2063 - https://phabricator.wikimedia.org/T229302 (10Marostegui) 05Open→03Resolved All good! ` logicaldrive 1 (3.3 TB, RAID 1+0, OK) ` [17:41:53] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) [18:19:47] marostegui: I don't know, the number of rows in the new tables are smaller then 100k [18:25:49] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10wiki_willy) a:03Cmjohnson [18:36:24] Amir1: do you have a set of queries that might be bad so we can do some explains? [18:36:47] Amir1: otherwise we can deploy again and leave it for a bit longer to see how it goes [18:37:03] the query latency remained the same, so that's good [18:40:54] I don't think there's any bad queries, it's just it's loooooots of queries and they are "slightly" worse which adds up [18:46:23] lots of query isn't bad per se, if they are fast [18:47:56] https://phabricator.wikimedia.org/T225053#5381325 This is one example [18:48:40] that doesn't look bad [18:49:50] yeah [18:50:14] I can improve it even further by putting things into APCu cache [18:50:22] but that would take some time :( [18:51:10] that particular set of queries aren't bad [18:56:42] I agree [19:00:29] marostegui: So I go live again tomorrow for a while [19:01:04] Amir1: let's try yes [21:37:35] Holy shit, I found out what's causing the most number of database error in production, running the exact same query twice, one through job, one through deferred update [21:37:57] Will write in depth [21:38:12] T205045 [21:38:12] T205045: Exception from LinksUpdate: Deadlock found in database query (from Wikibase\Client\Usage\Sql\EntityUsageTable::addUsages) - https://phabricator.wikimedia.org/T205045