[05:57:45] 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3314068 (10Marostegui) This BBU failed again and the policy went back to WriteThrough: ``` Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad... [06:05:43] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3314078 (10Marostegui) [06:09:19] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3314080 (10Marostegui) [06:16:24] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3314087 (10Marostegui) [07:34:32] reimaging db2038 (s5) next [07:34:50] ok [07:58:10] what should we do about 89 ? [07:58:23] I want to get the firmware updated today [07:58:25] and then repool it [07:58:47] will chris be around today? [07:59:10] I was trying to see if there is a holiday in the us [07:59:16] because the deployment page doesn't say so [07:59:45] nope, today isn't supposed to be holiday there [13:01:19] 10DBA, 10Analytics, 10Analytics-EventLogging: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3314678 (10Marostegui) The server went nuts again and got stuck. I am going to leave it catch up (it is 3 days behind) and will attempt the ALTER once more. If it doesn... [13:31:50] depending how I see db2038, I may reimage db2037 too, today (s4) [13:32:15] ok, sounds good to me [13:35:39] I'll proobably disconnect for 1 hour and work until late [13:35:48] ok! have a good trip :) [13:35:54] don't work until late though!!! [13:36:47] remember also I will not be here on friday and monday [13:37:00] it is on the etherpad [13:37:07] oh yes :) [13:55:13] 10DBA, 10Operations, 10ops-eqiad: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3314805 (10Marostegui) ``` root@db1089:~# hpssacli controller all show detail | grep Firmware Firmware Version: 3.56 ``` [14:45:15] 10DBA, 10Operations, 10ops-eqiad: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3315004 (10Marostegui) I have started to slowly repool this server as I don't want to leave it out much longer. @Cmjohnson once you have time for the firmware upgrade, let us know and we will... [15:11:25] 10DBA, 10Commons, 10MediaWiki-Change-tagging, 10MediaWiki-Recent-changes, and 2 others: Allow filtering based on tag on Special:NewFiles - https://phabricator.wikimedia.org/T124214#3315088 (10MarkTraceur) p:05Normal>03Low [15:20:40] 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315110 (10MarcoAurelio) [15:50:37] 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315249 (10jcrespo) I won't be around on Friday or Monday. I would suggest starting it next Tuesday, assuming it is appr... [16:06:14] I will reimage db2037 now [16:09:58] 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315324 (10Nuria) Since this is the master for eventlogging machine. Can we move the refresh for this host to happen sooner? (ping @jcrespo) https://phabricator.wikime... [16:15:32] 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315348 (10Framawiki) p:05Triage>03Normal [16:17:02] 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315357 (10jcrespo) Not really, we have almost decided the goals for Q1, and they are all quite urgent and for hardware that has been already bought. What we can do to... [17:14:06] BTW, for tomorrow, check the warning at https://mariadb.com/kb/en/mariadb/mariadb-10124-release-notes/ [17:14:31] probably related you the issue you have or had when importing between versions [17:21:13] volans: I could use a DB expert to provide some input. [17:44:50] Cyberpower678: cannot guarantee that I have the time now, but feel free to ask. Also did you see my reply few days ago? [17:46:01] volans: Yes, I've been using explains to help me figure out how to optimize the query. So I decided to use UNION. [17:46:10] But that also only seems to use one index. :/ [17:47:01] if the problem we're talking is the same of last week, doing 2 queries and cycle over both results in the code is probably the quicker solution [17:47:37] with optionally changing the index to have both queries column in it, but can be a second step improvement [17:47:38] volans: well this will be for an API output, and doing 2 requests will result in duplicates. I want to avoid that. [17:47:41] EXPLAIN SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) UNION SELECT * FROM externallinks_paywall RIGHT JOIN externallinks_global ON externallinks_paywall.paywall_id=externallinks_global.paywall_id WHERE paywall_status IN (3) LIMIT 0,1000; [17:48:04] Shows that the global table will not use the LIVE_STATE key. [17:49:18] But removing the UNION from the query to get EXPLAIN SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) LIMIT 0,1000; has it using the index. [17:49:20] Which I don't understand. [17:56:03] volans: Any suggestion on how to keep this with one query, otherwise I will deal with duplicates. [17:56:35] BTW the DB is s51059__cyberbot [17:58:11] yes I know (the db), I can take a quick look, the basic point is that one query is filtering on a column on the global table, while the other on a column on the paywall table [17:59:09] Correct. I've been experimenting on the best optimization, but my SQL skills are intermediate at best. [17:59:41] do you need all fields from both tables in the results? [18:00:01] All the columns? Not really. [18:00:16] But most [18:00:22] ok [18:03:30] Cyberpower678: the queries of last week were using also has_archive IIRC, did the requirements changed? [18:04:22] I'm adding filters so API requests can be filtered so they aren't downloading the entire DB. [18:04:42] So I do know I need to add indexes for both columns as it's own index. [18:05:43] I'm also trying to make this a reliable and fast API for possible future tools and gadgets as well as bots, which I already have using. :p [18:08:35] if the filters are generic and the users will be able to choose them independently it might be impossible to have a good query plan for all the combinations [18:09:28] and it might be data-dependent, like if for paywall_status = 3 there are 1k rows while for paywall_status = 5 there are 1M rows [18:12:09] volans: I'm not sure I follow. [18:12:53] Cyberpower678: for example in this specific case, if and only if the number or rows in externallinks_paywall with paywall_status IN (3) is not too large, like now that are less than 1k, a quick workaround to have an efficient query could be this one: [18:13:11] SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) OR externallinks_paywall.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHERE paywall_status IN (3)) limit 1000; [18:13:55] Ooh nested selects. I hadn't thought of that. [18:14:03] But why only if it is small? [18:14:11] but as the number of rows, hence IDs, in the subquery become larger, it become less efficient and heavier [18:15:54] Feeding a large list into a WHERE clause would kill the efficiency I imagine. [18:16:50] basically yes [18:17:08] Since the tables are joined, wouldn't it make more sense to use externallinks_global.paywall_id [18:18:19] it shouldn't make a difference, mysql should be smart enough to know they are the same field [18:18:28] Okay. [18:19:12] The API uses pagination, so I can't imagine how to effectively break the two requests apart. [18:19:23] Without effectively dealing with duplicates. [18:21:03] the other approach could be to have the users make 2 API calls, one for each, but I have no context of the tool itself to be able to give a good advice ;) [18:21:39] Well it's supposed to be a simple API, aside from the ridiculously tough OAuth layer to authenticate with. [18:23:48] volans: why won't the UNION command use the provided index, but the non-union will? [18:24:17] I was under the impression they were considered separate SQL queries and then combine to remove duplicates. [18:30:10] Cyberpower678: you have 15M rows in externallinks_global WHERE `live_state` IN (3,7)... and it has to get them all to make the union ;) [18:30:57] Thus resulting in a large execution time as if it were using a non-indexed. [18:32:05] mysql can decide to not use an index if it thinks that a full table scan is "better", in this case probably due to the fact that you're querying for * so it had to recover all the columns anyway [18:32:22] that actually reminds me that you will have anyway pagination issues [18:32:52] * Cyberpower678 could try session based pagination handlers that remember which values were already returned. [18:32:59] 1) because we didn't sort, so might make sense to ORDER BY url_id for example (but not desc) [18:33:18] and 2) because the data changes over time ofc [18:33:42] also, this approach become very heavy if going towards larger offsets [18:33:45] Okay how about a new idea. [18:34:14] a query that takes 0.01s to get LIMIT 0,1000 can take 1m to get LIMIT 100000,1000 [18:34:29] pagination is the killer of relational DBs ;) [18:36:31] volans: so how does one fix that? [18:36:47] I just tested that theory and it does indeed take 8.88s. [18:38:34] you might be able to improve it a bit querying only for the IDs and then doing a query SELECT * from FOO where ID in (LIST of IDs / subquery), ti depends [18:39:20] Nest a select statement inside another? [18:39:34] or gathering the IDs in the application and then querying for them [18:39:58] considering the filter query a "search" for the IDs and then getting the actual rows only for the few IDs you need [18:40:09] Well that's one lookup option. But I have it so you can do a general lookup of URLs that are either dead, have archives, etc... [18:41:46] Nope it takes just as long with just URL IDs. [18:42:58] which one? [18:43:19] SELECT url_id FROM externallinks_global WHERE `has_archive` = 1 LIMIT 1000000,1000; [18:43:19] * volans gotta go for dinner in few minutes [18:45:34] volans: ^ [18:47:39] Cyberpower678: you sure? I'm getting 0.35s for the above vs A LOT with select * [18:48:17] where A LOT just returned in 1m 34s [18:49:04] volans: well the DB caches requests doesn't it? [18:49:18] Try a different offset. [18:49:52] at most 0.5s for SELECT url_id FROM externallinks_global WHERE `has_archive` = 1 LIMIT 1012000,1000; [18:50:28] and no it doesn't cache requests, caches indexes and pages [18:52:33] sorry, gotta go for dinner [18:53:15] I notice that if I repeat a request right after making the first one, the result is instantaneous. [18:53:18] volans: ^ [18:58:00] yes, that's normal, but it doesn't cache the query results, it's a bit more complicated than that [18:58:25] hope this can help you, really gotta log off now, cya [20:33:57] 10DBA, 10Labs, 10Labs-Infrastructure: Decide whether back-compat views for upcoming major schema changes will be provided in the Labs replicas - https://phabricator.wikimedia.org/T166798#3316420 (10bd808) > I don't know if there are any statistics as to how many tools would be broken by these changes, or if...