[05:57:45] <wikibugs>	 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3314068 (10Marostegui) This BBU failed again and the policy went back to WriteThrough: ``` Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad...
[06:05:43] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3314078 (10Marostegui)
[06:09:19] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3314080 (10Marostegui)
[06:16:24] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3314087 (10Marostegui)
[07:34:32] <jynus>	 reimaging db2038 (s5) next
[07:34:50] <marostegui>	 ok
[07:58:10] <jynus>	 what should we do about 89 ?
[07:58:23] <marostegui>	 I want to get the firmware updated today
[07:58:25] <marostegui>	 and then repool it
[07:58:47] <jynus>	 will chris be around today?
[07:59:10] <marostegui>	 I was trying to see if there is a holiday in the us
[07:59:16] <marostegui>	 because the deployment page doesn't say so
[07:59:45] <marostegui>	 nope, today isn't supposed to be holiday there
[13:01:19] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3314678 (10Marostegui) The server went nuts again and got stuck.  I am going to leave it catch up (it is 3 days behind) and will attempt the ALTER once more. If it doesn...
[13:31:50] <jynus>	 depending how I see db2038, I may reimage db2037 too, today (s4)
[13:32:15] <marostegui>	 ok, sounds good to me
[13:35:39] <jynus>	 I'll proobably disconnect for 1 hour and work until late
[13:35:48] <marostegui>	 ok! have a good trip :)
[13:35:54] <marostegui>	 don't work until late though!!!
[13:36:47] <jynus>	 remember also I will not be here on friday and monday
[13:37:00] <jynus>	 it is on the etherpad
[13:37:07] <marostegui>	 oh yes :)
[13:55:13] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3314805 (10Marostegui) ``` root@db1089:~# hpssacli controller all show detail | grep Firmware    Firmware Version: 3.56 ```
[14:45:15] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3315004 (10Marostegui) I have started to slowly repool this server as I don't want to leave it out much longer. @Cmjohnson once you have time for the firmware upgrade, let us know and we will...
[15:11:25] <wikibugs>	 10DBA, 10Commons, 10MediaWiki-Change-tagging, 10MediaWiki-Recent-changes, and 2 others: Allow filtering based on tag on Special:NewFiles - https://phabricator.wikimedia.org/T124214#3315088 (10MarkTraceur) p:05Normal>03Low
[15:20:40] <wikibugs>	 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315110 (10MarcoAurelio)
[15:50:37] <wikibugs>	 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315249 (10jcrespo) I won't be around on Friday or Monday. I would suggest starting it next Tuesday, assuming it is appr...
[16:06:14] <jynus>	 I will reimage db2037 now
[16:09:58] <wikibugs>	 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315324 (10Nuria) Since this is the master for eventlogging machine. Can we move the refresh for this host to happen sooner?  (ping @jcrespo) https://phabricator.wikime...
[16:15:32] <wikibugs>	 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 10Operations: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315348 (10Framawiki) p:05Triage>03Normal
[16:17:02] <wikibugs>	 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315357 (10jcrespo) Not really, we have almost decided the goals for Q1, and they are all quite urgent and for hardware that has been already bought. What we can do to...
[17:14:06] <jynus>	 BTW, for tomorrow, check the warning at https://mariadb.com/kb/en/mariadb/mariadb-10124-release-notes/
[17:14:31] <jynus>	 probably related you the issue you have or had when importing between versions
[17:21:13] <Cyberpower678>	 volans: I could use a DB expert to provide some input.
[17:44:50] <volans>	 Cyberpower678: cannot guarantee that I have the time now, but feel free to ask. Also did you see my reply few days ago?
[17:46:01] <Cyberpower678>	 volans: Yes, I've been using explains to help me figure out how to optimize the query.  So I decided to use UNION.
[17:46:10] <Cyberpower678>	 But that also only seems to use one index. :/
[17:47:01] <volans>	 if the problem we're talking is the same of last week, doing 2 queries and cycle over both results in the code is probably the quicker solution
[17:47:37] <volans>	 with optionally changing the index to have both queries column in it, but can be a second step improvement
[17:47:38] <Cyberpower678>	 volans: well this will be for an API output, and doing 2 requests will result in duplicates.  I want to avoid that.
[17:47:41] <Cyberpower678>	 EXPLAIN SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) UNION SELECT * FROM externallinks_paywall RIGHT JOIN externallinks_global ON externallinks_paywall.paywall_id=externallinks_global.paywall_id WHERE paywall_status IN (3) LIMIT 0,1000;
[17:48:04] <Cyberpower678>	 Shows that the global table will not use the LIVE_STATE key.
[17:49:18] <Cyberpower678>	 But removing the UNION from the query to get EXPLAIN SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) LIMIT 0,1000; has it using the index.
[17:49:20] <Cyberpower678>	 Which I don't understand.
[17:56:03] <Cyberpower678>	 volans: Any suggestion on how to keep this with one query, otherwise I will deal with duplicates.
[17:56:35] <Cyberpower678>	 BTW the DB is s51059__cyberbot
[17:58:11] <volans>	 yes I know (the db), I can take a quick look, the basic point is that one query is filtering on a column on the global table, while the other on a column on the paywall table
[17:59:09] <Cyberpower678>	 Correct.  I've been experimenting on the best optimization, but my SQL skills are intermediate at best.
[17:59:41] <volans>	 do you need all fields from both tables in the results?
[18:00:01] <Cyberpower678>	 All the columns?  Not really.
[18:00:16] <Cyberpower678>	 But most
[18:00:22] <volans>	 ok
[18:03:30] <volans>	 Cyberpower678: the queries of last week were using also has_archive IIRC, did the requirements changed?
[18:04:22] <Cyberpower678>	 I'm adding filters so API requests can be filtered so they aren't downloading the entire DB.
[18:04:42] <Cyberpower678>	 So I do know I need to add indexes for both columns as it's own index.
[18:05:43] <Cyberpower678>	 I'm also trying to make this a reliable and fast API for possible future tools and gadgets as well as bots, which I already have using. :p
[18:08:35] <volans>	 if the filters are generic and the users will be able to choose them independently it might be impossible to have a good query plan for all the combinations
[18:09:28] <volans>	 and it might be data-dependent, like if for paywall_status = 3 there are 1k rows while for paywall_status = 5 there are 1M rows
[18:12:09] <Cyberpower678>	 volans: I'm not sure I follow.
[18:12:53] <volans>	 Cyberpower678: for example in this specific case, if and only if the number or rows in externallinks_paywall with paywall_status IN (3) is not too large, like now that are less than 1k, a quick workaround to have an efficient query could be this one:
[18:13:11] <volans>	 SELECT * FROM externallinks_global LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE `live_state` IN (3,7) OR externallinks_paywall.paywall_id IN (SELECT paywall_id FROM  externallinks_paywall  WHERE paywall_status IN (3)) limit 1000;
[18:13:55] <Cyberpower678>	 Ooh nested selects.  I hadn't thought of that.
[18:14:03] <Cyberpower678>	 But why only if it is small?
[18:14:11] <volans>	 but as the number of rows, hence IDs, in the subquery become larger, it become less efficient and heavier
[18:15:54] <Cyberpower678>	 Feeding a large list into a WHERE clause would kill the efficiency I imagine.
[18:16:50] <volans>	 basically yes
[18:17:08] <Cyberpower678>	 Since the tables are joined, wouldn't it make more sense to use externallinks_global.paywall_id
[18:18:19] <volans>	 it shouldn't make a difference, mysql should be smart enough to know they are the same field
[18:18:28] <Cyberpower678>	 Okay.
[18:19:12] <Cyberpower678>	 The API uses pagination, so I can't imagine how to effectively break the two requests apart.
[18:19:23] <Cyberpower678>	 Without effectively dealing with duplicates.
[18:21:03] <volans>	 the other approach could be to have the users make 2 API calls, one for each, but I have no context of the tool itself to be able to give a good advice ;)
[18:21:39] <Cyberpower678>	 Well it's supposed to be a simple API, aside from the ridiculously tough OAuth layer to authenticate with.
[18:23:48] <Cyberpower678>	 volans: why won't the UNION command use the provided index, but the non-union will?
[18:24:17] <Cyberpower678>	 I was under the impression they were considered separate SQL queries and then combine to remove duplicates.
[18:30:10] <volans>	 Cyberpower678: you have 15M rows in externallinks_global  WHERE `live_state` IN (3,7)... and it has to get them all to make the union ;)
[18:30:57] <Cyberpower678>	 Thus resulting in a large execution time as if it were using a non-indexed.
[18:32:05] <volans>	 mysql can decide to not use an index if it thinks that a full table scan is "better", in this case probably due to the fact that you're querying for * so it had to recover all the columns anyway
[18:32:22] <volans>	 that actually reminds me that you will have anyway pagination issues
[18:32:52] * Cyberpower678 could try session based pagination handlers that remember which values were already returned.
[18:32:59] <volans>	 1) because we didn't sort, so might make sense to ORDER BY url_id for example (but not desc)
[18:33:18] <volans>	 and 2) because the data changes over time ofc
[18:33:42] <volans>	 also, this approach become very heavy if going towards larger offsets
[18:33:45] <Cyberpower678>	 Okay how about a new idea.
[18:34:14] <volans>	 a query that takes 0.01s to get LIMIT 0,1000 can take 1m to get LIMIT 100000,1000
[18:34:29] <volans>	 pagination is the killer of relational DBs ;)
[18:36:31] <Cyberpower678>	 volans: so how does one fix that?
[18:36:47] <Cyberpower678>	 I just tested that theory and it does indeed take 8.88s.
[18:38:34] <volans>	 you might be able to improve it a bit querying only for the IDs and then doing a query SELECT * from FOO where ID in (LIST of IDs / subquery), ti depends
[18:39:20] <Cyberpower678>	 Nest a select statement inside another?
[18:39:34] <volans>	 or gathering the IDs in the application and then querying for them
[18:39:58] <volans>	 considering the filter query a "search" for the IDs and then getting the actual rows only for the few IDs you need
[18:40:09] <Cyberpower678>	 Well that's one lookup option.  But I have it so you can do a general lookup of URLs that are either dead, have archives, etc...
[18:41:46] <Cyberpower678>	 Nope it takes just as long with just URL IDs.
[18:42:58] <volans>	 which one?
[18:43:19] <Cyberpower678>	 SELECT url_id FROM externallinks_global WHERE `has_archive` = 1 LIMIT 1000000,1000;
[18:43:19] * volans gotta go for dinner in few minutes
[18:45:34] <Cyberpower678>	 volans: ^
[18:47:39] <volans>	 Cyberpower678: you sure? I'm getting 0.35s for the above vs A LOT with select *
[18:48:17] <volans>	 where A LOT just returned in 1m 34s
[18:49:04] <Cyberpower678>	 volans: well the DB caches requests doesn't it?
[18:49:18] <Cyberpower678>	 Try a different offset.
[18:49:52] <volans>	 at most 0.5s for SELECT url_id FROM externallinks_global WHERE `has_archive` = 1 LIMIT 1012000,1000;
[18:50:28] <volans>	 and no it doesn't cache requests, caches indexes and pages
[18:52:33] <volans>	 sorry, gotta go for dinner
[18:53:15] <Cyberpower678>	 I notice that if I repeat a request right after making the first one, the result is instantaneous.
[18:53:18] <Cyberpower678>	 volans: ^
[18:58:00] <volans>	 yes, that's normal, but it doesn't cache the query results, it's a bit more complicated than that
[18:58:25] <volans>	 hope this can help you, really gotta log off now, cya
[20:33:57] <wikibugs>	 10DBA, 10Labs, 10Labs-Infrastructure: Decide whether back-compat views for upcoming major schema changes will be provided in the Labs replicas - https://phabricator.wikimedia.org/T166798#3316420 (10bd808) > I don't know if there are any statistics as to how many tools would be broken by these changes, or if...