[06:52:48] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) [06:52:57] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) p:05Triage→03High [06:58:42] 10DBA, 10Patch-For-Review: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) [07:01:04] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) [07:01:13] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) s5 was easy, just the new wikis. [07:06:24] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es1013.eqiad.wmnet - https://phabricator.wikimedia.org/T268436 (10Marostegui) [08:40:39] while testing our percona package, I found out we are missing some extra dependencies, I will update the repo [08:42:35] oh, they are in the repo, just there is not an updated package uploaded [08:43:47] yeah, who knows what's the last one we uploaded [08:43:55] it's been a while I think [08:44:20] if I wasn't going to go on vacation soon, I would upload a new version, as I would like to have an "emergency alternative" [08:44:36] but sadly I have little time available, I may do it after christmas [08:48:46] no worries [08:48:54] Hopefully we won't need it :) [08:56:13] I got it to run, at least: Version 8.0.17-8, Uptime 370s, read_only: True, event_scheduler: True, 10.60 QPS, connection latency: 0.001997s [08:59:02] 10DBA, 10Patch-For-Review: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) s3 gave errors when copied from db1124 and upgraded, so needs also copying from sanitarium masters. ` Dec 18 08:52:53 db1154 mysqld[20635]: InnoDB: tuple DATA TUPLE: 3 fields;... [09:00:19] I also confirmed that the access error is fixed on cumin buster, mysql.py -h db2102 "just works" [09:01:09] access error? [09:02:16] yeah, the old client library, the one on stretch didn't work with mysql8 [09:02:31] and I mentioned that it would be fixed once cumin was upgraded to buster [09:02:43] not sure if you remember the conversation, was many months ago [09:05:55] ah yeah [09:06:04] Indeed that was long time ago [09:12:34] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) [09:19:46] https://phabricator.wikimedia.org/P13595#74967 [09:20:10] nice, so mariadb specific [09:20:17] I would recommend to send a bug report [09:20:20] yeah [09:20:35] COpying your comments + my comment with the tracer should be enough [09:20:36] this was to make sure I wasn't making any terrible mistake on my side [09:20:53] like a charset mismatch or something [09:21:15] The other day I reported another report and I had to use a link to phabricator, as I reached the max amount of jira lines, this might be the case here too [09:21:18] So keep that in mind [09:24:32] jynus: have you tried running an analyze on the table or even an alter table engine=innodb,force to see what happens when the table gets rebuilt? [09:24:40] (And the indexes stats updated) [09:24:47] analyze yes, it was on the paste [09:24:59] as I understood it could be bad after a large import [09:25:11] yeah [09:25:21] if the table is not so big, I would try also a force rebuilt [09:25:28] sometimes mariadb asks for that right away XD [09:25:37] it is not huge, but it is not small, 4GB [09:25:41] but I can try too [09:26:04] I have found cases where it changes the optimizer and cases where it doesn't, I would try it and mention it on the ticket too [09:26:06] but note that on mysql it worked immediately after import [09:26:16] yeah, it is likely to be an optimizer issue [09:26:23] but shouldn't take long for 4GB [09:26:37] well, it also has 8 indexes [09:26:40] but I am running it now [09:26:46] I think it is not the stats [09:26:58] I think is a lack of features to identify short orders [09:27:01] Probably not, but mariadb usually asks for that if that has been tried [09:27:09] we have been hit with that in the past on production [09:27:19] yep [09:27:21] order by X limit N, where N is a low number [09:27:45] it actually took very little to rebuild, as there is no ongoing traffic [09:27:50] :) [09:28:05] sstill filesorting [09:28:19] which is fast now that it is on memory [09:28:22] expected yeah, but now we can say we've tried the analyze and the rebuild [09:28:43] but normally not very (thousands of time slower than using the index) [10:52:02] I have reverted db2102 to mariadb and replicating enwiki, but left over the config and files of percona, if at a later time we need more tests [11:10:59] hey marostegui, jynus! We decided not to restart our import yesterday because we were getting close to peak and it takes a while. Would it be okay for us to restart it with longer pauses and smaller batches today or is an import on a holiday Friday something that would make you nervous? [11:18:20] hnowlan: manuel is afk [11:18:50] ack, not urgent [11:19:14] probably consider waiting for next week, unless he says otherwise, if it is not urgent [11:20:04] hnowlan: what I can tell you is the metrics to check while importing [11:21:17] this is currently the server that ends up reciving the writes: https://grafana.wikimedia.org/d/000000273/mysql?var-server=db1107&var-port=9104&from=now-24h&to=now [11:21:59] and this is one of the servers we would like to avoid having lag: https://grafana.wikimedia.org/d/000000273/mysql?from=1608204104465&to=1608290504465&var-server=db1117&var-port=13322&orgId=1 [11:26:04] jynus: ah, very useful, thanks! [11:26:23] also yikes, the import is *very* visible heh [11:26:26] hey, quick answer from my phone: agreed with Jaime, let's wait for next week indeed if it is ok [11:33:56] PROBLEM - MariaDB sustained replica lag on db1106 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1106&var-port=9104 [11:34:16] I am looking at it [11:35:34] RECOVERY - MariaDB sustained replica lag on db1106 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1106&var-port=9104 [11:49:17] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10jcrespo) 05Resolved→03Open [11:54:17] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10jcrespo) Filed T270481, not yet UBN, but could become one as usage increases. [13:16:33] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) [13:17:47] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) 05Open→03Resolved Thanks Jaime for filing that task. I have added it as a subtask for this one. I am going to re-close this one, so we can follow... [13:17:59] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) [13:18:13] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) [13:18:47] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob bad database peformance on enwiki, commons, causing database lag? - https://phabricator.wikimedia.org/T270481 (10Marostegui) [13:21:41] 10DBA: Ensure InnoDB is compressed on the new clouddb hosts - https://phabricator.wikimedia.org/T270473 (10Marostegui) [13:30:44] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob bad database peformance on enwiki, commons, causing database lag? - https://phabricator.wikimedia.org/T270481 (10Marostegui) The master had a huge spike on deletes at 11:20, which matches the above graph... [13:46:45] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob bad database peformance on enwiki, commons, causing database lag? - https://phabricator.wikimedia.org/T270481 (10kostajh) > @Joe told me those were (possibly) jobs generated by newly deployed feature "wa... [13:49:38] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob bad database peformance on enwiki, commons, causing database lag? - https://phabricator.wikimedia.org/T270481 (10kostajh) cc @Samwilson @MusikAnimal [14:30:39] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob bad database peformance on enwiki, commons, causing database lag? - https://phabricator.wikimedia.org/T270481 (10jcrespo) To summarize all my findings above, there is 3 "bad" things happening right now:... [14:31:24] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10jcrespo) [14:33:05] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10Marostegui) >>! In T270481#6701497, @jcrespo wr... [14:34:09] thanks, I was about to add that, too [14:34:25] I had to look at it 5 times, as they are pretty similar [14:34:36] so not sure if you added it or not XD [14:35:41] maybe it is a coincidence, but this is again possibly the downsides of aligning OKRs with the end of a quarter and rushing to meet a deadline [16:47:27] consider voting also for https://jira.mariadb.org/browse/MDEV-13115 while mariadb is not a great queue system, I think many mw logic uses it like it, and if it gets implemented now we could use it by 2025 or so :-) [17:54:01] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10aezell) The engineers on CommTech are looking i... [18:41:14] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for niawiki - https://phabricator.wikimedia.org/T270414 (10nskaggs) [18:45:58] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for skrwiki - https://phabricator.wikimedia.org/T268412 (10nskaggs) a:03nskaggs [18:46:00] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for eowikivoyage - https://phabricator.wikimedia.org/T269427 (10nskaggs) 05Open→03Resolved This should be ready for use. [18:46:04] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for madwiki - https://phabricator.wikimedia.org/T269440 (10nskaggs) a:03nskaggs [19:01:31] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for niawiktionary - https://phabricator.wikimedia.org/T270410 (10nskaggs) [19:10:14] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10MusikAnimal) `removeWatchBatchForUser()` is onl... [20:01:32] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for skrwiki - https://phabricator.wikimedia.org/T268412 (10nskaggs) 05Open→03Resolved views created [20:01:35] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10Marostegui) >>! In T270481#6702282, @MusikAnima... [21:06:45] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items, 10Growth-Team, and 2 others: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10MusikAnimal) > Is there a way to throttle those...