[05:17:09] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [07:30:19] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Upgrade lists-next to bullseye mailman versions - https://phabricator.wikimedia.org/T280887 (10Legoktm) [11:41:45] tendril is acting weirdly since the morning, I will check later in the day or even tomorrow. It is not urgent at all [13:19:23] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) Tendril has been acting weirdly for quite a few hours. I have narrowed the issue to `global_variables` and `general_log_sampled` tables This first table has never given any issues before. We don't use this... [13:25:45] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) Tendril is back up, but very slow. I think we really need to truncate `general_log_sampled` [13:27:43] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) Something started at 2am: https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=24&orgId=1&var-server=db1115&var-port=9104&from=now-24h&to=now [13:29:42] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=3&orgId=1&refresh=5m&var-server=db1115&var-datasource=thanos&var-cluster=mysql&from=now-24h&to=now [13:32:45] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1156 is corrupted and needs to be recloned, probably best to use a logical dump. [13:35:22] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) db1156 needs to be built from a logical dump. The copy from db1074 looks corrupted, so best to build it from the logical dumps. [13:38:02] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) @jcrespo may I offload the above ^ to you? that would help me a lot time-wise [16:07:26] 10DBA, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata.org, and 4 others: [Story] Monitor size of some Wikidata database tables - https://phabricator.wikimedia.org/T68025 (10Addshore) [17:46:11] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) `processlist_query_log`, is holding lots of transactions. That table is partitioned and total size was around 50G. We obviously don't use that for anything, so I have truncated it. The server is not yet at... [17:48:43] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) I have set `innodb_purge_threads = 3` to see if it helps reducing the purge lag the server started to increase since 2AM. So far it is slowly recovering: https://grafana.wikimedia.org/d/000000273/mysql?vie... [18:06:32] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) Waited for the innodb_list_length to be 0, and I have started the event_scheduler again...we'll see how it goes. [18:40:22] 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) I think tendril is definitely back and operating normally. It is hard to actually describe what was the culprit this time. There were definitely huge tables creating lots of contention (I have truncated al...