[04:34:40] PROBLEM - MariaDB sustained replica lag on s8 on db2167 is CRITICAL: 15 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=9104 [04:35:40] RECOVERY - MariaDB sustained replica lag on s8 on db2167 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=9104 [09:00:38] btullis: when would it be a good time to upgrade dbstore* to 10.6.20? [09:00:42] (mariadb that's it) [09:37:57] marostegui: Thanks. dbstore1007 and dbstore1009 can be upgraded to 10.6.20 any time before the end of the week. I'd rather hold off on dbstore1008 for a few days, while this enwiki dumps is running, if that's OK. https://dumps.wikimedia.org/enwiki/20250123/ [10:09:47] btullis: Excellent, I will do those two today and then I will sync up before doing dbstore1008, next week [10:09:51] I will create a task and we can coordinate there [10:09:53] Thank you [10:17:19] btullis: 1007 and 1009 done, and back up [10:30:23] Great, many thanks. [10:32:44] There might still be a window to do dbstore1008 this week, which would be before the 20240201 dumps kick off and use it on Saturday. I'll keep you informed. [10:33:52] btullis: Sure, just let me know if it is possible, if not, we can do it next week [10:58:50] Going to reboot db2185, orchestrator/zarcillo stand by DB [11:14:16] federico3 jynus any objections if I upgrade Ochestrator database? It will be down for a few minutes [11:23:41] no objections [11:23:47] thanks! [11:34:58] none from me [11:39:04] thanks guys [11:46:18] Done [11:53:53] db1171 booted through network, but it didn't get any partman recipe [11:54:16] has that ever happened to you? I checked installserver/preseed.yaml and it looks correct [11:54:39] Should I just retry? [11:55:03] jynus: when was that? there was an issue with partman due to a bad patch merged during the weekned [11:55:15] this morning [11:55:23] when was it corrected? [11:55:35] moritz fixed it with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114349 [11:55:48] ah, that must be it [11:55:58] thank you- and good that I asked [11:56:07] I was confused [11:56:12] I bet [11:56:21] I will retry then [12:05:06] I think it is now going on without issues [12:22:00] glad it fixed it :) [12:28:48] PROBLEM - MariaDB sustained replica lag on s3 on db2227 is CRITICAL: 69.5 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2227&var-port=9104 [12:29:35] ^ me [12:31:50] RECOVERY - MariaDB sustained replica lag on s3 on db2227 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2227&var-port=9104 [14:28:21] <_joe_> I will be late to the meeting, I have to finish an ongoing conversation [14:28:26] <_joe_> please start without me [14:31:50] ack [19:31:28] (non-urgent) I see the quick-fix procedure for index corruptions has been removed from the "Temporary incident response steps" doc. [19:31:28] to confirm, is that bug something that we don't expect should happen anymore, or is it just that the preferred (default) guidance is to simply depool and open a task?