[08:52:46] PROBLEM - MariaDB sustained replica lag on s4 on db2236 is CRITICAL: 602.5 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2236&var-port=9104 [08:59:46] RECOVERY - MariaDB sustained replica lag on s4 on db2236 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2236&var-port=9104 [10:05:30] volans: how can i test parts of a new cookbook [safely] before committing the changes? [10:05:55] https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Test_before_merging [10:13:10] nothing that can be done during development, before committing? Also, some cookbooks use def run(args, spicerack): as entry point instead of classes, which is much simpler, can I use that? [10:13:57] have you read https://doc.wikimedia.org/spicerack/master/introduction.html#api-interfaces ? [10:15:20] tehcnically speaking there is a way to have a shell to try thing, I've been recluctant to make it generally available due to implied risks of running random commands, not peer reviewed, not audited, etc... [10:15:28] if you really need it I can show it to you in private [10:16:12] federico3: I think the plan also was to have a cumin installation on the test vms [10:16:22] *test db virtual machines [10:16:31] jynus: that would be *really* helpful [10:16:51] I didn't get to it, but being a good idea, I encourage to set up that [10:16:55] I think I can give you access [10:17:12] jynus: cumin or spicerack? I doubt you have all the pieces needed to make spicerack work [10:17:27] well, anything you want [10:17:35] as it is not production, we can install whatever we want [10:17:48] are you creating a staging env from production? :D [10:17:57] there is one already [10:18:02] no there isn't :D [10:18:03] but I don't know who was the last to use it [10:18:13] I mean for databases [10:18:43] sure but to have spicerack work in a way that you can test things you will need all the piece of the infrastructure [10:18:48] federico3: do you have access to the mariadbtest project on horizon? [10:18:53] so basically it will need to be a staging mimicking producion :) [10:19:00] there is a cumin-1 host [10:20:47] Amir1 marostegui any concern on giving sudo to federico3 on the mariadbtest cluster? who was using it for the last time? [10:21:24] volans: in any case, I belive the mariadb test cluster will be useful to him [10:22:02] that's for sure, not saying it shouldn't be used :) jsut that might not be helpful to test spicerack/cookbook related stuff [10:22:17] that's for him to discover :-D [10:25:32] jynus: I'm not sure I have access, on min... [10:41:57] I have added him access, feel free to mention if you are using some vm that shouldn't be touched [11:30:30] heads up I'm doing more clouddb reboots today, if you see any clouddb alerts [11:51:40] jynus: the wmcs cluster? Sure go ahead! [11:57:27] I need to reboot to print some travel stuff (don't ask) [12:00:45] jynus went into secure printing mode :-P [12:14:06] lol [12:23:23] jynus is back from the secure printing mode :-P [12:23:38] I wish [12:24:30] I need a sticker that says "in my other Linux installation, the printer drivers work" [12:56:50] jynus: whenver you have some time, can you check the hosts pending to upgrade and rebuild from your side at https://phabricator.wikimedia.org/T385550 [12:58:11] there should be none left from my side [12:58:21] I will update the ticket after lunch [12:58:24] jynus: I see some backup sources there not running 10.6.20 [12:58:33] I can do them if you like [12:58:35] ah, true [12:58:44] But no worries, not in a rush, enjoy your meal [12:58:58] they all should be on bookworm but not on 10.6.20 [12:59:28] jynus: Whenever you have time, it would be great if you can just upgraed to 10.6.20 and rebuild linter, recentchanges and pagelinks [12:59:30] as those were upgraded months ago [12:59:33] I can do it if you prefer [12:59:54] its ok, but I don't think I will be able to finish this week [13:00:00] No worries, no rush [13:13:14] 3 hosts are missing: (3) db[2198-2200].codfw.wmnet [19:29:48] FIRING: MysqlReplicationLagPtHeartbeat: MySQL instance db2213:9104 has too large replication lag (12h 7m 22s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2213&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [21:00:52] I take a look [21:06:35] candidate master for s5, not scary at all [21:09:58] in orchestrator it's fine, I think it recovered