[03:21:25] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es1042:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:16:25] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es1042:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:38:05] jynus for when you get online: PROBLEM - Disk space on dbprov2004 is CRITICAL: DISK CRITICAL - free space: /srv 449399 MB [08:43:54] I have switched m1 master's proxy [08:44:07] I will probably do m2 as well, but not today most likely [09:09:56] <_joe_> q: how often do we switch these proxies? [09:10:29] <_joe_> and, how automated is it? [09:44:02] _joe_: Mostly when we need to reboot/upgrade [09:44:05] _joe_: It is a DNS change [09:44:47] We are in process of refreshing them now, so I am taking the opportunity to reboot and pick up the new kernels for https://phabricator.wikimedia.org/T376905 [09:48:19] so apparently someone upgraded db2197 mariadb version, which made backups fail since the 6th [09:48:42] That was probably me [09:48:46] But why did they fail? [09:48:47] please don't do that without contacting me first, as dbprov and backup sources have to be done at the same time [09:49:05] jynus: Ah I see, it got upgraded because there was a table corruption [09:49:19] [03:06:04]: CRITICAL - Error while performing backup of /srv/backups/snapshots/ongoing/snapshot.s2.2025-01-09--01-10-11 [09:49:20] [03:57:02]: ERROR - xtrabackup version mismatch- xtrabackup version: {'major': '10.6', 'minor': 17, 'vendor': 'MariaDB'}, backup version: {'major': '10.6', 'minor': 20, 'vendor': 'MariaDB'} [09:49:46] that's ok, I am not saying not to upgade, just give me a heads up so I can make sure dbprovs are in sync [09:49:47] I know nothing of backups, but I'm surprised the backup software cares about minor versions [09:50:17] my checks does, it wants xtrabackup to be the same or higher version, otherwise there may be problems [09:50:17] (you may be able to infer the former from the latter :) ) [09:50:19] jynus: Yeah, will do. :( [09:50:47] Emperor: probably xtrabackup would run ok, but it is a check I do just to be safe [09:51:21] I keep the data so I could run it later, which is why it got its disk full [09:53:15] Emperor: majore features are created or disappear in minor versions for MariaDB [09:56:16] o.O [09:56:38] 🤷 [09:56:49] :) [09:56:54] <_joe_> chaothic versioning is the only versioning [09:58:32] but in any case, if Manuel upgraded the host because there was a bug, I don't want to run xtrabackup (which is just a mariadb instance) with the original version, so I created that check on backup software [09:58:55] doesn't have to be the same version, but it cannot be lower [10:02:59] so no harm done, alerts worked as intended, just that now I have to do the upgade in a bit more of a rush [10:24:25] marostegui: should I depool and reboot pc4? It's really fun [10:24:34] Amir1: Go for it [10:24:42] We have different opinions on what fun means though [10:24:45] 😈 [10:57:51] the dbprov2004 issue should be fixed now, rerunning backups now to confirm [10:58:06] <_joe_> marostegui: de gustibus non est disputandum [11:03:53] Hahahaha [11:04:00] Yeah we all know Amir1 [11:09:07] :D [11:34:36] So probably starting next week (and should be fast, but not commiting to a date for now) I will start upgrading pending backup hosts to bookworm [11:35:11] https://phabricator.wikimedia.org/T376916 [11:36:35] (and mariadb 10.6.20, where corresponds) [16:05:18] backups finished, issues fixed