[11:07:56] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-ethtool-exporter.service on backup2014:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:11:29] apparently prometheus-ethtool-exporter is missing from trixie [11:12:04] but puppet sets it up? [11:22:35] ah, I think I fixed it, it was a leftover, and it was creating auto-dooming facts [11:22:56] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-ethtool-exporter.service on backup2014:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:24:16] I don't necesarilly like the whole autorestart daemons, it would be nice to make it a bit more intelligent and create only 1 daemon per host [12:02:56] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1169:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:11:27] the host is not lagging from what I'm seeing [12:11:31] (s1) [13:37:24] That host is being used for uefi [13:37:27] testing from moritzm [13:37:36] ah okay, have fun! [13:38:55] shall I set notifications_enabled: false in hieradata/hosts/db1169 for it? [13:39:20] I've run into what appears to be an unrelated Netbox traceback, so these tests will extend to the next week [13:40:28] ah, it already has that set actually [13:41:19] moritzm: yeah, the above doesn't respect it [13:41:30] :-( [13:50:07] ok [14:53:02] Amir1, marostegui: I'm able to reproduce the get_columns bug reliably in functional tests but only with all_dbs=False, see https://gitlab.wikimedia.org/repos/data_persistence/dbtools/auto_schema/-/merge_requests/16 [15:00:04] Amir1: if you can find few mins to take a look, if we can gain confidence that the functional tests are reliable I can then fix the get_column bug [15:09:15] federico3: unfortunately we rarely use it set to False :( [15:10:38] I'll try more combinations of flags to see if I can reproduce it with all_dbs=True (as you said it's happening with True) [15:14:09] Yeah :( [15:32:41] We have issues in all dbs to True not the other way around. [15:47:07] Amir1: I'm referring to this https://gitlab.wikimedia.org/repos/data_persistence/dbtools/auto_schema/-/merge_requests/16/diffs#6fab28bd2bf5f9ea5478ad78adf68e34e9f3d8f8_0_186 [16:04:59] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1169:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:05:10] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1169:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed