[05:54:23] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on db1261:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:28:10] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2054:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:28:11] FIRING: [2x] SystemdUnitFailed: swift_dispersion_stats.service on thanos-fe1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:28:40] that'll be it's just been rebooted as part of the October reboots [08:33:11] RESOLVED: [2x] SystemdUnitFailed: swift_dispersion_stats.service on thanos-fe1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:26:38] @marostegui @Amir1 I tried showing the weight in the mysql dashboard (on a copy not to modify the original) to monitor a pool-in: see the dbctl weight VS Queries Per Second on the left https://grafana-rw.wikimedia.org/d/876bb257-6cc2-4898-b7de-c797e9d4bbf8/mysql-federico-s-copy?forceLogin=true&from=now-3h&orgId=1&refresh=1m&timezone=utc&to=now&var-job=%24__all&var-port=9104&var-server=es2032 [11:58:52] federico3: that is useful [12:28:58] db-mysql db1167 -e "SHOW REPLICA STATUS \G" | grep Master_Host -->>> Master_Host: db1193.eqiad.wmnet [12:29:00] a leftover? [12:29:38] I think I need more context [12:35:31] is there a date for the networm maintenance, I cannot see it on the task [12:36:44] jynus: No, there is some info about it like: This work is slated to start after Oct 15th and extend through the end of the month. Data-Persistence feedback is needed. [12:37:02] But nothing defined yet [12:37:08] ok, because I am not worried about it, but I would like to be a bit on top [12:37:23] bacula can be done for some time, just it cannot be for long [12:37:26] *down [12:37:35] same for backups in general [12:38:07] Probably worth adding those to the migrations comments section on the doc [12:38:30] I did, well, a summary [12:39:36] I don't needs a heads up for most, more like a "we are done" to check everything is good [12:39:56] that's why I asked if there was some initial planning, but I think there is not [13:41:05] Trying again, this time during the day :) Could I get a review of this very small patch for prometheus-mysqld-exporter? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1195769 [13:59:03] andrewbogott: I don't think it makes much sense to add the same config twice on 2 sides of a conditional? [13:59:47] that's true, I'm just trying not to add yet a third variable. iirc everything in puppet is a constant? [14:00:37] in any case, I am not sure that is the right fix [14:01:29] but I haven't tested Trixie. If it has to go now, I would add it only for that os [14:08:35] I think that could break multi-instance hosts [14:11:35] ok, i'll limit it to trixie. I did test it on a bookworm host but only one! [14:55:32] jynus: like https://gerrit.wikimedia.org/r/c/operations/puppet/+/1195769 better now? [14:56:19] fine to me, but the dbas are the one to ok it [14:56:24] *ones [15:13:48] Amir1, can I get a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1195769 ? [15:22:44] I'm not really feeling well today. I try to take a look tomorrow [15:30:56] ok! feel better