[03:14:16] PROBLEM - MariaDB sustained replica lag on s7 on db1236 is CRITICAL: 287 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1236&var-port=9104 [03:16:52] PROBLEM - MariaDB sustained replica lag on s4 on db1243 is CRITICAL: 18 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1243&var-port=9104 [03:16:54] PROBLEM - MariaDB sustained replica lag on s3 on db1189 is CRITICAL: 49.25 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1189&var-port=9104 [03:16:54] PROBLEM - MariaDB sustained replica lag on s4 on db1242 is CRITICAL: 66 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1242&var-port=9104 [03:16:54] PROBLEM - MariaDB sustained replica lag on s4 on db1252 is CRITICAL: 124.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1252&var-port=9104 [03:16:56] PROBLEM - MariaDB sustained replica lag on s4 on db1244 is CRITICAL: 130 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1244&var-port=9104 [03:18:16] RECOVERY - MariaDB sustained replica lag on s7 on db1236 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1236&var-port=9104 [03:18:54] RECOVERY - MariaDB sustained replica lag on s3 on db1189 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1189&var-port=9104 [03:18:54] RECOVERY - MariaDB sustained replica lag on s4 on db1242 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1242&var-port=9104 [03:18:56] RECOVERY - MariaDB sustained replica lag on s4 on db1244 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1244&var-port=9104 [03:19:54] RECOVERY - MariaDB sustained replica lag on s4 on db1243 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1243&var-port=9104 [03:19:56] RECOVERY - MariaDB sustained replica lag on s4 on db1252 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1252&var-port=9104 [03:32:59] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1262:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:16:03] federico3: which host were you referring to? if it is db1262, that is a perfectly fine host (it is super new), but it simply has a broken DIMM which needs replacement [06:37:58] I've silenced the above alert [10:19:08] I'm going to retry a few backups that failed due to network issue [10:56:39] ok if I reload haproxy for dbproxy1024 and dbproxy1029 ? [10:58:41] Yep! [10:58:54] The network maintenance [10:58:55] Thanks [10:59:03] I can also do it if you are not already there [11:14:59] nah, I was already logged in [11:17:13] I was reviewing red staff/cleaning up after the net issue and so those [11:17:26] *saw [11:27:18] Thanks! [13:23:26] there's a new warning for db2239 https://alerts.wikimedia.org/?q=%40cluster%3Dwikimedia.org&q=instance%3D~%5E(db%7Cpc%7Ces)%5B12%5D.* [13:25:57] I'm not feeling well, nothing major but I might go offline early today [13:34:40] take care [13:56:50] Get rest [15:00:25] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1264:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:15:25] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1264:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed