[02:15:25] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db2230:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:46:03] FIRING: PuppetFailure: Puppet has failed on ms-be2094:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:15:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db2230:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:46:03] FIRING: PuppetFailure: Puppet has failed on ms-be2094:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:35:48] RESOLVED: PuppetFailure: Puppet has failed on ms-be2094:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:40:18] same issue from partial install by dc-ops yesterday evening; I've fixed the non-empty-spinning-disk and am now reinstalling [08:45:25] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db2230:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:54:35] FIRING: DiskSpace: Disk space ms-be1070:9100:/srv/swift-storage/sda3 3.112% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=ms-be1070 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:10:20] This is T377827 again (and ms-be1067 is in WARN) [09:10:20] T377827: Disk near-full warnings on ms swift backends for container filesystems due to some bloated sqlite files - https://phabricator.wikimedia.org/T377827 [09:19:35] RESOLVED: DiskSpace: Disk space ms-be1070:9100:/srv/swift-storage/sda3 3.112% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=ms-be1070 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:20:25] FIRING: SystemdUnitFailed: swift-object.service on ms-be1070:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:15:25] FIRING: SystemdUnitFailed: swift-object-reconstructor.service on ms-be1088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:20:25] FIRING: [2x] SystemdUnitFailed: swift-container-sharder.service on ms-be1088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:27:42] that's the trixie test host, I've restarted the affected services [13:30:25] RESOLVED: [2x] SystemdUnitFailed: swift-container-sharder.service on ms-be1088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:54:43] Hey, I'm trying to build a new wmf-debci and I can't invoke the correct docker-pkg command. I'm following https://wikitech.wikimedia.org/wiki/Kubernetes/Images#Production_images on build2001 - I can't get it to select wmf-debci at all, it returns the selection as blank. [17:55:19] Removing the --select arg only has it claiming it would build a few images too. [17:56:21] ....wrong chan, it seems [18:22:38] PROBLEM - MariaDB sustained replica lag on s2 on db1233 is CRITICAL: 1334 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1233&var-port=9104 [18:23:10] PROBLEM - MariaDB sustained replica lag on s7 on db1174 is CRITICAL: 1135 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1174&var-port=9104 [18:23:41] PROBLEM - MariaDB sustained replica lag on x1 on db1224 is CRITICAL: 227 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1224&var-port=9104 [18:23:43] PROBLEM - MariaDB sustained replica lag on s2 on db1259 is CRITICAL: 819 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1259&var-port=9104 [18:24:41] RECOVERY - MariaDB sustained replica lag on x1 on db1224 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1224&var-port=9104 [18:27:39] RECOVERY - MariaDB sustained replica lag on s2 on db1233 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1233&var-port=9104 [18:28:09] RECOVERY - MariaDB sustained replica lag on s7 on db1174 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1174&var-port=9104 [18:28:45] RECOVERY - MariaDB sustained replica lag on s2 on db1259 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1259&var-port=9104 [18:30:39] PROBLEM - MariaDB sustained replica lag on s4 on db1221 is CRITICAL: 396 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104 [18:36:39] RECOVERY - MariaDB sustained replica lag on s4 on db1221 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104