[02:12:20] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:47:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:52:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:12:20] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:16:48] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10556218 (10JMeybohm) From lunch discussion in Atlanta: It would be ideal if we could create a recordin... [09:18:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [09:28:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [09:31:46] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10556225 (10fgiunchedi) Since we have to overwrite `instance` with the host instead of the router, that... [10:12:20] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:12:21] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:48:13] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10557253 (10akosiaris) [18:12:21] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:12:55] FIRING: MaxConntrack: Max conntrack at 80.74% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [18:17:55] RESOLVED: MaxConntrack: Max conntrack at 80.74% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [21:39:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:44:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:12:21] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed