[10:19:53] FIRING: ProbeDown: Service webperf2003:443 has failed probes (http_performance_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#webperf2003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:24:53] FIRING: [2x] ProbeDown: Service webperf1003:443 has failed probes (http_performance_wikimedia_org_ip6) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:30:35] tappof: possibly your new probes? [10:39:01] yeah, I'll check [14:24:53] FIRING: [2x] ProbeDown: Service webperf1003:443 has failed probes (http_performance_wikimedia_org_ip6) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:34:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:44:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:16:25] FIRING: SystemdUnitFailed: check_icinga-alert1002.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:21:25] FIRING: [2x] SystemdUnitFailed: check_icinga-alert1002.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:26:06] yeah [17:26:11] icinga needs a bounce probably [17:26:40] SIGSEV hmm [17:27:06] any olly folks around and want to check? otherwise I can [17:27:26] there is also this on the interface > Warning: Status data OUTDATED! Last status data update was 1013 seconds ago! [17:33:36] restarted and then I realized I should have done https://wikitech.wikimedia.org/wiki/Service_restarts#Icinga instead [17:33:47] sorry about the noise for the on-call folks, next time I will be more careufl [17:36:25] RESOLVED: [2x] SystemdUnitFailed: check_icinga-alert1002.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed