[00:08:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:08:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:12:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:12:55] FIRING: MaxConntrack: Max conntrack at 83.83% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [02:17:55] RESOLVED: MaxConntrack: Max conntrack at 81.48% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [06:12:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:12:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:28:19] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Migrate port utilisation alert from LibreNMS to alertmanager - https://phabricator.wikimedia.org/T384052#10516521 (10ayounsi) Looks all good to me ! First start with non-paging, and revisit later on. I'm wondering if we could re-wri... [13:51:21] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10517106 (10fgiunchedi) >>! In T384731#10511648, @cmooney wrote: >>>! In T384731#10511163, @fgiunchedi... [13:55:50] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Migrate port utilisation alert from LibreNMS to alertmanager - https://phabricator.wikimedia.org/T384052#10517133 (10fgiunchedi) SGTM too, re: extracting hostname from interface description we could do it via regexp if the extraction/... [14:13:50] 10Mail, 06Infrastructure-Foundations, 06MediaWiki-Platform-Team, 10MediaWiki-User-login-and-signup, and 2 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#10517221 (10jhsoby) It's happening again: {T385467} [14:41:59] 10CAS-SSO, 06Infrastructure-Foundations, 10GitLab (Auth & Access): gitlab account maps to two different developer accounts - https://phabricator.wikimedia.org/T384025#10517383 (10SLyngshede-WMF) p:05Triage→03Medium @Jelto could you take a look at this. I think it probably works as intended, i.e. two OID... [15:22:43] 10netops, 10Hiddenparma, 06Infrastructure-Foundations, 10Prod-Kubernetes, 07Kubernetes: Allow reaching services on the aux k8s cluster bypassing the CDN - https://phabricator.wikimedia.org/T382269#10517586 (10CDanis) 05Open→03Resolved a:03CDanis Do we think we have any non-ingress, non-nodeport... [16:15:37] 10Mail, 06Infrastructure-Foundations, 06MediaWiki-Platform-Team, 10MediaWiki-User-login-and-signup, and 2 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#10517754 (10Tgr) >>! In T383047#10455692, @jhathaway wrote: > If possible... [16:17:00] 10Mail, 06Infrastructure-Foundations, 06MediaWiki-Platform-Team, 10MediaWiki-User-login-and-signup, and 2 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#10517759 (10Tgr) Btw there was a huge spike in these errors today, that's... [18:27:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:32:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:37:55] FIRING: [3x] SystemdUnitFailed: etcd.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:42:55] FIRING: [4x] SystemdUnitFailed: etcd.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:45:04] ^ expected, being turned up [18:47:55] FIRING: [5x] SystemdUnitFailed: etcd.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:48:10] FIRING: [5x] SystemdUnitFailed: etcd.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:52:55] FIRING: [4x] SystemdUnitFailed: etcd.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:08:48] FIRING: [2x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:13:48] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:27:55] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed