[00:13:48] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [00:27:55] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:27:55] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:13:48] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:13:48] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [10:44:22] 10netops, 06Infrastructure-Foundations, 06SRE: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10520327 (10ayounsi) For (1) we can have the `sre.ganeti.addnode` cookbook call the PuppetDBImport script towards the end. What do you and @MoritzMu... [11:01:17] 10netops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: WMF RIPE Atlas probe in Eqsin offline - https://phabricator.wikimedia.org/T382519#10520381 (10ayounsi) Let's decom it and focus our efforts on spinning up VMs instead (T385560). It needs to be removed from the list on https://github.com/wikimedia/... [11:01:55] 10netops, 06Infrastructure-Foundations, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10520385 (10ayounsi) Let's decom it and focus our efforts on spinning up VMs instead (T385560). It needs to be removed from the list on https://github.com/wikimedia/operations-pupp... [11:36:34] FIRING: DiskSpace: Disk space build2001:9100:/ 1.709% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:01:07] elukey: ^^^ [12:01:31] ahahhahah [12:06:34] RESOLVED: DiskSpace: Disk space build2001:9100:/ 5.93% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:13:49] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [16:13:49] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [16:17:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10521665 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a50b2671-d855-40a0-8790-c502280b9115) set by cmooney@cumin100... [16:17:16] 10CAS-SSO, 06collaboration-services, 06Infrastructure-Foundations, 10GitLab (Auth & Access): gitlab account maps to two different developer accounts - https://phabricator.wikimedia.org/T384025#10521666 (10Jelto) I checked the user in the [GitLab admin menu](https://gitlab.wikimedia.org/admin/users/sstefano... [16:47:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10521778 (10cmooney) >>! In T384288#10505646, @RobH wrote: > Remote hands 01020815 scheduled for 2025-02-04 @ 0800 Pacific (1600 GMT). Ha... [17:03:49] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:13:49] FIRING: [3x] PuppetFailure: Puppet has failed on aux-k8s-etcd2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:23:49] RESOLVED: [2x] PuppetFailure: Puppet has failed on aux-k8s-etcd2004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:54:55] FIRING: SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:04:55] FIRING: [2x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:09:55] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:58:37] 10Mail, 06Infrastructure-Foundations, 06MediaWiki-Platform-Team, 10MediaWiki-User-login-and-signup, and 2 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#10522657 (10Krinkle) >>! **Task description** > `name=Log message > Could... [23:09:55] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed