[03:11:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [11:15:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:53:48] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on puppetserver2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:03:48] FIRING: [4x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:08:48] FIRING: [5x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:13:48] FIRING: [6x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:08:48] FIRING: [6x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:13:48] FIRING: [6x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:15:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:18:48] FIRING: [6x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:28:48] RESOLVED: [4x] PuppetZeroResources: Puppet has failed generate resources on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:52:18] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Add an option to the reimage cookbook to also update firmware - https://phabricator.wikimedia.org/T410384#11401096 (10LSobanski) p:05Triage→03Medium [15:54:09] 10netops, 06Infrastructure-Foundations: rancid: message has lines too long for transport - https://phabricator.wikimedia.org/T410606#11401128 (10LSobanski) p:05Triage→03Low [16:03:15] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Add an option to the reimage cookbook to also update firmware - https://phabricator.wikimedia.org/T410384#11401180 (10cmooney) For a little bit more background we most regularly encounter PXEboot failures due to a firmware version on hosts with Broadcom BCM57... [16:17:30] 10netops, 06Infrastructure-Foundations, 06SRE: Codfw row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910 (10cmooney) 03NEW p:05Triage→03Medium [16:17:41] 10netops, 06Infrastructure-Foundations, 06SRE: Codfw row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11401242 (10cmooney) [16:27:09] 10netops, 06Infrastructure-Foundations, 06SRE: Codfw row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11401277 (10cmooney) [16:35:49] 07Puppet, 06Infrastructure-Foundations, 06serviceops-radar, 06SRE: Fix UIDs for deployment server users - https://phabricator.wikimedia.org/T163667#11401343 (10LSobanski) [17:05:43] 10netops, 06Infrastructure-Foundations, 06SRE: Codfw row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11401603 (10cmooney) [17:15:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:58:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11402451 (10RobH) Day 9 Update: * 9 hosts moved, 10 remain - 300 hosts total at start of migration * John worked with Ben directly to migrate the (8) Data Pla... [20:46:59] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Add an option to the reimage cookbook to also update firmware - https://phabricator.wikimedia.org/T410384#11402862 (10bking) Hey Moritz and Cathal, Just wanted to add my .02 as someone who's been bitten a few times by the firmware stuff, including writing [[...