[01:28:48] FIRING: PuppetFailure: Puppet has failed on ms-be1092:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:33:48] FIRING: [2x] PuppetFailure: Puppet has failed on ms-be1092:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:33:48] FIRING: [2x] PuppetFailure: Puppet has failed on ms-be1092:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [07:17:51] that's one of the new nodes, presumably it didn't image properly last night for the DC team [07:43:38] Could I get a sanity check on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164981 ? Alex seem to be on vacations [07:48:36] yes [07:51:44] thank you [07:52:25] no review will be perfect, but the more eyes to spot simple mistakes, the better [07:52:51] we are finally getting rid of backup1001 [08:30:53] Is it normal that ms-be2077 is so loaded? [08:35:15] it's a bit unusual [08:36:25] but it's not a full or obviously-yet-failing disk, so I don't think there's anything to do right now. The container-replicator is using a chunk of CPU, but there's plenty spare [10:13:56] Finally completed the freeing of 16U T387892 [10:13:57] T387892: Decommission backup1001, backup1002, backup2001, backup2002 (and their arrays) - https://phabricator.wikimedia.org/T387892 [10:15:40] very close to finish T376916 too [10:15:41] T376916: Upgrade backup hosts to Debian Bookworm 12.X - https://phabricator.wikimedia.org/T376916 [10:21:41] both ms-be10xx nodes reimaged fine once I'd cleaned out sda [10:22:40] look out for CRs to put them in the rings soon :) [10:49:23] Can I get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1165851 to add new backends to swift::storagehosts and then https://gerrit.wikimedia.org/r/c/operations/puppet/+/1165852/1 to add them to the rings and drain one more old node, please? [10:49:36] CR text should hopefully point at the relevant docs :) [10:50:35] what about ms-be1091 ? [10:50:58] that's the test SM node with the new controller card in; it does need bringing into prod, but that's a bit more complicated [11:00:12] taking a break [11:00:40] thanks for the reviews :) [13:45:07] FYI @team I'm switching to IRCCloud