[07:38:36] <moritzm>	 headsup; I'll migrate the eqiad installserver to nftables in 10 minutes
[08:05:17] <godog>	 mmhh I launched this pipeline/job again and it is stuck in the same spot, thoughts/ideas on what it could be ? https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/jobs/829846
[08:09:47] <godog>	 ok I'll just wait I guess, looks like slow disk i/o
[19:33:18] <brouberol>	 I have a host (kafka-jumbo1016) seemingly stuck in PXE boot. I've been able to take a screenshot via the IDRAC web ui.  https://wikimedia.slack.com/archives/C055QGPTC69/p1779303718528979?thread_ts=1779274323.128399&cid=C055QGPTC69
[19:33:28] <brouberol>	 Does this ring a bell for anyone?
[19:34:56] <brouberol>	 hmm actually maybe I should ask #wikimedia-sre-foundations
[19:39:22] <sukhe>	 brouberol: I wonder if this is because you have a partman recipe specified for BIOS but it is attempting to do UEFI instead (which is the default for all reimages)
[19:39:56] <sukhe>	 and thatn probably requires a reprovision with --legacy 
[19:40:01] <sukhe>	   'kafka-jumbo101[0-8]':
[19:40:01] <sukhe>	     - reuse-parts.cfg
[19:40:01] <sukhe>	     - partman/custom/reuse-kafka-jumbo.cfg
[19:40:31] <sukhe>	 I may be wrong, but we saw someting similar for a recent LVS reimage, see https://phabricator.wikimedia.org/T421421#11914273
[19:40:35] <sukhe>	 I can't type, something
[19:41:20] <brouberol>	 Thanks! brb, my daughter can't sleep, I'll come back when I can
[19:46:59] <brouberol>	 Nice, so I should kill the current cookbook, and simply run the provisioning one with —legacy?
[19:47:24] <brouberol>	 At least there shouldn’t be any harm in trying this I guess 
[20:08:21] <brouberol>	 (famous last words, as I've never used the cookbook before. Here's to hoping I don't f things up some more)
[20:18:29] <brouberol>	 hmm, the cookbook does not seem to be able to shut down the host. I'm aborting
[20:18:38] <brouberol>	 welp, I probably made it worse then
[20:24:39] <jhathaway>	 sukhe: yeah that makes sense
[20:24:43] <jhathaway>	 nice find
[20:26:52] <brouberol>	 jhathaway if that's ok with you, I'll follow up here instead of #-sre-foundations to avoid too much x-channel activity 
[20:27:08] <jhathaway>	 please do
[20:27:14] <brouberol>	 so, I'm attempting to re-run the reimage cookbook, to see if it still fails the same way 
[20:28:11] <brouberol>	 I got to
[20:28:12] <brouberol>	 Running IPMI command: ipmitool -I lanplus -H kafka-jumbo1016.mgmt.eqiad.wmnet -U root -E chassis bootparam set bootflag none options=reset
[20:28:12] <brouberol>	 Running IPMI command: ipmitool -I lanplus -H kafka-jumbo1016.mgmt.eqiad.wmnet -U root -E chassis bootparam get 5
[20:28:12] <brouberol>	 Running IPMI command: ipmitool -I lanplus -H kafka-jumbo1016.mgmt.eqiad.wmnet -U root -E chassis bootparam get 5
[20:28:12] <brouberol>	 Checked BIOS boot parameters are back to normal
[20:28:12] <brouberol>	 [1/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Reboot for kafka-jumbo1016.eqiad.wmnet not found yet
[20:28:25] <brouberol>	 and it's back to looping over these checks
[20:29:57] <brouberol>	 with the same data on screen visible through the IPMI web ui
[20:31:09] <jhathaway>	 from the serial console it seems to be stuck in the debian installer at the moment
[20:31:13] <jhathaway>	 [/dev/sda] ERROR: recipe partition count (2) != actual partition count (1)
[20:31:19] <jhathaway>	 is the error on the screen
[20:31:54] <brouberol>	 ooh, I might have been looking at the wrong thing then, aka the vconsole in the IPMI ui instead of the serial console
[20:34:05] <jhathaway>	 the first part of the error is "reuse-parts: Recipe mismatch with existing partitioning"
[20:34:20] <brouberol>	 tell you what, it's 10:30PM local time, way past the time to futz with parted. I'll pick this up in the morning w/ b.tullis and we'll see how we can fix this
[20:34:43] <jhathaway>	 nod, happy to help as well, enjoy your evening
[20:34:57] <brouberol>	 I'd rather we don't lose the data on the second disk, as otherwise we'd force some pretty large catchup from other brokers (about ~10TB of data)
[20:35:15] <brouberol>	 s/the second disk/the RAID array
[20:35:21] <brouberol>	 thanks again y'all