[10:43:11] moritzm: cp4026 buster installer is asking for input [10:43:20] right now the partitioning config [10:43:24] I guess that's not expected [10:43:53] but maybe it was a L8 from my side, let me try again... O:) [10:48:12] which partman recipe? [10:48:30] custom/varnish.cfg [10:48:40] modules/install_server/files/autoinstall/partman/custom/varnish.cfg [10:50:27] I remember that we ran into a problem with one broken partman recipe during the initial buster tests (where updated d-i flagged an error which was previously silently ignored, let me check Phabricator [10:50:57] ah, forgot about the broken search... [10:51:34] I've restarted the process cause I was worried about an early Intro I hit could affect the installer behaviour [10:51:57] the installer first stops on "Download debconf preconfiguration file", asking for the netmask [10:52:28] even though "ip addr" shows a valid config: inet 10.128.0.126/24 scope global enp5s0f0 [10:53:09] from the log.. [10:53:16] https://www.irccloud.com/pastebin/kO6dz9W2/ [10:53:23] that might be a recurrence of the issue you once had with the lvs/codfw? I can merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/564729/ and we can re-try [10:54:06] same for the gateway [10:54:21] moritzm: hmmm I don't think so [10:54:28] dhcp works as expected [10:54:33] ok [10:54:35] and it's configuring the proper NIC [10:54:36] it looks different from what we saw with those es hosts, indeed [10:55:10] also the gateway prompt is prefilled with the proper value [10:57:24] vgutierrez: can I have a look over the mgmt?, idrac says it's currently in use [10:57:42] let me close my ssh :) [10:57:52] take over please [10:58:06] right now you should see the installer waiting for partman instructions [11:15:11] it's puzzling! can you re-trigger the reimage, I'd like to have a look over the serial console if there's any thing printed error-wise in d-i which isn't in the logs [11:16:29] moritzm: sure, triggering [11:16:57] moritzm: it should reboot anytime now [11:17:05] ack, indeed. [11:18:04] the bug I remember earlier was probably https://phabricator.wikimedia.org/T229915, but not used in varnish.cfg [11:41:05] sigh :) [11:41:55] there's no error printed and also not something obvious in the logs, the netcfg settings are correctly taken for private1-ulsfo.cfg [11:42:58] looking at the partman d-i log next, but if there's no clue, we could narrow it down by temporarily switching the partman recipe to the standard.cfg to see whether there's a more fundamental issue with that host [11:43:13] sure [11:43:22] let me open a task first [11:46:21] https://phabricator.wikimedia.org/T243506 [11:47:33] moritzm: cp4026) echo partman/standard.cfg partman/raid1-dev.cfg should do the trick, right? [11:47:50] *raid1-2dev :) [11:47:59] ack [11:48:06] submitting CR... [11:50:26] moritzm: https://gerrit.wikimedia.org/r/566711 [11:53:26] looks good, let's give that a shot [11:55:06] triggering puppet on installservers... [11:56:16] and triggering a new reimage [11:56:23] ack [11:56:28] moritzm: should I take care of the serial console? [11:56:46] yep, I left the console [11:56:50] ack [11:57:17] for some reason /var/lib/preseed/log on cp4026 did not contain the netmask setting, I'll dig into what could have caused this [12:05:48] interesting.. [12:06:01] same issue with standard raid1-2dev [12:06:02] :/ [12:15:01] both for the netmask and the partition scheme? [12:15:56] yep [12:16:03] same behaviour [12:17:44] vgutierrez: can I have a peek at the serial console again? [12:18:14] sure, go for it [12:45:35] moritzm: considering that the partitioning is already done from previous installations I'm tempted to just reuse it [12:49:06] I think something more fundamental is broken here, it's odd that this affects the two setting blocks that are conditionally selected in http://apt.wikimedia.org/autoinstall/preseed.cfg (picking the subnet settings and the partman recipe) [12:51:03] rightr [12:52:25] hmm anybody else reimaged a buster host recently? [12:53:46] we could e.g. reimage cp1008, which is a spare [12:54:05] cp1008 reallly needs to be decom'ed :) [12:54:15] it's one of the constant outliers in all of our planning/reports/etc. [12:54:26] I know its hostname by heart at this point :) [12:54:41] hmmm I think I got it.. [12:54:47] https://github.com/wikimedia/puppet/commit/eec313c1c39f32ed23ee200c889a63a53ff8a060#diff-b777cdc5868aac9dd7fbc98011bc614f [12:54:56] it's 9 years old as of next week [12:55:01] moritzm: it looks like a \ is missing there? on line 69 [12:55:42] let me fix that :) [12:56:02] doh! good catch [12:57:16] https://gerrit.wikimedia.org/r/c/operations/puppet/+/566723/1/modules/install_server/files/autoinstall/netboot.cfg [12:58:11] paravoid: bring it to the all hands for an in person 9 year old party! [12:59:20] fwiw traffic folks, I just pinged T229586 about it [12:59:20] T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 - https://phabricator.wikimedia.org/T229586 [13:00:02] paravoid: ack :) [13:00:07] moritzm: amended [13:00:22] ack, looking [13:09:32] moritzm: may I get the console back on cp4026? [13:10:14] (triggering a new reimage now) [13:11:37] ack, I'm off it [13:12:05] thx <3 [13:16:44] moritzm: so, netmask/gateway issue solved.. let's see the partman... [13:17:03] it'll be solved as well I'm pretty sure :-) [13:17:19] fixed as well :) [13:18:23] great :-) [13:19:18] thx for the help debugging the issue [13:19:23] I'm off for lunch now! [13:21:33] how do you usually count an array length in puppet? [13:22:45] size()? [13:31:09] arturo: yes [13:31:38] although i think it may be a stdlib function and not part of puppet core [13:36:28] thanks! [15:48:55] anyone know of any examples of icinga check commands that also require a secret (like an API key)? [15:49:55] ah, check_mysqlstatus is one [15:57:23] ok, another dumb question, any preference between secret() and the passwords module for newer stuff? [16:04:09] cdanis: if it can be looked up using hiera i would put it in /srv/private/hieradata. otherwise secrets [16:05:09] no need for it to be hiera-izible [16:09:46] cdanis: oh my god what have you done. I'll never unsee "hiera-izible" [16:10:14] * jbond42 is still trying to mentaly pronounce it [16:10:23] jbond42: ema: at some point we should reify the hieraification of secrets [16:11:47] cdanis: as part of the pki work im doing this Q ill be evaluating vault so i will also look at its secret managment. vault has a custome hiera backend so it should intergrate nicley for secrets [16:12:16] i just wanted to say 'reify' 🙃 [16:12:23] :) [17:30:07] I'm not really sure who'd be interested in this, but we just had a hypervisor collapse due to memory issues...but the memory has been silently throwing errors for over a year. So I'm interested in ways to surface dell hardware issues (because this is something dell's idrac can surface) a bit better https://phabricator.wikimedia.org/T243533 <-- in case anyone has good suggestions or is interested in the issue