[01:17:50] Out of the 392 problematic AS#, 339 don't have a NOC contact, 199 don't have either a NOC or Abuse (and there has been 35 error 500 while running the script). So unless we can find a proper contact database (maybe throw PeeringDB in the mix?) it's not worth spending more time on it [01:19:59] hah [03:28:09] sigh, our puppetization of icinga monitoring has $notes_url ... and $dashboard_link [03:29:01] cdanis: what's wrong specifically? [03:29:16] volans asking me what's wrong with an inconsistency [03:29:18] now i've seen everything [03:29:40] :D [03:29:50] * volans kinda off [03:31:48] but it's actually a different type of link [03:32:04] though we should use dashboard_url if that's what you mean :) [03:33:45] ┌──────────│ Unable to install GRUB in /dev/sda │ ────────┐ [03:33:49] │ │ Executing 'grub-install /dev/sda' failed. │ │ [03:33:53] │ │ This is a fatal error. │ │ [03:33:56] │ │ │ │ [03:34:08] aww :p and here i will try that again tomorrow [03:34:24] was installing an install server [12:08:16] Today's routers upgrades are cr2-eqsin and cr3-knams [12:08:28] in ~52min [12:51:34] note that eqsin perf will be poorer around the reboot as our main 2 transits are on cr2-eqsin, cr1 still have telia and peering (which alone is more than cr2's transits) [13:01:06] cr2-eqsin is now rebooting [13:31:53] cr3-knams is rebooting [13:46:19] and all good! [13:56:54] \o/ [15:02:57] Loading debian-installer/amd64/initrd.gz... [15:02:57] Boot failed: press a key to retry, or wait for reset... [15:03:08] ^^ I just got that trying to reboot ncredir4002 on the debian buster installer [15:03:18] moritzm: any ideas? [15:04:13] (retrying in the meanwhile.. [15:05:35] (I do love how I manage to hit all the corner cases...) [15:05:40] hmmh, no other error message? just the BIOS failing to PXE boot? [15:05:54] just that yeah [15:06:02] is not the BIOS [15:06:20] the network boot begins and fails after loading the initrd.gz [15:06:31] hmm it's working now [15:06:34] let me check the logs [15:13:16] on bast4002 I can see lpxelinux.0 being served to 10.128.0.33 on 15:00 and 15:03, no idea what went wrong in the first attempt [15:14:51] maybe a net glitch [15:15:00] or a puppet run in the middle [15:18:41] ah, yes. puppet run is actually the most likely explanation, in fact I merged https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/564729/ earlier, which could have triggered the puppet run while your install was in flight [15:46:24] err we can work here I guess for more vis [15:46:42] XioNoX: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570672/ and the others sent with it basically (eqsin, then ulsfo, then the rest of codfw, then eqiad) [15:47:11] we've had half of codfw and all of esams for a while now, haven't seen any specific issues [15:48:00] XioNoX: each per-dc step, I'll have to do some pybal restarts and waiting between them [15:48:05] but eqsin first [15:48:16] adding the neightbors [15:49:24] bblack: interesting, cr2 already had the config for the 3 LVS [15:49:58] heh [15:50:28] anyway, added now [15:50:38] so it's ready for you [15:51:22] waiting for agent runs to complete, then will start logging pybal restarts for the 3 [15:56:15] XioNoX: looks good? [15:56:51] bblack: all good! [15:56:57] it took an extra bit of time before cr1 went established from lvs5001 [15:57:09] but it's there now [15:57:17] cr1 is... special [15:57:24] ok :) [15:57:54] unrelated but one of the 3 links between knams-esams didn't come back up after the reboot, light is good, a bounce didn't help [15:58:04] will investigate later [15:58:14] on to ulsfo! [16:01:31] opened https://phabricator.wikimedia.org/T244497 for the interface [16:05:32] XioNoX: let me know when you're ready for ulsfo [16:05:37] ready to commit on ulsfo [16:05:38] eh [16:05:40] bblack: ^ [16:05:54] alright, commited [16:07:53] should see all 3 now [16:08:02] they all established on the pybal end anyways [16:08:44] bblack: all up and receiving prefixes [16:09:05] ok moving on to the codfw primaries puppet part [16:09:48] (recap: we did lvs200[456] a while back there already, just lvs200[123] need the change now) [16:09:56] fyi I have to leave in 50min [16:10:14] ok [16:13:58] XioNoX: ready when you are for codfw [16:17:14] 1min [16:18:17] all of them were already configured on cr1 [16:18:37] nice [16:18:48] added the ones on cr2 [16:19:04] ack [16:20:00] 10.192.1.1 is still not up on cr2 [16:20:18] everything else is [16:20:40] I hadn't restarted it yet then, have now :) [16:21:22] for lvs2003, cr1/2 conns both established quickly [16:21:25] confirmed, all good [16:21:33] for lvs200[12], cr1 came like 30 seconds after cr2 [16:21:37] either way, all connected now [16:22:51] XioNoX: got time for eqiad or want to do it later/tomorrow/whatever? [16:23:03] yeah it's fine [16:23:05] ok [16:28:25] XioNoX: ready for eqiad when you are [16:28:42] bblack: and done! [16:32:21] XioNoX: looks all green here [16:33:04] bblack: here too! [16:33:21] thanks! [22:34:08] have others had issues installing OS an a Ganeti VM (with the standard flat.cfg/virtual.cfg partman we use for all of them) and at the very end it fails writing GRUB to harddisk? [22:34:24] i am mostly surprised it happens on a VM like any other and not some new hardware