[03:41:35] * Krinkle adds Grafana to [[Category:Software using the GNU AGPL license]] [03:41:45] https://grafana.com/blog/2021/04/20/qa-with-our-ceo-on-relicensing/ [06:00:46] hello hello [06:02:52] yo yo [06:03:43] I was about to ask if anybody new about the cr1-2 codfw interfaces down, but I see Chris/Arzhel's chat [06:03:47] *knew [06:04:01] goood [10:37:06] anyone familiar with `gbp buildpackage` oddities? I have a branch I'd like to build off of named envoy-future, and I have an upstream named envoy-future-upstream. I'm overriding these using --git-debian-branch and --git-upstream-branch but the command still ends up building off of master [10:37:18] is there a flag I've missed or a behaviour I need to override? [10:38:49] I've also got the envoy-future branch checked out when I start the build [11:18:25] hnowlan: I usually use directly `sbuild` instead https://wiki.debian.org/sbuild which should just use whatever you have in the filesystem, without paying attention to git [11:20:11] git-integrated build procedures are very convoluted in my experience [11:20:40] the only tools I use from gbp are `gbp dch` and `gbp import-orig` [12:00:35] hnowlan: do you have perhaps a .git/gbp.conf file ? or a debian/gbp.conf ? They might be overriding --git-upstream-tree to not be branch. Also notes that --git-upstream-tree defaults to TAG (it needs to be set to BRANCH for --git-upstream-branch to be honored) [12:22:24] akosiaris: there is a debian/gbp.conf now that you mention it but it doesn't have any overrides that might be throwing it off unfortunately. I specify --git-upstream-tag as well which should offset any side effects of --git-upstream-tree but I'm trying to force BRANCH at the moment [12:22:58] arturo: I might end up trying that thanks! short-term I'm trying to stay within using gbp to fit into the existing workflow [12:44:26] hnowlan: if you need further assistance with sbuild, I could translate to you what gbp buildpackage does into specific git commands, so you just sbuild + [12:44:32] + git [13:19:27] _joe_: Hi! I updated https://gerrit.wikimedia.org/r/c/operations/puppet/+/674077/ to support the current cert system as well to not break production, could you take a look at some point when you have a moment? thanks in advance [13:23:17] <_joe_> Majavah: sure :) [15:06:43] _joe_: are there docs about how folks should access MW API in production? [15:06:52] am searching wikitech... :) [15:07:14] <_joe_> ottomata: sorry, meeting, but nothing updated and modern I think :) [15:07:19] <_joe_> guilty as charged [15:08:18] seve just asked me and i'm trying to find docs for himi! [15:08:36] <_joe_> ottomata: 301 to me :) [15:08:48] haha ok [16:59:02] While re-imaging `wdqs1006`, the reboot timed out after 60 minutes. Now trying to re-image again gives `01:16:37 | wdqs1006.eqiad.wmnet | Unable to run wmf-auto-reimage-host: Signed cert on Puppet not found for hosts ['wdqs1006.eqiad.wmnet'] and no_raise=False:` [16:59:36] The error's pretty self explanatory, but not sure on the best way to proceed. Should I just set the `--no-raise` flag or is there a better way to kick off another re-image? [17:03:05] ryankemper: give me as sec [17:03:29] add the --no-verify [17:03:35] skips the first verification [17:03:47] volans: ack, thanks [17:03:54] the other related option is --new [17:04:00] for new hosts [17:06:18] ryankemper: also it's worth checking why it failed [17:06:22] in the first time [17:14:14] volans: any tips on how to do that? the logs of the reimage itself don't give any detail, so I imagine I want logs from `wdqs1006` directly to figure out what was preventing the reboot, but it looks like the host must have rebooted (just not in time) since it's denying my pubkey when I try to ssh in [17:14:20] do we ship syslogs to kibana perhaps? [17:15:04] * volans in a call, can check the logs in few minutes [17:17:41] cool, no rush :) thanks for the help [17:19:55] * ryankemper is glancing at `ryankemper@cumin1001:~$ sudo vi /var/log/wmf-auto-reimage/202104210006_ryankemper_14697_wdqs1006_eqiad_wmnet.log` rn [17:20:40] so it did reboot into PXE and went into debian installer [17:21:03] then the script waits for the d-i to reboot it again after the base installation is completed [17:21:10] as it didn't do that, most likely is stuck in d-i [17:21:17] you can ssh into it with the install_console [17:21:49] sudo install_console wdqs1006.eqiad.wmnet [17:21:52] ryankemper: ^^^ [17:22:00] ah no, it's in a bysybox [17:22:04] *busybox [17:23:57] mdadm: cannot open /dev/sda3: No such file or directory [17:24:02] Error creating array /dev/md1 [17:24:08] is the partman recipe correct? [17:26:39] Let me check on netbox to make sure the drive count is correct, but it should be (at least drive #) since we're going from `raid10-8dev` -> `raid0-8dev` [17:28:10] netbox has no knowledge of disks [17:28:45] feature request! [17:29:21] you can check /dev/disk on the host or procurement task on phab [17:29:40] I'll check on phab (and spot check the partman change) [17:30:02] my paste above is from /var/log/syslog on the busybox [17:30:09] you have to use more as there is no less ;) [17:43:04] https://phabricator.wikimedia.org/T186349#4007529 / https://phabricator.wikimedia.org/T188432 [17:43:35] I only see 1 SSD drive listed on that but I must be reading/interpreting it wrong, I'll try glancing at /dev/disk on busybox [17:45:31] Hmm I suppose busybox wouldn't tell me and I'd have to look on the actual host which I can't access currently? [17:46:37] the host doesn't exists anymore ;) [17:46:43] can you run partman on the busybox? [17:46:54] if not you can reboot into d-i forcing PXE [17:47:01] connect via ssh or via the mgmt console [17:47:15] and then see the what happens at d-i time [17:47:30] Nevermind /dev/disk does work [17:47:39] I was being dumb and not looking at `/dev/disk/by-id` :P [17:48:18] If I'm interpreting this correctly then it actually is just 4 disks? https://www.irccloud.com/pastebin/CfOu8wBE/ [17:48:51] looks like it [17:49:52] sorry almost dinner time here, starting to prepare [17:49:59] No worries! I'll take it from here [17:51:29] Ah I should have glanced at partman earlier, we have this set for `partman/raid0-4dev.cfg` so it's supposed to be 4 devices [17:52:14] So the number of devices is correct, this host went from `raid10-4dev.cfg`->`raid0-4dev.cfg` so at least the # of devices is correct