[02:05:34] hi is outreachdashboard down? [02:05:52] https://outreachdashboard.wmflabs.org/ I got error message [02:14:41] Hey, we at traffic have had some difficulty setting up our own puppetserver in wmcs and would love a little bit of a helping hand setting it up. I'm starting from scratch over at https://wikitech.wikimedia.org/wiki/Help:Project_puppetserver#How_can_I_use_a_project_puppetserver? without much success - the initial puppet run won't even complete (and seems to want to use a commit from july) [08:27:46] brett: if you have more details (ex. project, puppetserver host, etc.) I might take a look at some point, a task should help also keep track [08:35:17] !log globaleducation reboot peony-web lost network connectivity due to OOM killing things [08:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Globaleducation/SAL [16:02:20] dcaro: Thanks, it's the "traffic" project, and the host is 'traffic-puppetserver-bookworm' [16:03:09] hieradata/roles are set via "prefix puppet" [16:44:50] brett: if you are still here, I can help you set up the puppetserver. [16:45:15] Step one is generally to make sure that the puppetserver is not trying to manage itself. You can do that by setting 'puppetserver: puppet' in hiera for the instance [16:45:16] andrewbogott: I'd love that, thank you! [16:45:24] I did set that, yes [16:45:49] ok. This is traffic-puppetserver-bookworm ? [16:45:55] mind if I log in and see what I can see? [16:45:57] oh wait, puppetmaster: puppet, not puppetserver [16:46:05] absolutely, do what you need to [16:46:05] oh you're probably right [16:46:25] It says puppetmaster: puppet on the wt page [16:46:43] I'm unable to SSH to the instance, though, it just disconnects me [16:47:06] im confused by the whole "become" process, can someone give me a hand and walk me through it? ive created the tool account in toolsadmin and when i try to become the tool it says i need to log out and back in, but ive done that: https://i.imgur.com/5lGtXhn.png [16:48:18] owuh: try `ssh -Snone owuh@login.toolforge.org` [16:48:20] brett: ok, so backing up :) you only just built this VM, right? [16:49:08] owuh: I can have a look in a few minutes. It might also be the kind of thing that starts working after you wait a bit :) [16:49:32] -Snone worked, time to read the man pages to see what the hell -Snone is [16:49:33] andrewbogott: Correct, fresh instance [16:49:40] owuh: it disables connection sharing [16:49:52] brett: is it re-using the name of an older instance by any chance? [16:50:02] the “Shared connection to login.toolforge.org closed” message is a hint, but also pretty misleading: ssh will still reuse the connection for a future ssh [16:50:12] so from the server’s perspective, you’re still in the same session and didn’t log in again [16:50:21] i always wondered what that message meant [16:50:32] assuming you didn’t configure some monstrously long lifetime for your shared connections, it should resolve itself pretty soon [16:50:42] or you can manually delete the shared connection socket wherever it is [16:50:46] depends on your ControlPath in your SSH config [16:50:56] brett: ok, so this is barely even a puppetserver issue at this point, more of a 'puppet doesn't apply at all on the server' issue [16:51:02] there’ll be a socket file in that directory, `rm` that and ssh won’t be able to reuse the connection anymore ^^ [16:51:06] I'm resetting the certs to take a second try [16:51:38] (killall ssh or reboot would also work) [16:52:26] brett: its getting further now -- very possible that you happened to create this vm at the exact lucky moment when I was updating the central puppetserver and it rejected the cert request. But we'll see what happens next :) [16:52:35] my ssh config has an entry for my homelab, plus SendEnv LANG LC_* and XAuthLocation /opt/X11/bin/xauth both on Host * [16:53:28] other than that its the default [16:53:50] brett: is it your intent to store puppet config/certs/etc on a cinder volume? I see a volume in your project 'traffic-puppetserver' but it isn't mounted anyplace [16:54:13] andrewbogott: I had tried destroying/creating a few times yesterday, so I wouldn't think it was clashing timing-wise [16:54:29] were you using the same hostname each time by chance? [16:54:37] (that should work but it's a possible explanation) [16:54:37] the cinder volume was created some months ago when I was following the steps but I don't think we need it [16:54:45] Yeah, same hostname [16:55:01] ok [16:55:17] Would you suggest a different one? [16:56:20] not needed at this point. But sometimes puppetservers aren't great at recycling certs, and the cert name is bound to the fqdn. [16:56:28] So I think at least one of the things you ran into was that. [16:56:47] so now I'm getting "ERROR: Unable to obtain the current branch" is that what tripped you up the first time around? [16:57:54] Where are you seeing that error? [16:58:35] puppet output [16:58:38] when I run-puppet-agent [16:58:47] oh, you can ssh in?! odd [16:58:57] oh, now I can [16:59:13] Yeah -- you couldn't before because the initial puppet run failed so nothign was set up at all [16:59:20] that was because of the cert conflict [16:59:26] How did you fix that? [16:59:51] I cleaned the cert on the central puppetserver -- not something you could have easily done (but it would've worked with a different hostname) [16:59:57] aha, thanks [17:00:23] and eventually I would've gotten an alert about there being a cert associated with a deleted VM and cleaned it up. eventually [17:00:45] So now we're tripping over something with ownership. This is a familiar issue to me, git doesn't like root to look at repos that aren't owned by root [17:01:00] and that's a newish git behavior so none of this was constructed with that in mind. Although I thought we had worked around it... [17:01:47] Is there some sort of old deployment config happening here or something? Related to the fact that it's trying to apply from a puppet repo commit back in july? [17:03:19] Where are you seeing that? [17:03:58] > Info: Applying configuration version '(16128ca149) Andrew Bogott - validatecloudvpsfqdn.py: Support projects with project_name in fqdn' [17:05:12] oh, hm, I could have sworn it was in operations/puppet [17:06:07] oh, yeah, it is [17:06:31] But that's this morning, so I was mistaken. It *was* doing an ancient copy last night [17:06:37] just also authored by you :) [17:08:34] owuh: interesting, maybe your distro configures a default ControlPath in /etc/ssh/ssh_config or similar? (AFAICT it’s not enabled by default on my system, I did it manually) [17:09:12] if it’s still not working without -Snone, then you should be able to see the path it’s using in ssh -v and can keep looking from there [17:09:28] if it is now working, then the connection probably expired in the background and nothing needs to be done :) [17:10:20] it works without -Snone (however i did also restart because macfuse kext weirdness), but i included /etc/ssh/ssh_config too [17:10:49] ok [17:11:14] anyway, if it works now that’s okay, and I guess it’s not super important where connection sharing is being enabled ^^ [17:11:40] probably just macos being weird [17:12:45] anyways, gotta bounce, lunch is over [17:15:16] brett: I'm still looking, I see what looks like an obvious mistake in modules/profile/manifests/puppetserver/git.pp but unsure [17:16:08] I really appreciate this [17:17:42] no problem, it's at least partially my mess to clean up [17:29:28] pretty sure https://gerrit.wikimedia.org/r/c/operations/puppet/+/1071038 has introduced a chicken/egg issue for new servers. jhathaway might have thoughts [17:29:56] eyes... [17:31:02] any chance you meant to 'require' rather than 'subscribe'? It's not obvious to me what's happening but subscribe feels wrong on first glance [17:32:06] jhathaway: take me with a grain of salt; I don't have evidence that this worked properly before your patch either [17:32:36] what error do we get at present? [17:32:50] https://www.irccloud.com/pastebin/PM29O6so/ [17:33:08] that implies that puppetserver-deploy-code is running as the wrong user I think? [17:33:17] or just running before things are set up properly [17:34:08] hm, actually, on a working puppetserver /usr/local/bin/puppetserver-deploy-code doesn't work at all, as either gitpuppet or as root [17:34:18] * andrewbogott more confused [17:35:23] OK! the magic is that I have to su - to gitpuppet and then 'sudo /usr/local/bin/puppetserver-deploy-code' [17:35:50] that works on a working (old) server [17:35:52] but on the new server [17:35:56] "Sorry, user gitpuppet is not allowed to execute '/usr/bin/sudo /usr/local/bin/puppetserver-deploy-code' as root on traffic-puppetserver-bookworm.traffic.eqiad1.wikimedia.cloud." [17:36:12] so this is an ordering issue; the sudo rule needs to be set up before whatever else [17:38:16] jhathaway: I suspect that https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121664 will fix [17:38:28] let me try creating that rule by hand and see if it unbreaks things [17:39:11] does the sudo rule exist on disk? [17:39:33] huh yes it's there already [17:41:03] ...and now it's getting further [17:41:18] so maybe there's a delay for the sudo rule to take effect? I'm very puzzled. [17:41:35] puppetserver crashes now, but that feels like progress [17:43:04] there is no sudo daemon, so there should be no delay [17:43:12] it reads its configuration everytime [17:43:50] yeah [17:44:06] so I'm going to have to build a fresh server to see if I can get hung up back in that same spot [17:44:16] since apparently deploying once totally changes what puppet does forever after [17:44:32] meantime... thoughts about why the puppetserver won't start? I'm going to start by just rebooting the host, for superstitious reasons [17:46:27] anything in the log file? [17:46:58] It starts with "Feb 21 17:40:37 traffic-puppetserver-bookworm java[67708]: Exception in thread "main" clojure.lang.ExceptionInfo: throw+: {:kind :puppetlabs.kitchensink.core/io-error, :msg "Unable to create directory '/etc/puppet/puppetserver'"}" [17:47:05] but that directory definintely already exists, and is owned by puppet [17:47:31] hmm [17:47:33] https://www.irccloud.com/pastebin/Dp6zmj7I/ [17:47:44] this is traffic-puppetserver-bookworm.traffic.eqiad1.wikimedia.cloud [17:49:08] I don't have access, but happy to take a look [17:49:09] Hey, I have to step out for the next hour - thanks so much for taking a look at this - I'll be back! [17:50:17] jhathaway: you have access now [17:50:24] \o/ [17:50:57] well, if the cache ever refreshes [17:51:34] * andrewbogott reboots it again [17:52:04] in [17:52:26] great, if you restart the puppetserver you will see what I see [17:54:54] trying.. [17:57:28] looks like this file, /etc/puppet/puppetserver/logback.xml, is missing [17:57:36] which is part of the puppetserver package [17:57:41] maybe blown away in testing? [17:58:06] not sure why we don't manage that file ourselves in puppet [17:58:30] oh, that would do it. I'll try reinstalling the package... [17:59:21] that was from the journal log [18:00:52] reinstalling the puppetserver seems to have not created that file? weird [18:01:04] I will let you try :) [18:01:58] fyi, you won't be able to build a new server from scratch if it uses the "ssldir_on_srv" setting and/or is in cloud [18:02:24] at least not until we fix another dependency issue after that one from the other day [18:02:40] one was fixed but there is a second one [18:03:36] you'd get.. Unable to create directory '/etc/puppet/puppetserver' [18:03:55] mutante: yep, that's what we're getting [18:03:58] do you know the fix? [18:04:14] no, I just fixed "Cannot create /srv/puppet/server; parent directory /srv/puppet does not exist." [18:04:19] which was before this one [18:04:28] but maybe I can create another patch [18:04:38] oh actually that's not what I'm seeing, hang on... [18:04:41] is ssldir_on_srv the default? [18:04:52] I think it was hiera cloud.ya [18:04:56] yaml [18:05:04] but maybe I am wrong and it's not all of cloud [18:05:55] it's also possible the /srv/puppet issue was only for ssldir_on_srv but the "/etc/puppet/puppetserver" is for any puppetserver [18:05:56] so yeah, it's the default [18:06:43] I can try to upload another patch and add you as reviewer. but need to stare at it first. [18:06:52] so later today? [18:06:59] I've built two fresh servers just now and haven't seen that error yet [18:07:01] other ones though [18:07:07] interesting [18:07:35] then...I wonder what is different there compared to the puppetserver I created [18:08:10] but also good that you dont get that one.. hmm [18:08:55] they all use module puppetserver, not puppetmaster.. right [18:10:30] puppet 7 is such a hog, I can barely test because my test server grinds to a halt when I ask it to compile a catalog [18:11:01] andrewbogott: puppetserver is reinstalled, but puppet apply just seems to hang [18:11:05] how were you running it? [18:11:24] run-puppet-agent as root [18:11:40] thanks [18:11:44] doing so now [18:13:55] jhathaway: can you let me know if you get "Unable to create directory '/etc/puppet/puppetserver'" or not [18:14:17] yup, did the first time after the puppet run, trying again... [18:14:59] yes, same error [18:15:00] but you did get: /srv/puppet, /srv/puppet/server, /srv/puppet/server/ssl [18:15:03] that's the progress :) [18:15:29] 24 hours ago that would have been before that [18:15:40] so now it's ONLY that wall of text that the process is crashing, heh [18:16:25] how about you try "mkdir /etc/puppet" and run it again.. just to see if that's all or there is more lurking beyond that [18:16:33] then let's try to fix it regardless [18:17:17] ask andrewbogott mentioned the dir exists, so something strange is going on [18:17:24] is this really the first time we create new servers since we use module puppetserver though? [18:17:45] mutante: no, but it has never worked quite right and every time we fix one thing another thing breaks [18:17:53] got it [18:18:19] jhathaway: that host only has 2G of ram, maybe it's running out of memory and giving us a fake error [18:18:51] oooh, that would be nasty, can we bump the ram to confirm? [18:18:51] soo. Unable to create directory '/etc/puppet/puppetserver' [18:18:59] what happens when we manually create that dir ? [18:19:08] well the dir is already there [18:19:13] and matches the perms in prod [18:19:20] oof, ok.. nod [18:19:39] puppetserver has this weird code where it uses puppet to create all its configuration files [18:19:52] that's not a network mount by any chance, is it? [18:20:00] probably sounded like a good idea, but they have recently ripped that code out in puppet 8 I believe [18:20:05] jhathaway: I'm starting to think we should just abandon this particular server since I may have knocked it into an inconsistent state while troubleshooting. [18:20:21] I have a fresh build that is actually puppet-only so we should probably focus our interest there. [18:20:31] It is abogott-puppetserver-test-1.testlabs.eqiad1.wikimedia.cloud [18:20:37] and, step one, it's showing mutante's puppet error [18:20:49] there is also puppetmaster-1004.devtools to compare to, fwiw [18:20:51] Error: /Stage[main]/Puppetserver/File[/srv/puppet/server]/ensure: change from 'absent' to 'directory' failed: Cannot create /srv/puppet/server; parent directory /srv/puppet does not exist [18:20:57] no agent uses it currently and it has that same issue [18:21:19] andrewbogott: that one I fixed yesterday [18:21:30] jhathaway: I will resize that server to satisfy my curiosity though :) Prepare to get kicked off when it reboots [18:21:31] by https://gerrit.wikimedia.org/r/1121079 [18:21:45] andrewbogott: sounds good [18:22:11] maybe double check which upstream puppetmaster is actually being used by the new master [18:22:17] if any [18:22:46] https://phabricator.wikimedia.org/T382960#10568942 [18:24:18] mutante: you're right, I had a local change on my puppetserver that was breaking syncs [18:24:20] trying again... [18:24:47] !ah, ack. good! [18:24:58] I expect now you get the other issue about /etc/ [18:25:31] yup same error [18:25:33] hmm [18:26:25] trying a change... [18:27:20] jhathaway: fyi more RAM didn't help [18:27:33] nope [18:27:40] and test-1 still isn't picking up latest puppet, I'm fighting with the server [18:27:47] * andrewbogott really hates the extra deploy step and unreliability thereof [18:28:26] ok.. so what exactly makes it "unable" to create the dir [18:28:37] the rest seems to all come from that [18:28:46] trying to debug as well in devtools project [18:29:14] at puppetlabs.puppetserver.certificate_authority$eval41610$ensure_directories_exist_BANG___41615$fn__41616.invoke(certificate_authority.clj:1541) [18:30:05] jhathaway: ok, now it's running the latest puppet and is still failing. If you want to apply puppet patches, abogott-puppetserver-test-1.testlabs.eqiad1.wikimedia.cloud is a client of abogott-puppetserver.testlabs.eqiad1.wikimedia.cloud [18:30:17] I am logging out so I don't step on toes [18:31:17] tried chmod 775 /etc/puppet/puppetserver to let puppet group write to it. then puppet fixes it back to 0755 [18:32:58] seeing what happens if I completely remove that dir so puppet can try to create it from scratch [18:33:11] and that fixes the puppet run :o [18:33:28] and the server actually starts up? [18:33:37] no :/ lol, that would have been too easy [18:33:49] still crashes.. but the puppet agent finishes [18:34:04] what server are you on? [18:34:17] puppetmaster-1004.devtools [18:34:20] ah, I see [18:34:21] ok [18:34:25] no toe-stepping then! [18:34:32] and nevermind.. it just recreates the dirs and then it's back to the same error that says it... cant create the dir [18:34:37] after if JUST did that [18:34:46] exactly, definitely no toe-stepping [18:34:51] I need to go turn over my laundry but I'm staging a couple more test servers so ping me if/when either of you has a patch that you want to test on a fresh host [18:35:14] ok [18:37:43] mkdir -p /srv/puppet/server/ssl/ca; chown puppet: /srv/puppet/server/ssl/ca [18:37:47] allowed it to start [18:37:54] the error was errouneous [18:38:31] the issue was the dangling symlink from ca -> /srv/puppet/server/ssl/ca [18:38:37] :o [18:39:01] perhaps there is no puppet code to create that directory? [18:39:20] !bash jhathaway> the error was errouneous [18:39:20] mutante: Stored quip at https://bash.toolforge.org/quip/Od3OKZUBvg159pQrmUwT [18:39:57] :) [18:41:25] there is another case of "ensure_resource" that does this [18:41:35] ensure_resource does not do dir trees [18:41:38] mkdir_p does [18:42:17] I think it's like the previous fix.. let me try a patch [18:44:29] jhathaway: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121682 [18:44:43] let's just create the full /srv/puppet/server/ssl/ca ? [18:45:03] instead of just to /srv/puppet/server/ [18:45:16] see the code right after that that creates the symlink [18:45:35] yeah that seems correct [18:45:51] this was the previous fix https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121079/4/modules/puppetserver/manifests/init.pp [18:46:31] though I'm not sure why we have these sub dirs, if they are not used? why not just create /srv/puppet/server/ca and link to that? not that it matters much [18:47:25] ah never mind, look like ssl is populated as well [18:47:32] so your patch seems correct [18:47:53] +1 [18:48:37] ok, and since this is all inside the "if $ssldir_on_srv" I am just going to merge [18:48:42] because it wont affect prod [18:49:12] yup [18:50:46] deployed on puppetserver1001 / prod [18:52:11] ffs [18:52:23] Duplicate declaration: File[/srv/puppet/server/ssl] is already declared [18:53:20] ok im back, is there a way to have my toolforge account (owuh) be able to write to files owned by the tool account (tools.owuhbot) so i can edit directly on toolforge? [18:54:04] jhathaway: we need to drop the "file { $ssl_dir:" now but only if $ssldir_on_srv ... sigh [18:54:39] hmm, maybe just create the ca path [18:54:47] since we only need one more directory? [18:55:03] mkdir_p is magical, but causes woes like this [18:55:18] I tend to just create the directories manually [18:55:35] that would mean creating the entire tree manually for each level [18:56:11] our custom function exists to fix this [18:57:13] imho it's an ordering problem. the whole if $ssldir_on_srv section is BEFORE file { $ssl_dir: [18:57:33] right, but sometimes it causes other, because it tries to create a puppet resource for every directory in the chain, which is not always what you want [18:57:41] owuh: making the file group-writable should work, I think (since owuh is in the tools.owuhbot group), but in general I think we don’t encourage editing files directly on toolforge these days ^^ [18:57:49] since the source code should be tracked on git and stuff [18:58:09] and if you use the build service, the tool (or bot) might not even use the shared NFS files anymore [18:58:39] mutante: we also have a specific mode on the ssl dir [18:58:45] which we can't do with mkdir_p [18:58:50] if we'd use ensure_resource also for the ssl_dir itself, instead of plain file{} [18:58:59] then it should also go away? [18:59:08] what should? [18:59:14] the duplicate declaration [18:59:27] because ensure_resource would not care that it already exists? [18:59:35] lucaswerkmeister makes sense, however this really makes me feel otherwise: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Tool_accounts#Mount_your_tools_home_directory_onto_your_local_machine [19:00:00] well ensure_resource only works reliably if both resources are identical [19:00:11] but since we have different directory modes [19:00:14] jhathaway: we can do modes with mkdir_p. I am already doing that with the previous patch. line 141 [19:01:39] but ssl_dir is 771, so if we create it with mkdir_p on 141, we would get 751, right? [19:02:20] owuh: heh, yeah I’d say that dates back to an older style of tool development ^^ [19:02:21] 2 mkdir_p? one for $ssl_dir with 771 and another one for $ssl_dir/ca with 751 ? [19:02:31] but it probably still works [19:03:19] alrighty. should it be changed then? [19:04:00] mutante: I would rather just use explicit file resources and avoid the complexity [19:04:00] I’d be fine with leaving it alone for now [19:04:06] but interested what others think too [19:04:06] alright [19:04:13] or keep the mkdir_p for /srv/puppet/server [19:04:25] and create the other two dirs, ssl & ca with file resources [19:04:39] also, for whatever reason, im not able to log in to wikitech despite having SUL set up [19:04:42] i.e. just add a ca file resource [19:07:10] jhathaway: so this? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121684/1/modules/puppetserver/manifests/init.pp [19:07:50] I am making a sandwich and be right back :) [19:08:23] I would just use a normal file resource, stepping away for lunch as well [19:11:29] cccccbukvgbcbfukvchcehngbutughngefcefcedlvue [19:11:49] grrr:) cu [19:12:50] does that mean the sandwich was good? [19:16:14] no, it means I still did not get it even with sudo [19:16:18] oh [19:16:29] was the sandwich good? [19:17:13] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121684 [19:17:27] I am going to find out once the eggs are boiled. [19:17:34] afk [19:17:37] fair enough [20:00:34] !log lucaswerkmeister@tools-bastion-13 tools.lexeme-forms deployed 81611bc5dc (l10n updates: pa, tr) [20:00:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [20:19:29] Thanks everyone for looking into the puppetserver issue :) [20:51:17] of course [21:23:29] andrewbogott: in -feed there's now a bunch of puppet failure alerts from different puppetservers :( [21:59:28] thanks taavi. That could easily be from the backscroll or from my juggling project id and project name. [21:59:30] I'll look [22:02:43] mutante, jhathaway, puppet servers now say [22:02:45] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: File[/srv/puppet/server/ssl] is already declared at (file: /srv/puppet_code/environments/production/modules/wmflib/functions/dir/mkdir_p.pp, line: 69); cannot redeclare (file: [22:02:45] /srv/puppet_code/environments/production/modules/puppetserver/manifests/init.pp, line: 171) (file: /srv/puppet_code/environments/production/modules/puppetserver/manifests/init.pp, line: 171, column: 5) on node metricsinfra-puppetserver-1.metricsinfra.eqiad1.wikimedia.cloud [22:02:56] Are you still around and working on this or should I take the torch? [22:11:24] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121702 [22:16:27] updated [22:17:24] mutante cut this patch, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121684 [22:17:32] which should fix the issue [22:17:45] happy to take that over if that helps andrewbogott? [22:19:11] I don't understand how that helps, unless it includes removal of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121682? [22:19:33] But yes, if you're still working then please follow up, I need to eat dinner :) My patch above is also an option although it is not deeply thought-out. [22:20:31] ok, let me push a change to mutante's patch [22:22:49] geohack seems down [22:31:17] andrewbogott: patch merged [22:38:14] well, traffic-puppetserver-bookworm's now running the agent happily :) [22:39:00] brett: great [22:39:17] still need to perform some rebuilds from scratch, but at least the dup def is gone [22:41:35] jhathaway: existing puppetservers seem happier, let's see if I can build a scratch server.... [22:43:47] praise the sun, my client is now finally applying the catalog. Hooray :). Thanks so much for all your help. [22:43:55] :) [22:46:57] great! [22:48:24] the initial deploy run still fails, but it's getting further! [22:48:44] I've vanishing again but you can see the status quo on abogott-puppetserver-test-2.testlabs.eqiad1.wikimedia.cloud [22:50:36] I see the merge. And that the duplicate definition is gone. Thanks. [22:51:04] yea, once again we just moved one issue to the next. But this one does not affect existing servers. so good. [22:51:42] it's not like one thing breaks another. it's a series of problems we fix one at a time. so... [22:52:05] now it's could not open /etc/puppet/puppetserver/logback.xml [22:52:42] that file is supplied by the package, but I think we should just move it into the module [22:53:00] I prefer to have all the config files under our own control [22:54:28] the package is installed but it's "no such file or directory" [22:54:40] does it get created on first run of the service? [22:55:01] on my existing server it's there.. on the new one it's not [22:55:15] both have the puppetserver package installed [22:55:42] yeah, I needed to purge, then re-install to get the file back [22:55:53] ah! [22:57:48] !log devtools puppetmaster-1004 - apt-get remove --purge puppetserver; run-puppet-agent [22:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [22:58:30] I have a running service and no more puppet errors on my instance. HELL YEA, finally :) [23:00:29] it seems the purge was needed for multiple reasons! puppetserver config was now changed in a bunch of ways [23:00:50] so puppet tried to start the service before it edited the configs [23:24:19] Hm.... I'm having some more perplexing behavior.... I'm trying to get puppet to apply to traffic-acmechief01 but it's complaining about a missing hieradata lookup (profile::acme_chief::cloud::designate_sync_project_names), which is specified in the puppet prefix... [23:26:57] brett: I'll have a look [23:27:08] Thanks. Sorry for the trouble! [23:30:26] The error I see there is about profile::acme_chief::cloud::designate_sync_password which is not defined in the prefix [23:30:57] * brett facepalms [23:31:04] easy one :) [23:31:05] clearly the same words [23:31:29] Thank you... [23:34:35] hm, but that is set in labs-private under hieradata/labs/cloudinfra/common.yaml... [23:36:03] Considering that was committed in 2022 I imagine it was functioning fine before [23:36:16] brett: but are you in the project cloudinfra? [23:36:33] I am not. Didn't realize that was its own project [23:36:38] Thank you again! [23:37:03] brett: try putting it in hieradata/cloud/eqiad1/traffic/common.yaml [23:38:12] ah, labs-private. a real password? not a fake one? ok.. but you get the idea [23:39:26] wait what about cloudinfra? [23:39:34] the fake password in labs-private will be needed so you can use the puppet compiler [23:39:48] * andrewbogott is lost [23:40:28] but this sounds like you (also) need it in the actual project context in regular hiera data [23:41:35] * andrewbogott is caught up [23:41:55] andrewbogott: the designate_sync_password is set to a dummy value in labs/private. but it's project-level for project cloud-infra [23:42:01] brett: on the traffic-local puppetmaster I would just stick it right in /srv/git/labs/private/hiera/clouds.yaml [23:42:02] so would not apply to project traffic [23:42:07] yep, you're right [23:42:21] I would not do that because it means next time you have to do that again [23:42:58] but I am also not sure if we are talking about a REAL or a fake password now [23:43:09] Oh yeah, I was assuming a real password [23:43:45] if it's a fake password it can go in the git repo or just get to a dummy value on horizon hiera [23:45:24] understood. Thanks!