[06:30:36] hello hello [06:31:00] the logstash puppet warnings are due to a change that we (analytics) did yesterday probably, checking [06:35:50] yep kafka and es are trying to declare openjdk-8 at the same time [06:56:18] so there are two problems: [06:57:35] 1) profile::java::java_8, included by kafka, declares openjdk-8, at the same time of elasticsearch::packages. With some conditionals this shouldn't be a big deal (not elegant but I don't have a better idea) [06:58:47] 2) profile::java::java_8 adds a rule for update-alternatives, to set openjdk-8. This was done to prevent that openjdk-11, installed by accident when upgrading to buster, could cause problems [06:59:22] I can surely remove it, but then we need to make sure that on the new logstash nodes on Buster the following doesn't happen [06:59:37] 1) kafka deploys openjdk-8 [07:00:03] 2) es deploys openjdk-11, and update alternatives is set to 11 (when the package is installed) [07:00:11] 3) kafka runs with 11 rather than 8 [07:00:15] <_joe_> so, 1 - use profile::java::java8 everywhere and remove it from elasticsearch::packages [07:00:52] <_joe_> or, switch to 11 [07:00:58] I thought about it but they are different, since elasticsearch::packages allows also to set openjdk-11 for example [07:01:03] we cannot sadly [07:01:06] (switch) [07:01:06] <_joe_> ok so [07:01:36] <_joe_> create a profile::jdk which installs either or based on a parameter [07:01:39] maybe we could have a more generic profile to include [07:01:45] yes yes [07:02:12] I am a bit afraid of messing up with all the es code, but I can try something and see what gehel thinks about it [07:02:56] the generic profile should also take into account that we may need two different openjdks at the same time [07:03:09] like on Buster for logstash10xx (we'll need 8 for kafka and 11 for es) [07:06:54] <_joe_> ok, then it's ok to have two profiles, and include them both [07:07:05] <_joe_> ofc they'll need to be properly differentiated [07:20:10] * elukey starts battling with puppet [07:21:56] <_joe_> https://imgflip.com/i/41jc7t [07:43:47] what populates /srv/deployment/spicerack/cookbooks/ on cumin hosts? [07:47:05] kormat: ssh://ayounsi@gerrit.wikimedia.org:29418/operations/cookbooks [07:47:13] ahh, thanks :) [07:47:32] kormat: see also https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Creating_your_local_environment [07:49:17] cheers :) [08:20:36] what was the solution to "DNS Discovery operations diffs [08:20:54] again ? the anchor to the link in icinga doesn't exist [08:20:59] https://wikitech.wikimedia.org/wiki/DNS/Discovery#Discrepancy that is [08:27:45] godog: iirc it's because the confd state has been changed from the default [08:28:13] <_joe_> godog: I think that alert needs to be reviewed [08:28:19] <_joe_> but lemme take a look [08:28:20] you can see the current state when you expend the alert [08:28:22] ah [08:29:13] mmhh interesting, it might have to do with my thanos-query service addition to conftool earlier in the week [08:29:45] <_joe_> godog: is it a critical or a warning? [08:30:02] a warning [08:31:07] <_joe_> ok, I think this script uses some configuration that needs updating (??) [08:41:34] indeed, https://gerrit.wikimedia.org/r/c/operations/puppet/+/596607 [13:13:10] hmmm anybody else experiencing issues with webproxy.eqiad.wmnet:8080? [13:16:06] my very basic dashboard says no: https://grafana.wikimedia.org/d/i5YA-BXWz/squid?orgId=1 [13:16:08] not so far. i can confirm squid is running on install1003 which is behind that [13:16:17] vgutierrez: all ok here as well [13:17:35] yup... it looked like a glitch on the OCSP responder for a few minutes [13:18:09] and then acmechief crashed? [13:18:55] yeah, acme-chief assumes that the OCSP responder is gonna be always there [13:19:26] easy fixable though :) [13:19:56] good [14:57:47] huh, either ripe atlas is having problems, or the internet-at-large is having problems [14:58:32] https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-target_site=All&var-ip_version=ipv4&var-country_code=All&var-asn=All [15:06:44] hi! I have a question about creating a Ganeti VM. should I use the cookbook myself or is it recommended to open a task on Phabricator? [15:08:19] sukhe_: the tasks are still recommended but it doesn't mean you can't create it yourself later [15:09:22] mutante: ah I see. so the task is to document that it is being created and then I can just use the cookbook? [15:09:57] the purpose of the tasks is to have a common understanding of needed resources and some oversight [15:10:10] ok great, thanks. I will do that [15:12:41] I am just going to create the task and let someone else do it for now, given it's the first time I am doing this [15:13:39] fair enough, happy to be subscribed [15:16:31] heh, CF thinks there is higher than normal packet loss on the internet at large, interesting [15:16:40] mutante: thanks! [15:21:44] gerrit is super slow for me, it hags while doing [15:21:45] trace: run_command: unset GIT_PREFIX; ssh -p 29418 elukey@gerrit.wikimedia.org 'git-upload-pack '\''/operations/puppet'\''' [15:21:52] anybody else with the same issue? [15:21:58] (that is after a git pull --rebaes) [15:22:30] I can reach the port via nc but I only see the sshd welcome [15:25:29] elukey: I think the internet itself sucks right now [15:25:57] probably, I am running traceroute and it doesn't look good [15:26:24] https://i.imgur.com/D1kguOH.png in case grafana doesn't load either ;) [15:30:01] hi all i have added a minor CR to add yamllint to the private repo, not sure who best to add so if your intrested please take a look and comment https://gerrit.wikimedia.org/r/c/operations/puppet/+/596649 [15:30:18] +1 to the idea for sure [15:30:27] to clarify add yamllint as a pre-commit hook [15:31:18] Facebook just bought Giphy for 400 million [15:32:08] trying to make insta compete with tic-toc [18:09:04] (following up from earlier; RIPE Atlas metrics now look reasonably back-to-normal as of about... 16:20 to 16:50, depending on the site) [18:09:07] Has anyone done a successful pxe boot today or yesterday? I can't get it to work but haven't yet investigated if the issue is local to the (3, now) servers I'm trying it on