[00:27:30] mutante: what does malformed membership mean? [00:27:31] I should have the same permissions as klausman, another recently-onboarded sre [05:23:45] volans: for when you are around: I was executing the decommissioning script and it has reached the point where it removes the DNS entries, however there're lots of things I wasn't expecting there, so I am not sure how to proceed further: https://phabricator.wikimedia.org/P12740 [05:31:56] marostegui: they are all good, you can go ahead [05:32:14] I can give context in 10m [05:32:56] thanks for checking, really appreciated! [05:32:58] ok, going ahead then! [05:32:59] thanks [05:45:15] marostegui: so, at the time of the mass import from puppetdb, there were some few new hosts for which the DNS entries were already added to the dns repo but they were not reimaged yet. [05:45:56] given that after the reimage a netbox script is run to import all the host interfaces into Netbox, it was deemed we can just get those auto-imported at reimage time. [05:46:35] I could add to the reimage script to run the sre.dns.netbox cookbook too, but in all normal cases is not needed, so thought to skip that. [05:46:51] I'll follow up with dcops to ask them to run it when reimaging those specific hosts (just few left [05:47:22] also, unrelated, I'm finishing a patch that adds a check that if there are uncommitted changes will alert on IRC [05:47:25] volans: Ah, I see, gotcha. Thanks for the context, I was surprised to see that large diff so I thought it'd better to ask [05:47:52] you did the right thing! we should always do that, it's like puppet merge [05:48:56] :) [06:09:22] mutante,razzi - the warning should be related to the fact that razzi is now in the 'ops' posix group, so membership to the other analytics-related ones is not needed, will do the clean up [06:12:34] mmm but in theory https://github.com/wikimedia/puppet/commit/00c0f4fd0cb4309c38ccf0c5f661cfd201308ff3 should have fixed it [06:49:22] I am trying to save a new page on wikitech and I get "Error contacting the Parsoid/RESTBase server (HTTP 404)" [06:55:08] ah ok so I had to create the empty page first, then I added the content and it worked [07:02:54] elukey: lol, same here, see -serviceops [07:06:16] <_joe_> it would be helpful if SREs were able to report bugs with some added information. Like, after opening the "network" tab in your browser's dev tools :P [07:09:22] it is very easy to repro I think, new page not yet saved with some content (this was my use case) [07:09:58] also I just finished my coffee so I think I am excused to avoid debugging mediawiki before it :D [07:10:28] <_joe_> the issue is [07:10:32] also I tried to nerd-snipe volans, still not sure if I made it [07:10:36] <_joe_> no response from the server is a 404 [07:10:52] <_joe_> mediawiki-api-error: apierror-visualeditor-docserver-http [07:11:16] <_joe_> oh right, the action api [07:11:19] <_joe_> 200 ok on error [07:11:30] <_joe_> @$#%$#@%@! [07:11:35] thanks for the fix elukey [07:11:48] s/fix/workaround/ [07:14:24] <_joe_> ok so it's fixed on officewiki [07:15:52] <_joe_> so you just need to wait for wikitech to be rolled to .10 [07:25:42] <_joe_> we have 12 logstash criticals on icinga, AIUI it's for kibana-next but it's still clogging the icinga UI and should be acknowledged [07:26:12] <_joe_> godog: do you know if anyone is looking into it? [07:27:10] <_joe_> anyhow, I am uncomfortable with having a whole cluster with notifications disabled, so no one's alerted when something goes bad, but still reporting all errors in the icinga UI [07:27:57] _joe_: which icinga url? I'm looking at icinga.w.o/alerts but don't see the logstash alerts ? [07:28:29] <_joe_> https://icinga.wikimedia.org/icinga/ [07:28:41] <_joe_> if you click on "critical" [07:29:16] <_joe_> logstash2020-2031 all are alerting with CRITICAL - logstash-2020.09.12[0](2020-09-19T02:21:15.243Z) [07:30:07] ok, no I'm not aware of anyone looking, also that cluster isn't in production yet afaik [07:30:24] I usually look at /alerts which has higher SNR [08:49:31] the ~100 warnings on icinga about cert expiry - are those real/being handled? [08:49:38] they've been there for a few days now [09:16:44] <_joe_> ema: ^^ [09:18:33] ack'ing [09:19:17] <_joe_> so they're being handled :) [09:19:58] yeah, they're in state "someone is aware" :) [09:20:12] ah, this is like an accessor in python. merely by checking, a side-effect occurs \o/ [09:26:47] very low priority but looking for comments on a CR proposal to replace puppet `os_version` function if people have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/626723 ( _joe_, akosiaris, shdubsh? thanks) [09:27:12] <_joe_> jbond42: ack [09:27:19] thx [09:40:25] <_joe_> hey all, I've seen some confusion (understandably so, the style guide is not clear enough) on when it's ok to require a profile from another profile [09:40:32] <_joe_> in our puppet styleguide [09:40:55] <_joe_> I tried to clarify what my intentions were back in the day https://wikitech.wikimedia.org/wiki/User:Giuseppe_Lavagetto/Profiles_Including_Profiles [09:41:20] <_joe_> I would like to hear feedback, and maybe proceed to integrate some of that in the style guide to further clarify what's stated there [09:42:37] _joe_: a 💡 went off in my head when i read it. suddenly things made sense to me [09:42:59] <_joe_> yeah I realized that the wording was really ambiguous [09:43:16] <_joe_> and I think practical examples help explain the spirit of the advice [09:44:10] <_joe_> also: it's always advice, YMMV, there can be special cases. But if you only stumble on special cases, maybe something else is wrong :) [09:47:00] jbond42: ^ FYI [09:51:04] kormat: thanks and _joe_ that looks good to me, im wondering if that also covers https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/hadoop/worker.pp#L19 i.e. both hadoop::master and hadoop::worked need some set of scafolding which is included in hadoop::common. this is arguabley the same as _joe_'s example but also a bit different [09:54:54] <_joe_> jbond42: can you imagine a server with profile::hadoop::worker installed and without profile::hadoop::common? [09:55:18] <_joe_> If not, then I think it's exactly the intended use case [09:58:44] _joe_: (obvioullsy a question better anserwed by elu.key) but no. if you think its coivered then ignore im happy :) [10:03:56] I am not getting if our profiles are bad or not [10:03:57] :D [10:04:51] ::common is a dependency (in my head) of worker, it installs a lot of configs and packages that are needed [15:41:04] elukey: ack, thank you [18:21:59] Anyone have context on where `logmsgbot` lives? https://wikitech.wikimedia.org/wiki/Logmsgbot appears to be out of date given I can't find a systemd unit with that name (or any with the string "irc" or "log") [18:24:03] Nevermind, it lives on `alert1001` [18:24:23] yeah that just changed today [18:24:26] cc herron ^ [18:24:46] Appears to be broken now, it's restarting every few seconds and complaining about no ident response [18:24:57] https://www.irccloud.com/pastebin/VZv56IVx/ [18:25:31] hmm having a look [18:26:13] https://www.irccloud.com/pastebin/FQVUA8bu/ [18:27:31] herron: few notes for later: scap didn't complain that logmsgbot is not working, which means SAL may miss deploys :/ [20:41:05] I don't understand this warning: [20:41:09] wmf-style: profile 'profile::openstack::base::barbican' includes non-profile class openstack::barbican::service [20:41:16] that suggests that profiles can only include other profiles? [20:48:00] are you `include`ing it? [20:48:04] that is forbidden by the style guide, yes [20:48:06] No resource should be added to a profile using the include class method, but with explicit class instantiations. Only very specific exceptions are allowed, like global classes like the network::constants class. [20:48:24] (there's also plenty of examples of it in the codebase) [20:50:13] all the includes should be in the role class [20:51:07] centralization [20:51:47] andrewbogott: the hack for that is `class { 'openstack::barbican::service': }` [20:52:33] ^ [20:52:39] I take it 'contain' is the same as 'include'? [20:52:50] I don't really understand why it defines the class and then immediately contains it [20:52:58] but lots of profiles do that, I figured it was some order of operations thing