[08:56:03] who owns cloudelastic ? [08:56:57] is it the cloud team or the elastic team? [08:57:30] afaik that'd be search [08:59:00] ok, there are 4 hosts alerting since 20 to 7h ago [09:00:40] gehel: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudelastic1001&service=Rate+of+JVM+GC+Old+generation-s+runs+-+cloudelastic1001-cloudelastic-chi-eqiad [09:01:18] lemme silence that... [09:01:37] cloudelastic is a bit overloaded during reindex [09:02:59] we're waiting for more hardware, but unlikely to be racked any time soon... [09:07:17] thanks! [09:07:44] gehel: there is also this timeout https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudelastic1002&service=ElasticSearch+setting+check+-+9200 [09:08:03] maybe transcient though [09:12:30] yep, related, but most likely transient [09:12:54] and low frequency check, I'm re-running it [09:14:22] XioNoX: thanks for the ping! [09:15:10] no pb, doing my morning round of /alerts [09:29:21] I also opened https://phabricator.wikimedia.org/T248660 for the ftpsync emails but dunno who is in charge of them [09:31:19] I was chatting with mo.ritz about that and apparently was para.void that setup the thing [09:31:38] should I assign the task to him? :) [09:32:12] volans: Daniel took it [09:32:45] ack, I commented with my context [09:33:08] thanks for the task, you beat me [09:33:13] and Daniel solved the mystery [11:27:52] I have some hosts that need to get pooled and I'd like a sanity check on my confctl if anyone has a sec [11:29:30] have they been scap pooled to make sure they have the updated code? (in particular if they were pooled=inactive) [11:31:10] they are pooled=inactive atm - does that mean they won't receive scap deploys? [11:32:24] they have been deployed to it seems, there is new code in /srv/deployment/restbase/deploy/ [11:32:56] I assumed mediawiki hosts [11:33:24] for mediawiki hosts with pooled=inactive are removed from the dsh groups and hence don't get updates [11:33:32] ahh [11:33:41] as they might cause errors because they might have issues [11:35:52] <_joe_> volans: it's the same for restbase IIRC [11:36:04] <_joe_> hnowlan: I'm here if you need a sanity check [11:36:25] that I don't know but wouldn't surprise if we used the same approach [11:36:54] <_joe_> volans: unless they keep the list in the deploy repo [11:37:03] <_joe_> which is also possible, because scap3 [11:38:30] hnowlan: as a rule of thumb, check also the weight of the hosts in conftool, compared with the others in the same cluster. [11:39:03] volans: yeah, they're inactive and weight 0 according to conftool [11:39:23] <_joe_> volans: we've take the discussion in private [11:39:35] ack, all yours [16:13:17] in case anyone is keeping track of whether I have any hope of understanding what's going on with Puppet and case-sensitive identifiers [16:13:24] just noticed https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/systemd/types/timer/schedule.pp spells it "Datetime" and https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/systemd/types/timer/datetime.pp spells it "DateTime" but everything still seems to work anyway [16:19:19] the correct one is Datetime IIRC as puppet does basically UCFirst of every part of the namespace [16:19:31] actually lower()+UCFirst [16:19:56] volans: which is funny because the puppet documentation itself refers to typenames including CatalogEntry and NotUndef 🤔 [16:20:07] lol [16:21:00] I'm touching this anyway so I guess I'll change it to Datetime for consistency, thanks [16:21:12] also, let the record show, >:| [16:21:35] in cumin's puppetdb backend I use capwords() to do the trick [16:39:30] <_joe_> rlazarus: yeah I guess it's my fault, and that's one of the many ways in which puppet is amazing [16:40:23] it definitely isn't your fault, we should probably have a linter for this [16:40:31] but I'm already a couple of yaks deep so I'm gonna put that thought on hold [16:50:59] rlazarus: hey, wait, were you cleaning up systemd::timer ?? [16:51:00] 👀 [16:51:07] are you in need of any more yaks, sir [16:52:49] https://gerrit.wikimedia.org/r/c/operations/puppet/+/551281 feels like forever ago now, but it does need to get fixed at some point [16:55:06] unrelated work sorry [16:55:46] haha fair enough [16:56:00] I would love if Someone fixed it though 👀 [17:15:40] <_joe_> cdanis: I'm starting to think we should turn systemd::timer into a custom resource definition [17:16:19] i don't know all that much about custom resources but from what little i do i'm inclined to agree [17:17:06] <_joe_> cdanis: although your patch is almost-correct [17:17:36] it felt like it was close, and I gave up when I realized I'd have to trace the hierarchy of which things instantiate which other things using pencil and paper [17:19:31] cdanis: puppetboard might help a bit, not too much, just a bit :) [17:22:33] <_joe_> cdanis: it's ugly I know, but I think it systemd::service gives an overall nice interface as long as you don't need to look under the hood [17:22:38] it oes [17:22:40] <_joe_> and I don't think you needed to either [17:22:57] okay, well, amendments to my patch very encouraged ;) [17:23:49] <_joe_> ahah [17:26:40] that's some next-level "patches welcome" [17:32:30] <_joe_> yeah tbh I think if anything that patch needs line removals [17:49:38] <_joe_> cdanis: oh there is significant work to be done [17:49:45] <_joe_> we now allow multiple intervals [17:49:49] ahah [17:50:10] sensible, but yeah, probably needs a rewrite [17:52:09] XioNoX: re: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/584008 when were you thinking of deploying? [17:52:41] next week [17:53:04] +1! [18:05:12] <_joe_> cdanis: {{done}} [18:05:17] wow I didn't expect that to work [18:10:46] <_joe_> well we're still unsure if it works :P [18:11:00] <_joe_> puppet can surprise you, always [18:11:30] <_joe_> oh and I forgot to do another thing [18:11:37] <_joe_> apart from s/status/show [18:11:48] <_joe_> quoting the unit argument [18:14:49] ah yes [18:20:29] gerrit pro tip: appending (for example) /8..9 to the URL [18:20:41] _joe_: much better, thanks [19:11:35] _joe_: augh, it turns out `systemd-analyze calendar` didn't exist until buster 🙃 [19:13:13] rlazarus: well then shipping that patch is going to be a stretch [19:13:23] 😠 [19:13:44] hey, don't buster my chops [19:14:42] well, any sujesstions? [19:25:30] bbiab [19:33:24] <_joe_> rlazarus: ahem, not sure tbh [19:42:33] distro name puns -- https://bash.toolforge.org/quip/AVNCt4EC_GUtdAQqNVg4 [19:45:29] I think I'm just going to make the horrible regex incrementally horribler, and not sink any more time into this right now