[08:26:51] <_joe_>	 moritzm: having to reinstall the laptop with ubuntu (only distro, apart from RHEL, officially supported by lenovo...) I decided to start fresh and use wmf-sre-laptop, that resulted in a few patches :P
[08:27:28] <_joe_>	 in particular - I've moved git-review to 'Recommends:'
[08:27:54] <_joe_>	 I hate git review and I have an alias in git to push to the refs/for/master instead
[08:28:35] <_joe_>	 there's also one patch to add a functionality to auto-update the ssh config whenever one updates the package
[08:30:26] <moritzm>	 cool, will check these late
[09:41:40] <apergos>	 thank you, I wish we could update the 'how to use git' docs to allow one to easier see the path for git-review or plain git instead of plain git just not bing covered
[09:41:52] <apergos>	 as a non git review user
[09:43:43] <volans>	 apergos: use gitlab :-P
[09:44:09] <apergos>	 wow the trolls are awake early today I see, and in good form too! ;-D
[09:44:40] <volans>	 :)
[16:46:29] <jynus>	 db1077 is running out of space, but this is a testing host, so don't worry too much (alerts disabled)
[17:11:54] <marostegui>	 just purged its binlogs
[17:19:54] <ottomata>	 shdubsh: can you link me to the ECS labels field?
[17:20:00] <ottomata>	 looking for it in https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html
[17:21:29] <ottomata>	 AH FOUND IT
[17:21:30] <ottomata>	 in base
[17:23:07] <shdubsh>	 ottomata: Yes, they nest their documentation which can make it a bit harder to find fields.  Because we need the option to extend ECS, we host docs here which (IMO) are a bit easier to search: https://doc.wikimedia.org/ecs/
[17:24:14] <ottomata>	 cool, thanks, just commented on https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/639529 to use labels
[17:35:02] <apergos>	 someone working while they are on vacation
[17:35:11] <apergos>	 I'm lookin at you, maro stegui
[17:50:45] <Krinkle>	 effie: multi dc meeting with SRE/CPT/perf settled last month on using dc-local redis for ChronologyProtector. And CPT are also leaning toward Redis for CentralAuth tokens long term. Is the new task meant to explore a different direction for those?
[17:51:26] <effie>	 for our redis needs that do not need to be multi-dc
[17:51:35] <effie>	 we have the rdb* hosts per datacenter
[17:52:23] <effie>	 this task is about the redis instances living within the memcached clusted
[17:53:49] <Krinkle>	 Right. So we want to merge those clusters. Ok
[17:53:56] <effie>	 not merge
[17:54:14] <effie>	 we want to eliminate one of them
[17:54:19] <effie>	 and keep the other one
[17:54:46] <effie>	 now, as we mentioned in the meeting we had about ChronologyProtector
[17:54:47] <Krinkle>	 Right. That's what I mean
[17:55:02] <effie>	 if you do to not need any specific redis features
[17:55:33] <effie>	 or there are serious reasons, it is preferable to use memcached, we already have a large cluster, mcrouter has been working well for us
[17:55:57] <effie>	 and it is fault tolerant, on our rdb cluster, if one rdb host goes down
[17:56:12] <effie>	 we have to manually elevate a secondary server to a master
[17:56:30] <effie>	 and there will be some data loss in the meantime
[17:56:36] <effie>	 it is not fault tolerant
[17:58:30] <Krinkle>	 Afaik very little is specific to anything in that area. Everything is possible :)
[17:58:46] <effie>	 what do you mean ?
[17:59:42] <Krinkle>	 If given time and priority we can move almost most any key value store to something else. Our uses are generally simple
[18:00:11] <effie>	 then, if this has not been implemented yet
[18:00:43] <effie>	 you should consider using memcached
[18:01:10] <Krinkle>	 No, it is implemented. That's not what I meant
[18:01:16] <effie>	 ah
[18:01:18] <effie>	 sorry
[18:01:20] <effie>	 :)
[18:01:28] <Krinkle>	 I'm saying it's not special, and that nothing else is special either
[18:01:57] <Krinkle>	 But also it's good enough and we need to paying attention to other fires
[18:02:37] <Krinkle>	 But moving around etc seems fine. I can ignore this mostly then.
[18:38:42] <ryankemper>	 Does anyone have a good handle on puppet class parameters? I've set some parameters for a specific class but it looks like the defaults are still getting taken (context incoming...)
[18:39:01] <ryankemper>	 At a high level, `icinga::monitor::elasticsearch::base_checks` is ultimately where the variables I care about end up getting used for real. `icinga::monitor::elasticsearch::base_checks` sets defaults that I don't want for my use case
[18:39:22] <ryankemper>	 so I've initialized the variables to the value I want in `profile::icinga`, which gets passed through to `icinga::monitor::elasticsearch::cirrus_cluster_checks`, which should then call `icinga::monitor::elasticsearch::base_checks` (code links incoming...)
[18:39:51] <ryankemper>	 My recent patch added these variables: https://github.com/wikimedia/puppet/blob/bfb5060cf85ef281d45d0dffd61b2f1de54b8db7/modules/profile/manifests/icinga.pp#L29-L32 which should get passed through from here: https://github.com/wikimedia/puppet/blob/bfb5060cf85ef281d45d0dffd61b2f1de54b8db7/modules/profile/manifests/icinga.pp#L72-L78
[18:40:32] <ryankemper>	 Passed through to https://github.com/wikimedia/puppet/blob/bfb5060cf85ef281d45d0dffd61b2f1de54b8db7/modules/icinga/manifests/monitor/elasticsearch/cirrus_cluster_checks.pp#L2-L6 which feeds into here https://github.com/wikimedia/puppet/blob/bfb5060cf85ef281d45d0dffd61b2f1de54b8db7/modules/icinga/manifests/monitor/elasticsearch/cirrus_cluster_checks.pp#L13-L22
[18:42:26] <ryankemper>	 So ultimately I'm trying to get those variables passed through to `icinga::monitor::elasticsearch::base_checks`, but the behavior I'm seeing makes me think it's still using these defaults instead: https://github.com/wikimedia/puppet/blob/bfb5060cf85ef281d45d0dffd61b2f1de54b8db7/modules/icinga/manifests/monitor/elasticsearch/base_checks.pp#L7-L8
[18:43:26] <ryankemper>	 So I guess to close out my line of questioning, given how I defined the variables in the initialization of `profile::icinga`, is my understanding correct that those variables end up being passed into `icinga::monitor::elasticsearch::cirrus_cluster_checks` and thus should be passed through to `icinga::monitor::elasticsearch::base_checks`?
[20:37:28] <shdubsh>	 ryankemper: what is your goal wrt those parameters?  Redefine defaults?
[20:59:32] <ryankemper>	 shdubsh: So, basically I want to pass in my own values for a specific codepath but not mutate the defaults incase they're relied upon elsewhere
[20:59:58] <ryankemper>	 Basically `icinga::monitor::elasticsearch::base_check` is used generically by anything using elasticsearch (not just cirrus), so logstash etc in addition to cirrus
[21:01:20] <ryankemper>	 cirrus is broken up into two codepaths, one that is an alert for cloudelastic and another is an alert that lives on the two `alert` hosts and checks `cirrus->codfw` and `cirrus->eqiad`
[21:02:23] <ryankemper>	 The former, cloudelastic I've already got working with this patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/634391 but the alerts for codfw and eqiad were still going
[21:03:14] <ryankemper>	 So to fix codfw and eqiad I made this patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/636811, and that patch didn't fix the issue for codfw/eqiad which led to my above line of questioning
[21:04:15] <shdubsh>	 ah, I think Hiera may be the way to go here.  Do you know the roles which you want to apply the new values?
[21:10:07] <ryankemper>	 Hmm, if we took a role approach it'd probably be whatever role includes `profile::icinga`, because `profile::elasticsearch::cirrus` isn't actually used by the icinga instances running the checks for eqiad/codfw cirrus
[21:10:39] <ryankemper>	 In this case though we have the file `modules/icinga/manifests/monitor/elasticsearch/cirrus_cluster_checks.pp` which handles the part I care about and nothing else
[21:11:05] <ryankemper>	 But I believe we don't want to avoid defining parameters in classes directly and want to do it at the profile level (IIRC)
[21:11:37] <ryankemper>	 Which is why with the current approach I was/am trying to change `profile::icinga` to have the values I want (which may not be the right way to do it to be clear)
[21:12:00] <ryankemper>	 Basically the flow looks like `profile::icinga` -> `icinga::monitor::elasticsearch::cirrus_cluster_checks` -> `icinga::monitor::elasticsearch::base_checks`
[21:16:49] <ryankemper>	 s/don't want to avoid/want to avoid
[21:24:11] <shdubsh>	 I think I follow.  Passing it through like that should be fine.  So you're seeing (for example) $shard_size_warning is still 50 when you expect it to be 80 from the default_value from profile::icinga?
[21:36:18] <ryankemper>	 shdubsh: Exactly
[21:37:35] <ryankemper>	 So I'm wondering if it's just a simple misunderstanding of puppet and the way I'm overriding it is wrong...like I'm imagining the earliest a variable gets set is the value it carries
[21:40:24] <ryankemper>	 I think my next step is to look at `monitoring::service`, find where the nagios service lives and verify which values I'm seeing
[21:41:20] <ryankemper>	 Since this is on the codepath that *doesn't* use nrpe: https://github.com/wikimedia/puppet/blob/5cfe03af70c6d0c4e1d413e394569df568e7fd0b/modules/icinga/manifests/monitor/elasticsearch/base_checks.pp#L35-L40
[21:42:26] <shdubsh>	 ryankemper: that's where I was going to go next.  Which hosts do you suspect aren't getting the right value?
[21:43:47] <shdubsh>	 AFAICT, check_elasticsearch_shard_size is warning 80 and crit 100 in all cases
[21:53:19] <ryankemper>	 shdubsh: It was warning on `search.svc.eqiad.wmnet` (which actually runs on the icinga servers) previously but I just checked back now and it's actually in an `Unknown` state: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=search.svc.eqiad.wmnet&service=ElasticSearch+shard+size+check+-+9243
[21:53:43] <ryankemper>	 Totally should have re-checked first :x it's weird though because I forced a re-check last friday and it stayed alerting
[21:54:13] <ryankemper>	 So now I need to figure out how `UNKNOWN - int() argument must be a string, a bytes-like object or a number, not 'NoneType'` is happening
[21:54:59] <ryankemper>	 shdubsh: Okay well I think I can actually figure it out from here, since now something's properly broken :P
[21:58:23] <shdubsh>	 right on :)