[07:18:52] <brouberol>	 welcome back jayme. I'll have a look!
[07:21:05] <jayme>	 thanks :)
[07:57:03] <brouberol>	 I'm caught up after a couple of days of OOO. can you point me to a gerrit URL showing the failure?
[08:08:39] <brouberol>	 >  You may run rake check_deployments[diff,'dse-k8s-services/postgresql-airflow-analytics-product'] to quickly repdoduce
[08:08:39] <brouberol>	 Sigh, seems like I'm not caffeinated enough
[08:09:03] <jayme>	 won't be very helpful though as it just says 'Template did not render correctly (HEAD of origin/master).'
[08:09:07] <jayme>	 https://integration.wikimedia.org/ci/job/helm-lint/24562/consoleFull
[08:16:22] <brouberol>	 I have an idea what might be causing this: we have isolated the PG releases in their own helmfile, itself with assciated kubeconfig file by root:root, to avoid accidental deletion by an un-priviledged user
[08:16:35] <brouberol>	 so there might be a missing yaml file in CI, that we need to download
[08:18:39] <jayme>	 that sounds weird. Download from where?
[08:20:33] <brouberol>	 not sure, I'm struggling to run the rake tasks atm
[08:21:20] <brouberol>	 either in or out of docker, nothing seems to work
[08:23:44] <brouberol>	 alright, I think I should be able to reproduce. I'll report back when I know more
[08:24:40] <brouberol>	 hmm, it seems to work, with a diff regarding a newline
[08:24:48] <brouberol>	 would that newline diff fail the job though?
[08:26:12] <brouberol>	 nvm me. I had a rough night, I can't seem to be reading correctly this morning.
[08:26:12] <brouberol>	 >   +Template did not render correctly (HEAD of local branch).
[08:36:12] <brouberol>	 hmm, it's difficult to debug this without any additional debug information
[08:42:46] <brouberol>	 when running `rake "check_deployments[diff,dse-k8s-services/postgresql-airflow-analytics-product]"`, I'm seeing
[08:42:46] <brouberol>	 helmfile lint output:
[08:42:46] <brouberol>	   ----------------
[08:42:46] <brouberol>	     err: no releases found that matches specified selector() and environment(aux-k8s-eqiad), in any helmfile
[08:44:32] <jayme>	 I don't see that one :)
[08:45:26] <brouberol>	 do you see anything more than "Template did not render correctly" ?
[08:46:02] <jayme>	 no, undortunately the CI does not output the actual error. CI will do something like 'helmfile -e dse-k8s-eqiad template' on both git revisions
[08:46:23] <jayme>	 execution error at (cloudnative-pg-cluster/templates/cluster.yaml:96:5): The s3.accessKey and s3.secreyKey values were not provided
[08:46:56] <brouberol>	 ok so that indicates a missing secret file
[08:46:57] <jayme>	 is what I get. So I would assume you need to provide some (additional) fixtures
[08:47:39] <brouberol>	 that's what I meant by "missing some YAML file that we might need to download" 
[08:48:30] <brouberol>	 the file itself is  /etc/helmfile-defaults/private/dse-k8s_services/postgresql-airflow-platform-eng/{{ .Environment.Name }}.yaml
[08:49:12] <brouberol>	 brouberol@deploy1003:~$ sudo cat  /etc/helmfile-defaults/private/dse-k8s_services/postgresql-airflow-platform-eng/dse-k8s-eqiad.yaml
[08:49:12] <brouberol>	 ---
[08:49:12] <brouberol>	 s3:
[08:49:12] <brouberol>	   accessKey: XXX
[08:49:12] <brouberol>	   secretKey: XXX
[08:49:12] <jayme>	 I think you can provide those values in .fixtures.yaml
[08:49:52] <brouberol>	 nice, I'll whip up a patch for that
[08:49:58] <brouberol>	 how did you get the error message btw?
[08:50:01] <jayme>	 so helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product/.fixtures.yaml
[08:50:16] <jayme>	 I changed CI code, which forces a 'rake all' run
[08:50:19] <brouberol>	 I got lost in rake/ruby, which I'm not super familiar with
[08:50:53] <brouberol>	 aah, we had these .fixtures.yaml files, we just didn
[08:50:59] <elukey>	 o/
[08:51:04] <brouberol>	 't port them to the new helmfile PG dir
[08:51:09] <brouberol>	 ./facepalms
[08:51:15] <elukey>	 as FYI I started a chain of changes to upgrade all charts to mesh.configuration:1.13
[08:51:18] <elukey>	 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1144454
[08:51:21] <elukey>	 first batch of 20
[08:51:32] <elukey>	 so if you need to do something similar, please sync with me first :D
[09:03:53] <brouberol>	 jayme https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1144477 seems to work
[09:06:38] <jayme>	 brouberol: the diff is expected because the pg setup moves from the airflow deployments to the postgres-airflow ones?
[09:14:40] <brouberol>	 yep
[09:15:01] <jayme>	 cool. +1 then
[09:15:02] <jayme>	 thanks
[09:15:10] <brouberol>	 we separated the airflow and PG deployments, to harden the permissions on the airflow kubeconfig files, to make them, only deployable/deletable by SREs
[09:15:16] <brouberol>	 thanks for the help!
[11:21:08] <jayme>	 brouberol: dse-k8s-services/airflow-wmde/dse-k8s-eqiad seems to still be broken
[11:21:18] <jayme>	 https://integration.wikimedia.org/ci/job/helm-lint/24687/console
[11:39:44] <brouberol>	 hmm, that's odd
[11:40:34] <brouberol>	 oh, I see why. I'll send a patch to btullis
[11:43:36] <brouberol>	 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1144514 was merged. You should be fine after a rebase jayme
[12:07:04] <elukey>	 brouberol: o/ lemme know if https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1144454 is ok when you have a moment :)
[12:07:22] <brouberol>	 looking
[12:07:24] <elukey>	 I'll use it as first real test if you are ok (and if so, lemme know if we can roll it out)
[12:07:52] <elukey>	 the new config auto injects a custom histogram config for all the envoys basically
[12:08:07] <elukey>	 so we can reduce what we ingest on Prometheus
[12:08:36] <brouberol>	 this is only changing the statsd->prom exporter config for envoy itself, right?
[12:10:31] <elukey>	 the main change yes, but it may bring more due to the module update
[12:10:57] <elukey>	 ah no wait it is not statsd->prom, it is related to envoy's histogram bucket config
[12:11:25] <elukey>	 the rest in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1144454/1/charts/airflow/templates/vendor/mesh/configuration_1.13.0.tpl is basically what is already running elsewhere
[12:12:11] <elukey>	 in the diff it is under "histogram_bucket_settings"
[12:51:09] <brouberol>	 yep, that looks good to me, in the sense that I trust you on the histogram config and I'm not seeing any airflow config being affected
[12:51:23] <brouberol>	 do you want to try this out on a specific airflow instance? 
[12:52:55] <elukey>	 ideally yes, no idea what's best etc..
[12:53:57] <brouberol>	 I can checkout your patch locally and deploy it on airflow-test-k8s if you want
[12:57:24] <elukey>	 that would be great thanks!
[12:57:47] <elukey>	 the test is to check whether the envoy metrics have the histogram buckets stated or not
[12:59:45] <brouberol>	 sure, let me do this right now
[13:03:09] <elukey>	 <3
[13:10:50] <brouberol>	 I have to perform a bit of ,aintenance in that ns, I'll ping you when I get to deploying the patch
[13:11:08] <elukey>	 yes please but if you have time, I didn't mean to brutally nerd snipe you :D
[13:11:31] <brouberol>	 np!
[13:20:23] <brouberol>	 hmm somehow, I'm only seeing a diff related to the chart version
[13:22:29] <elukey>	 when it happen to me, I just run puppet (that forces some gz creation of new chart's versions etc..)
[13:22:46] <elukey>	 in this case, we didn't merge so maybe it doesn't workk
[13:22:57] <elukey>	 it should be ok to merge and then test directly in my opinion
[13:24:43] <brouberol>	 I tweaked the helmfile so that it would use a locally checked out version of the chart with your changes in them, so that should work
[13:25:01] <brouberol>	 but sure, if you want to merge and deploy, I'm not seeing any change atm, so I'm ok with that!
[13:49:00] <elukey>	 merged! 
[13:52:53] <brouberol>	 alright, I'll deploy
[13:53:41] <brouberol>	 ok, I'm indeed seeing a config change this time
[13:54:44] <brouberol>	 aaand it's deployed
[13:56:34] <elukey>	 nice!
[13:56:54] <elukey>	 what namespace? I'll check the envoy metrics after some meetings
[14:40:24] <brouberol>	 airflow-test-k8s
[14:43:12] <elukey>	 yep just tested on dse-worker 1009, it seems working! I'll ask Filippo to confirm
[14:44:56] <brouberol>	 nice!
[14:47:23] <elukey>	 Filippo confirms, it is safe to be deployed in other airflows! Thanks a lot
[14:49:31] <brouberol>	 anytime :)