[06:19:40] <elukey>	 good morning
[06:19:59] <elukey>	 I cannot reach the mgmt console of mc2028, the host is down
[06:20:43] <elukey>	 impact is very limited, the mcrouters in codfw are showing tkos for the failed shard
[06:20:46] <elukey>	 https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=codfw%20prometheus%2Fops&var-cluster=All&var-instance=All&var-memcached_server=All&from=now-6h&to=now
[06:20:57] <elukey>	 (so basically what gets replicated to codfw)
[06:21:42] <elukey>	 but we are currently using the codfw gutter
[06:21:43] <elukey>	 https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=memcached_gutter&var-instance=All&from=now-6h&to=now
[06:21:51] <elukey>	 (I think it is the first time)
[06:23:46] <elukey>	 ah and also in theory redis on mc1028 is not replicated anymore to codfw due to host down
[06:25:13] <_joe_>	 elukey: yes, that's a problem we probably want to solve before the dc switchover
[06:25:22] <_joe_>	 rzl / effie ^^
[06:26:12] * volans double checking is not a dns issue due to recent automations
[06:27:39] <volans>	 all looks good on that side (record matches old manual one)
[06:31:32] <volans>	 according to icinga both went down at the same time more or less
[06:31:32] <volans>	 (host and mgmt)
[06:35:31] <elukey>	 volans: yep I see the port down on the switch
[06:40:22] <elukey>	 opening a task
[06:41:29] <volans>	 thx
[06:44:39] <elukey>	 https://phabricator.wikimedia.org/T260224
[06:45:16] <elukey>	 in theory we can wait in this state that Papaul checks what happened, and remove the host from the mcrouter config only if it is a permanent failure
[06:53:01] <_joe_>	 yes
[06:53:13] <_joe_>	 also if it's permanent we can go with an async replication for redis
[06:53:28] <_joe_>	 puppet supports having multiple shards on a single server, in case of need
[09:06:39] <kormat>	 godog: found a new weirdness in pontoon - it looks like maybe rsyslog is broken
[09:07:50] <godog>	 kormat: oh! broken badly ?
[09:08:08] <kormat>	 godog: only in that it stopped logging anything 3-4 days ago
[09:10:01] <kormat>	 i tried restarting it on one machine; on the third restart, it finally managed:
[09:10:06] <kormat>	 Aug  9 02:23:41 zarcillo0 diamond[25494]: Queue full, check handlers for delays
[09:10:06] <kormat>	 Aug 12 08:58:32 zarcillo0 systemd[1]: Stopping System Logging Service...
[09:10:12] <kormat>	 and that was it, nothing further
[09:12:49] <godog>	 :( ok LMK if you find sth obvious, I haven't run into that yet, will be able to take a look later
[09:24:53] <kormat>	 godog: the issue is caused by /etc/rsyslog.d/30-remote-syslog.conf
[09:25:00] <kormat>	 if i delete that file and restart rsyslog, then it functions again
[10:52:07] <marostegui>	 godog: the victorops app just stopped working for me saying that my SSO credentials have expired, however, the same user/pass does work on portal.victorops.com
[10:55:49] <marostegui>	 This is weird it worked again without me doing anything :-/
[10:57:51] <jynus>	 it happend to me something similar, I closed the app and relogged in for it to work (for some reason it wanted to use SSO?)
[11:00:36] <marostegui>	 yeah, the second time it didn't ask me for user/pass it just logged in
[12:01:39] <godog>	 marostegui: gah :( so it "recovered" without doing anything
[12:02:28] <marostegui>	 yep
[12:11:23] <kormat>	 godog: fuh. getting cumin to work in pontoon is a royal clusterfuck (pun intended)
[12:14:40] <kormat>	 godog: if there was some way of having a per-pontoon-stack 'private' repo, or being able to override parts of the existing labs/private one, that would remove a huge headache
[12:14:48] <kormat>	 (but i'm kinda guessing the answer is 'lolno')
[12:23:57] <godog>	 kormat: mmhh we could arrange that yeah, what's the issue(s) ATM ?
[12:24:53] <kormat>	 godog: labs/private contains a bogus cumin_master ssh key. that gets installed on the cumin pontoon host, and cumin (the tool) tries to use it
[12:25:21] <kormat>	 and from there everything gets terrible
[12:25:47] <kormat>	 if i overwrite the key on the cumin host, puppet nukes it next run
[12:26:02] <kormat>	 so i need to have puppet itself honour the key
[12:26:45] <godog>	 mmhh yeah, I wonder if having a valid keypair in there would make the problem better or ideally go away ?
[12:26:54] <godog>	 as opposed to SNAKEOIL
[12:27:19] <kormat>	 soo, it would 'fix' the problem, but it would also publish a key that has root access to a bunch of VMs
[12:27:57] <godog>	 ok so that's a non-starter obviously
[12:28:34] <kormat>	 yeah :)
[12:28:58] <kormat>	 oh, huh
[12:29:55] <kormat>	 a 'solution' comes to mind: manage /var/lib/git/labs/private/ on the pontoon puppetmaster in the same way we manage the puppt repo
[12:30:10] <kormat>	 that way we can put secrets in there that will never leave the pontoon project
[12:31:35] <godog>	 yeah that might work, at least it is a start
[12:31:55] <kormat>	 i'll give it a shot. thanks for being my 🦆
[12:32:58] <godog>	 for sure!
[12:35:00] <godog>	 FWIW the pie in the sky in my mind for the "private repo" story is to have a public description/manifest of the private material we want and then we can (re)generate it ad-hoc anytime we want
[12:48:48] <kormat>	 oh thank $deity. it works \o/
[12:54:55] <godog>	 neat kormat!
[12:55:25] <kormat>	 godog: i spent 4h on this before having that 💡 moment. this is a big relief
[12:56:06] <godog>	 kormat: easy to believe! ducks seldom disappoint
[12:56:21] <kormat>	 :D
[13:02:55] <kormat>	 godog: re: pie in the sky, that's pretty much what i had done for $lastjob. the fake private data was generated locally for that specific test env
[13:04:23] <godog>	 aye, that'd be The Way™ to do it
[13:05:17] <godog>	 https://i.redd.it/mrvjjbpi2p541.jpg
[13:05:25] <kormat>	 haha
[20:46:58] <ryankemper>	 I'm working on moving some logic from the `Cookbooks` repo to `spicerack`. The method I'm moving from cookbooks makes a prometheus query with `spicerack.prometheus()`, how can I access the same object within the spicerack repo?
[20:47:32] <ryankemper>	 From https://doc.wikimedia.org/spicerack/master/introduction.html#spicerack-automation-framework-for-the-wmf-production-infrastructure it looks like Cookbooks define a `run(args, spicerack)` function that gets called, but I'm not sure what originally creates/passes in the `spicerack` object
[20:48:58] <volans>	 hey ryankemper
[20:49:20] <ryankemper>	 hey
[20:49:47] <ryankemper>	 I wonder if I can just do something like `from spicerack.prometheus import prometheus`
[20:50:01] <volans>	 so, from the Cookbooks PoV everything is accessible via the spicerack object that is an instance of Spicerack()
[20:50:28] <volans>	 setup by the cookbook.py before calling the specific cookbook
[20:50:42] <volans>	 as described in https://doc.wikimedia.org/spicerack/master/introduction.html
[20:50:42] <ryankemper>	 right
[20:51:12] <volans>	 now, you can totally import other  spicerack modules from whithin spicerack
[20:51:37] <volans>	 and in this case the Prometheus class doesn't have any __init__ that requires specific parameters
[20:51:48] <ryankemper>	 looks like there's a https://doc.wikimedia.org/spicerack/master/api/spicerack.prometheus.html, so probably something like `from spicerack.prometheus import Prometheus`
[20:51:53] <volans>	 so you can totally just from spicerack.prometheus import Prometheus
[20:52:06] <volans>	 and then use Prometheus.query()
[20:52:18] <ryankemper>	 perfect, thanks for explaining that
[20:52:23] <volans>	 *but*, if what you're doing
[20:52:39] <volans>	 can be easily generalized maybe is worth to add to the prometheus module itself
[20:52:43] <volans>	 not sure what you want to do
[20:52:51] <volans>	 ofc if it's specific to ES stuff
[20:52:54] <ryankemper>	 Yeah, in this case it's a very specific elasticsearch use case where we need to make a certain query
[20:52:55] <volans>	 it's ok to have it there
[20:53:02] <volans>	 k
[21:01:20] <volans>	 ryankemper: for development you can run 'tox' locally that runs all the checks of CI (as long as you have at least one python version of the ones supported)
[21:01:34] <volans>	 Ci will run them for all versions, to be clear
[21:01:50] <volans>	 if you need to re-run a single one: tox -e py37-unit # for example
[21:02:31] <volans>	 and to run only specific tests tox -e py37-unit -- -k test_elasticsearch_cluster
[21:02:34] <volans>	 for example
[21:05:01] <volans>	 lmk if you need an hand for those or the type hint stuff
[21:23:19] <ryankemper>	 volans: thanks I was just looking into running it locally
[21:23:25] <ryankemper>	 that will help a lot
[21:23:53] <volans>	 tox -av # to list all envs is also helpful
[21:29:26] <cdanis>	 you can also cheat, ryankemper, and use the tox venv as a venv to do other local testing of your changes :)
[21:43:46] <volans>	 I always do that, I have tox manage my venvs, I just . .tox/py37-unit/bin/activate
[21:43:50] <volans>	 and do stuff :D
[21:43:55] <cdanis>	 yeah it's great