[10:20:52] <kormat>	 jbond42: is it possible to call puppet functions using puppet apply?
[10:21:34] <jbond42>	 kormat: not sure its something i have tried but you should be able to
[10:22:04] <kormat>	 `puppet apply -e "profile::mariadb::multiinstance_mariadb_port('s1')"` says "unknown function", but that function call works perfectly fine from within the puppet codebase
[10:25:13] <jbond42>	 kormat you probably need to add modules in the puppet repo to your module path e.g. puppet apply --modulepath=/home/jbond/git/puppet/modules -e "profile::mariadb::multiinstance_mariadb_port('s1')"
[10:26:48] <kormat>	 ah hah. if i do that on the puppet master it works (with --modulepath=/var/lib/git/operations/puppet/modules)
[10:27:25] <kormat>	 it doesn't work from the client node though
[10:27:40] <kormat>	 (i assumed it would be executed remotely on the master)
[10:28:31] <jbond42>	 ahh no puppet apply just runs a localy master is not inplay when using puppet apply
[10:28:47] <kormat>	 ahh. right!
[10:28:49] <jbond42>	 and the agents to get the full module sent t them only facts providers and types
[10:28:58] <jbond42>	 yuo should be able to run it from your laptop though as well
[10:29:29] <jbond42>	 but if you want need hiera you will also need to use
[10:29:40] <jbond42>	 --hiera_config=/my/hiera/config.yaml
[10:29:50] <kormat>	 i see there's a utils/localrun script
[10:30:03] <kormat>	 which does some stuff
[10:30:30] <jbond42>	 my hiera yaml looks like this https://phabricator.wikimedia.org/P11560
[10:30:37] <jbond42>	 i havn;t looked at theta sctip tbh
[10:30:53] <jbond42>	 and also i still have in my log to right up a genral wiki page on this stuff :)
[10:31:07] <kormat>	 hehe, great
[10:32:45] <jbond42>	 a quick look at the localrun script suggest you would at least need to link hireadata from the puppet repo to /etc/puppet/hieradata
[10:32:57] * kormat nods
[10:32:58] <jbond42>	 and the private repo to /etc/puppet/private/hieradata
[10:33:14] <jbond42>	 the later is also true for my hiera file
[10:33:34] <kormat>	 jbond42: i don't think running puppet apply on my laptop is all that interesting in my case; the hostname won't match the node i care about
[10:35:53] <jbond42>	 kormat: not sureif it still works but you use to be able to do FACTER_fqdn=roo.example.com puppet apply ...
[10:36:15] <kormat>	 ohgod
[10:38:16] <kormat>	 it looks like that might work
[10:38:33] <kormat>	 at least `FACTER_fqdn=blah.thing puppet apply -e 'notify{ "${fqdn}": }'` outputs `blah.thing`
[10:41:52] <jbond42>	 yes i think that should work for any of the top level facts complex facts may be a bit harder
[10:42:27] <jbond42>	 you can also drop name=value pairs into  /etc/facter/facts.d/somfile.txt
[10:46:25] <jbond42>	 the later only works if run with sudo.  there is probablty an equivilant place under ~/.puppet but cant find it right now
[10:47:56] <kormat>	 :puppet: https://phabricator.wikimedia.org/P11561
[10:50:27] <jbond42>	 been a while since i looked at this, i know that instead i have set the codepath to /home/kormat/gerrit/operations/puppet.git/
[10:51:03] <jbond42>	 from there puppet will use a modulapath of $codepath/modules and manifest path of $codepath/manifests
[10:51:20] <kormat>	 let's see if setting that is "legal"
[10:51:58] <kormat>	 well, it's not complaining, so that's something
[10:52:09] <jbond42>	 i think you can set basemodulepath in puppet.conf
[10:52:45] <kormat>	 jbond42: what i don't get is why `puppet config set` would set an illegal value
[10:52:49] <kormat>	 that's crazy
[10:53:23] <jbond42>	 yes sure give it a year and such things will seem noral with pupet ;)
[10:53:32] * kormat sobs
[10:53:48] <jbond42>	 all i can say is its so much better then it use to be :)
[10:54:05] <kormat>	 stockholm syndrome is strong in the puppet world i see :)
[10:54:12] <jbond42>	 yep :D
[11:00:08] <liw>	 (config managment needs strong typing just like programming does)
[11:02:01] <kormat>	 liw: amen to that
[11:04:59] <volans>	 kormat: test in prod meme mandatory here :D
[11:29:43] <jbond42>	 volans: any idea if a reimage script or something could have accedently created this https://cas-puppetboard.wikimedia.org//node/puppet.  looks like something submited an empt facts set with the fqdn puppet
[11:36:06] <volans>	 jbond42: wierd, the reimage script just runs puppet in a host where it can ssh, so usually the hostname is properly set
[11:36:12] <volans>	 but we can verify with the logs
[11:36:19] <volans>	 if they match the datetime
[11:36:53] <volans>	 jbond42: uptime 49 days in the facts
[11:36:58] <volans>	 so I would exclude a reimage :D
[11:37:05] <jbond42>	 this is the submissin on the puppetdb just trying to see if i can see it on puppetmaster
[11:37:07] <jbond42>	 https://phabricator.wikimedia.org/P11564
[11:37:27] <jbond42>	 oh where did you get the facts from?
[11:37:37] <jbond42>	 directly in puppetdb i couldn;t see them in pupetboard
[11:37:55] <volans>	 jbond42: it's puppetmaster1001 according to the serialtag
[11:38:03] <volans>	 jbond42: from https://puppetboard.wikimedia.org//node/puppet
[11:38:08] <volans>	 the facts on the right
[11:38:39] <volans>	 maybe some test of running puppet apply or similar?
[11:38:56] <jbond42>	 ahh something elses which is not working correctly on the cas- domain :(
[11:40:06] <jbond42>	 yes could be, ok i think its best to just leave it for the puppetdb to purge it and ignore this hostname from netbox reports just wanted to double check
[11:40:14] <volans>	 sorry I cheated but at lunch so cas- was not doable :D
[11:40:49] <jbond42>	 :) thanks for the help
[11:41:02] <volans>	 np
[11:49:48] <volans>	 jbond42: unrelated, we have a bunch of logs in the puppetmaster saying
[11:50:00] <volans>	 Error encoding a 'replace catalog' command for host '$HOSTNAME' ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB, see debug logging for more info
[11:50:03] <volans>	 for various $HOSTNAME
[11:58:36] <jbond42>	 volans: this looks like when the catalouge has binary data and it fails to convert it to json.  it normally has a message straight after saying falling back to pson
[12:00:08] <volans>	 yeah I wondered teh same, just not sure if we're missing anything in puppetdb or is able to anyway send the data to it
[12:00:22] <jbond42>	 ack ill look more indepth
[12:02:08] <volans>	 very low-prio
[12:02:10] <volans>	 thanks!
[12:02:29] <jbond42>	 ack :)
[12:43:18] <godog>	 I've changed the default datasource for 'host overview' to Thanos, meaning there's no need to select the per-site datasource anymore, let me know what you think! https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1
[12:57:39] <jbond42>	 looks good thanks godog
[13:01:13] <godog>	 np jbond42 ! I'll followup on sre@ as well with more examples
[13:02:24] <jbond42>	 great thank :)
[13:53:34] <vgutierrez>	 godog: that's pretty cool <3 thanks!
[14:00:14] <jynus>	 conectivity issues on B3 mgmt network?
[14:00:26] <jynus>	 (codfw)
[14:00:46] <marostegui>	 jynus: I checked that mgmt and it looks up, so maybe it was just a punctual thing
[14:00:50] <marostegui>	 like very brief even
[14:01:39] <jynus>	 I remember being told some switches were old on some mgmt networks, but not sure if it was tehre
[14:02:32] <jynus>	 there was another alert on D7 mgmt, but not sure if related
[14:02:33] <stashbot>	 D7: Testing: DO not merge - https://phabricator.wikimedia.org/D7
[14:02:57] <cdanis>	 jynus: yes, there was a blip, bad config change that was auto-rolled-back, should be fine now
[14:03:03] <jynus>	 ok
[14:13:43] <godog>	 vgutierrez: yw, glad it helps!
[22:20:10] <SPF|Cloud>	 I am aware folks are moving from varnish to ats, but I have a question about the varnish-be setup. How is the list of varnish backends exactly generated/adjusted when using confd/etcd?
[22:27:25] <SPF|Cloud>	 we have a cluster of cache proxies running Varnish, but taking down backends using varnishadm backend.set_health commands is not persistent (and needs to be done through dsh/salt/cumin) and deploying puppet changes is a rather tedious process - we'd like to automate backend changes (not only in Varnish, but more services) as much as possible and make them persistent. Wikitech's Confd page is a good start, but I'm looking for a more
[22:27:25] <SPF|Cloud>	 detailed of the puppet-confd integration and production templates for varnish (and possibly more?) (:
[22:30:14] <Reedy>	 isn't everything basically in the puppet repo? :P
[22:31:49] <cdanis>	 SPF|Cloud: as of a while ago (last November?), varnish-be is no more, we use ATS in its place instead
[22:31:59] <cdanis>	 I'm doing a quick look to see what now-deleted files I can point you to
[22:32:37] <SPF|Cloud>	 Reedy: I assume it was, but with such a large setup, it can be hard to find what you're looking for :)
[22:34:53] <cdanis>	 SPF|Cloud: https://gerrit.wikimedia.org/r/c/operations/puppet/+/217818 is the creation of the varnish/confd integration as it used to exist long ago; it's the best single 'starting point' I've found
[22:35:37] <SPF|Cloud>	 right, so only varnish-fe left until that's fully decommissioned? my sort of volunteer involvement in SRE tasks must have been a long time ago then
[22:36:02] <cdanis>	 yeah, a ton of our routing and rewrite and ratelimiting logic is still implemented in VCL, so varnish-fe lives on for now
[22:37:04] <bd808>	 cdanis: which immediately reminded me of https://bash.toolforge.org/quip/AU7VTzhg6snAnmqnK_pc
[22:37:36] <bd808>	 but I have faith that ATS will kill off the VCL hacks eventually :)
[22:37:50] <SPF|Cloud>	 got that, we started using varnish 4 in 2015 (free wiki farm, so was able to steal some config from your puppet repo), and backend TLS in ATS is one major reason to consider it
[22:37:55] <cdanis>	 bd808: at my old employer there was a saying: "there's two ways to do anything! the deprecated way, and the way that's not ready yet"
[22:38:21] <cdanis>	 different but similar from here :)
[22:38:50] <bd808>	 we usually have at least 2 deprecated ways ;)
[22:39:07] <SPF|Cloud>	 with simple VPSs you never benefit from 'internal networks', so every traffic has to be encrypted in some way, but that's no fun with varnish backends
[22:39:27] <bd808>	 nothing is fun with varnish
[22:39:29] <cdanis>	 SPF|Cloud: yeah we use both ATS and Envoy for internal TLS termination
[22:39:35] <cdanis>	 and probably still use nginx for some of it
[22:39:41] <SPF|Cloud>	 hmm, envoy?
[22:39:49] <cdanis>	 but most things have moved over to Envoy instead of nginx internally
[22:39:52] <cdanis>	 yeah, https://www.envoyproxy.io/
[22:40:26] <SPF|Cloud>	 ah, that sounds cool :)
[22:40:51] <cdanis>	 I don't love their marketing copy, but it is good at what it does :)
[22:41:55] <SPF|Cloud>	 where do you put such a thing in the stack? between ATS and the Apache daemon on the backend?
[22:42:03] <cdanis>	 yeah exactly
[22:42:36] <cdanis>	 each edge proxy has an "ATS sandwich": ATS-TLS on the edge terminating user connections, which talks to varnishfe, which talks to ATS-BE
[22:42:48] <cdanis>	 then ATS-BEs talking to Envoys via LVS
[22:43:06] <bd808>	 careful cdanis, you will end up being an admin at miraheze if you keep being useful to SPF|Cloud :)
[22:43:16] <SPF|Cloud>	 sst bd808 :)
[22:43:19] <cdanis>	 ahah
[22:43:29] <SPF|Cloud>	 I'm actually surprised you knew that...
[22:43:51] <cdanis>	 those Envoys are actually talking to Mediawiki in two ways, IIRC: one, the obvious one, terminating the connections from ATS-BE, and the other way is some code paths in Mediawiki are using Envoy to also make outbound HTTPS API calls to other services
[22:44:13] <cdanis>	 because having PHP do TLS calls at high volume is, uh, fraught
[22:44:14] <SPF|Cloud>	 but we share(d) a lot of the same technical challenges, so I'm trying to acquire new ideas here
[22:45:21] <SPF|Cloud>	 is that so problematic in php? architectural issue?
[22:45:36] <cdanis>	 you don't get to reuse connections, so you're doing a full TLS negotiation every time
[22:45:50] <bd808>	 its just that tls is expensive and at scale that adds up
[22:46:11] <bd808>	 as more things are split from core into remote services it becomes a bigger problem
[22:46:15] <cdanis>	 and if you're talking to something faraway the rtts add up too
[22:46:25] <SPF|Cloud>	 yes, that's true there
[22:47:22] <SPF|Cloud>	 how about connections to databases (we do one-way TLS in MariaDB)? same issue?
[22:47:28] <bd808>	 somebody should actually do a blog post for techblog about the magic graph dive that getting envoy everywhere enabled
[22:47:35] <cdanis>	 bd808: seriously, those graphs were amazing
[22:47:48] <cdanis>	 cpu and network traffic and latency
[22:49:07] <cdanis>	 I'm actually not sure if we do TLS talking to MariaDB, I think maybe we don't?  AIUI the Mediawiki<->MariaDB traffic all stays within one datacenter so there's not the same concern as with edge PoPs
[22:49:51] <SPF|Cloud>	 that's correct, unfortunately we don't have (and never had) that advantage
[22:49:55] <bd808>	 I think it has been talked about but not done, and for the TLS handshake costs if I remember
[22:50:25] <Reedy>	 looks like tls for replication, but not for "all clients"
[22:50:25] <Reedy>	 https://phabricator.wikimedia.org/T157702
[22:50:51] <SPF|Cloud>	 I am not sure what the impact looks like in MariaDB, only that Redis is a pain with regards to support
[22:51:37] <bd808>	 we don't do persistent connections from php to maria, so it would be pretty costly
[22:52:06] <bd808>	 and if we turned on persistent connections that has its own issues
[22:52:24] <bd808>	 php's shared nothing is a blessing and a curse
[22:53:01] <cdanis>	 heh I didn't realize PHP supported persistent db connections
[22:53:10] <bd808>	 "sort of"
[22:53:22] <cdanis>	 yeah, it keeps a freelist of idle ones and reuses when it can?
[22:53:32] <bd808>	 they get pooled a the runtime container, so apache or fcgi
[22:53:37] <cdanis>	 ofc
[22:54:19] <bd808>	 so yeah, its a pool checkout to use on basically. Its a built-in feature of the connector code on the PHP side
[22:55:41] <bd808>	 pool leaks happen, checkout contention, and every once in a while an upstream bug that hands out a connection to multiple clients. That last one may is mostly "fixed" upstream. I haven't used php with pooled connections on for ~10 years
[22:56:49] <bd808>	 and in our deploy there would likely be issues with r/o slice routing if the pool duration was set too high
[22:58:21] <bd808>	 slice? section? I never remember which "s" word is preferred since what we do is certainly not sharding
[23:01:23] <bd808>	 did I derail everything SPF|Cloud or did you get the next set of breadcrumbs you needed?
[23:02:04] <SPF|Cloud>	 oh, I definitely got what I needed :) and even learned some new things about envoy and persistent connections in php
[23:03:30] <bd808>	 at $DAYJOB-1 we used php + oracle (*shudder*) and I spent a lot of hours with gdb and the oracle driver layer tracking down edge case issues
[23:03:38] <cdanis>	 bd808: you gave me a bunch of fun thoughts involving nasty kludges with passing around filedescriptors over unix sockets combined with the new in-kernel TLS support (which can transparently encrypt FDs)
[23:04:07] <cdanis>	 would be a fun way to add "native" persistent TLS connections to PHP on Linux ;)
[23:05:53] <bd808>	 cdanis: do NOT tell _j.oe_ I led you down that path :)