[10:20:52] jbond42: is it possible to call puppet functions using puppet apply? [10:21:34] kormat: not sure its something i have tried but you should be able to [10:22:04] `puppet apply -e "profile::mariadb::multiinstance_mariadb_port('s1')"` says "unknown function", but that function call works perfectly fine from within the puppet codebase [10:25:13] kormat you probably need to add modules in the puppet repo to your module path e.g. puppet apply --modulepath=/home/jbond/git/puppet/modules -e "profile::mariadb::multiinstance_mariadb_port('s1')" [10:26:48] ah hah. if i do that on the puppet master it works (with --modulepath=/var/lib/git/operations/puppet/modules) [10:27:25] it doesn't work from the client node though [10:27:40] (i assumed it would be executed remotely on the master) [10:28:31] ahh no puppet apply just runs a localy master is not inplay when using puppet apply [10:28:47] ahh. right! [10:28:49] and the agents to get the full module sent t them only facts providers and types [10:28:58] yuo should be able to run it from your laptop though as well [10:29:29] but if you want need hiera you will also need to use [10:29:40] --hiera_config=/my/hiera/config.yaml [10:29:50] i see there's a utils/localrun script [10:30:03] which does some stuff [10:30:30] my hiera yaml looks like this https://phabricator.wikimedia.org/P11560 [10:30:37] i havn;t looked at theta sctip tbh [10:30:53] and also i still have in my log to right up a genral wiki page on this stuff :) [10:31:07] hehe, great [10:32:45] a quick look at the localrun script suggest you would at least need to link hireadata from the puppet repo to /etc/puppet/hieradata [10:32:57] * kormat nods [10:32:58] and the private repo to /etc/puppet/private/hieradata [10:33:14] the later is also true for my hiera file [10:33:34] jbond42: i don't think running puppet apply on my laptop is all that interesting in my case; the hostname won't match the node i care about [10:35:53] kormat: not sureif it still works but you use to be able to do FACTER_fqdn=roo.example.com puppet apply ... [10:36:15] ohgod [10:38:16] it looks like that might work [10:38:33] at least `FACTER_fqdn=blah.thing puppet apply -e 'notify{ "${fqdn}": }'` outputs `blah.thing` [10:41:52] yes i think that should work for any of the top level facts complex facts may be a bit harder [10:42:27] you can also drop name=value pairs into /etc/facter/facts.d/somfile.txt [10:46:25] the later only works if run with sudo. there is probablty an equivilant place under ~/.puppet but cant find it right now [10:47:56] :puppet: https://phabricator.wikimedia.org/P11561 [10:50:27] been a while since i looked at this, i know that instead i have set the codepath to /home/kormat/gerrit/operations/puppet.git/ [10:51:03] from there puppet will use a modulapath of $codepath/modules and manifest path of $codepath/manifests [10:51:20] let's see if setting that is "legal" [10:51:58] well, it's not complaining, so that's something [10:52:09] i think you can set basemodulepath in puppet.conf [10:52:45] jbond42: what i don't get is why `puppet config set` would set an illegal value [10:52:49] that's crazy [10:53:23] yes sure give it a year and such things will seem noral with pupet ;) [10:53:32] * kormat sobs [10:53:48] all i can say is its so much better then it use to be :) [10:54:05] stockholm syndrome is strong in the puppet world i see :) [10:54:12] yep :D [11:00:08] (config managment needs strong typing just like programming does) [11:02:01] liw: amen to that [11:04:59] kormat: test in prod meme mandatory here :D [11:29:43] volans: any idea if a reimage script or something could have accedently created this https://cas-puppetboard.wikimedia.org//node/puppet. looks like something submited an empt facts set with the fqdn puppet [11:36:06] jbond42: wierd, the reimage script just runs puppet in a host where it can ssh, so usually the hostname is properly set [11:36:12] but we can verify with the logs [11:36:19] if they match the datetime [11:36:53] jbond42: uptime 49 days in the facts [11:36:58] so I would exclude a reimage :D [11:37:05] this is the submissin on the puppetdb just trying to see if i can see it on puppetmaster [11:37:07] https://phabricator.wikimedia.org/P11564 [11:37:27] oh where did you get the facts from? [11:37:37] directly in puppetdb i couldn;t see them in pupetboard [11:37:55] jbond42: it's puppetmaster1001 according to the serialtag [11:38:03] jbond42: from https://puppetboard.wikimedia.org//node/puppet [11:38:08] the facts on the right [11:38:39] maybe some test of running puppet apply or similar? [11:38:56] ahh something elses which is not working correctly on the cas- domain :( [11:40:06] yes could be, ok i think its best to just leave it for the puppetdb to purge it and ignore this hostname from netbox reports just wanted to double check [11:40:14] sorry I cheated but at lunch so cas- was not doable :D [11:40:49] :) thanks for the help [11:41:02] np [11:49:48] jbond42: unrelated, we have a bunch of logs in the puppetmaster saying [11:50:00] Error encoding a 'replace catalog' command for host '$HOSTNAME' ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB, see debug logging for more info [11:50:03] for various $HOSTNAME [11:58:36] volans: this looks like when the catalouge has binary data and it fails to convert it to json. it normally has a message straight after saying falling back to pson [12:00:08] yeah I wondered teh same, just not sure if we're missing anything in puppetdb or is able to anyway send the data to it [12:00:22] ack ill look more indepth [12:02:08] very low-prio [12:02:10] thanks! [12:02:29] ack :) [12:43:18] I've changed the default datasource for 'host overview' to Thanos, meaning there's no need to select the per-site datasource anymore, let me know what you think! https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1 [12:57:39] looks good thanks godog [13:01:13] np jbond42 ! I'll followup on sre@ as well with more examples [13:02:24] great thank :) [13:53:34] godog: that's pretty cool <3 thanks! [14:00:14] conectivity issues on B3 mgmt network? [14:00:26] (codfw) [14:00:46] jynus: I checked that mgmt and it looks up, so maybe it was just a punctual thing [14:00:50] like very brief even [14:01:39] I remember being told some switches were old on some mgmt networks, but not sure if it was tehre [14:02:32] there was another alert on D7 mgmt, but not sure if related [14:02:33] D7: Testing: DO not merge - https://phabricator.wikimedia.org/D7 [14:02:57] jynus: yes, there was a blip, bad config change that was auto-rolled-back, should be fine now [14:03:03] ok [14:13:43] vgutierrez: yw, glad it helps! [22:20:10] I am aware folks are moving from varnish to ats, but I have a question about the varnish-be setup. How is the list of varnish backends exactly generated/adjusted when using confd/etcd? [22:27:25] we have a cluster of cache proxies running Varnish, but taking down backends using varnishadm backend.set_health commands is not persistent (and needs to be done through dsh/salt/cumin) and deploying puppet changes is a rather tedious process - we'd like to automate backend changes (not only in Varnish, but more services) as much as possible and make them persistent. Wikitech's Confd page is a good start, but I'm looking for a more [22:27:25] detailed of the puppet-confd integration and production templates for varnish (and possibly more?) (: [22:30:14] isn't everything basically in the puppet repo? :P [22:31:49] SPF|Cloud: as of a while ago (last November?), varnish-be is no more, we use ATS in its place instead [22:31:59] I'm doing a quick look to see what now-deleted files I can point you to [22:32:37] Reedy: I assume it was, but with such a large setup, it can be hard to find what you're looking for :) [22:34:53] SPF|Cloud: https://gerrit.wikimedia.org/r/c/operations/puppet/+/217818 is the creation of the varnish/confd integration as it used to exist long ago; it's the best single 'starting point' I've found [22:35:37] right, so only varnish-fe left until that's fully decommissioned? my sort of volunteer involvement in SRE tasks must have been a long time ago then [22:36:02] yeah, a ton of our routing and rewrite and ratelimiting logic is still implemented in VCL, so varnish-fe lives on for now [22:37:04] cdanis: which immediately reminded me of https://bash.toolforge.org/quip/AU7VTzhg6snAnmqnK_pc [22:37:36] but I have faith that ATS will kill off the VCL hacks eventually :) [22:37:50] got that, we started using varnish 4 in 2015 (free wiki farm, so was able to steal some config from your puppet repo), and backend TLS in ATS is one major reason to consider it [22:37:55] bd808: at my old employer there was a saying: "there's two ways to do anything! the deprecated way, and the way that's not ready yet" [22:38:21] different but similar from here :) [22:38:50] we usually have at least 2 deprecated ways ;) [22:39:07] with simple VPSs you never benefit from 'internal networks', so every traffic has to be encrypted in some way, but that's no fun with varnish backends [22:39:27] nothing is fun with varnish [22:39:29] SPF|Cloud: yeah we use both ATS and Envoy for internal TLS termination [22:39:35] and probably still use nginx for some of it [22:39:41] hmm, envoy? [22:39:49] but most things have moved over to Envoy instead of nginx internally [22:39:52] yeah, https://www.envoyproxy.io/ [22:40:26] ah, that sounds cool :) [22:40:51] I don't love their marketing copy, but it is good at what it does :) [22:41:55] where do you put such a thing in the stack? between ATS and the Apache daemon on the backend? [22:42:03] yeah exactly [22:42:36] each edge proxy has an "ATS sandwich": ATS-TLS on the edge terminating user connections, which talks to varnishfe, which talks to ATS-BE [22:42:48] then ATS-BEs talking to Envoys via LVS [22:43:06] careful cdanis, you will end up being an admin at miraheze if you keep being useful to SPF|Cloud :) [22:43:16] sst bd808 :) [22:43:19] ahah [22:43:29] I'm actually surprised you knew that... [22:43:51] those Envoys are actually talking to Mediawiki in two ways, IIRC: one, the obvious one, terminating the connections from ATS-BE, and the other way is some code paths in Mediawiki are using Envoy to also make outbound HTTPS API calls to other services [22:44:13] because having PHP do TLS calls at high volume is, uh, fraught [22:44:14] but we share(d) a lot of the same technical challenges, so I'm trying to acquire new ideas here [22:45:21] is that so problematic in php? architectural issue? [22:45:36] you don't get to reuse connections, so you're doing a full TLS negotiation every time [22:45:50] its just that tls is expensive and at scale that adds up [22:46:11] as more things are split from core into remote services it becomes a bigger problem [22:46:15] and if you're talking to something faraway the rtts add up too [22:46:25] yes, that's true there [22:47:22] how about connections to databases (we do one-way TLS in MariaDB)? same issue? [22:47:28] somebody should actually do a blog post for techblog about the magic graph dive that getting envoy everywhere enabled [22:47:35] bd808: seriously, those graphs were amazing [22:47:48] cpu and network traffic and latency [22:49:07] I'm actually not sure if we do TLS talking to MariaDB, I think maybe we don't? AIUI the Mediawiki<->MariaDB traffic all stays within one datacenter so there's not the same concern as with edge PoPs [22:49:51] that's correct, unfortunately we don't have (and never had) that advantage [22:49:55] I think it has been talked about but not done, and for the TLS handshake costs if I remember [22:50:25] looks like tls for replication, but not for "all clients" [22:50:25] https://phabricator.wikimedia.org/T157702 [22:50:51] I am not sure what the impact looks like in MariaDB, only that Redis is a pain with regards to support [22:51:37] we don't do persistent connections from php to maria, so it would be pretty costly [22:52:06] and if we turned on persistent connections that has its own issues [22:52:24] php's shared nothing is a blessing and a curse [22:53:01] heh I didn't realize PHP supported persistent db connections [22:53:10] "sort of" [22:53:22] yeah, it keeps a freelist of idle ones and reuses when it can? [22:53:32] they get pooled a the runtime container, so apache or fcgi [22:53:37] ofc [22:54:19] so yeah, its a pool checkout to use on basically. Its a built-in feature of the connector code on the PHP side [22:55:41] pool leaks happen, checkout contention, and every once in a while an upstream bug that hands out a connection to multiple clients. That last one may is mostly "fixed" upstream. I haven't used php with pooled connections on for ~10 years [22:56:49] and in our deploy there would likely be issues with r/o slice routing if the pool duration was set too high [22:58:21] slice? section? I never remember which "s" word is preferred since what we do is certainly not sharding [23:01:23] did I derail everything SPF|Cloud or did you get the next set of breadcrumbs you needed? [23:02:04] oh, I definitely got what I needed :) and even learned some new things about envoy and persistent connections in php [23:03:30] at $DAYJOB-1 we used php + oracle (*shudder*) and I spent a lot of hours with gdb and the oracle driver layer tracking down edge case issues [23:03:38] bd808: you gave me a bunch of fun thoughts involving nasty kludges with passing around filedescriptors over unix sockets combined with the new in-kernel TLS support (which can transparently encrypt FDs) [23:04:07] would be a fun way to add "native" persistent TLS connections to PHP on Linux ;) [23:05:53] cdanis: do NOT tell _j.oe_ I led you down that path :)