[09:11:30] elukey, godog: puppet is broken on graphite1004, fails in profile::graphite::alerts, probably caused by the decommission of the Kafka/analytics cluster yesterday [09:14:42] ah lovely [09:14:43] checking [09:15:24] $kafka_config = kafka_config('analytics') [09:15:25] lol [09:15:28] moritzm: fixing it! [09:15:32] thx [09:20:58] done! [09:24:06] thanks elukey moritzm! [14:02:52] godog: when you are around I would like to request a quick opinion about prometheus configuration handling [14:03:31] jynus: sure, shoot [14:03:50] remember when you wanted to do a prometheus mysql exporter script? [14:04:11] and I said using mediawiki was not ideal [14:04:19] I figureed out a proper way [14:04:30] but want to request what is the best way to proceed [14:04:57] as in, how to run it from puppet/overwrite configuration logic [14:05:22] to avoid bad config states [14:07:05] I have https://gerrit.wikimedia.org/r/c/operations/puppet/+/519203 so far [14:07:35] but blindly overwritting it on every puppet run seems inefficient end error prone [14:09:06] I agree, not ideal [14:09:12] any suggestion? [14:09:38] generating a new one and comparing? reading on the script the old one? [14:11:07] I am guessing you had a plan, other thing is if you remember it :-) [14:12:36] heheh one might be to reuse the script as a basis but let puppet handle the file writing, IOW use generate() in puppet [14:12:48] oh, I see [14:12:52] so that should DTRT and not rewrite on every puppet run [14:13:27] does that need to run on puppetmaster? [14:14:35] I think yes- that is not a blocker, but that would mean a bit of contamination on puppet master (mysql connection, pythong libaries, etc.) [14:17:14] yeah it'll run on the puppetmaster, I think that's fine tho [14:19:58] I quickly read backscroll but it seems that doing that we'll have zarcillo as a dependency for puppet to compiler catalogs for the DBs, is that so? [14:21:15] no [14:21:20] well, maybe [14:21:31] depending on your definition of compiler catalog [14:21:37] *compile [14:21:41] I wanted to avoid that dependency [14:21:50] by being a local thing that just executed things locally [14:21:51] be able to have puppetmaster compile the catalog for any DB [14:22:11] an exec that does the content generation [14:22:17] but locally [14:22:34] I am not convinced of putting it on puppet master [14:22:55] and would like to avoid it [14:23:28] my 2 cents is that whatever it is the final mechanism, it should have as a requirement to keep working with stale data also if zarcillo is down [14:23:29] also a blind generation would not catch errors [14:23:33] that might require a local cache [14:23:43] yeah, even simpler [14:23:53] implement that as part of the logic of the script [14:24:04] and bail early on any strange state [14:24:16] (e.g. if 0 host returned, not touch the original file) [14:24:40] what would you think of that? [14:24:57] that way there is no hard dependency and it will not contamintate puppetmaster [14:26:09] I belive a similar question happened when etcd was wanted to generate alerms [14:26:11] *alarms [14:26:22] I don't have enough context on the current limitations and issues of the prometheus-mysqld-exporter config and why that can't be solved within puppet alone [14:26:27] the generate() is ran on prometheus hosts not dbs btw to generate config, so puppet catalogs for dbs are not affected I believe [14:26:30] volans: [14:26:32] between a system like this and an exported resource I dunno [14:26:55] godog: by the agent? [14:28:04] let me clarify, "on prometheus hosts" meaning catalogs with generate() are compiled for prometheus hosts [14:28:13] but no generate() always run on the puppet master afaik [14:28:14] volans, I don't want to maintain this by hand every single time: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/files/prometheus/mysql-core_codfw.yaml [14:28:20] ah ok, so still on puppetmasters but will break prometheus puppet, not db puppet [14:28:23] got it [14:28:26] when I already have a canonical place where that info is [14:29:00] but why it's by hand in the first place if I may ask? :) [14:29:12] because that information is not on puppet [14:29:23] it is dynamic state [14:29:27] but it's on the hosts [14:29:31] no [14:29:39] an host doesn'tknow which instances it has? [14:29:51] no to that detail [14:30:33] I am taking that away from puppet, not addint it back [14:30:57] your script does exactly add it back though :) [14:31:06] via the window :-P [14:31:11] sure [14:31:24] but I don't want puppet as the canonical place [14:31:50] godog: the labels cannot be defined by the exporter? the prometheus server must know them in advance before polling an exporter? [14:33:24] volans: the exporter can define labels as it wishes yeah, although the "service discovery" part is generally recommended to be kept in the prometheus config [14:33:25] I would be happy to skip puppet [14:33:41] and speak with prometheus directly [14:34:15] jynus: in that case if you have logic to update the target files only when changed I think the script in cron will be fine [14:35:11] just to be clear, that was in response to volans suggestion, not sure about that [14:36:13] basically I was wondering if we could have a standard exporter per db host and keep the logic of multi-instance and labels (shard, role) in the exporter side, and potentially expose all metrics from a single meta-exporter [14:36:31] but that's bubbling basically given my low-knowledge of prometheus constraints, so I'll shut up ;) [14:36:41] all metrics, like, all exporters? [14:37:06] all db exporters [14:37:11] if there are multi-instances [14:37:38] and with exporter, you mean pupet facts? [14:37:57] no I mean prometheus exporters [14:38:09] but dunno if it's doable [14:39:01] I think you want automatic discovery [14:39:19] which I don't disagree it would be nice, but it is out of my scope [14:39:54] semi-automatic yeah, all dbs export on port NNNN, but behind that there could be multiple things aggregated, for multiple mariadb instances [14:41:01] so my plan it slightly different- centralize that on a single service- which, while being a SPOF, it wouldn't be for critical stuff [14:41:20] e.g. for us metrics monitoring is important, but not Tier 1 important [14:41:48] but we need consolidation because we have mediawiki, haproxy, misc services [14:41:54] each with its own language [14:42:05] for pure inventory purposes [14:42:16] think netbox for mysql instances [14:42:17] sure, I get the issue, ofc it shouldn't be manual [14:42:36] and then all non critical services will query its cache [14:43:19] once it is dinamic, we can do checks like the ones you have on ntbox + puppet [14:43:58] the idea to move away from puppet is that puppet should confugure mysql, but shouldn't need to understand its state [14:44:22] for puppet, s1 and s2 are the same static config [14:44:38] state control should be somewhere, that is my take [14:44:55] *somewhere else [14:45:09] that may give you more context of the why [14:46:13] so ok to try a local-only setup? [14:47:46] I am worried to put tendril db as a hard dependency of puppet [14:48:14] as I don't have a better proposal right now, I'm just advocating to avoid a coupled dependency, if it works if zarcillo is down it's ok for me [14:49:01] it should, if done properly [14:49:36] e.g. by following the etcd model- if things are down, no changes happen, but nothing is down [14:50:09] *AND we we have probably better backups of tendril than any othert service :-D [14:52:04] will ask for a review when I have a concrete proposal [14:52:21] concrete == code for review [14:54:09] sounds good to me [19:52:30] gehel (if still working)… I'm on the verge of using base::expose_puppet_certs but my use case (libvirt) wants to point to ca.pem. Has that come up in any of your uses? [19:52:40] I could just add a switch to expose it but I'm surprised it's not already there :) [19:55:50] andrewbogott: my last use of that was ages ago. I don't think I'm going to be of much help [19:57:04] gehel: ok, no worries! I asked because you touched it first and also last [19:57:22] he, he, he... I think we have better ways to expose certs nowaday [19:57:43] e.g.? [19:58:02] (in this case I can't use acme because it's for .wmnet things) [19:59:54] Here's an unrelated question for anyone: I need to generate an array of all fqdns of hosts applying a given puppet class. I'm pretty sure that's a thing we already do a fair bit; can someone point me to a good/modern example? [20:00:53] andrewbogott: offhand, that sounds like a cumin / spicerack thing ... what's the larger context? [20:01:09] cdanis: it's going in a config file [20:01:26] ohh so you need to do this from puppet itself? [20:01:29] I want each cloudvirt to talk to every other cloudvirt [20:01:41] Oh, yes! Sorry, I need the list in puppet [20:01:46] so that puppet can dump it into an acl [20:02:19] query_nodes() [20:02:31] example: [20:02:31] $hosts = unique(concat(query_nodes('Class[Role::Debmonitor::Server]'), [$::fqdn])) [20:02:36] sweet! [20:02:38] Thank you :) [20:02:54] the concat is to make sure the host where you're compiling is part of the set from the first puppet run [20:02:59] before it's added to puppetdb [20:03:02] this is for prod ofc [20:03:08] prod is good [20:03:19] I assume it won't work on cloud because no exported resources/no puppetdb [20:05:59] yep [20:06:06] that's why I was specifying pro [20:06:16] didn't read the whole backlog so dunno where you need it [20:08:28] I'm trying it now — I think that snippet is exactly what I need though