[13:02:40] While editing a grafana dashboard, I see a menu called "Data Source" that includes things like "codfw prometheus/ops" and "eqiad prometheus/labs". Where is the mapping between sets of prometheus exports (or hosts?) and those entries? I want to add and/or find "codfw prometheus/cloud-dev" [13:04:53] andrewbogott: the dropdown is hosts running prometheus known to grafana. is there a cloud-dev prometheus? [13:06:06] cdanis: Ah, so it's not a mapping, it's literally a prometheus host and all the collectors that are pointed to that host? [13:06:40] (Is it far to use 'aggregator' in this context? Or is it just called 'prometheus'?) [13:06:41] yep! (but the other way around, Prometheus servers get a list of targets to scrape and poll them) [13:07:05] ok. So then I have a new question, which is… who is scraping the particular collector I'm interested in [13:07:43] which is, for starters, 'prometheus-pdns-rec-exporter' running on cloudservices2002-dev.codfw [13:08:07] and I'm guessing that one possible answer is 'no one is polling collector at all' [13:09:38] modules/profile/manifests/wmcs/prometheus.pp makes me think that the 'labs' prometheus might be scraping that [13:09:49] although I haven't dug through the class_name to see if that is true [13:10:43] hmm OTOH that looks like explicitly just eqiad stuff? [13:10:54] So you'd expect there to be a 'codfw prometheus/labs' option in grafana? (Because there isn't) [13:11:24] * andrewbogott looks to see where that profile is applied [13:11:25] the other prometheis generally don't do cross-cluster scrapes, yeah [13:12:17] ok! So it looks like the answer is that we have labmon hosts running in eqiad and no equivalent in codfw [13:12:27] so would need to make that before I get my metrics. [13:12:35] * andrewbogott has a sad [13:12:58] yeah, and then once the prometheus exists, there's a yaml file somewhere else that tells grafana "datasource named X is backed by prometheus hostname Y" [13:13:36] ok. Thank you! I will… try to decide what to do next [13:19:13] cdanis: would it be bad manners (and/or cause namespace collisions) for me to just use codfw prometheus/ops to watch these things? [13:20:45] how many metrics is it? [13:21:25] At the moment, all I want is this: https://grafana.wikimedia.org/d/000000240/labs-dns-dashboard?orgId=1 [13:21:31] but that's for 2 hosts and in codfw I only have one [13:21:38] so, like, six [13:22:09] But I don't want to intermingle those metrics with prod pdns-recursor things which I'm sure are already monitored in codfw [13:23:02] so, your dashboard as-is doesn't look at them, but prometheus always attaches an 'instance' label with the host:port that it scraped the metric from [13:23:29] we'd also have to check if there's any aggregation rules in prom's config that slurps up all pdns_rec metrics, but I suspect there aren't [13:24:02] but assuming the other pdns dashboards are written properly to just care about certain hostnames it would in theory be fine [13:24:31] ok, let's see if I can figure out how to add those to prometheus2003/prometheus2004 [13:26:20] the other somewhat-complicated thing is that if you wanted to have metrics from both in your dashboard you'd have to make the datasource a conditional [13:27:36] a separate dashboard is fine [13:28:06] (I actually think in an ideal world it would be easy to not need a separate dashboard, and also dashboards would be code that lived in git, but) [13:29:24] that is a beautiful dream [13:44:53] cdanis: this is largely copypasta but does it look reasonable to you? https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/533210/1/modules/profile/manifests/prometheus/ops.pp [13:46:41] mostly yes -- you still need to add references to that to the prometheus::server resource, and also, do so in such a way where it only gets scraped in codfw if that's the plan [13:49:43] ok... [13:49:44] * andrewbogott digs more [13:50:16] should be enough to condition on $::site I think [14:00:55] cdanis: ok, I think have the 'add references' part. Given that that role isn't applied in eqiad I'm thinking it's harmless to leave out the $::site switch… does that sound right to you? [14:01:24] hmm, yeah seems right, PCC will tell us for sure