[07:23:24] the victorops config seems broken, per the sheet I'm on call, but the IRC bot still shows Kamila and I've also not gotten the Splunk notification about the shift start either) [07:26:26] moritzm: I'll take a look [07:27:05] cheers [13:12:02] <_joe_> moritzm: I think your override began at 9:30 [13:12:14] <_joe_> I wanted to share with you all this moment of zen [13:12:16] <_joe_> https://phabricator.wikimedia.org/T374887 [13:12:36] <_joe_> maaaybe we should stop querying graphite from that extension [13:12:49] <_joe_> I'm not finding the courage to go look at the code [13:17:44] Cowards die many times before their deaths; [13:17:44] The valiant never taste of death but once. [13:17:44] Of all the wonders that I yet have heard, [13:17:44] It seems to me most strange that men should fear, [13:17:44] Seeing that death, a necessary end, [13:17:47] Will come when it will come. [13:34:08] <_joe_> lol [13:34:31] <_joe_> so this extension queries graphite's /render endpoint go fetch stats [13:34:47] <_joe_> https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/ExtensionDistributor/+/refs/heads/master/includes/Stats/ExtDistGraphiteStats.php#40 [13:56:32] In addition, the query it runs consistently times out. [13:59:37] <_joe_> cwhite: I've prepared a couple patches that "fix" the firewalling issue, but at this point I'd direct them to use prom instead? [14:04:00] I can't find extdist metrics in prometheus yet - they're probably not yet migrated. [14:18:40] probably-stupid question: where do the "Origin cluster" values come from on the Envoy Telemetry dashboard? ( https://grafana.wikimedia.org/goto/kP-z5bgNg?orgId=1 ) I think the variable is coming from envoy_cluster_upstream_rq , but I'm not sure where that is ultimately coming from. I'm asking because my apus nodes (moss-fe*) are running envoy but aren't usefully appearing as a cluster (though the individual nodes are there if I select [14:18:40] "All" from Origin cluster)... [14:23:57] I presume this means I need to set Something Somewhere (TM), but I've no idea where to start, and it has always Just Worked for the Swift clusters [14:27:00] <_joe_> Emperor: you just have tls termination there right? [14:27:16] Emperor: Origin Cluster = `label_values(envoy_cluster_upstream_rq, cluster)` [14:28:45] _joe_: indeed so [14:28:54] <_joe_> Emperor: it comes from 'cluster' in hiera [14:29:18] <_joe_> if you use the same cluster for all your nodes, the metrics there are aggregated [14:29:34] <_joe_> you can go with explore and add a regex on the instance name [14:34:22] So I probably in fact want to arrange to set the "cluster" hiera key for my nodes to something sensible (probably pulled out of the existing cephadm_clusters hiera structure) [14:36:02] probably in hieradata/regex.yaml where I'm already assigning nodes to clusters [14:41:54] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075027 look reasonable (and thus worth a +1) ? :)