[09:17:00] Hello! Is it possible for a Prometheus alert to notify WMCS and netops ? For example by passing a list to https://gerrit.wikimedia.org/r/c/operations/alerts/+/1126030/8/team-netops/bgp.yaml#32 ? [09:23:26] XioNoX: not a list per se, though for example collab did set up a new team in modules/alertmanager/templates/alertmanager.yml.erb (team: 'collaboration-services-releng') which could work depending on the case, an alternative is to keep one team and then route alerts to a team1+team2 receiver in alertmanager [09:24:03] or actually a receiver: for team2 + continue [09:24:37] I tend to prefer solution #2 [09:33:33] godog: noted, thx, is there doc on how to do it? [09:39:22] XioNoX: not yet no, something similar to this though https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/alertmanager/templates/alertmanager.yml.erb#216 [09:39:54] except that you'd be matching on sth else, the general structure/idea is the same [09:41:04] cool, thx! [09:44:14] XioNoX: np, FWIW for cases like these and similar what we could do is have sth like a 'scope' label in the alert, e.g. scope: cloud [09:45:00] godog: I was also wondering about a scope "network" [09:45:28] but then I worry it becomes too complex with some stuff being borderline between different scopes [09:45:36] and the possible confusing between team netops and scope network [09:46:34] yeah fair enough [10:13:54] godog: I might need your help for the alert that we just received : "problem = prometheus "ops" at http://127.0.0.1:9900/ops has "gnmi_bgp_neighbor_session_state" metric with "instance" label but there are no series matching {instance=~"cloudsw.*"} in the last 1w" [10:20:48] XioNoX: ah yes, the problem is that on pops there's no cloudsw, the easiest solution is to split the cloud alerts in a separate file then use # deploy-site: eqiad, codfw in addition to # deploy-tag: ops [10:29:43] godog: https://gerrit.wikimedia.org/r/c/operations/alerts/+/1126944 [10:31:37] XioNoX: yes exactly, LGTM [10:50:40] FIRING: [2x] LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:51:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [14:50:40] FIRING: [2x] LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag