[17:12:21] PROBLEM - cache_text: Varnishkafka webrequest Delivery Errors per second -esams- on icinga1001 is CRITICAL: 1207 ge 5 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=esams+prometheus/ops&var-source=webrequest&var-cp_cluster=cache_text&var-instance=All [17:15:50] this is due to an ongoing outage, see #operations :( --^ [17:30:27] RECOVERY - cache_text: Varnishkafka webrequest Delivery Errors per second -esams- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=esams+prometheus/ops&var-source=webrequest&var-cp_cluster=cache_text&var-instance=All [17:32:31] !log restart varnishkafka on cp3056/cp3064 due to network issues on the hosts [17:32:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:58:02] !log re-run failed refine job for MobileWebUIActionsTracking 2020-01-26T12 [17:58:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:58:19] all right all alerts should be ok now [17:58:33] I'll be traelling today to SFO, ping me on the phone if needed