[00:53:14] a-team anyone around? quite urgent :) [00:54:08] chelsyx: you around? :) [00:54:18] yes [06:40:06] a-team: I had to restart eventlogging due to a huge kafka consumer lag [06:40:21] https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-12h&to=now&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=All [06:40:41] I think that the kafka python version that we have rolled back to might have some issues [06:49:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) {F29017194} There was a big lag accumulated during the EU nightime, after the EL restart it seems working fi... [07:26:46] ok just added an alarm for these use cases [07:26:57] it should alert in here if a huge lag is registered [07:27:02] (shouldn't be spammy) [17:47:27] 10Quarry, 10observability: Develop the monitoring of Quarry - https://phabricator.wikimedia.org/T205150 (10Framawiki) @Dzahn @bd808 your opinion would be interesting here. Thanks! [17:51:05] 10Quarry, 10observability: Develop the monitoring of Quarry - https://phabricator.wikimedia.org/T205150 (10Krenair) There's shinken but I don't know if we want to move anything new there going forward. I've been thinking of setting up icinga for deployment-prep. [17:51:55] 10Quarry: Add query runs count on user page - https://phabricator.wikimedia.org/T223009 (10Framawiki) p:05Triage→03Low [17:52:59] 10Quarry, 10Patch-For-Review: Create api health point for monitoring - https://phabricator.wikimedia.org/T205151 (10Framawiki) https://quarry.wmflabs.org/.health/summary/v1/60 is happily working, I think we can close this task. Thanks @Rafidaslam :) [17:53:05] 10Quarry, 10observability: Develop the monitoring of Quarry - https://phabricator.wikimedia.org/T205150 (10Framawiki) [17:53:09] 10Quarry, 10Patch-For-Review: Create api health point for monitoring - https://phabricator.wikimedia.org/T205151 (10Framawiki) 05Open→03Resolved [18:16:16] 10Quarry, 10observability: Develop the monitoring of Quarry - https://phabricator.wikimedia.org/T205150 (10Dzahn) There is already Icinga 2 in cloud VPS used to monitor some things. Ask @Paladox [18:50:00] (03PS1) 10Framawiki: templates: add target="_blank" to external links [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509598 [19:30:28] 10Quarry: 'NoneType' object has no attribute 'status_message' when qrun doesn't exist - https://phabricator.wikimedia.org/T223013 (10Framawiki) p:05Triage→03Low [20:25:56] 10Quarry: Create custom 502 Bad Gateway error page - https://phabricator.wikimedia.org/T223018 (10Framawiki) p:05Triage→03Lowest [21:42:35] 10Quarry: Create a beta host - https://phabricator.wikimedia.org/T209119 (10Framawiki) 05Open→03Resolved a:03Framawiki Created manually https://quarry-dev.wmflabs.org/ without puppet. [21:43:47] 10Quarry: Create a script to import queries from an instance to another - https://phabricator.wikimedia.org/T223022 (10Framawiki) p:05Triage→03Lowest [22:52:42] (03PS1) 10Framawiki: Create custom 50x error pages [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) [23:02:58] (03CR) 10Zhuyifei1999: Create custom 50x error pages (033 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [23:03:39] 10Quarry: Create a beta host - https://phabricator.wikimedia.org/T209119 (10zhuyifei1999) Awesome thanks! [23:38:07] (03CR) 10Zhuyifei1999: [C: 03+1] templates: add target="_blank" to external links [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509598 (owner: 10Framawiki)