[03:44:53] 10Analytics, 10Core Platform Team Backlog, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Performance: > 2% of API wall time spent generating UUIDs - https://phabricator.wikimedia.org/T222966 (10Pchelolo) We have been using UUID v1 in #eventbus events for a while now with no problem,... [09:09:23] PROBLEM - Kafka Consumer lag of the EventLogging processors on icinga1001 is CRITICAL: 7.26e+05 ge 5e+05 https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1&prometheus=ops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=eventlogging_processor_client_side_00 [12:14:30] !log restart eventlogging on eventlog1002 [12:14:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:14:43] got also a py-bt from gdb, and added to T222941 [12:14:44] T222941: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 [12:21:04] going afk now (on the move), will check later! [12:21:18] (the lag seems decreasing but it will take a bit) [12:49:24] (03CR) 10Framawiki: [C: 03+2] templates: add target="_blank" to external links [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509598 (owner: 10Framawiki) [12:49:48] (03Merged) 10jenkins-bot: templates: add target="_blank" to external links [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509598 (owner: 10Framawiki) [13:10:37] (03PS2) 10Framawiki: Create custom 50x error pages [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) [13:10:42] (03CR) 10Framawiki: "Thanks." (033 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [13:10:49] RECOVERY - Kafka Consumer lag of the EventLogging processors on icinga1001 is OK: (C)5e+05 ge (W)2.5e+05 ge 0 https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1&prometheus=ops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=eventlogging_processor_client_side_00 [13:37:00] (03PS1) 10Framawiki: app: handle bad qrun_id in run_status() [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509632 (https://phabricator.wikimedia.org/T223013) [13:37:20] 10Quarry, 10Patch-For-Review: 'NoneType' object has no attribute 'status_message' when qrun doesn't exist - https://phabricator.wikimedia.org/T223013 (10Framawiki) a:03Framawiki [15:30:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) @Ottomata sorry but 1.4.3 was not the right version to rollback to :( ` Start-Date: 2019-04-30 13:24:25 Com... [15:33:57] !log rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 [15:33:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:39:49] ok I don't see anymore disconnections, I hope that 1.4.1 will get back to full stability [15:39:52] checking later on! [16:24:25] (03CR) 10Zhuyifei1999: [C: 03+1] app: handle bad qrun_id in run_status() [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509632 (https://phabricator.wikimedia.org/T223013) (owner: 10Framawiki) [16:25:40] (03CR) 10Zhuyifei1999: [C: 03+1] Create custom 50x error pages [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [17:52:17] (03CR) 10Framawiki: [C: 03+2] app: handle bad qrun_id in run_status() [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509632 (https://phabricator.wikimedia.org/T223013) (owner: 10Framawiki) [17:52:28] (03CR) 10Framawiki: [C: 03+2] Create custom 50x error pages [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [17:52:40] (03Merged) 10jenkins-bot: app: handle bad qrun_id in run_status() [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509632 (https://phabricator.wikimedia.org/T223013) (owner: 10Framawiki) [17:52:49] (03Merged) 10jenkins-bot: Create custom 50x error pages [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509606 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [18:01:53] 10Quarry, 10Patch-For-Review: 'NoneType' object has no attribute 'status_message' when qrun doesn't exist - https://phabricator.wikimedia.org/T223013 (10Framawiki) 05Open→03Resolved [18:59:10] (03PS1) 10Framawiki: user: add query runs count [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/509638 (https://phabricator.wikimedia.org/T223009) [19:02:38] 10Quarry, 10Patch-For-Review: Add query runs count on user page - https://phabricator.wikimedia.org/T223009 (10Framawiki) a:03Framawiki