[05:31:02] PROBLEM - Check if active EventStreams endpoint is delivering messages. on icinga1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [05:45:59] (03CR) 10Phuedx: [C: 03+1] eventlogging: Purge prefupdate after 90 days [analytics/refinery] - 10https://gerrit.wikimedia.org/r/588106 (https://phabricator.wikimedia.org/T249894) (owner: 10Krinkle) [06:01:44] RECOVERY - Check if active EventStreams endpoint is delivering messages. on icinga1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [06:14:59] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10elukey) >>! In T249923#6049325, @lexnasser wrote: > @elukey I don't see the "Unknown error" message anymore. Nothing in the JS console either. Thanks a... [08:34:34] ottomata, elukey, stat1007 is clogged by a python job.. [08:39:32] (03CR) 10Phuedx: "Actually, I think Iab17401152a33c7cc57d7510f6cfa9de15c36796 is preferable." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/588106 (https://phabricator.wikimedia.org/T249894) (owner: 10Krinkle) [09:59:29] mforns: hey! what do you mean with "clogged" ? Were you unable to ssh or was it super slow? [10:00:36] I see that there are a lot of python jobs running, but plenty of memory and cpu usage is around 75/80% (that is due to the limits we have in place via systemd) [10:01:09] see https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=stat1007&var-datasource=eqiad%20prometheus%2Fops&var-cluster=analytics [13:57:44] elukey: I was able to ssh, but it was very slow. when I executed top, there were around 20 processes of the same program running with 100% CPU each. [15:43:59] mforns: yep yep it is a downside of the current rules, all processes for all users can use 90% of all cpu resources available. This means that when a host is full, all users logged in are seeing it :( [15:44:09] if too memory is used, then process is killed [15:44:12] but not for CPU [15:48:40] (note: overall the usage limit for CPU is 90% of all cpus, so you might see some of them pegged to 100%) [16:05:27] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10elukey) Ok I think I got to the bottom of this. I created a test user in superset, and created a ssh tunnel to bypass httpd (hence LDAP auth) manually se... [16:18:40] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10elukey) @dr0ptp4kt I added the extra sqllab role to your Superset user, can you check if the "Unknown error" is fixed whenever you have time? https://su... [16:35:32] (03Abandoned) 10Krinkle: eventlogging: Purge prefupdate after 90 days [analytics/refinery] - 10https://gerrit.wikimedia.org/r/588106 (https://phabricator.wikimedia.org/T249894) (owner: 10Krinkle) [21:29:30] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10lexnasser) Thanks again for all your feedback! I implemented all 3 tests in Java, and ran them each, individually, against all the __VALID__ pageview... [22:46:55] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10awight) >>! In T245468#6050611, @lexnasser wrote: >> (2) is really interesting, but I would only go with that option if you're willing to make the ca...