[10:01:14] chrisalbon: I actually do (sorry I missed your ping somehow) [10:01:36] We should reduce number of requests a uwsgi worker takes before restarting [10:01:44] it's currently 200 [10:03:13] I prepare a patch [10:29:51] https://gerrit.wikimedia.org/r/c/operations/puppet/+/638467 [10:51:12] 10MediaWiki-extensions-ORES, 10Machine Learning Platform: Ores gives 500 error - https://phabricator.wikimedia.org/T267118 (10Hawkeye7) [11:05:14] 10MediaWiki-extensions-ORES, 10Machine Learning Platform, 10User-Ladsgroup: Ores gives 500 error - https://phabricator.wikimedia.org/T267118 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup I restarted ores and it should be back to normal. The actual long term issue is {T263910} [14:56:01] Awesome thank you Amir! [19:40:45] Alright, I’ve jerry-rigged a solution for today so I can sleep in peace tonight. If ORES is critical (~1k busy workers) it’ll restart itself. I’ll be manually monitoring throughout the day. [19:45:05] I'm going to set the restart threshold at 500 workers, which based on past data is when clearly a tipping point has been reached and ORES needs restart.