[00:23:24] Hi, we are seeing a lot of ORES failures: https://ores.wikimedia.org/v3/scores/enwiki/?models=damaging&revids=964118401&format=json [00:23:57] There seems to be an internal Redis failure. It started about 10 hours ago. [00:24:27] Only a small amount of our requests were failing, but after restarting our servers the failure rate increased to a 100%. I suspect only new connections are failing [00:29:33] Yeah I have been seeing it, I am still trying to figure it out [00:30:05] Ok good to know, thanks! [00:51:56] Digging through the error logs, looks like redis connection "Connection to Redis lost: Retry (11/20) in 1.00 second." [00:52:57] https://www.irccloud.com/pastebin/R1tn0ylq/ [01:05:09] hey, if any help on the ORES issue is needed from SRE/Traffic, feel free to ping me [01:09:09] (I'm guessing it's chrisalbon looking at it?) [08:34:37] 10ORES, 10Machine Learning Platform, 10Operations, 10serviceops: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10jijiki) [12:51:57] 10ORES, 10Machine Learning Platform, 10Operations, 10serviceops: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10CDanis) "busy web workers" graph, which correlates quite well with the slowdown: https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?viewPanel=13&orgId=1&f... [18:06:39] 10ORES, 10Machine Learning Platform, 10Operations, 10serviceops: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10calbon) every time it starts at 16:00 [18:10:35] 10ORES, 10Machine Learning Platform, 10Operations, 10serviceops: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10calbon) {F32364643} [19:41:57] 10ORES, 10Machine Learning Platform, 10Operations, 10serviceops: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10calbon) It started happening again, I went into each Ores200X box and manually `sudo service uwsgi-ores restart` to restart it.{F32364681}