[10:34:40] PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:33] RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 706 bytes in 5.600 second response time [12:03:22] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES: Introduce ORES rvprop - https://phabricator.wikimedia.org/T143614#2665385 (10Ladsgroup) >>! In T143614#2664820, @Anomie wrote: > Although I note FetchScoreJob only does one revision at a time. We can work on it and make it accept more than... [13:09:44] PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:11:30] What is going on here? [13:12:09] RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 707 bytes in 0.570 second response time [13:13:28] https://grafana.wikimedia.org/dashboard/db/ores [13:13:34] Time out errors are going up [13:13:57] also same for response time [13:37:48] (03CR) 10Ladsgroup: [C: 032] Only make hidenondamaging available if damaging is enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/310475 (owner: 10Catrope) [13:38:38] (03Merged) 10jenkins-bot: Only make hidenondamaging available if damaging is enabled [extensions/ORES] - 10https://gerrit.wikimedia.org/r/310475 (owner: 10Catrope) [13:38:58] (03CR) 10Ladsgroup: [C: 04-1] "You deleted most of conditions there. Lots of them were vital to keep the system working. Can you explain why?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/311652 (owner: 10Catrope) [14:13:59] 06Revision-Scoring-As-A-Service, 10ORES, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665475 (10Ladsgroup) [15:06:02] 06Revision-Scoring-As-A-Service, 10ORES, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665544 (10Ladsgroup) [15:20:43] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:04] NO [15:22:07] What? [15:24:26] 06Revision-Scoring-As-A-Service, 10ORES: Investigate memory leak in precached - https://phabricator.wikimedia.org/T146500#2665556 (10Halfak) Happened again just a moment ago. It looks like the same story with precached. [15:25:24] OK. Looks like 03 is back online. [15:25:34] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.628 second response time [15:25:45] And that we have a memory leak in the precached utility [15:25:50] Why didn't this show up before? [15:27:23] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665561 (10Ladsgroup) I'm guessing that's the reason behind the flapping in labs too. I get that fixed asap [15:29:06] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665563 (10Ladsgroup) Some stuff: {P4114} [15:35:56] halfak [15:36:00] o [15:36:02] \ [15:48:35] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665567 (10Ladsgroup) Also in labs: https://github.com/wiki-ai/ores-wmflabs-deploy/pull/69 [16:39:59] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review, 15User-Ladsgroup: celery log level is INFO causing disruption on ORES service - https://phabricator.wikimedia.org/T146581#2665606 (10Ladsgroup) Failure ratio in the last six hours: {F4523175} [20:18:57] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 15User-Ladsgroup: Embed machine readable ores scores as data on pages where ORES scores things - https://phabricator.wikimedia.org/T143611#2665760 (10Ladsgroup) Okay, I checked and it's super easy to add it but it will conflicts with Roan patches... [23:53:50] 06Revision-Scoring-As-A-Service, 10Edit-Review-Improvements, 10rsaas-editquality, 03Collab-Team-Q1-July-Sep-2016: Research how to present ORES scores to users in a way that is understandable and meets their reviewing goals - https://phabricator.wikimedia.org/T146333#2665906 (10jmatazzoni) **Use design to e...