[09:32:38] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES: Dashboard or pane for ORES failed jobs on beta - https://phabricator.wikimedia.org/T142119#2530286 (10Ladsgroup) I made this: https://grafana-labs-admin.wikimedia.org/dashboard/db/ores-extension-beta-cluster It seem... [09:34:19] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES: Dashboard or pane for ORES failed jobs on beta - https://phabricator.wikimedia.org/T142119#2530302 (10Ladsgroup) (The exact same query doesn't work in grafana.wikimedia..org even though I selected labs-graphite as so... [09:44:20] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES: Dashboard or pane for ORES service in beta - https://phabricator.wikimedia.org/T142294#2530311 (10Ladsgroup) [12:32:32] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES: Dashboard or pane for ORES failed jobs on beta - https://phabricator.wikimedia.org/T142119#2530399 (10Ladsgroup) Okay, ORES beta is up and working. This puppet change (which got cherry-picked) solved this issue. http... [12:49:10] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10ORES, 05WMF-deploy-2016-07-26_(1.28.0-wmf.12), and 2 others: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368#2530405 (10Ladsgroup) Steps to fix this: - Increase number of workers. We increased to 48 (fr... [13:19:06] 06Revision-Scoring-As-A-Service, 10ORES, 06Operations, 07Puppet: Clean up puppet & configs for ORES - https://phabricator.wikimedia.org/T142002#2519474 (10Ladsgroup) [[https://wikitech.wikimedia.org/w/index.php?title=Hiera:Ores&diff=816527&oldid=816487 | This edit]] and the similar one for ores-staging wil... [13:56:04] (03PS2) 10Ladsgroup: Jobs fail instead of throwing error when score is not right [extensions/ORES] - 10https://gerrit.wikimedia.org/r/302703 (https://phabricator.wikimedia.org/T141978) [14:02:56] (03PS3) 10Ladsgroup: Jobs fail instead of throwing error when score is not right [extensions/ORES] - 10https://gerrit.wikimedia.org/r/302703 (https://phabricator.wikimedia.org/T141978) [14:13:58] (03CR) 10Ladsgroup: [C: 031] "Tested it in mw-revscoring.wmflabs.org. Works like charm" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/302703 (https://phabricator.wikimedia.org/T141978) (owner: 10Ladsgroup) [18:01:46] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES: Dashboard or pane for ORES failed jobs on beta - https://phabricator.wikimedia.org/T142119#2530528 (10Ladsgroup) [19:15:07] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3388 bytes in 6.094 second response time [19:15:27] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:58] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:36:23] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 384 bytes in 1.101 second response time [19:36:43] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 384 bytes in 0.632 second response time [19:41:03] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 369 bytes in 0.584 second response time [19:42:05] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES: Dashboard or pane for ORES failed jobs on beta - https://phabricator.wikimedia.org/T142119#2530681 (10Ladsgroup) Thanks to {T141891} getting solved, we have this in https://grafana.wikimedia.org/dashboard/db/ores-ext... [20:09:05] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 10ORES: Dashboard or pane for ORES service in beta - https://phabricator.wikimedia.org/T142294#2530687 (10Ladsgroup) https://grafana.wikimedia.org/dashboard/db/ores-beta-cluster You might notice that some of them are empty. It's because the set... [20:15:12] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:17:12] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 370 bytes in 0.653 second response time [21:54:33] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:56:23] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 384 bytes in 0.660 second response time