[04:15:53] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:19:27] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:24:14] ARG [04:24:16] WHY [04:24:18] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.628 second response time [04:24:58] Hmm... looks like I can't even get the homepage to load [04:25:27] https://ores.wmflabs.org/node/ores-web-05/ [04:25:29] Is up [04:25:34] but https://ores.wmflabs.org/node/ores-web-03/ is down [04:25:44] I can ssh to ores-web-03 [04:26:07] On ores-web-03, there's one big python process. [04:26:28] Top line: 3338 www-data 20 0 4020768 3.123g 5600 S 7.3 80.8 1296:19 python [04:26:32] 80% of memory! [04:26:49] It hovers around 4-8% cpu [04:28:05] o/ Amir1 [04:28:15] halfak: hey [04:28:26] it's morning here, why are you awake? :D [04:28:34] Been looking into the icinga notification. [04:28:39] Will get you a paste of my notes shortly. [04:28:50] TL;DR: ores-web-03 got into a weird state [04:29:49] Service restart did nothing. Executed without error. [04:30:11] okay We should look into that why our instances suddenly gets crazy [04:30:24] This one is really weird. [04:30:38] uwsgi seems to have died and been replaced with a python process. [04:30:43] Usually the "command" is uwsgi [04:30:57] https://ores.wmflabs.org/node/ores-web-03/ is back online [04:31:11] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.860 second response time [04:31:36] And we're back! [04:31:38] You're faster than icinga [04:32:04] http://pastebin.ca/3714973 [04:32:08] Amir1, ^ FYI [04:32:15] looking into it [04:33:06] halfak: have you tried restarting it? [04:33:50] I didn't restart the machine. [04:34:10] restarting the deamon [04:34:11] Executing "sudo service uwsgi-ores restart executed without error and did nothing. [04:34:21] okay [04:34:41] Per my email, it seems we've got worker down too [04:35:03] Yeah. Eventually, I got it by "sudo kill " for the single big python process and then "sudo service uwsgi-ores restart" [04:35:15] Yikes [04:36:01] Workers look OK to me [04:36:02] https://grafana-labs-admin.wikimedia.org/dashboard/db/ores-labs [04:36:14] Rather https://grafana-labs.wikimedia.org/dashboard/db/ores-labs [04:36:16] Maybe we have an uninvited guest [04:37:09] (my bad, icigna told me that ores worker is down but it was actually the web not letting the worker test to be done) [04:37:37] Oh yeah. We should rename that test :) [04:37:51] OK. I'm going to bed now. [04:37:51] 06Revision-Scoring-As-A-Service, 10Data-release, 06Research-and-Data, 10rsaas-articlequality , 03Research-and-Data-2017-Q1: Publish article quality score dataset - https://phabricator.wikimedia.org/T145332#2627364 (10Ladsgroup) Oh, Thanks :) [04:37:58] I filed a task to look into it. [04:38:06] 06Revision-Scoring-As-A-Service: Investigate show period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145352#2627365 (10Halfak) [04:38:08] 06Revision-Scoring-As-A-Service: Investigate show period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145353#2627378 (10Halfak) [04:38:18] 06Revision-Scoring-As-A-Service: Investigate short period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145353#2627391 (10Halfak) [04:38:33] Awesome, incident report? [04:38:38] I guess so [04:40:42] Yeah. Probably. It's worth looking into. [04:40:43] o/ [04:41:50] o/ halfak [04:47:20] 06Revision-Scoring-As-A-Service: Investigate show period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145352#2627404 (10Ladsgroup) 05Open>03Invalid [04:47:44] 06Revision-Scoring-As-A-Service: Investigate show period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145352#2627365 (10Ladsgroup) See {T145353} [04:48:05] 06Revision-Scoring-As-A-Service, 10ORES: Investigate short period of ores-web-03 insanity - https://phabricator.wikimedia.org/T145353#2627378 (10Ladsgroup) [05:30:25] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Review schema changes for T143962 - https://phabricator.wikimedia.org/T145356#2627448 (10Ladsgroup) [07:08:20] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Review schema changes for T143962 - https://phabricator.wikimedia.org/T145356#2627576 (10jcrespo) Amir, I am not sure what you want #DBAs to do here, as I do not have a patch to apply or review. In any case, this doesn't seem like a #blocke... [07:12:34] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Review schema changes for T143962 - https://phabricator.wikimedia.org/T145356#2627581 (10Ladsgroup) I'm trying to follow https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change. This is the article 3: "Once the soluti... [08:28:27] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Review schema changes for T143962 - https://phabricator.wikimedia.org/T145356#2627682 (10jcrespo) Ladsgroup, this is reverted https://gerrit.wikimedia.org/r/#/c/309818/ . Even if you did that, it is not "merged to HEAD", as far as I can see... [08:29:19] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Check ORES data corruption on Beta do not affect production - https://phabricator.wikimedia.org/T145356#2627683 (10jcrespo) [08:41:30] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Check ORES data corruption on Beta do not affect production - https://phabricator.wikimedia.org/T145356#2627701 (10Ladsgroup) And that revert got reverted later on: https://gerrit.wikimedia.org/r/#/c/309837/ So It' in HEAD now. We should w... [08:47:55] 06Revision-Scoring-As-A-Service, 10DBA, 10MediaWiki-extensions-ORES: Check ORES data corruption on Beta do not affect production - https://phabricator.wikimedia.org/T145356#2627707 (10jcrespo) a:03jcrespo @Ladsgroup, allow me to do that for you properly, so we do not affect the performance of the other que... [13:04:07] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Drop unique part from oresm_model index [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309825 (https://phabricator.wikimedia.org/T144432) (owner: 10Ladsgroup) [13:09:05] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Move storeScores stuff into another method (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/309824 (owner: 10Ladsgroup) [14:01:11] 06Revision-Scoring-As-A-Service: Enable ORES at es.wikibooks - https://phabricator.wikimedia.org/T145394#2628616 (10MarcoAurelio) [14:03:07] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2628634 (10Halfak) [14:03:33] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628638 (10Halfak) [14:06:31] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628659 (10Halfak) [14:06:33] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2628658 (10Halfak) [14:08:50] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628661 (10Halfak) [15:06:43] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628862 (10Halfak) [15:29:00] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628977 (10Halfak) The campaign is live, but you can't reach it from https://es.wikibooks.org/wiki/Wikilibros:Etiquetando because of some CORS issues. [15:30:06] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 10rsaas-editquality: Fix CORS for wikibooks - https://phabricator.wikimedia.org/T145406#2628980 (10Halfak) [15:30:20] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality: Fix CORS for wikibooks - https://phabricator.wikimedia.org/T145406#2628980 (10Halfak) a:03Halfak [15:39:32] Amir1, if you can take a look at https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/27 [15:39:40] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality: Fix CORS for wikibooks - https://phabricator.wikimedia.org/T145406#2629032 (10Halfak) https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/27 [15:40:07] halfak: {{merged}} [15:40:08] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2629036 (10Halfak) [15:40:14] Awesome [16:00:29] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2629117 (10Halfak) Just finished a deploy and it all now works. [16:02:57] 06Revision-Scoring-As-A-Service, 10Wikilabels, 10rsaas-editquality: Edit quality campaign for eswikibooks - https://phabricator.wikimedia.org/T145395#2628638 (10Halfak) a:03Halfak [16:03:29] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2629125 (10Halfak) a:03Halfak [16:03:37] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Deploy ORES review tool in es.wikibooks - https://phabricator.wikimedia.org/T145394#2628616 (10Halfak) a:05Halfak>03None [16:04:56] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2629130 (10Halfak) [16:05:40] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2629130 (10Halfak) @MarcoAurelio, can you announce the labeling campaign to eswikibooks so that we can get people working on it? [16:42:37] 10Revision-Scoring-As-A-Service-Backlog, 10Data-release, 06Research-and-Data, 10rsaas-articlequality , 03Research-and-Data-2017-Q1: Publish article quality score dataset - https://phabricator.wikimedia.org/T145332#2629409 (10Halfak) [16:42:50] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#2629410 (10Halfak) [16:55:48] halfak: checking our done column? [16:55:48] :D [16:56:09] we were way over time, but I'm okay. [16:56:54] I've confirmed that there's no more cards for me to pull over right now. [16:57:14] okay [16:59:27] I need to go [16:59:29] o/ [17:54:09] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test reverted model for Spanish Wikibooks - https://phabricator.wikimedia.org/T145428#2629749 (10Halfak) [17:55:07] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test reverted model for Spanish Wikibooks - https://phabricator.wikimedia.org/T145428#2629768 (10Halfak) https://github.com/wiki-ai/editquality/pull/48 [17:58:36] 06Revision-Scoring-As-A-Service, 10Wikilabels: Fix CORS for wikibooks - https://phabricator.wikimedia.org/T145406#2629773 (10Halfak) [21:07:34] Amir1: What's involved with making goodfaith scores available in Extension:ORES ? [21:09:21] RoanKattouw: Two parts. 1- The db population [21:09:27] 2- the interface [21:09:31] (I'm talking about https://phabricator.wikimedia.org/T137966 ) [21:09:36] the former is super easy [21:09:47] yeah [21:10:01] I'm trying to discover why the DB isn't being populated already [21:10:07] the latter requires some design (a mock would be fine for me) [21:10:16] RoanKattouw: it's in the settings [21:10:23] let me find it [21:10:26] Oh in prod? [21:10:34] yup [21:11:05] $wgOresModelClasses['goodfaith'] is set to false [21:11:24] Changing it only requires a SWAT window :D [21:11:33] and a patch in wmf-config [21:12:18] aha [21:13:10] OK, so if we do that then we could at least do filtering by goodfaith score, even if we don't have a UI for indicating whether something is goodfaith [21:13:52] that needs to be built too, the damaing model is hard-coded there [21:14:02] but not a big deal I guess [21:15:05] https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/includes/Hooks.php#L84 [21:21:22] Yeah I noticed that hidenondamaging was special-cased [21:21:30] s/special-cased/hard-coded/ [21:21:47] But yeah for now I can just hard-code goodfaith too; we're gonna do a UI overhaul there soon anyway [21:22:14] OK this sounds pretty straightforward then [21:22:21] Thanks Amir1 [21:22:34] unless there are pitfalls / weird things I should know about? [21:22:54] RoanKattouw: Not right now [21:23:04] I will let you know if I saw something [21:23:09] Cool, then I'll take a stab at it today or tomorrow [21:23:12] please add me as reviewer to the patches [21:23:16] Yes, will do [21:23:33] thanks [21:25:59] 10Revision-Scoring-As-A-Service-Backlog, 10Edit-Review-Improvements, 10MediaWiki-extensions-ORES, 03Collab-Team-Q1-July-Sep-2016: Include goodfaith model information in ORES review tool - https://phabricator.wikimedia.org/T137966#2630830 (10Catrope) a:03Catrope [23:45:46] 06Revision-Scoring-As-A-Service, 10MediaWiki-API, 10MediaWiki-extensions-ORES, 07Epic: [Epic] Implement ORES service proxy in api.php - https://phabricator.wikimedia.org/T143895#2631380 (10Tgr) Getting back to the topic of backfilling, did you investigate the possibility of populating `ores_classification`...