[01:37:11] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:11] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 338 bytes in 9.751 second response time [01:49:11] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:02] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 339 bytes in 1.546 second response time [11:48:52] 06Revision-Scoring-As-A-Service, 10DBA, 06Operations, 07Blocked-on-schema-change, and 3 others: Remove oresc_rev index - https://phabricator.wikimedia.org/T140803#2501457 (10jcrespo) 05Open>03Resolved a:03jcrespo The schema change seems to have been succesful: ``` $ mysql -h s5-master wikidatawiki -e... [14:01:53] o/ [14:02:12] * halfak looks into issues with web-05 while he waits for others to join backlog grooming [14:04:25] Hmm... Available memory got low, but not terribly. [14:05:43] No real spikes in CPU until well after the minor downtime events took place. [14:06:47] Woops. I forgot to account for UTC offset [14:07:01] OK. Now it looks like there was a burst in requests. [14:07:47] Most of the requests were cache hits [14:08:36] Both -03 and -05 had roughly the same request rate [14:08:54] schana, backlog grooming? [14:08:58] Amir1? [14:09:06] be right htere [14:09:08] *there [14:12:05] 10Revision-Scoring-As-A-Service-Backlog: Investigate web-05 downtime - https://phabricator.wikimedia.org/T141523#2501811 (10Halfak) [14:13:31] 10Revision-Scoring-As-A-Service-Backlog: Investigate web-05 downtime - https://phabricator.wikimedia.org/T141523#2501811 (10Halfak) p:05Triage>03High [14:13:41] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Investigate web-05 downtime - https://phabricator.wikimedia.org/T141523#2501811 (10Halfak) [14:15:58] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Greek language utilities - https://phabricator.wikimedia.org/T122727#2501830 (10Halfak) p:05Triage>03Lowest [14:20:29] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Revision not found - Wikilabels - https://phabricator.wikimedia.org/T139587#2436811 (10Halfak) This is expected behavior. It means that the revision to be labeled has been deleted (or is somehow otherwise not available). However, the `$1` should include... [14:21:25] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Revision not found error unformatted and not localized - https://phabricator.wikimedia.org/T139587#2501837 (10Halfak) [14:22:04] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Revision not found error unformatted and not localized - https://phabricator.wikimedia.org/T139587#2436811 (10Halfak) p:05Triage>03Normal [14:23:03] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10revscoring, 07Documentation: Add MacOS instructions for installation to README - https://phabricator.wikimedia.org/T139355#2501855 (10Halfak) p:05Triage>03Normal [14:23:30] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Make static asset wheels for Wikilabels - https://phabricator.wikimedia.org/T139959#2501857 (10Halfak) p:05Triage>03Low [14:25:06] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-API, 10MediaWiki-Database, 07Wikimedia-log-errors: mwapi.errors.APIError: internal_api_error_DBQueryError - https://phabricator.wikimedia.org/T121333#2501865 (10Halfak) 05Open>03Invalid [14:25:50] 06Revision-Scoring-As-A-Service, 10MediaWiki-API, 10MediaWiki-Database, 07Wikimedia-log-errors: mwapi.errors.APIError: internal_api_error_DBQueryError - https://phabricator.wikimedia.org/T121333#1876009 (10Halfak) 05Invalid>03Open [14:27:11] 06Revision-Scoring-As-A-Service, 10MediaWiki-API, 10MediaWiki-Database, 07Wikimedia-log-errors: mwapi.errors.APIError: internal_api_error_DBQueryError - https://phabricator.wikimedia.org/T121333#2501871 (10Halfak) I just moved this out of the #revision-scoring-as-a-service backlog because it's not really a... [14:28:42] 10Revision-Scoring-As-A-Service-Backlog, 10PAWS: Install revscoring inside PAWS - https://phabricator.wikimedia.org/T120317#2501872 (10Halfak) We don't need to do this because we can use magic commands to install things through pip inside of a PAWS instance. [14:28:49] 10Revision-Scoring-As-A-Service-Backlog, 10PAWS: Install revscoring inside PAWS - https://phabricator.wikimedia.org/T120317#2501873 (10Halfak) 05Open>03declined [14:29:18] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikilabels, 07Documentation: Document maintenance tasks (restart something, deploy new versions, revert, etc...) - https://phabricator.wikimedia.org/T106271#2501875 (10Halfak) [14:30:01] 06Revision-Scoring-As-A-Service, 10Wikilabels: Add wiki labels detail to deployment docs on wikitech - https://phabricator.wikimedia.org/T131768#2501878 (10Halfak) [14:30:03] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikilabels, 07Documentation: Document maintenance tasks (restart something, deploy new versions, revert, etc...) - https://phabricator.wikimedia.org/T106271#1463432 (10Halfak) [14:30:55] 06Revision-Scoring-As-A-Service, 10ORES: Document ORES scap3 deploy process on wikitech - https://phabricator.wikimedia.org/T137570#2501883 (10Halfak) [14:30:57] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikilabels, 07Documentation: Document maintenance tasks (restart something, deploy new versions, revert, etc...) - https://phabricator.wikimedia.org/T106271#1463432 (10Halfak) [14:32:44] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10Wikilabels: ORES/Wikilabels deploy checklist - https://phabricator.wikimedia.org/T111828#2501903 (10Halfak) @Ladsgroup, do we still need this since we did T106271 ? [14:38:00] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Implement twemproxy for ORES in production - https://phabricator.wikimedia.org/T122676#2501912 (10Halfak) @akosiaris, do you think we should still pursue something like this? We haven't had big stability problems with redis (that I'm aware of), but it could be... [14:41:31] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Implement twemproxy for ORES in production - https://phabricator.wikimedia.org/T122676#2501914 (10Halfak) p:05Triage>03Low [14:42:31] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Implement single column diff option (from mobile app) - https://phabricator.wikimedia.org/T104072#2501917 (10Halfak) [14:42:33] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Make Wiki Labels mobile compatible - https://phabricator.wikimedia.org/T105518#2501916 (10Halfak) [14:43:34] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: Make Wiki Labels mobile compatible - https://phabricator.wikimedia.org/T105518#1445605 (10Halfak) p:05Triage>03Lowest [14:43:56] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels: [Mock] Wiki labels support on a mobile device. - https://phabricator.wikimedia.org/T123339#2501919 (10Halfak) p:05Triage>03Lowest [14:45:02] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES: Integrate with mw-core ChangeTags - https://phabricator.wikimedia.org/T123871#2501921 (10Halfak) 05Open>03declined [14:45:54] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES: Integrate with mw-core ChangeTags - https://phabricator.wikimedia.org/T123871#1939946 (10Halfak) Declining this because it isn't well defined and it seems like we're likely not going to go in this direction since we now have the `orescores`... [14:49:46] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Add scaling, centering and balancing to `tune` utility - https://phabricator.wikimedia.org/T129252#2501941 (10Halfak) Looks like we have implemented scaling of features. See https://github.com/wiki-ai/revscoring/blob/master/revscoring/utilities/tune.py#L1... [14:50:44] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Add scaling, centering and balancing to `tune` utility - https://phabricator.wikimedia.org/T129252#2501942 (10Halfak) p:05Triage>03Low [17:01:29] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Enable basic stats for uwsgi - https://phabricator.wikimedia.org/T141543#2502403 (10Halfak) [17:03:18] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Enable statsd uwsgi settings - https://phabricator.wikimedia.org/T141543#2502419 (10Halfak) [17:12:00] 06Revision-Scoring-As-A-Service, 10ORES: Investigate web-05 downtime - https://phabricator.wikimedia.org/T141523#2502439 (10Halfak) a:03Halfak [17:12:37] 06Revision-Scoring-As-A-Service, 10ORES: Investigate web-05 downtime - https://phabricator.wikimedia.org/T141523#2501811 (10Halfak) It looks like there was a spike in system CPU usage and a sudden drop in precache requests right before the downtime. See: https://graphite-labs.wikimedia.org/render/?width=586&... [17:12:52] halfak: hey [17:12:58] Hey Amir1 [17:13:05] I just woke up and realized I missed the backlog grooming [17:13:17] No worries. The hours are weird for you. [17:13:19] won't happen again, I forgot it's Thrusday [17:13:26] *Thursday [17:13:45] You can see what we did in IRC logs. I think I pinged you in a couple places. [17:13:49] Or at least touched your cards. [17:14:36] yeah, I'm about to answer those [17:14:41] kk [17:17:34] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10ORES, 05WMF-deploy-2016-07-26_(1.28.0-wmf.12), and 2 others: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368#2502501 (10Ladsgroup) [17:18:16] 06Revision-Scoring-As-A-Service, 10ORES: Update wmflabs deploy repo for new version of ORES - https://phabricator.wikimedia.org/T141377#2502505 (10Halfak) This is now in staging. I'm running a test with precached. [17:21:52] Amir1, FYI, I'm going to need to run Isla to the vet. She's sick this morning. I'll be running away to do that in ~2 hours. [17:21:57] I should be able to come back afterwards. [17:22:12] okay [17:22:22] it's okay [17:22:45] halfak: https://phabricator.wikimedia.org/T123871 regarding this. I think Adam wanted change tags to be added [17:22:55] based on scores [17:23:58] It looks like that isn't really relevant though, right? [17:24:15] Now that we have the review tool and associated tables [17:26:30] it can make getting queries in API easier [17:26:42] we can do in the extension [17:31:03] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10Wikilabels: ORES/Wikilabels deploy checklist - https://phabricator.wikimedia.org/T111828#2502555 (10Ladsgroup) Agreed [17:31:05] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: More robust/automated deployment of ORES - https://phabricator.wikimedia.org/T130410#2502557 (10Ladsgroup) [17:31:07] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10Wikilabels: ORES/Wikilabels deploy checklist - https://phabricator.wikimedia.org/T111828#2502556 (10Ladsgroup) 05Open>03Invalid [17:39:01] Amir1, do you know where the "99-main.yaml" file comes from on our wmflabs servers? [17:39:18] probably puppet [17:39:22] let me find it [17:39:36] Thanks. I think we need to squash it. It doesn't seem to be serving a purpose. [17:40:36] halfak: it's not there. probably it was and then got deprecated [17:41:13] OK cool. I'll delete. [17:42:36] Oh! Looks like it was only on web-05 [17:42:44] Rather, web-03 [17:43:05] At least now I can be sure it is irrelevant :) [17:48:27] Hmm... Can't seem to log into ores-worker-07 [17:50:38] Yup. 07 is locked [17:52:22] Rebooting [18:08:46] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Include specific user groups in the trwiki edit quality model - https://phabricator.wikimedia.org/T140474#2502762 (10Superyetkin) Is there anyone who can rebuild the model for trwiki? [19:08:05] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: Include specific user groups in the trwiki edit quality model - https://phabricator.wikimedia.org/T140474#2502896 (10Halfak) Yup. We're currently bogged down a bit by some operations work. But we have assigned this task and we should be able to pi... [19:26:37] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10ORES, 05WMF-deploy-2016-07-26_(1.28.0-wmf.12), and 2 others: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368#2502968 (10Ladsgroup) We usually have around 300-500 cases but today it was around 1K. I still... [20:12:01] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: Set up grafana dashboard for ORES extension - https://phabricator.wikimedia.org/T141169#2503102 (10Ladsgroup) Hey @Krinkle, Thanks for the tips. One question. Why not using `sum` instead of `rate` and scaling it up? [20:22:53] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: Set up grafana dashboard for ORES extension - https://phabricator.wikimedia.org/T141169#2503117 (10Krinkle) >>! In T141169#2503102, @Ladsgroup wrote: > Hey @Krinkle, > Thanks for the tips. One question. Why not using `sum` instead of `rate` and scal... [20:45:13] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10ORES, 05WMF-deploy-2016-07-26_(1.28.0-wmf.12), and 2 others: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368#2503299 (10Ladsgroup) Okay, By checking the logs it seems we had that spike because API didn't... [20:49:32] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES: Set up grafana dashboard for ORES extension - https://phabricator.wikimedia.org/T141169#2503319 (10Ladsgroup) Okay, Thank you! it was super helpful. [21:40:54] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 10ORES, 05WMF-deploy-2016-07-26_(1.28.0-wmf.12), and 2 others: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368#2503504 (10Ladsgroup) Here's some facts: * ORES celery queue is 100 revisions once it gets mo... [22:06:15] 06Revision-Scoring-As-A-Service, 10ORES, 07Puppet: Puppet config changes for ORES refactor - https://phabricator.wikimedia.org/T141575#2503579 (10Ladsgroup) [22:14:54] 06Revision-Scoring-As-A-Service, 10ORES, 07Puppet: Puppet config changes for ORES refactor - https://phabricator.wikimedia.org/T141575#2503579 (10Pchelolo) > Change method of precaching to several models at a time. Please do this in the backwards compatible way and ping me when it's done, so that I could upd... [22:24:42] 06Revision-Scoring-As-A-Service, 10ORES, 07Puppet: Puppet config changes for ORES refactor - https://phabricator.wikimedia.org/T141575#2503730 (10Ladsgroup) >>! In T141575#2503622, @Pchelolo wrote: >> Change method of precaching to several models at a time. > Please do this in the backwards compatible way an... [23:08:54] Back! [23:22:53] halfak: I made some progress about this: https://phabricator.wikimedia.org/T141368 [23:27:51] Nice. [23:28:06] I still haven't finished the deploy of new code to wmflabs. [23:28:11] I'd like to do that tonight though. [23:28:36] nice. I will monitor it