[06:01:55] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.273 second response time https://wikitech.wikimedia.org/wiki/ORES [06:06:39] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 512 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/ORES [08:54:48] 10MediaWiki-extensions-ORES, 10ORES, 10Scoring-platform-team, 10Documentation: Elaborate documentation on how to deploy ORES to a new wiki - https://phabricator.wikimedia.org/T182054 (10kostajh) The documentation that @Catrope wrote was for RCFilters configuration, and can be found here: https://www.mediaw... [13:01:48] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 0.442 second response time https://wikitech.wikimedia.org/wiki/ORES [13:05:58] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 512 bytes in 0.029 second response time https://wikitech.wikimedia.org/wiki/ORES [13:32:48] o/ [13:33:14] forgot about a follow-up review for a journal paper that I'm on the hook for. Diving into that right away this morning and then I'll be back to working on ores-worker-04 [14:32:17] ...and done. [14:32:38] Now onto figuring out why ores-worker-04 is running out of memory while the other workers ar enot. [14:39:13] 10Scoring-platform-team, 10VPS-project-icinga2, 10User-Zppix: Have ORES-web[01,02] been removed? - https://phabricator.wikimedia.org/T232921 (10Halfak) Yes. Thank oyu. [14:54:17] 10Scoring-platform-team, 10editquality-modeling, 10User-Tgr, 10artificial-intelligence: Create ORES dataset for huwiki edits in the last two years or so - https://phabricator.wikimedia.org/T223900 (10Halfak) Oh good question. You'd need historical scores for it too, right? I wonder if @Groceryheist alrea... [14:57:20] halfak: ill add the other ores web sometime today (hopefully) [14:59:49] 10Scoring-platform-team, 10editquality-modeling, 10User-Tgr, 10artificial-intelligence: Create ORES dataset for huwiki edits in the last two years or so - https://phabricator.wikimedia.org/T223900 (10Tgr) >>! In T223900#5496096, @Halfak wrote: > You'd need historical scores for it too, right? In general t... [15:02:48] 10Scoring-platform-team, 10editquality-modeling, 10User-Tgr, 10artificial-intelligence: Create ORES dataset for huwiki edits in the last two years or so - https://phabricator.wikimedia.org/T223900 (10Halfak) Got it. I'll let @Groceryheist respond because he's probably already done this digging, but if he... [15:54:01] thanks Zppix :) [15:56:48] 10ORES, 10Scoring-platform-team (Current), 10Patch-For-Review, 10Puppet: Require git-lfs in ORES hosts - https://phabricator.wikimedia.org/T232494 (10Halfak) [15:57:08] 10ORES, 10Scoring-platform-team (Current), 10Patch-For-Review, 10Puppet: Require git-lfs in ORES hosts - https://phabricator.wikimedia.org/T232494 (10Halfak) I just pushed another patchset that adds a new role for "ores::misc" (e.g. ores-misc-01 where we build models) that includes git::lfs. I also added... [16:38:07] akosiaris, can you take a look at https://phabricator.wikimedia.org/T230917 ? I can't seem to get a response from any other ops folks to figure out if my proposal is crazy or not. :) [16:46:02] 10Jade, 10Scoring-platform-team (Current): Design Jade entity UI - https://phabricator.wikimedia.org/T212370 (10Halfak) a:03Halfak [16:46:15] 10ORES, 10Scoring-platform-team (Current): Monitor ORES celery worker status - https://phabricator.wikimedia.org/T230931 (10Halfak) a:03Halfak [16:47:05] 10Scoring-platform-team (Current): Build tool to guess what tool was used to make reverts on Wikimedia wikis - https://phabricator.wikimedia.org/T226426 (10Halfak) 05Open→03Resolved [18:01:28] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 3.473 second response time https://wikitech.wikimedia.org/wiki/ORES [18:05:28] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 512 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/ORES [18:42:02] Ha. Now all of our celery workers have stopped processing jobs on labs for some reason. They are [18:42:06] still alive [18:42:10] just not doing anything [18:44:24] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 8.938 second response time https://wikitech.wikimedia.org/wiki/ORES [18:48:09] Yeah. So, celery runs fine on all workers except worker-04. On that machine, celery just slowly eats up memory until it OOMs [18:48:27] unrelated to uwsgi or redis problems. A new out-of-left-field problem appears. [18:49:50] I wonder if it is starting too many workers [18:49:54] For some weird reason [18:51:32] Aha! I looked at it sideways, restarted the service again and now it's not running out of memory. [18:51:38] It has over 30% remaining! [18:51:46] WHAT [19:17:26] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Implement interpret-based scoring model - https://phabricator.wikimedia.org/T233045 (10Halfak) [19:43:44] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Implement interpret-based scoring model - https://phabricator.wikimedia.org/T233045 (10Halfak) There's a general additive model in there that we should start experimenting with. Some MSFT folks have already taken a pass at this. [20:44:22] So. I don't want to jinx it, but I think ORES in labs is looking pretty good. [20:44:32] * halfak waits for the icinga hammer to hit [20:50:07] wikimedia/revscoring#1716 (dump_cache_fields - 9562df6 : Aaron Halfaker): The build passed. https://travis-ci.org/wikimedia/revscoring/builds/585774312 [21:13:38] 10Jade, 10Scoring-platform-team (Current), 10Documentation, 10Patch-For-Review: Implement a MW API module for interacting with Jade entities - https://phabricator.wikimedia.org/T199834 (10Halfak) I updated P8830 to include warnings for when the actions is a no-op. [21:13:56] accraze, ^ note that I updated that old paste I made of the API actions. [21:14:10] It now includes the warnings for noops [21:16:11] halfak: icinga2 should now be monitoring all the ores-web-*.wmflabs hosts [21:16:42] thanks halfak, was actually just working on warnings and was wondering about noop :) [21:16:50] \o/ [21:16:52] nice timing [21:16:54] halfak: im going to leave the test open for a couple more hours incase anything happens [21:17:03] s/test/task [21:17:18] Let me know if this is weird. I made the "code" consistent, but the message different depending on the task. Some have placeholders for data/names. [21:17:28] Zppix, thank you for getting this together [21:17:37] Icinga has been mysteriously silent for a little while [21:17:40] * halfak knocks on wood. [21:17:41] halfak: no problem always willing to make life easier [21:18:02] halfak: im on the icinga2 dashboard rn everything looks atleast somewhat not on fire [21:18:02] * halfak is actually seeing his to-do list go down for the first time in a while. [21:18:09] :D