[03:23:52] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10Hindi-Sites, 10artificial-intelligence: Add language support for Hindi - https://phabricator.wikimedia.org/T173122#3536645 (10awight) Admins are interested in opening the discussion, and would like to see a demo of what ORES c... [03:45:28] 10Scoring-platform-team, 10Gerrit, 10ORES, 10Operations, and 2 others: Simplify git-fat support for pulling from both production and labs - https://phabricator.wikimedia.org/T171758#3536646 (10awight) /me likes @demon's post. Awesome, let's stay in coordination about how we might be able to help with this... [04:17:29] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10Hindi-Sites, 10artificial-intelligence: Add language support for Hindi - https://phabricator.wikimedia.org/T173122#3519025 (10YmKavishwar) >>! In T173122#3536645, @awight wrote: > Admins are interested in opening the discussio... [04:42:00] 10Scoring-platform-team-Backlog, 10Bad-Words-Detection-System, 10revscoring, 10Hindi-Sites, 10artificial-intelligence: Add language support for Hindi - https://phabricator.wikimedia.org/T173122#3536661 (10hindustanilanguage) Discussion moved to Community Village pump: https://hi.wikipedia.org/wiki/विकिपी... [08:44:38] Amir1: minor typo in commit message for https://gerrit.wikimedia.org/r/#/c/369915/. "defualt" [08:45:14] akosiaris: I will never learn how to spell it :D [08:45:20] fixing it right now, thanks [08:45:20] :-) [08:59:41] akosiaris: done [09:04:59] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3536827 (10akosiaris) Changed above merged, I did a puppet test ran on both production (a noop as expected) and a stresstest node (120 CELERYD_... [13:57:27] o/ akosiaris [13:57:41] Will you be around at 1500 UTC to do a quick stress test with me? [14:03:41] halfak: add 30 mins and I will be [14:03:53] I got a meeting at 15:00 UTC [14:04:17] Hmm. That could work. [14:06:29] Hey Guys, anybody tried running neural network prediction inside a hosted tool? [14:06:41] Amir1, ^ [14:07:12] brij: I did :) [14:07:18] Yay :) [14:07:34] So I hosted a Flask service here [14:07:43] https://tools.wmflabs.org/proneval-gsoc17/ [14:08:04] which accepts features as POST request and runs a prediction code [14:08:15] models are trained using Keras [14:08:25] 10Scoring-platform-team, 10Wikilabels, 10translatewiki.net, 10I18n, and 3 others: Wiki-ai-wikilabels-form-dagf-damaging-label and Wiki-ai-wikilabels-form-dagf-goodfaith-label appear as empty in translatewiki.net - https://phabricator.wikimedia.org/T172180#3537868 (10Nikerabbit) I'm hoping to get +1 on http... [14:08:49] But the code takes too long without any response [14:09:06] @Amir1: Can you please help [14:10:01] I didn't use Keras or TF in labs and I don't recommend it as it's resource-consuming without proper optimizer installed (jvm optimizer) [14:10:56] Ok did your predictions take long time too? [14:12:40] brij: hmm, yeah [14:13:33] what I did was to build the model first and then made a service that people would ask and it was just answering with predicating based on the trained model [14:14:04] the training took so long (for me it was 24 hours) but predicating based on the trained model took less than a second [14:15:48] ok [14:16:13] you used Python for prediction? [14:16:30] I am doing the same [14:16:43] But I am using Keras and Python for prediction [14:17:18] Not sure why it takes so long [14:19:55] brij: whats the time it takes locally? [14:20:46] max 2 seconds [14:21:10] when called from a different web service [14:21:36] brij: diff web service as in? [14:22:22] the page which fires the AJAX request is hosted on a different server [14:22:50] so it POSTs the request to the server where Keras model predicts [14:22:59] totally it comepletes within 2 sec [14:25:06] ok I tested it again, on local, it takes 71ms [14:26:45] brij: tried manually posting to the keras backend server? [14:27:28] i'm assuming this is for gsoc work product right? [14:27:34] Yes I am using Chrome's Postman addon to post the request [14:27:38] Yes [14:28:15] can you post a sample request here that the keras server is failing to serve? [14:28:30] Yes [14:28:51] URL: https://tools.wmflabs.org/proneval-gsoc17/pronserv [14:29:07] BODY [14:29:08] { [14:29:08] "feats": [0.04615384712815285, 0.15078403055667877, 0.9285714030265808, 0.9750000238418579, 0.0615384615957737, 0.1498793512582779, 0.1666666716337204, 0.40625, 0.5384615659713745, 0.1136680319905281, 0.8571428656578064, 0.33125001192092896, 0.04615384712815285, 0.16024932265281677, 0.738095223903656, 0.42500001192092896, 0.07692307978868484, 0.14431020617485046, 0.976190447807312, 0.7562500238418579, 0.71875], [14:29:08] "word": "because" [14:29:08] } [14:29:27] HEADERS: [14:29:35] Content-Type: application/json;charset=UTF-8 [14:29:40] brij: better use a paste :P [14:30:28] Ohh sorry. Do you mean a gist? [14:31:22] the webservice is starting right now [14:32:01] It even takes > 5 min to start [14:33:27] brij: i can suggest some basic steps: 1) make sure its serving basic requests like returning simple text, 2) make sure your model is getting the request by enabling some kind of logging [14:33:53] 3) make sure the model isn't timing out, in any case enabling log messages on the server in your script would be helpful [14:34:48] It returns a dummy value [14:34:54] see https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/ for logging how to [14:35:20] I am also checking logs in uwsgi.log file [15:23:35] does this look like a sane query for getting number of page creations per month in a specific window ? - https://quarry.wmflabs.org/query/21007 [15:24:35] not too sure, probably this only returns the page creations logged by the pagetriage tool [15:25:04] Yeah. Not sure how pagetriage_page works. [15:25:13] Might be that it doesn't pick up autopatrolled pages. [15:25:16] Not sure. [15:27:26] halfak: I am around and I 'll be available for 30-35 mins (another meeting on 16:00 UTC). Feel like doing that stress test ? [15:27:36] Yeah! [15:27:38] I'll get set up. [15:29:19] Starting stress test at 600 requests per minute. I expect this won't put us at all close to capacity. [15:29:27] ok [15:30:18] something is weird. Looking into it. [15:30:22] I think I got the command wrong. [15:36:30] akosiaris, this looks good. Bumping up to 6000 requests per minute. [15:38:14] here we go! [15:40:54] well that went bad. Trying 2000 instead. [15:43:46] OK going! [15:43:51] We pushed up to 8% CPU [15:43:54] lol [15:43:59] We need more ram. More workers! [15:47:36] A little bit of overload but we recovered quickly. [15:49:16] Looks like we're not pooling ores1001 -- that's probably because we have redis running there. [15:49:35] But it looks like the machine is totally under-utilized. I'll run 3000 per minute with ores1001 re-pooled. [15:49:43] ok [15:51:22] Here we go! [15:51:35] (I have timestamps on these runs in my notes) [15:51:43] So we'll be able to compare to the logs. [15:51:58] nice [15:52:12] well.. I don't see ores1002 breaking any sweat [15:53:01] Overloading now. [15:53:30] so the bottleneck is not the boxes [15:53:45] Maybe we need to set a higher queue size in the celery workers. [15:54:05] memory is barely at 8.5GB used and CPU usage at barely 6% [15:54:38] Right. We should bump up both workers and the max queue size. [15:54:42] * halfak looks for settings. [15:55:12] Hmm.. First, I'll get my notes in place. [15:55:47] akosiaris, what do you think about doubling the number of workers again -- given the situation with memory usage? [15:56:15] quadruple it ? [15:56:18] :-D [15:57:31] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3538225 (10Halfak) Ran another test. Looks like we can handle 2000 requests per minute without much trouble. But we barf at 3000 requests per... [15:58:48] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3538255 (10Halfak) So my thought is that we can certainly bump up the number or workers. I also think we should increase the max size for the... [15:58:52] Yeah. Let's do that. :) [15:59:25] So I think i want to see the number of workers quadruple and the size of the queue quadruple. [15:59:54] hah, we only have a queuesize of 100 ? [15:59:59] yeah let's increase that ... [16:00:01] right [16:00:59] akosiaris, we'll get to work on patches. [16:01:04] Thanks for helping us monitor. [16:01:16] ok, I have a meeting but will be around and can merge stuff [16:02:14] Just about to start our main sync meeting. But Amir1 will pick up patches later today. [16:02:24] I think we'll be ready for your review tomorrow. :) [16:03:38] ok [16:27:29] Halfak: hi, Aaron! I won't be able to join our calls today (both 9 am and 10 am) as I am almost off tge grid this morning. I'll contact you later [16:42:35] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review, and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3538561 (10Halfak) a:05Halfak>03Ladsgroup [16:43:57] 10Scoring-platform-team, 10ORES, 10articlequality-modeling, 10draftquality-modeling, and 2 others: Update all of the models for revscoring 2.0 - https://phabricator.wikimedia.org/T173202#3538562 (10Halfak) @Ladsgroup please review https://github.com/wiki-ai/wikiclass/pull/47 [16:45:22] 10Scoring-platform-team, 10ORES, 10revscoring, 10artificial-intelligence: Include label-specific schemas with model_info - https://phabricator.wikimedia.org/T172566#3538577 (10Halfak) https://github.com/wiki-ai/revscoring/pull/350 [16:45:48] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: ThresholdOptimizations fail when a stat is null-ed - https://phabricator.wikimedia.org/T173268#3522646 (10Halfak) https://github.com/wiki-ai/revscoring/pull/349 [16:46:29] 10Scoring-platform-team-Backlog, 10ORES, 10Operations: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3538590 (10Halfak)