[13:27:25] o/ [15:46:02] 06Revision-Scoring-As-A-Service, 10ORES: [Spike] Implement & test dependent tasks in Celery - https://phabricator.wikimedia.org/T136875#2357747 (10Halfak) [15:46:06] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10revscoring: Score multiple models with the same cached dependencies - https://phabricator.wikimedia.org/T134606#2357746 (10Halfak) [15:49:24] o/ Amir1 [15:49:33] halfak: hey [15:49:59] akosiaris: hey, are we ready for the public DNS? [15:50:21] halfak: I have some patches you need to look [15:50:32] I saw one in Wikilabels and left you some notes. [15:50:47] I had to do some digging and homework to evaluate that complex query :) [15:52:04] awesome, why didn't I get the email [15:52:11] anyway, I have one in editquality [15:52:14] Hmm.. Not sure [16:01:35] halfak: do you want to join? [16:02:41] 06Revision-Scoring-As-A-Service, 10ORES: [Spike] Implement & test dependent tasks in Celery - https://phabricator.wikimedia.org/T136875#2357806 (10schana) >>! In T136875#2353418, @Halfak wrote: > I don't want to cache features between requests. Why not? > Instead, I want to re-use the feature-extraction-cache... [16:11:16] Amir1: yes we are. Seems like it is going to happen today [16:11:23] * akosiaris in a meeting right now [16:11:27] sure [16:11:50] nice :) [17:05:02] Afk for ten min [17:24:13] back [17:41:39] * halfak hacks on phab board. [17:41:51] Then I'm going to try to quickly kick off the russian article quality stuff. [17:42:06] I'll then come back to the load testing of ores.wikimedia.org [17:45:58] 06Revision-Scoring-As-A-Service, 10ORES: [Spike] Implement & test dependent tasks in Celery - https://phabricator.wikimedia.org/T136875#2358210 (10Halfak) > Why not? Because it would be very complicated to apply to the dependency injection system used to score revisions. We'd need to cache whole trees -- not... [17:46:13] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Article quality models for Russian Wikipedia - https://phabricator.wikimedia.org/T131635#2358211 (10Halfak) a:03Halfak [17:48:22] 06Revision-Scoring-As-A-Service, 10ORES: Plan refactor to improve resource usage / cache feature extraction - https://phabricator.wikimedia.org/T137125#2358215 (10schana) [17:50:09] halfak: ^ phab task created [17:51:18] schana, I generally don't agree that we should cache extracted features. [17:51:47] E.g. let's say we cache all of the features for a scoring except for one: revision.diff.words_added [17:51:59] We'd need to re-process *everything* again in order to get that feature. [17:52:15] So then, we might cache the entire extraction tree, right? [17:52:24] Well, that is a lot of data! [17:52:35] well, you're caching them in the 'super task' now, so it's just a matter of defining where the bounds and how long-lived the cache is [17:53:56] I think containing all the feature extraction to a single computational task was the goal to eliminate wasteful processing, right? [17:54:31] Sure is. Don't you think it's a stretch to refer to simply keeping the data in memory until done as a "cache"? [17:54:50] It seemed to me that you wanted independent tasks to be able to request this cache. [17:55:02] yes, individual scoring tasks [17:55:08] That would mean we need to transfer the big pile of data N times -- wher N is the count of models. [17:55:27] is the list of features that big? [17:55:44] No. As I said, the dependency tree is big. Otherwise it does not make sense to cache features. [17:56:41] so, say a request comes in for multiple models... something can look at that, decide the sum of features that are needed, and then send a request off to a feature extractor [17:57:04] the feature extractor does the work (keeping the same dependency tree), and then shoves the results to the feature-cache [17:57:12] schana, OK. Then we'd have one supertask for extracting features and sub-tasks for doing the scoring. [17:57:22] *not sub tasks [17:57:25] independent taskss [17:57:28] *tasks [17:57:35] I don't see the difference [17:57:52] one task wouldn't spawn others [17:57:53] In the case of my super-task and sub-tasks, they are independent [17:58:12] See executions 8 and 9 here for a view into the dependency tree: So how does that help us more than what we are already doing? [17:58:15] Woops [17:58:17] or rather, the super-task would spawn the feature extractor and scorers independently [17:58:18] Wrong paste [17:58:23] https://github.com/wiki-ai/revscoring/blob/master/ipython/feature_engineering.ipynb [17:59:01] schana, it sounds like you are proposing a much larger and more complicated dependency structure [17:59:32] Also, the supertask (uwsgi) spawns the tasks independently in my code too. [17:59:59] maybe - I agree it's not necessary at this point, but moving towards a system that has more single-minded tasks is my intention [18:00:10] Spawn of larger task: https://github.com/halfak/dependent_task_testing/blob/master/generate_score_requests.py#L61 [18:00:22] Spawn of independent sub-tasks: https://github.com/halfak/dependent_task_testing/blob/master/generate_score_requests.py#L64 [18:00:54] IMO the uwsgi task should just deal with handling the web requests; there should be no task-management logic in it [18:01:05] Well, there won't be. [18:01:16] The ScoreProcessor (now ScoringSystem) will handle all that [18:01:21] Hence the modularity [18:01:56] ScoringSystem seems like a god-class [18:02:11] And the ScoringContext implements all the details of generating scores that depend on context, but not where the processing actually takes place. [18:02:16] (from my limited perspective) [18:02:27] schana, I don't think that's fair. It's not a massive pile of code and it has limited scope [18:02:36] But it does sit at the top in a sense. [18:03:23] Have you seen my notes starting on line 68 in this etherpad? https://etherpad.wikimedia.org/p/ores_refactor [18:07:02] just out of curiosity, why is ScoringSystem a dict? [18:07:44] Because it behaves like a behavior of ScoringContext. [18:07:52] It doesn't need to be. Not married to that decision. [18:08:05] *behaves like a container of ScoringContext [18:08:23] ScoringContext behaves like a container of models. [18:19:58] * halfak kicks off label extractor for Russian Wikipedia [18:22:31] 06Revision-Scoring-As-A-Service: Build load testing scripts for ores.wikimedia.org - https://phabricator.wikimedia.org/T137131#2358353 (10Halfak) [18:22:39] 06Revision-Scoring-As-A-Service: Build load testing scripts for ores.wikimedia.org - https://phabricator.wikimedia.org/T137131#2358366 (10Halfak) a:03Halfak [18:25:35] akosiaris: https://ores.wikimedia.org/ [18:25:43] under maintenance \o/ [18:26:15] halfak: me and you should have access to scb right now [18:30:12] halfak: https://github.com/wiki-ai/ores-wikimedia-config/pull/61 [18:30:29] (the patch in puppet is merged now) [18:30:33] afk for ten minutes [18:42:15] halfak around ? [18:42:35] Amir1: ^ ? [18:55:26] afk for an hour or so [18:59:46] Hey! In meeting [19:07:09] akosiaris: now, I'm here. Tell me when you're back [20:26:45] halfak: around? [20:27:11] Yeah. 50% present [20:27:24] https://ores.wikimedia.org/ [20:27:30] halfak: ^ have you checked this [20:27:43] the worker is not up, it seems: https://ores.wikimedia.org/v2/scores/fawiki/damaging/4567/ [20:35:26] \o/ technically online [20:35:37] Looks like it's more than the workers not being online [20:35:45] It shouldn't 500 -- it should timeout [20:36:09] I think they are online but redis settings or stuff like that is not set correctly [20:36:15] maybe I can login [20:39:01] nope, I can't login [20:51:21] akosiaris: around? ^ [20:51:59] halfak: Is there anything wrong with this patch? https://github.com/wiki-ai/ores-wikimedia-config/pull/61 [20:53:10] nice [20:53:12] thanks :) [21:04:20] I was trying to rebase the gerrit patch but it couldn't, turns out it was moved to another file in https://github.com/wikimedia/mediawiki/commit/f51b1cd9ec1d70fa070a90fa6412ff774251959c [21:12:43] rebased finally [21:13:11] Also I need +2er review this: https://gerrit.wikimedia.org/r/279925 [21:38:31] 06Revision-Scoring-As-A-Service, 06Operations, 06Research-and-Data-Backlog, 10Research-management, and 3 others: [Epic] Deploy Revscoring/ORES service in Prod - https://phabricator.wikimedia.org/T106867#2359253 (10akosiaris) [21:38:33] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review: Setup varnish endpoint for ORES - https://phabricator.wikimedia.org/T124203#2359250 (10akosiaris) 05Open>03Resolved a:03akosiaris Done. Resolving [21:39:03] \o/ [21:41:05] akosiaris: hey, sorry to over over ping, getting scores gives me internal server error: https://ores.wikimedia.org/v1/scores/wikidatawiki/damaging/12345678/ [21:41:30] I want to login to scb1002 but it seems I still don't have access [21:41:48] (just to check logs, nothing to do) [21:44:47] Amir1: halfak: I am around again [21:44:57] sorry to over ping [21:44:58] so, redis password setting for ores_redis does not seem to work [21:44:59] :) [21:45:29] redis.exceptions.ResponseError: invalid password [21:45:29] [pid: 24766] 10.64.32.103 (-) {50 vars in 1205 bytes} [Mon Jun 6 18:39:49 2016] GET /v2/scores/enwiki/?models=damaging&revids=724030089 [21:46:01] * Amir1 is checking [21:47:13] I 've checks 99-main.yaml and the password is indeed there [21:47:38] the specific part of the config file is [21:47:42] score_caches: [21:47:42] ores_redis: [21:47:42] host: oresrdb1001.eqiad.wmnet [21:47:42] password: REMOVED [21:47:42] port: 6380 [21:48:10] where I 've just replaced the actual password (which I checked that it is correct) with REMOVED [21:48:24] for the purpose of copypasting it on IRC [21:48:45] obvisouly [21:49:10] so, why does ores not honor that setting ? [21:49:36] I'm checking the source code atm [21:52:07] akosiaris: per what I realized it invokes https://redis-py.readthedocs.io/en/latest/#redis.StrictRedis [21:52:19] which means password should work just fine [21:53:05] the source code to load the config: https://github.com/wiki-ai/ores/blob/master/ores/score_caches/redis.py [21:55:35] We should first confirm that the config is being read appropriately. [21:55:51] Also, 99-main.yaml seems like the wrong name. Shouldn't we be calling this 99-redis.yaml? [21:57:31] well, it must be read the correct way cause the score_processors: database does have data in it [21:57:40] at least keys * does say so [21:58:32] the score_caches on the other hand, no [21:58:40] Have you seen any errors in the logs? [21:59:01] or rather, what machine can I access to look at logs? [21:59:06] redis.exceptions.ResponseError: invalid password [21:59:10] I'm running the uwsgi in scb1002 right now [21:59:16] tell me to stop if you think so [21:59:16] want the entire stacktrace ? [21:59:36] per what I see, it can ready 00-main.yaml properly [21:59:40] *read [21:59:48] 2016-06-06 21:58:50,442 INFO:ores.scoring_contexts.scoring_context -- Loading ScoringContext 'huwiki' from config. [22:00:38] akosiaris, so it is trying a password. [22:00:59] Where do I review the logs? [22:01:31] scb1001, scb1002 : /srv/log/ores/main.log [22:01:58] I couldn't do it [22:02:42] https://www.irccloud.com/pastebin/Lrxcswxc/ [22:02:50] akosiaris: halfak ^ here 's the full backtrace [22:02:57] I run it withing the scb1002 [22:03:19] Aha! [22:03:28] This isn't the cache! [22:03:34] It's inside of the celery processor. [22:04:36] This is how we re-read the URL passed in for celery and convert it into structured data for use in constructing a new Redis connector. [22:04:36] https://github.com/wiki-ai/ores/blob/master/ores/score_processors/celery.py#L270 [22:05:13] We end up getting this from "self.application.conf.BROKER_URL" [22:05:36] So if celery has the password, we should have it here. [22:05:39] I'm pretty sure we haven't defined the password for the redis broker in the config files [22:05:41] But we must be reading it wrong. [22:05:45] Oh. [22:05:49] Well there's a problem [22:06:12] looking at them right now, but I might be wrong [22:10:13] you mean BROKER_URL: redis://:password@hostname:port/db_number ? [22:10:31] cause definitely got that in 99-main.yaml [22:10:47] yeah [22:10:50] we have it [22:10:52] sorry [22:11:13] minus the db_number, but I see that is missing in the labs setup as well, without causing any issues [22:11:27] BROKER_URL: "redis://:REMOVED@oresrdb1001.eqiad.wmnet:6379" [22:11:27] CELERY_RESULT_BACKEND: "redis://:REMOVED@oresrdb1001.eqiad.wmnet:6379" [22:11:53] the redis in labs doesn't have pass IIRC [22:11:57] ! [22:12:06] We don't handle ":@" [22:12:12] We handle "@" [22:12:18] Is that colon standard? [22:12:22] yup [22:12:29] according to celery docs at least [22:12:44] the normal format is username:password but redis has no users [22:12:50] Gotcha. [22:12:50] hence the username part is skipped [22:13:04] Amir1, can you update https://github.com/wiki-ai/ores/blob/master/ores/score_processors/celery.py#L260 ? [22:13:12] halfak: yeah, sure [22:13:24] one thing. Do you want me to deploy it too? [22:13:39] http://docs.celeryproject.org/en/latest/configuration.html#redis-backend-settings and http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html clearly says that at least [22:13:43] (we need to update the gerrit submodule, I think) [22:14:07] akosiaris: is the submodules are in differntial or I need to update gerrit manually? [22:14:35] Amir1: for now in gerrit. I 've talked with releng and it seems possible we can get them moved into diffusion soon though [22:15:01] okay, thanks :) [22:33:08] halfak: https://github.com/wiki-ai/ores/pull/145 [22:33:53] "?:\:"? [22:34:08] I spent some time to support username too but it seems it doesn't support it :D [22:34:24] halfak: "?:" means non-capturing group [22:34:27] Why not just add "\:" to the beginning of the first group? [22:34:36] :P [22:34:47] Premature optimization is the root of all evil [22:35:02] no, not optimization [22:35:19] we run it only once, I just think it's not concpetully correct [22:35:26] *conceptually [22:35:39] thanks :) [22:35:57] halfak: shall I deploy? [22:36:43] halfak: tldr, with the ?: reg.match(str).group(0) would return the password too. it's not right [22:54:24] fixing the submodule was a huge PITA [23:01:15] Amir1, we use named groups in that regex. [23:01:18] But yeah, I hear you [23:01:30] Wait. Submodules should be easy. [23:01:37] It's the wheels that are hard [23:01:50] no, the submodules in gerrit [23:01:59] not in our github repo [23:49:13] halfak: https://ores.wikimedia.org/v2/scores/enwiki/?models=damaging&revids=724030089 [23:49:29] so, we are finally up and running [23:52:02] yes [23:52:03] yes [23:52:08] \o/ [23:52:10] and I am off to bed [23:52:15] congrats guys [23:52:17] :-) [23:52:21] thank you akosiaris [23:52:27] it was awesome