[14:28:16] o/ [14:33:53] Oh yeah! Today is a holiday! [14:34:03] Hey halfak [14:34:13] Is it? I thought it was next week? [14:34:28] * guillom still isn't used to US holidays. [14:34:36] Oh... Might be. [14:34:38] * halfak looks [14:34:51] (other than "there are fewer than French holidays") [14:34:52] Yup. I was wrong. [14:40:53] o/ diyi [14:41:00] And welcome. [14:41:15] :) [14:41:27] Hi Aaron, nice to meet you here [14:41:44] :) So you should get a audio "ping" when someone says your name. [14:41:47] diyi: ^ [14:41:57] This is a good way to pull people into a conversation. [14:42:41] thanks! [14:44:15] Ironholds: rsyncing CirrusSearchRequests.logs to stat1002 now :) [16:05:40] morning J-Mo, DarTar [16:05:42] ottomata, yay! [16:05:55] hello [16:05:59] howdy Ironholds [16:57:08] ottomata, we have libmaxminddb installed on 1002, right? [16:57:53] think so [16:57:56] hrn [16:58:01] * Ironholds is trying to work out why his code won't compile. [16:58:33] oh. Do we have libmaxminddb-dev? ;p [16:58:40] probably not? [16:58:44] Ironholds: you can always check [16:58:47] * halfak can now extract features for a revision in 0.3 seconds including API lookup time. [16:59:05] apt-cache show libmaxminddb-dev [16:59:23] oh that doesn't hsow state [16:59:24] nope, we have it, apparently [16:59:26] then instead: [16:59:27] oh [16:59:33] aptitude show libmaxminddb-dev [16:59:36] State: installed [16:59:37] yes [16:59:39] so it is installed [16:59:54] halfak: via restbase? [17:00:09] ottomata, no. Restbase isn't terribly useful for this. [17:00:13] Regretfully. [17:00:22] Restbase is no replacement for the API. [17:00:31] Turns out I'm CPU bound now though :) [17:08:32] ottomata, hmmn. Thanks! [18:16:59] o/ yuvipanda [18:17:03] hey halfak [18:17:06] Good news! [18:17:09] go on! [18:17:16] We're now CPU bound for feature extraction :) [18:17:25] So I'm hoping to hack on celery on the plane later today. [18:17:31] w00t! [18:17:34] wait, celery? [18:17:39] for what? [18:17:59] For doing the CPU-intensive feature extraction work. [18:18:25] I get 50 rev_ids, I pre-cache the data they'll need and then send them off to workers. [18:19:02] hmm [18:19:02] Wrong tool? [18:19:19] am wondering. [18:20:20] halfak: do you not want them to be realtime? [18:20:39] halfak: so if you get in a request for 50revids, don't you want to synchronously return the response immediately? [18:20:56] yuvipanda, sure, but I want to farm out the computation in parallel [18:21:20] Preferably to multiple machines in a flexible way. [18:21:20] hmm, so you'll put things out on celery but block response until they complete? [18:21:24] Yup [18:21:33] I'm assuming low overhead. [18:24:58] halfak: hmm, that sounds a bit of the wrong toolforthejob [18:25:05] because of the blocking [18:25:38] you usually return a id and then have your clients poll [18:27:22] apply_sync()? [18:27:59] What tool would you use, yuvipanda ? [18:29:18] halfak: I don't find apply_sync on the celery docs [18:30:13] halfak: I'm thinking of reasons not to do it, give me a few minutes (also in a meeting) [18:30:49] kk [18:31:06] Looks like delay() give you an AsyncResult [18:31:22] When you're ready to block on the result, you just call .get() [18:33:24] halfak: so will you have one job that does all 50 revids or split them into 50 jobs? [18:34:33] halfak: ya but you have to account for queuing delays and failovers - you have to model a full queuing system when you want to reason about the system :) [18:34:55] it might make sense to do so but I guess we should think of that and convince ourselves before doing it... [18:41:11] yuvipanda, I just want to split the processing work over cores. [18:41:14] Simple is good. [18:41:17] Do you know a better way. [18:41:20] ? [18:41:35] halfak: so will you have one job that does all 50 revids or split them into 50 jobs? [18:41:56] Split then into n jobs. [18:42:13] They certainly can be split into one-rev per job [18:42:22] right, but we can batch them [18:42:23] I don't think one job that does 50 revisions gets us anything [18:42:35] right [18:42:37] The batch data gathering has already occurred before this point [18:44:35] halfak: so you do one batch request for 50 of them, and then split the data out for processing? [18:44:48] yuvipanda, yes [19:21:57] halfak: sorry, IRC troubles [19:22:15] * yuvipanda-wmf looks at logs [19:24:24] halfak: so celery sounds ok mostly because I can't think of any alternatives that don't have the word 'threading + ctypes' in them :) [19:25:20] halfak: although, the block-until-response makes me feel icky - it means that we've to be careful with timeouts or one dead runner can bring down the web cluster too [20:13:54] yuvipanda-wmf, makes sense. There should be some facilities to allow us to say: "No tasks shall take longer than 2 seconds. [20:14:08] If they do, kill 'em and we'll deal with the traceback. [20:14:18] I know we can timeout on get(). [20:14:29] But that's not exactly what we want. [20:15:32] http://celery.readthedocs.org/en/latest/userguide/workers.html#time-limits [20:15:57] yay! [20:16:10] Looks like this shouldn't be a problem. [21:11:02] halfak: yeah, fair enough [21:11:20] OK. I'll experiment. I'll simulate before I implement. [21:15:04] halfak: looking back, the reason I was hesitant is that having a web request block on celery makes a hard dependency on the web servers and the task runners - one full web worker process can be tied up by any one of X different task runners not responding properly [21:15:30] yuvipanda-wmf, +1. But isn't that what timeouts are for? [21:15:41] halfak: yeah just need to be aware of them and design them in :) [21:15:49] +1 :) [21:16:06] My plan for right now is just to tell the user. [21:16:08] halfak: I guess in general I prefer solutions that require less thinking :D [21:16:17] Scoring this revision timed out. You're welcome to try again. [21:16:34] yeah [21:16:43] * halfak works on implementing a cache. [21:16:47] :D [21:17:15] Would you put a key prefix in the config for your redis use? E.g. "ores:"? [21:17:30] ores == the system serving scores. [21:23:18] halfak: make it configurable [21:23:25] halfak: and yes, always put a key prefix :) [21:23:30] that allows multitenancy [21:23:39] Makes sense. Just worrying about best practices. :) [21:23:46] I've never used redis amongst others. [21:25:16] halfak: yeah :)