[13:52:58] o/ [14:57:21] halfak, can you get onto altiscale? [15:21:35] halfak, never mind. The University is blocking them... again [15:21:43] \o_ [15:36:21] o/ halfak [15:36:53] o/ [15:36:56] * halfak is in meetings [15:37:01] Will respond soon [15:37:14] ok. i am writing the p.o.c for HV. will have something to show in 1 or 2 days. [15:54:33] Great! Will be happy to talk to you about how you're thinking about diffs. [16:20:44] Hey! Back [16:21:01] kjschiroo, looks like you got what you needed. [16:22:04] yeah, still working on getting it fixed though. [16:23:30] sabya, I look forward to hearing back on what you work with diffs. [16:24:03] It looks like the sparse matrix representation of features that HashingVectorizer produces will work with simple mathematical operations. http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html [16:50:53] halfak: https://github.com/wiki-ai/revscoring/blob/poc_hashing_vector/poc/hv/poc_hashing_vectorizer.py [16:51:20] this is not much now, just was seeing how the things connect together. [16:52:16] do you thing i understood things correctly? [16:53:55] In the second "session.get", I think that rvstartid=revid-1 [16:54:00] Since it searches inclusively. [16:54:11] Oh! Wait! I see that one is getting both. [16:54:13] Cool. [16:54:51] rvdir "older" is the default. It looks like your ordering is right, but you might want to make that explicit. [16:55:46] ok [16:55:58] Otherwise, the gist looks good. [16:56:18] We'll want to do some tweaking -- e.g. telling the hasher to work with ngrams and skip grams. [16:58:16] ok. also, don't you think we should get the training set from human reverts? [16:59:05] the tsv is a prediction from the damaging model, isn't it? [16:59:55] The TSV contains real human labels [17:00:24] So you can train and test against those :) [17:00:34] oh i see :) [18:48:46] Hey folks. I'm taking the afternoon to run errands. Will be back in a few hours.