[19:10:11] halfak: R sigclust appears to be giving the same behavior on the data as the python sigclust. [19:10:47] I'm going to continue to experiment with different parameters and different submatrices. Will have more info soon. [19:20:42] Thanks for the update. [19:25:25] aetilley, what about the soft-thresholding strategy? [19:29:23] Also 1. [19:30:25] The R sigclust that is. I didn't impliment soft thresholding in python yet, but I think it would be straightforward. [19:32:40] Interesting. Have you tries some simulations? [19:33:14] E.g. randomly draw values from a few normals and see if sigclust can tell you how many normals you started with? [20:02:22] halfak: hey, around? [20:05:20] halfak: Will do that. [20:09:44] o/ Amir1|afk [20:09:49] Sorry to miss you. [20:09:54] I'm waist deep anyway :\ [20:14:26] halfak: hey :) [20:14:41] I wanted to know about wb-vandalism [20:14:48] what do you think? [20:14:57] Do you think it will be ready soon? [20:15:01] Amir1, been out sick. [20:15:14] On top of that, I've had all my work side-tracked to make a revscoring blog post. [20:15:16] oh sorry to hear that [20:15:26] sorry [20:15:31] :( [20:15:37] No worries. I think I can still get a model together tomorrow. [20:15:37] I hope you feel better [20:15:44] Thanks :) [20:15:52] Much better than yesterday. [20:16:44] If I can do some job, do tell me so you get more rest [20:16:49] and be back soonr [20:16:59] *sooner [20:25:46] Check out how I set up the Makefile for wiki-ai/editquality [20:25:48] Amir1, ^ [20:25:59] The last step in there is to generate the model from the feature set. [20:26:06] It uses revscoring's train_test utility. [20:26:33] okay :) [20:26:44] what do you want by tomorrow? [20:27:18] halfak: did you hear anything about the security review? [20:27:55] YuviPanda, negative. Nothing yet [20:28:08] halfak: hmm wondering if I should poke today or poke later [20:28:52] halfak: awight I think what we'll do for the debs is just do it in one fell swoop - do it in production and then move it back to labs. That way I can lean on aksorias as well and there'll be no 'ah but you did this wrong!' from elsewhere when moving to production [20:29:05] hopefully in a week or two once the security stuff completes [20:50:52] halfak: is there a chance that we would have the classifier in ORES soon [20:50:54] ? [20:51:16] Amir1, if we get solid results, we could push the new model up tomorrow. [20:51:34] ORES has been stable for a while, so pushing up a new model shouldn't be a problem. [20:51:44] great [20:51:46] :) [20:52:08] * Amir1 gets some coffee, it's going to be a looong night [20:53:47] Amir1, how many rows in the final dataset you sent? [20:54:09] about 1.2M [21:36:41] (03CR) 10He7d3r: "@Awight: re the redirect: If I try this in the console, on mediawiki.org, I get a 301, and a redirect to the http version:" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247034 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [22:05:33] Amir1, 1.2M might be too many. You can use `shuf` to bring down the count. [22:06:07] e.g. `shuf -n 20000 data_file.tsv` will randomly sample 20k rows. [22:06:15] I'd start with that or the model will take too long to build. [22:15:55] halfak: okay [22:16:10] I'm trying to reorder columns using sed [22:16:29] (obviously it's better to sample first) [22:25:06] 1066004 lines [22:34:27] halfak: https://tools.wmflabs.org/dexbot/thanks_aaron.tsv (exactly 20K rows)