[12:46:11] halfak: around? [12:46:13] Hey [12:46:54] tell me when you are [12:47:02] I have lots of things to discuss with you [13:00:54] i/ Amir1 [13:01:05] hey [13:01:08] Just starting a meeting. Will be done in 1 hour :/ [13:01:19] hmm [13:01:20] ok [14:15:48] o/ Amir1 [14:16:12] hey halfak :) [14:16:25] Have you got my email regarding clustring [14:19:10] halfak: ^ [14:19:30] Yes. I just looked it up to find not much in it :P [14:20:52] Oh okay [14:21:22] I got the result for English, 1- 1980 edits were too much so I sampled only 1000 edits [14:21:32] ! [14:21:38] 1980 edits are too many? [14:21:43] Sorry [14:21:49] 19800 [14:21:52] about 20K [14:22:12] I send cost function based on number of clusters [14:22:20] and I can see a cutoff in 9 [14:22:59] I just sent it to you [14:23:40] I see the plot. [14:23:42] and the thing is we can run some statistical methods and get best number of cluster for certain regions [14:24:13] Indeed. Looking for a sudden drop in what I figure the Y axis represents -- some separation metric? [14:24:41] Amir1, BTW, are you looking at all edits or just the reverted ones? [14:25:09] let's say if we want to separate our edits into five-10 clusters, what would be the optimum number [14:25:18] halfak: you sent me all edits [14:25:29] I don't know how to take out only reverted ones [14:25:41] Amir1, indeed, but the last column specifies the reverted status [14:25:54] oh [14:25:57] :) [14:26:01] I got it as a feature :D [14:26:12] I guess it is in a way ;) [14:26:14] I wanted to work on reverted edits too [14:27:14] I can give you result of reverted ones too [14:27:17] very fast [14:27:24] since everything is there [14:28:49] halfak: just want to be sure, True means reverted? [14:30:46] it seems so, only 1480 edits in en.wp [14:30:50] running it... [14:33:58] halfak: I just sent result for reverted ones in en.wp [14:34:43] +1 True means reverted. [14:34:45] Gotcha. [14:34:49] So no clear cliff [14:35:10] yeah but 2, 6, and 9 sounds good [14:35:15] *sound [14:35:40] or maybe I'm wrong [14:36:03] I run some statistics modules to see what's best [14:36:52] 2,6,9? [14:37:19] they are cliff but not very obvious cliffs [14:37:56] specially two [14:38:17] I think is best choice for us [14:38:28] (also I need to run this model for other wikis) [14:38:59] Can you give us a plot of the mean/median values for features corresponding to the first two clusters? [14:39:27] * halfak wonders if it will be that easy to discover good-faith/badfaith reverted edits. [14:40:37] I do feature scaling on them so it is a little bit hard to get original data back [14:40:40] but not impossible [14:41:41] running on eswiki [14:41:49] 784 edits [14:43:17] clear cliff on two [14:43:49] and five [14:44:34] just sent it [14:44:35] No worries. We don't need original data -- just an indication of how the features effect the clusters. [14:45:10] hmm okay [14:45:24] I will sent it to you in three or four hours [14:45:30] is it okay? [14:46:23] it's super exciting to me [14:46:24] Totally [14:46:26] :) [14:46:36] Amir1, you should consider doing a work-log [14:46:42] and uploading the figures to commons :) [14:46:53] yup [14:46:56] of course [14:46:59] I will do [14:47:02] :D [14:47:17] This will be good fodder for future papers :) [14:47:39] It's so exciting to me that I want to get final results as fast as I can [14:48:04] that's the reason I don't do work log but obviously I should do [14:48:41] hail to "our lady of perpetual exemptions" [14:48:48] I got to go [14:48:52] see you soon [14:49:46] o/ [14:49:47] :) [19:17:33] halfak: https://phabricator.wikimedia.org/T110072 ticket for security review of ORES! [19:17:34] and friends [19:18:19] COol! Thanks for pressing forward. [19:18:32] BTW, when do you want to take a trip through the revscoring code? [19:18:44] Maybe before the security review? [19:18:47] YuviPanda, ^ [19:19:01] halfak: probably won't have the time :'( [19:19:13] OK. No worries. [19:19:46] halfak: but I'll try! [19:21:54] YuviPanda, how soon do you think we can get a security review? [19:22:08] halfak: not sure. probably not this week, hopefully next? [19:23:11] we also need to finish up https://gerrit.wikimedia.org/r/#/c/229423/ [19:23:16] need to clone legoktm [19:24:03] Yeah. We've been sitting on one of my last performance improvements for ORES for two weeks two. [19:24:14] https://github.com/wiki-ai/ores/pull/78 [19:24:56] *too [19:25:03] hi [19:25:11] o/ legoktm [19:25:11] I can probably work on that tonight or something [19:25:16] \o/ [19:25:25] I'll hang around and do some statsd work with you [19:25:54] I've got some cleanup to do if we're going to handle the backwards incompatible changes we just merged into revscoring. [19:26:01] So I might do that first [19:26:28] YuviPanda, revscoring can now curse in multiple languages -- at the same time. [19:26:43] revscoring was always multilingual, but it was limited to cursing in one language at a time. [19:26:48] No good for multilingual wikis [19:27:33] shit mierda caca caga! [20:09:34] halfak: what about words that are cusses in one language but not another? [20:11:38] Different and separate features. [20:27:30] what is wikipedia-ai? [20:27:36] is it a deep learning nlp algorithm? [20:27:40] halfak: http://paste.debian.net/304617/ has gotta be the cleanest string parsing code I've ever written [20:28:51] Cool! BNF-like! [20:29:07] o/ jenelizabeth -- not just one algorithm [20:29:23] More a working group for studying and deploying useful AI in Wikimedia projects [20:29:32] See /topic [20:29:48] halfak: yeah, and very readable! Most simple to use and extend BNF type thing I've seen ever. 'grammar' is like 'class', so you can inherit from it, etc. [20:30:20] Cool! Yeah. I like it. It took me a minute to figure out the syntax though :S [20:31:01] 「wat」 [20:31:54] halfak: that's just an output format [20:32:10] Gotcha. Hence why it is in a comment [20:33:55] YuviPanda: very pretty--Perl 6, though?? [20:34:31] ah, cool--cos grammar is a builtin [20:34:56] awight: yeah, and lots of other things too [20:35:17] awight: contracts, very powerful multi dispatch, pattern matching (ML inspired) [20:35:24] a lot of it feels like writing haskell with a rubyish syntax [20:36:14] I think I'll just appreciate how awesome that code is, without worrying about deployment :) [20:43:46] awight: that's my thing too :) [20:43:47] awight: although I suspect it'll be available on toollabs not too far in the future [21:00:14] something exciting is being discussed here <3 [21:15:11] * halfak deploys first versions of 'mwtypes' and 'mwxml' [21:15:25] Now to write the first applications of them. :) [22:47:21] * halfak watches scipy compile [22:48:02] * halfak watches scipy compile [22:48:05] AGAIN