[00:08:38] halfak, I tried out this sknn, and it seems it works (by my standards). The trained NN didn't perform at classification as well as some other NN in octave on which I've been learning the ML. But maybe I should try train a bit more with different params. [00:09:59] Anyway, I am not sure I could peruse the kit ATM, but will try and investigate further. Cheers, gtg [10:10:34] (03CR) 10Ladsgroup: Add onOldChangesListRecentChangesLine hook (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/263184 (https://phabricator.wikimedia.org/T122535) (owner: 10Ladsgroup) [14:59:59] Hello halfak and YuviPanda [15:01:05] Seems that last time I posted under my alternative nick: Marija_. Sorry for confusion. [15:06:44] Hello Amir1 [15:07:24] hey :) [15:12:17] :) [15:14:09] I was trying this scikit-neuralnetwork and I haven't found some paramaters I thought it should've to train a NN. [15:14:40] *it should have had [15:17:11] I don't know about scikit-neuralnetwork [15:18:07] Regularization for example. Why did you decide to build the core for Kian yourself? I mean I suppose you weren't satisfied with what was already there, but I would like to know what were the flaws of other kits. [15:19:31] sknn is actually what halfak told me it looked nice [15:20:37] And I was experimenting with it for past few days to get into python, ML and perhaps help - if I could [15:53:09] halfak: o/ [16:00:00] o/ [16:00:12] pipivoj & Amir1 [16:00:30] Sorry that I'm in late today. It's technically a holiday. :) [16:00:46] hey, Adam Wight is working on the gadget [16:00:49] it's awesome [16:01:04] I hope he has time to work some more [16:01:14] "the gadget"? [16:02:05] oh sorry [16:02:16] I'm working on the scored revision gadget right now [16:02:21] my mind is kind stuck there [16:02:51] We are trying to make the revision scored gadget something people can enable in their preferences [16:03:01] Oh! Gotcha. :) [16:03:05] it seems it has some minor issues [16:06:30] halfak: but it will find lots of users [16:06:47] +1 I think this is a great idea. [16:06:57] that's one more step towards usability [16:06:58] It's something Helder wanted to do once we got more users. [16:06:58] anyway [16:07:22] If it's possible please move these GUI to ores server [16:07:26] I added CDN [16:07:36] btw is it holiday there? [16:07:56] Amir1, can you make a task for me to do that? [16:08:02] Yes. It's Martin Luther King day :) [16:08:11] Amir1, BTW, I've been doing some testing with the performance of full text scans (e.g. with regex of badwords/informals/etc)) [16:08:12] oh [16:08:18] Happy martin luther king [16:08:20] day [16:08:22] :D! [16:08:23] one sec [16:08:37] awesome [16:08:44] what did you get? [16:09:56] halfak: https://www.youtube.com/watch?v=YWis_ijN23U watch this :) [16:11:19] "The uploader has not made this video available in your country." [16:11:24] Somethign I don't see that often. [16:12:20] it's snl [16:12:27] Found an alternative [16:13:36] there are lots of add-ons (or extensions in chrome) that can unblock this issue [16:15:57] https://phabricator.wikimedia.org/T123933 [16:16:11] "I'm just going to go on Wikipedia and cut'n'paste the whole thing" [16:20:08] Thanks for the phab task. [16:21:19] :) [16:21:23] So my plan for today is (1) complete an analysis of historical productivity in English Wikipedia (2) extend this analysis to a few other wikis and (3) pull these UI assets into ORES and *maybe* do a deploy. :) [16:21:37] \o/ [16:21:53] brb gotta do some morning things. [16:40:03] o/ Marija_ [16:44:20] (back btw) [16:44:30] OK. On to enwiki! [17:10:31] halfak: did you get my email about JCPOA? [17:11:05] Yes. Not sure what we'll be able to do in the short term though. [17:11:15] But \o/ [17:12:49] \o/ [17:20:54] halfak: One of users in fa.wp is suggesting that we use a more intuitive number as percent of vandalism [17:21:09] The probability? [17:21:24] That's a whole big ball of math. It's non-trivial to solve AFAICT. [17:21:33] yeah [17:21:39] The numbers are really good for our consumption but users without AI background can't understand it [17:21:48] Really, I think that what they want are precision/recall settings. [17:22:02] I told him exactly this [17:22:03] Amir1, we could include that with the probability estimate. [17:22:20] E.g. when probability is at X%, precision is at X% and recall is at X% [17:22:32] Then a user can set their threshold based on those instead. [17:22:44] We'd need to learn the precision and recall from the test data. [17:22:50] we can give several f scores [17:22:54] And we'd need to do some interpolation. [17:23:01] How would one use the f score? [17:23:35] f score gives you optimal threshold [17:23:47] let's say you want to run a bot [17:24:12] that reverts every edit with more than a certain percet [17:24:31] you should use f-0.25 or f-0.125 score [17:24:40] because precision is way more important [17:25:14] Amir1, why not just use the precision measure directly? [17:25:22] but when you want to check recent changes the best threshold is f-2 score [17:25:52] Seems to me that f-score is good for model optimization, but it doesn't serve the user meaningful information [17:26:06] becase precision can be misleading in skewed classes (such as vandalisms) [17:27:00] let me find some material for you [17:27:11] Amir1, yes. This is a problem for model optimization. [17:27:43] E.g. we might use f-score to choose hyperparameters so that we don't accidentally make a model that predicts false all of the time (high precision, low recall) [17:27:54] But I don't see why I'd rather use that as a user. [17:28:15] I.e. what makes the f-score threshold intuitive? [17:31:50] because it gives you a threshold that is optimal for certain kind of application [17:32:50] e.g. if you want to get the best recall when precision is high, it's better to use f-0.5 score (or 0.25 depends on how high you want the precision) [17:33:17] Using this I was able to get really accurate data with high recall for Kian [17:33:20] Amir1, the definition of optimal for an f-score assumes equal weight applied to false positives and false negatives. These weights are not even and context specific. [17:33:31] and add tons of statements [17:33:51] that's definition of f-1 score [17:33:57] Yes. [17:34:18] when we want to get more precision we use f-0.5 or f-0.25 or f-0.125 [17:34:27] Amir1, but what makes those intuitive? [17:34:38] Why not just make your own decisions about precision and recall? [17:36:13] because it goes between 0.125 until 3 but our classifier says 90% probability vandalism but in wikidata we sampled there was no vandalism under 93% [17:36:44] and it's efficient in matter of choosing precision [17:37:19] if we want high precision we can get the best recall possible for that accuracy [17:37:59] and I was saying f-0.5 gives more weight to precision [17:38:56] let me correct myself "if we want high precision we can get the best recall possible for that range of precision" [17:39:10] "Range of precision" we don't really have that. [17:39:15] We have a precision threshold. [17:39:25] You can't make more than n% false-positives. [17:40:17] Same thing with recall. You want to catch at least x% of damage. [17:40:37] These are the anti-vandal bot and patroler thresholds. [17:44:10] Amir1, let's say that we report the f-0.5 value of a classifier, do we leave it to our user to figure out which threshold optimizes this statistic? [17:44:31] Or would we publish the thresholds that optimize at f-1, f-0.5, ...? [17:45:50] we need to publish those thresholds [17:47:44] OK. I'm down with that. We should make generating those thresholds part of testing the model. [17:47:55] We'll need to be careful that we don't test on balanced data. [17:48:04] It'll need to be a true random sample. [17:48:05] yeah [17:55:00] halfak: How do you want to implement that [17:55:05] in editquality? [17:55:44] I'm thinking, I can see we can simply do it for reverted models but for damaging it's harder [17:55:57] except when we implement it in ORES [17:56:01] *revscoring [17:56:40] Amir1, revscoring in all classifiers. [17:57:00] hmm [17:57:02] kk [18:52:44] halfak: I've a proposal, let's add an output, something like: ores.wmflabs.org/scores/enwiki/damaging/test [18:53:25] and it contains something like this: {"true": [0.1, 0.2,...], "false": [0.3,0.4,0.5]} [18:53:45] I can build tons and tons of analysis tool based on this output [18:54:39] I can draw ROC, get f scores, their related thresholds, precision in certain recall and vice versa, [18:56:38] Amir1, +1 [18:56:47] We actually already have this stuff stored in the model. [18:57:15] awesome [18:57:30] so PR should be under its way soon [18:57:36] let me see what I can do [19:17:45] OK. I've ~doubled the speed of regex processing. [19:17:55] Still not as fast as I want, but it's not too bad. [19:18:26] Turns out the wildcard prefixes on regexes are the devil [19:18:38] Backtracking is bad. [19:34:44] \o/ [19:42:12] * halfak works on other filters. [19:48:31] OK. I think we're OK. I just tested all this on a moderately sized page and I'm feeling much better about the speed that we can get. [19:48:41] There's still some optimizations that can be done. [19:49:04] E.g. when filtering, one can generate both the included stuff and excluded stuff in one pass. [19:49:13] No need to take two passes. [19:51:11] Amir1, would live to bounce some ideas off of you re. docs. [19:51:24] So, pythonhosted.org is designed to host a single version of the documentation. [19:51:58] But we're about to have 0.7 and 1.0 available in parallel while people move code over to 1.0 [19:52:07] people == me and a few others [19:52:30] We could just upload the 1.0 docs and call it good. [19:52:47] Or we could look into something like readthedocs -- that supports multiple versions of documentation. [19:54:38] I'm okay to have it both if possible [19:55:06] but if only one is possible, readthedocs sounds much better [19:55:15] Amir1, so, I've looked into readthedocs. It works like travis and wants to build our env in a virtual machine. [19:55:15] (sorry I was afk for dinner) [19:55:44] Still working on dinner? I can wait. [20:03:38] When you get back... the concern I have with readthedocs is that it seems to require a completely different strategy for addressing missing dependencies on the build machine. [20:04:08] They expect you to remove packages that are missing from your requirements.txt and create mocks of them in your doc/conf.py. [20:04:25] This is ridiculous. It seems there's no way to create a configuration specific to readthedocs. [20:04:35] And I can't find an option to build and upload manually. [20:07:15] a noob question: isn't rtd hosted on github? [20:07:45] https://github.com/rtfd/readthedocs.org ? [20:09:02] no I meant when one "uploads" his docs, you can access it via github [20:09:13] Oh. That I don't know about. [20:10:20] on https://scikit-neuralnetwork.readthedocs.org/en/latest/module_mlp.html you have a "Edit on Github" link upperright, which leads to https://github.com/aigamedev/scikit-neuralnetwork/blob/master/docs/module_mlp.rst [20:10:55] pipivoj, oh! Yeah. So you can configure your sphinx to reference github. [20:11:03] That rst is actually part of the project code. [20:11:06] We have those too. [20:11:21] ah, ok [20:11:21] E.g. https://github.com/wiki-ai/revscoring/blob/master/doc/index.rst [20:12:20] * halfak waits for ggplot2 to build [20:12:33] This is what I get for installing the latest R [20:13:43] But then you should be able to edit this rst, and have the rtd docs for your project, right? [20:14:34] pipivoj, yeah. So RTD does integration with github so as soon as you push a change, RTD will build a new set of docs. [20:15:38] * halfak just sneezed himself a sore neck [20:15:41] grumble gumble [20:16:16] But then you should push a change for the 0.7 and have your "new"version for the old version, or am I missing something? [20:16:59] ^ I don't understand what you are saying. [20:17:08] Like a fork IIUC this hullabaloo [20:17:54] You make a new fork of the 1.0 version which would be the 0.7 version and you have this rtd docs already [20:18:09] I mean for the 0.7 version. [20:18:42] I'm not sure I understand the problem: you want to have the docs for two versions - the 0.7 and 1.0? [20:18:53] Yes. Here. Let me get an example. [20:19:05] back for real now [20:19:10] reading up [20:19:31] See how there are multiple version available for http://scikit-learn.org/stable/documentation.html? [20:19:51] I'd like to have such a list for our docs. [20:20:09] I'd like to keep "0.7" and "1.0" now [20:20:20] Later, we'll likely add "1.1", "1.2", etc. [20:20:32] So that, if you decided to stick with an old version, you can still get docs for it. [20:20:48] This is probably more than we need right now. [20:20:57] We're the primary (nearly only) users of revscoring right now. [20:21:19] So maybe let's just use pythonhosted, update to 1.0 and leave moving to RTD as a task. [20:21:24] But are those docs generated automatically from source, or do you have the ability to edit them yourself [20:21:25] ? [20:21:43] pipivoj, they are a generated from a bunch of different documentation points in the source code. [20:21:52] reading [20:21:54] I'm sure you are familiar with javadoc, right? [20:21:58] pipivoj, ^ [20:22:07] sphinx ~= javadoc for python [20:22:10] Not entirely, but I understand the concept [20:22:29] Or I believe I do. :( [20:23:05] You make comments in your code, and the docs are actually your comments - sth like that? [20:23:14] Yeah. Exactly. [20:23:32] There's a couple places where those comments aren't discovered correctly, so I had to manually copy them over. [20:23:45] This was a substantially amount of my complaining about sphinx over the last week. [20:24:04] But mostly, it works as expected. [20:24:08] Ah, that's the thing. Now I und the problem [20:24:22] * halfak is a sphinx liability because he knows too much about python modules. [20:24:37] let me do some research for rtd [20:24:38] You are then worried for the that "not mostly" part? [20:24:51] There are a few interesting things that we do in revscoring that make sense for python, but they are uncommon and sphinx goes crazy on them. [20:25:12] pipivoj, mostly, I'm worried about getting the damn thing to build in RTD's environment. [20:25:21] sorry I went away without notice [20:25:40] It's a sneaky, hard problem because we have several C libs that we depend on that we can't compile on the build server [20:25:44] no worries Amir1 :) [20:25:57] pipivoj, is helping me rubber duck this [20:26:03] [[:en:Rubber duck debugging]] [20:26:03] 10[4] 04https://meta.wikimedia.org/wiki/:en:Rubber_duck_debugging [20:26:59] numpy uses the c libraries [20:27:02] I actually have a rubber duck on my desk BTW [20:27:09] * pipivoj is faster then Asimov :) [20:27:29] :D [20:27:37] Hey AsimovBot. {{done}} [20:27:43] How efficient, halfak! [20:27:44] {{done}} [20:27:46] Here [20:27:48] *there [20:27:54] How efficient, halfak! [20:27:55] :) [20:30:20] haha [20:30:26] that's amazing [20:31:24] I remove the rubber duck for all of the photos I take for the media stuff. maybe I should leave it there. [20:31:42] Didn't get the smalltalk between you and the bot, but that's a side issue: I'm thinking if the rtd docs are made automatically from comments in code, then you should intervene twith just editing your comments and get the docs for old and/or new version? [20:32:53] pipivoj, the code structure changed quite dramatically, regretfully. So, doing the multi-version thing manually would be intractable. [20:33:00] It's the automatical thing of the docs that is not working, right? [20:33:11] The nice thing that RTD does is that it will generate the docs at a particular time window. [20:33:15] Yeah. [20:33:46] How about editing the rst files? [20:34:14] Ah, no, they are made automatically, yeah. [20:34:26] The RST files aren't generated. I write those myself. [20:35:13] And they influence the docs in the rtd? [20:36:01] Yes [20:36:52] So changing the rst of the older version is not possible? The 0.7? [20:37:44] When I say influence, they contain much of the documentation? [20:38:31] it's strange that rtd needs to build them in order to make html files [20:38:56] Amir1, for pythonhosted, we build locally. [20:39:08] So, same as travis for CI [20:39:16] RTD is like continuous documentation [20:39:30] travis ci runs the code [20:39:42] but we just get the documentation [20:41:46] afk [20:42:03] Amir1, RTD will run the code too. [20:42:09] Sort of [22:02:51] Done with measurement study (at least the stuff for today). See https://meta.wikimedia.org/wiki/Research_talk:Measuring_edit_productivity/Work_log/2016-01-18 [22:39:24] * halfak runs tests on persian wikipedia feature extraction [23:19:26] OK. New model tuning is running on ores-compute. I should have reports for all of the models by tomorrow. [23:20:19] Here's the WIP for editquality: https://github.com/wiki-ai/editquality/pull/6 [23:20:39] I'll still need to remove a bunch of lines from the Makefile, so I tagged it a (WIP) [23:20:42] o/