[00:29:02] I'm off. Have a good one! [00:29:03] o/ [02:37:57] halfak: heya [02:40:11] So I'm looking at line 78 of scorer_model.py [02:40:50] should this be "return [feature.validate(value) for feature, value in zip(self.features, feature_values)]" ? [03:33:49] halfak: sorry had to change location [03:34:26] aetilley: I think he's gone for the night [03:34:47] oh [03:35:14] humor me, does a + by someone's name typically mean they are present? [03:35:31] or logged on anyway? [03:36:16] I've no idea [03:36:19] ok [03:36:20] I think it just means 'voiced' [03:36:22] and it varies... [03:36:26] depending on channel [03:36:38] aetilley: there isn't really a way to see if someone is present or not via IRC unfortunately [03:37:41] I see. [03:37:44] Well thank you. [07:49:32] (03CR) 10Awight: "ping" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247185 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [07:49:39] (03PS8) 10Awight: Flag reverted risk rows using the recentChangesFlag [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247185 (https://phabricator.wikimedia.org/T112856) [08:30:54] (03PS9) 10Awight: Flag reverted risk rows using the recentChangesFlag [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247185 (https://phabricator.wikimedia.org/T112856) [08:47:18] (03PS10) 10Awight: Flag reverted risk rows using the recentChangesFlag [extensions/ORES] - 10https://gerrit.wikimedia.org/r/247185 (https://phabricator.wikimedia.org/T112856) [08:47:21] (03PS1) 10Awight: Actually implement the ORES RC filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/256641 (https://phabricator.wikimedia.org/T112856) [08:57:31] (03PS2) 10Awight: Actually implement the ORES RC filter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/256641 (https://phabricator.wikimedia.org/T112856) [12:42:16] hey halfak ! [12:42:18] what is the current place for reporting bugs on revscoring? phabricator or github? [12:44:37] https://ores.wmflabs.org/scores/ptwiki/?models=goodfaith&revids=30256434 predicts "false", but has 54% probability for "true" [12:44:48] isn't the threshold = 50%? [13:51:09] Hey Helder. That's a characteristic of Platt scaling. [13:55:12] Named after Bob Platt, the founding Vice President of Wikimedia DC. [13:56:56] Really? [13:57:19] "The method was invented by John Platt in the context of support vector machines,[1] replacing an earlier method by Vapnik, but can be applied to other classification models." [13:57:20] See http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities [13:57:24] Helder, ^ [14:19:27] halfak, how much effort has been made to classify edits as "improvement," "major contribution," etc.? [14:19:44] harej, none that I know of. [14:19:49] BUT [14:20:05] We're working on an edit type campaign right now that could easily include this type of question. [14:20:16] https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types [14:20:21] See the talk page for updates. [14:20:27] We're just finishing the classification form. [14:22:06] As part of making the WikiProject Directory more useful, I'd like to discriminate between types of edits. [14:22:33] WikiProject Directory is fed a lot of noise. AWB-edit 100,000 pages and suddenly you're a Super Active Guy in 30 different subjects. Makes no sense except to a mathematical formula. [14:22:48] Also, pie in the sky, but I'd like to be able to identify new sections posted to talk pages. [14:23:15] Right now we rely on the edit summary saying "new section" (an edit summary generated by the software), but people who don't want us to have nice things will instead edit an existing section and append their post to the bottom. [14:25:29] harej, this sounds tractible. [14:25:38] I'm working with a edit_productivity dataset right now. [14:25:40] :) [14:26:05] See https://meta.wikimedia.org/wiki/Research:Measuring_edit_productivity [14:26:17] Some recent analysis https://meta.wikimedia.org/wiki/Research_talk:Measuring_edit_productivity/Work_log/2015-12-02 [14:26:39] And how effective do you suppose it will be at differentiating between autowikiheads and Ms. Actually Interested in the Subject? [14:26:43] Contribute about 20% of the content that *sticks* in Wikipedia. [14:26:48] ^ Anons [14:27:02] harej, not sure. Would be fun to check. [14:29:16] anyways, to the extent you are Design™ing stuff for WikiProjects, Isarra should be able to help. [14:29:35] Yeah. Did you see the task fly by? [14:29:54] Ahh yes. I see your comment :) [17:37:09] (03CR) 10He7d3r: Actually implement the ORES RC filter (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/256641 (https://phabricator.wikimedia.org/T112856) (owner: 10Awight) [22:30:19] halfak: You around? [22:32:11] Hey! Yeah. [22:32:14] What's up? [22:34:55] Trying to do basic imports of revscoring models in ipython and I'm having a hard time. [22:35:05] For instance.... [22:35:12] oooo, another ipython user :) [22:35:16] * yuvipanda files info away for future reference [22:35:18] :D [22:35:39] Well I want to be able to play with revscoring functions inside an interpreter [22:35:46] and I'm having a hell of a time. [22:35:56] halfak: I could install revscoring into paws [22:36:04] aetilley: have I shown you paws? https://tools.wmflabs.org/paws [22:36:23] nope. lemme check it out. [22:36:48] aetilley, if you show me what error, then I should be able to help. [22:37:17] I just want to import feature.diff [22:37:23] halfak: if I do 'pip install revscoring' - is that useful by itself without needing model files? [22:37:40] yuvipanda, yes [22:37:50] So first let me try this from the inner revscoring directory [22:38:08] ok! [22:38:09] * yuvipanda does [22:38:10] aetilley, you should be in the main directory that contains "revscoring", and "doc" [22:38:29] ok [22:38:32] yuvipanda, will be helpful if we have some enchant dictionaries installed. [22:38:42] halfak: hmm true [22:39:00] import revscoring.feature [22:39:03] results in [22:39:07] oh dear, I forgot pip install revscoring does numpy stuff. [22:39:27] no module by that name [22:39:54] "revscoring" is not a package [22:40:34] This is from the main directory that you mentioned. [22:40:53] Can you give me a pastebin [22:40:54] ? [22:41:06] oh! "revscoring.features" [22:41:08] plural [22:43:01] intesting [22:43:13] But for some reason it wasn't autocompleting [22:43:28] (either plural or singular) [22:43:40] Thank you. [22:44:08] This week has been painful. [22:44:59] halfak: Did you get my irc message from last night? [22:45:00] :( Why? What happened? [22:45:31] I didn't. But I see it in the scrollback now. [22:45:57] ok, maybe a bit of an exageration. I'm tryin to give you new features while simultaneously trying to understand how the revscoring code works. [22:46:18] I have some sudocode for adding features, but I don't know how it's going ot be implimented. [22:46:42] Basically you currently have 46 features and if we did it my way we would have 46 + 50,000 features [22:47:03] * halfak whistles. [22:47:10] So yeah, it looks like you found a bug! [22:47:11] Unless we can somehow encapsulate all that into a handful of features. [22:47:20] I guess we aren't running that line of code anywhere. [22:47:24] Well at least I'm not totally useless. [22:47:40] aetilley, a neural network should be good for compressing features. [22:47:51] I'm not sure how to apply it. [22:47:56] We might also consider TFiDF [22:48:53] Well even with TFiDF isn't the idea to have a feature for each word in the corpus? [22:48:59] in *a* corpus [22:49:24] aetilley, we'd use TFiDF to learn weights and then drop every word below a certain weight [22:49:57] The would remove all words with low information values. [22:51:43] So what would the features be in that case? [22:52:11] Do we have some global corpus? [22:53:04] (Some large body of words that we focus on?) [22:53:31] aetilley, I suppose that if we can get a large set of labeled data (e.g. in the 'revert' case) then we can TFiDF like Amir has been doing. [22:56:14] You mean in his Kian (which is for Wikidata I believe) [22:57:38] Nope. In the BWDS system that generates the proposed badwords lists. [22:57:40] * halfak gets example [22:57:55] https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/it [22:58:28] "Generated list" contains the words that are most common to reverted edits, but not non-reverted edits. [22:59:05] * yuvipanda created https://phabricator.wikimedia.org/T120317?workflow=create for adding revscoring to PAWS [23:03:15] I didn't know this was already being worked on. [23:05:30] Wouldn't this fall under the scope of my phab card? [23:05:39] No matter I guess. [23:06:36] Probably yes. We've been manually bad-of-wordsing like this for a while. [23:06:47] But all the words get lumped together into a single feature. [23:07:10] So we might be able to take advantage of a similar strategy to do better. [23:08:19] Ok, so let me see if I understand: [23:08:43] You want an approach that is More fine grained than simply "number/ratio-bw-added" [23:08:57] but Less fine grained than a full bag of words. [23:09:25] You want to find a way of encoding the information in a full bag of words into just a handful of features. [23:09:50] aetilley, not totally sure. Was just thinking of a way to keep the feature list manageable. We might not *need* to keep the feature list manageable. [23:10:39] Currently I have something like the following [23:12:47] http://pastebin.com/Qn5Lts88 [23:13:41] I wrote "revision.text" because I don't know how to actually access the raw text of a revision' [23:14:28] The method in feature.diff of being able to refer to the current revision by "revision" caught me off guard, but I'm sure I'll understand it with a little more time. [23:15:14] you probably want revscoring.languages.english.revision.words [23:15:49] or revscoring.languages.english.revision.content_words [23:16:38] Looks solid to me otherwise. [23:17:06] I'd like to count word frequencies for other reasons anyway :) [23:17:46] See https://github.com/wiki-ai/revscoring/issues/213 [23:20:39] We can use word frequencies to get basic information theoretic measures together too. [23:20:52] E.g. did this edit add words that don't exist elsewhere in the article. [23:21:38] So... I have a problem with the model tuning script. [23:22:16] Sometimes a model training takes for ever -- maybe even will never finish. Yet, because sklearn farms this out to an Clib, we can't kill it with a timeout signal. [23:22:24] Arg! [23:23:06] Works great for Random forest, but not SVC. [23:23:10] SVC is where we have the problem. [23:23:14] Hm [23:24:28] Where do you suspect the problem lies? [23:24:56] Could be the clib. I'm digging through search results, but it seems that most of the time when people complain about this, they are working with 500k obs. [23:25:02] We only have ~20k [23:26:17] Have you tried running sklearn svm on the data outside of the revscoring context? [23:27:04] Yeah. That's where the model tuner operates. [23:27:06] to determine whether the alg is just slow on the data or if the problem comes from our code? [23:27:14] Just easier to not have to wrap up the estimator in a scorer model. [23:27:54] Right now, we're on the 46th hour of training :/ [23:28:06] Looks like "linear" SVCs suffer the most. [23:28:10] that's very surprising. [23:28:16] I don't really have an intuition for the time taken by such things. [23:28:42] Oh! Looks like it could be a convergence problem. [23:28:47] So you've tried other Kernels? [23:28:48] We can set a max iteration param. [23:28:54] Yeah. Mostly just RBF. [23:29:04] h [23:29:05] ah [23:29:08] I haven't gotten the send that sigmoid is worthwhile. [23:29:15] But we could try ti. [23:29:50] Looks like the linear model may work better when we scale. [23:29:55] Boo. [23:30:35] So what does sigmoid mean in the context of SVM? I only know sigmoids to be used in regression/NNs [23:30:51] Is there a sigmoid-like kernel? [23:31:15] https://gfycat.com/UnitedImpureAlaskajingle [23:31:25] aetilley, apparently. [23:31:40] See http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html [23:32:19] Apparently indeed. [23:33:51] Correct me if I'm wrong, but long training times aren't quite so bad, as long as the trained classifier can score quickly? [23:34:10] I mean, obviously an infinite training time is bad... [23:34:38] aetilley, well, they are bad when we are trying to optimize our hyperparameters because then the whole param tests is hung on a couple models. [23:34:48] And 2 days is way too much time. [23:35:21] The funny thing is that this is happening everywhere I turn and it's happening with the "linear" kernel -- which the internet thinks is the best for *not* taking forever. [23:36:06] Does the internet give an estimate for LinearSVM on ~20k * 46 datasets? [23:36:31] "Quickest" is relative afterall [23:37:02] aetilley, sure, but I'm running the other kernels on the same data for comparison. [23:42:11] So just to go back to NLP for a second.... [23:42:38] Would it be crazy to take the vector at the end of the pseudocode I just pasted you..... [23:42:58] Plug it into it's Own classifier..... [23:43:13] and use the output of that as a feature_value to the main classifier? [23:43:58] We might lose a lot of information. Hm. [23:44:49] Maybe I'm grasping at straws. Brb. [23:46:59] Not totally crazy. [23:47:01] :) [23:48:55] So to what extent is Amir's TFiDF thing that you sent me currently being incorporated into revscoring? [23:49:30] Or is it currently just a list of indicative words? [23:49:32] aetilley, no integration. He does independent runs to get the high weight words and then we clean up the lists using human curation. [23:49:47] I then take those words and incorporate them into our language utilities. [23:51:00] Is this where the Badwords regex comes from? [23:51:06] Yes [23:51:10] AH [23:51:46] Ok, so the situation isn't quite as crude as I assumed. [23:52:11] I thought those were just hand-picked by speakers of the language [23:53:22] So... [23:53:31] What would you say our highest priority is right now? [23:54:13] See my last post here: https://phabricator.wikimedia.org/T120138 [23:54:27] TL;DR: "explore new strategies for improving our signal in other ways." [23:54:44] "other" than user.is_anon and user.age [23:57:05] So, I think that looking at bag-of-words is a good idea. [23:57:17] We could tap ellery too as I think he'll have some insights that could direct us. [23:58:21] Hyperparameter optimization is another way we can get some more fitness. [23:58:32] So that's what I have been digging into over the last few days. [23:58:41] Well... I suppose it's been a week. :/ [23:58:58] If I could get this SVC model to *finish* or kill itself if it runs too long, that would be great :)