[13:13:45] hey halfak ! I've been analyzing a few "last edits" of articles, sorted by their scores, and marking them as true/false positives: [13:13:48] https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Projetos/AntiVandalismo/Edi%C3%A7%C3%B5es_possivelmente_prejudiciais [13:13:51] just mentioning in case that is useful for something else [13:34:34] o/ halfak [13:34:41] I just sent you the results [14:43:27] o/ ToAruShiroiNeko [14:43:32] working on the progress reports? [14:44:01] This link will be handy for reporting last week's progress: https://www.wikidata.org/wiki/Wikidata:Third_Birthday/Presents/ORES [14:44:43] hello [14:44:45] yes [14:45:12] Also, I'm just updating the Done list for the language stuff I did on Saturday. :) [14:45:14] I have added a calendar event to hammer it in my head not to forget them [14:45:41] the sad thing is I keep reminding it to myself the ENTIRE week. :p [14:46:03] ha! [14:46:11] I shall utilize that link nicely :) [14:46:14] Alarms and human psychology are a weird combination [14:46:20] yeah [14:46:33] am I the master of google calendar or is it the master of me? [14:50:39] Amir1, you around? [14:50:52] hey, hey [14:50:54] yeah [14:50:56] :) [14:51:12] So, I worked with aetilley a bit on Saturday. [14:51:20] I have to go in a few minutes though [14:51:29] great [14:51:33] He finally tried scaling the input data and was able to get some reasonable cluster sizes :D [14:51:43] So, I think we're good to go re. clustering now. [14:51:46] :) [14:52:15] great [14:52:20] I did feature scaling too [14:52:28] that's why I had reasonable sizes [14:52:28] +1 Seems essential. [14:52:55] Anyway, I won't keep you. Thanks for getting us the code. Hopefully we'll learn some fun bits from the clusters soon. [14:53:18] I also want to try some manual experimentation. [14:53:27] great [14:53:31] E.g. what happens to AUC when we remove user.is_anon [14:53:43] Tell me whereever or whenever you think I can help [14:53:48] And what is the AUC of our current classifier when tested *only* on anon edits. [14:53:52] Will do. [14:54:02] this projects sounds super exciting to me [14:54:06] I think we'll want to talk through a testing automation strategy [14:54:25] It'll take too long to do manually, so we'll want to set up a routine and let it run over night. [14:55:22] hmm [14:55:28] let me think about it [14:55:42] I think a card in phabricator would be great [14:56:06] I'll make one. [15:00:39] https://phabricator.wikimedia.org/T117425 [15:06:47] halfak when you have the time, do you think you can briefly walk me through the random forest implementaiton? [15:07:27] ToAruShiroiNeko, not sure what you are looking for. We have a pretty minimal wrapper around sklearn's implementation. [15:07:44] But sure, of course :) [15:07:51] https://github.com/wiki-ai/revscoring/blob/master/revscoring/scorer_models/rf.py [15:09:26] yes [15:09:28] http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier [15:11:00] I remember you mentioning it was difficult to gather training data for article assessment [15:11:22] Oh yes. Let me show you how we do that. [15:11:28] I want to add ECOC to do what random forest does to my treadmill of things to do [15:11:41] unless I push myself it is clear to me I will never achieve that [15:12:05] Well, we have training sets you can play with. [15:12:23] That's probably the best place to start if you want to explore ECOC on real data. [15:12:33] yeah I suppose [15:13:31] See ores-compute.labs:/home/halfak/projects/wikiclass/datasets/enwiki.features_wp10.30k.tsv [15:13:50] That's a balanced dataset of 5k observations per class [15:13:56] The features are already extracted. [15:14:16] See https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/feature_lists/enwiki.py [15:14:19] For the feature list. [15:27:58] trying to figure out git again [15:28:45] really tired of deleting my work [15:30:01] all I want to have was have a sustainable fork that just works [15:33:45] ToAruShiroiNeko, there are some good guides that take you through common issues. I'd recommend digging into those. [15:34:01] I suppose that's what you are doing with "figure out git" [15:34:51] its always unworkable, its broken beyond repair despite wishing merely to take whatever server has -_- [15:35:01] I already deleted the fork I have for I think the 5th time [15:39:45] right, random forest [15:41:59] ToAruShiroiNeko, gotta learn about the states you can get in with git and how to get our of 'em. [15:43:00] ores-compute.labs:/home/halfak/projects/wikiclass/datasets/enwiki.features_wp10.30k.tsv gives me a permissions error [15:44:19] You can't execute it [15:44:25] Access it with `less` or something [15:44:41] I just tried copying it [15:44:46] Don [15:44:53] t tell me "some permission error" [15:44:59] Tell me the command and the actual error. [15:45:23] eva@ores-compute:~/github/data$ cp /home/halfak/projects/wikiclass/datasets/enwiki.features_wp10.30k.tsv . [15:45:23] cp: cannot stat ‘/home/halfak/projects/wikiclass/datasets/enwiki.features_wp10.30k.tsv’: Permission denied [15:45:58] Try again [15:46:13] wow [15:46:15] magic :) [15:46:16] Looks like there was a folder in the path without read access. [16:00:25] okay [16:00:44] now I need to figure out they syntax for random forest, to the documentation! [16:31:03] Helder, just got a chance to look at your work-lists. Are the unlabeled edits all determined to be damaging? [16:49:42] is there a place to report false positives with the scoring? https://en.wikipedia.org/w/index.php?title=User_talk:Harej&curid=1213293&diff=688706917&oldid=688519670 is 97% reverted [16:52:59] halfak, yep! I used quarry to get the most recent edits of 60000 articles, then I filtered it, removing any edit with a damaging score smaller than 70% [16:53:43] Thanks Helder. [16:53:51] legoktm, right now, we don't have an official place. [16:54:11] We can't re-train the algorithm with false-positives, but we can learn from them qualitatively. [16:54:25] If someone were to set up a place to record false positives, that would be great. [16:54:26] legoktm, I don't think so, although another user from ptwiki was reporting a few at [16:54:27] https://meta.wikimedia.org/wiki/Research_talk:Revision_scoring_as_a_service#Misclassifications [16:54:27] * halfak winks [16:54:34] Looks good to me :) [16:55:04] heh [16:55:09] ok, I'll drop a link there [16:56:26] legoktm, and in the process of reviewing the edits I mentioned to halfak, I already found ~60 false positives [16:57:08] o.O [16:57:52] *I --> we [17:02:34] Helder, am I to conclude that in 60k observations, you only found 60 false positives? [17:02:59] nope... we only reviwed a few (I would say something < 200) [17:03:08] Oh yeah! Of course. [17:03:15] Most would not be predicted "positive" [17:03:41] each edit in the list was predicted as positive [17:05:01] we have even a few whose damaging score is 100% [17:05:53] Helder, do you notice any trends? [17:05:59] E.g. big content dumps that get flagged [17:06:04] Or maybe anons/newcomers? [17:07:57] one of them is a bot edit, another one is a sysop edit [17:08:03] https://pt.wikipedia.org/w/index.php?diff=42330892 [17:08:07] https://pt.wikipedia.org/w/index.php?diff=36323174 [17:08:10] removing vandalism [17:08:34] Interesting. So we catch vandalism removal as damaging. I wonder why that is. [17:08:48] We might have a feature that behaves weirdly. [17:09:21] the size of the edit delta for both edits is high [17:09:25] (and negative) [17:09:44] -48165 for one, and -32165 for the other edit [17:12:28] That could be it. [17:12:34] That's not a bad mistake to make then. [17:12:50] Even reverts should be reviewed if they remove a massive amount of content. [17:13:00] We could build some features for the edit comment though. [17:13:16] E.g. matching "rvv" "vandalism", "Rollback" or "Undo". [17:13:46] this one is odd: there are other subtle edits [17:14:00] https://pt.wikipedia.org/w/index.php?diff=36174773 [17:14:27] and this one is obviously reverting vandalism: https://pt.wikipedia.org/w/index.php?diff=36345167 [17:14:43] hackerhackerhackerhackerhackerhackerhackerhackerhackerhackerhac... [17:16:49] We should measure badwords in removed content. [17:16:53] We need features for that. [17:17:17] Hmm... We do have feature for that. [17:17:26] Oh! We don't have one for longest_token_removed. [17:17:31] Just longest_token_added. [17:17:51] We should really have a feature for "looks like a revert". [17:17:58] yep [17:21:41] halfak, why not just checking if it is indeed a revert? [17:21:51] ye, does it match the hash of some older revision? [17:22:20] Helder, we could, but it would be very time intensive to check for reverted status. [17:22:20] and using that as a boolean feature [17:22:29] It would double the feature extraction time at least. [17:22:53] maybe just check for the simplest case of revert (which is a revert to the previous revision)? [17:23:06] instead of searching the whole history of the page? [17:27:18] Helder, could do, but we'll have to make an additional request. Could look into the performance considerations. [19:46:11] halfak, BTW: this is not a dupe: https://meta.wikimedia.org/wiki/User:EpochFail/global.js?diff=14360633 [19:46:40] Woops! You're right. [19:47:18] Was showing someone how to install the gadget and made the mistake while I was distracted. :S [19:47:18] I was planning to merge ScoredCategories into ScoredRevisions but I never got the time to do that.. =/ [19:47:44] Speaking of getting time for things, have you considered attending the MW Dev. Summit? [19:47:46] and I'm not sure if always showing a table on top of every category the user access is the best approach... [19:48:13] I didn't think about that. When is it? [19:48:22] (this year?) [19:48:26] Early January [19:48:30] So just barely next year [19:48:44] what happens there? [19:55:08] A lot of discussion around architectural issues. [19:55:37] E.g. I want to talk to people about how they want to use ORES and what kind of tools they could build. [19:55:45] But in part it's also a hackathon. [19:55:58] Don't tell Quim I said that ;) [19:58:32] halfak: before I go... [19:58:37] halfak: did you hear back from the security people [19:58:52] Yes! :) I plan to go through their notes today. [19:58:55] Nothing major. [19:59:09] The primary thing they want is a way to specify your set of certs for mwapi [19:59:16] And thus our APIExtractor that uses mwapi. [19:59:34] They don't want to just rely on the defaults for `requests` [19:59:35] ah, like, SSL? [19:59:37] ah [19:59:37] Yeah [19:59:39] ok [19:59:41] that's fair [19:59:49] awesome :) [20:00:03] * halfak suspects that we won points for clean and well documented code. [20:00:17] halfak: yeah I wouldn't be surprised! [20:00:30] They left that as a comment on all the libs. [20:00:36] :D [20:00:37] nice! [20:00:41] where is this? can I see it? [20:00:51] It's on the task. [20:00:57] * halfak has too many windows [20:01:09] aaah ok [20:01:58] https://phabricator.wikimedia.org/T110072 [20:02:48] "Why does yamlconf have the import_module function? " [20:03:00] That's a good question. And I have a meh. answer. [20:03:05] :D [20:04:29] * Helder recomends https://addons.mozilla.org/en-US/firefox/addon/max-tabs/ to halfak [20:06:15] Oh no. I do not have a too-many tabs problem. [20:06:26] halfak: nice! (in general [20:06:28] ) [20:06:30] I'll take a more closer look later [20:06:32] * YuviPanda disappears [20:06:32] Quite the opposite. I just refuse to open a new tab until i close an old one. [20:08:26] so, is it just with windows? [20:10:24] I guess what I meant to say was that my mental stack was full and if I were to add something else to it, I'd forget what I was in the middle of. [21:40:47] o/ TarLocesilion [21:41:58] Any chance I can get you to help us build a badword/informal word list for plwiki? [21:42:00] See https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/pl [21:42:04] hi, halfak [21:42:14] yep, I'm just on wikibreak. [21:42:14] We've got an auto-generated list and we need a native speaker to filter it. [21:42:28] Oh no worries then. Enjoy your break! :) [21:43:35] it ends soon and I'll definitely write to you ;) [21:43:40] thx [21:43:44] OK cool. :) [22:31:53] Hi Amir [22:32:23] I noticed you got two clusters that don't sum to the total size of the group. [22:33:39] You count 87 1s and 715 0s but there are 19863 samples in data2.tsv [22:38:46] Indeed. I assumed it was a subset [22:38:58] * halfak waits for Amir1 to comment [23:55:41] * halfak tries desperately to make celery use it's queue for testing [23:57:54] FFS, the default queue is called "celery" [23:57:56] ARG!