[16:18:40] research showcase starting in 12 minutes! [16:30:54] showcase has started and the stream is live! [17:20:04] isaacj, what does it mean to say that only 44% of vandals can be caught before first reversion if the model has 78% accuracy on the first edit? Also, what are the precision and recall? [17:20:29] thanks halfak : will pass along [17:20:32] <3 [17:30:13] isaacj, if there's time, How did you formalize the page protection problem? Are you predicting whether or not a page will be protected using it's entire history? If so, this would include activity that occurred after the page protection event. Wouldn't that confound your outcome measure? [17:30:26] *its [17:32:14] Uh... That didn't answer the question at all. [17:32:22] Thanks though isaacj [17:32:42] (also, it's cool if you say the question is from me) [17:33:06] some more answers potentially in the paper: https://www.cs.umd.edu/~vs/pubs/KDD15-VEWS-Wikipedia-vandals.pdf [17:33:36] it has TPR etc. in Table 2 [17:38:51] "Will this page ever need protection" is not a very useful model. [17:39:00] "Does this page need protection right now" would be more useful. [17:40:04] yeah, i think i still have some confusion around this but am hoping to find the paper because it could potentially be a useful system [17:40:25] I'm also really bummed about the use of "accuracy". I [17:40:45] 'm honestly surprised a paper passed review where accuracy was used as the key measure of a classifier. [17:41:02] I can make a 95% accurate classifier by just predicting that nothing is vandalism/spam/etc. [17:41:41] * halfak digs for an old blog post [17:42:14] yeah, in this case it looks like their TPR and TNR are in line with each other but would agree it's misleading. i think they created a balanced dataset too so that presumably made accuracy more legitimate [17:42:18] i'm curious from your perspective halfak what features can a scalable model use for page protection? given that page protection detection would have to be pretty quick ideally and couldn't constantly be querying user/page history [17:42:53] I think the temporal features would be useful. [17:43:17] If we could dump ORES scores back into a feature score, you could rate an article by how hot it is getting -- using the damage detection model. [17:43:25] It would be nice if we have reverts in the event stream. [17:43:31] Those should all be do-able. [17:46:03] yeah and presumably you only need to ever look/cache 5-10 edits back to get a sense of whether a page might require protection [17:54:30] Right now, I'm extending ORES so that it can make predictions based on multiple edits [18:51:41] oh that's exciting! do you happen to have any pointers to papers or data around why focusing on a single edit/revision can be misleading? i like to gather evidence around taking these broader approaches to classification challenges (especially w/r/t needing to look not just at a single wiki but encouraging people to look at multiple edits is a start) [19:12:18] http://socio-technologist.blogspot.com/2016/01/notes-on-writing-wikipedia-vandalism.html [19:12:25] Finally found that old blog post. [19:13:05] I don't have data on why focusing on a single edit is misleading. But there's certainly more signal to be had. [19:22:41] isaacj, ^ [19:37:28] thanks for share -- that's a really useful review. i'd love to see more of this guidance around what metrics are actually useful to improve upon for classification tasks like vandalism detection. grand vision would be something like a meta page for each type of classification task involving Wikimedia data along with references to data, code, past work, and useful metrics [19:55:15] dsaez: is the arxiv lit review already up? Comms is asking for it and I just checked https://arxiv.org/search/?searchtype=author&query=S%C3%A1ez-Trumper%2C+D with no success [19:55:17] I wonder if we could get a publication out about patrolling practices and useful metrics for patrolling efficiency based on the work y'all have been doing reviewing lit. [19:55:20] isaacj, ^ [19:58:50] yeah, J-Mo has some thoughts about that but I think at a minimum, we could certainly combine your work and the more recent interviews/lit review to arrive at a set of best practices for anyone doing this work. it's from Jonathan's work that I'm thinking more around these cross-wiki classification challenges [20:00:59] +1 [20:44:49] leila, nop, for some reason it was put on hold, don't know why. [21:21:55] dsaez: ok. no worries.