[02:53:02] <travis-ci>	 wiki-ai/revscoring#504 (master - aaeb421 : Amir Sarabadani): The build was broken. https://travis-ci.org/wiki-ai/revscoring/builds/105321444
[04:07:18] <halfak>	 o/
[04:07:35] <halfak>	 Just hopping online to check things out before I call it a night,. 
[04:07:54] <halfak>	 Hopefully, I have new models with more reasonable accuracy scores. 
[04:08:19] <halfak>	 Well, it looks like the process finished. 
[04:09:04] <halfak>	 wikidatawiki.reverted gets 89% with a balanced test set.  Not bad.  We really need to test this with a representative test set. 
[04:09:07] <halfak>	 ^ Amir1 
[04:09:16] <halfak>	 Re. my note about putting together a test set. 
[04:09:22] <halfak>	 For wikidata. 
[04:09:53] <halfak>	 I'm hoping that by filtering out bot edits and client edits, we can get a much larger proportion of damaging edits to try to catch. 
[04:10:08] <halfak>	 I wonder if there is a better way of catching client edits than comment regexes. 
[04:10:16] <halfak>	 Maybe a change tag?
[04:11:54] <halfak>	 PR-AUC is a finicky statistic.  I'm not sure it is telling us what we need to know -- or sklearn is computing it weirdly. 
[04:15:17] <halfak>	 Looks like it behaves weird with our goodfaith models too since the "true" class is common. 
[04:15:26] <halfak>	 We often get PR-AUC of nearly 1.0 
[04:16:52] <halfak>	 yeah... filter rate is weird too when we have "false" as the interesting class.  
[04:17:02] <halfak>	 We may want to flip that one and call it "badfaith"
[04:19:30] <halfak>	 OK.  These accuracies look good to me, so I'm going to push this up. 
[04:32:26] <Amir1>	 I just came back, I took a shower
[04:33:51] <Amir1>	 halfak: Can you send me list of bad scored edits so I put it somewhere and check it
[04:34:27] <Amir1>	 I highly doubt catching client would be possible another way
[04:34:46] <Amir1>	 I can ask wikidata team but I really really doubt that
[04:35:52] <Amir1>	 maybe we need to read some university text books or review articles regarding AI in skewed classes 
[04:36:27] <Amir1>	 I do some lit. review 
[04:45:20] <halfak>	 Amir1, can you look into how we could catch all client edits via comment matching? 
[04:45:31] <halfak>	 I want to see if we can get this golden test set together soon. 
[04:45:39] <halfak>	 I think it will form the basis of our stats for the paper. 
[04:45:54] <halfak>	 I'd like to load it into Wikilabels. 
[04:46:40] <halfak>	 I'm hoping to look into generating this dataset tomorrow to see if we get a reasonable revert-rate. 
[04:51:40] <halfak>	 If we get within an order of magnitude for enwiki, I think that would be good for running through wikilabels. 
[04:51:54] <halfak>	 Especially since I suspect that a lot of wikidata damage might not show up as a reverted edit. 
[04:55:36] <halfak>	 It looks like we already have a couple of the regexes spec'd out here: https://github.com/wiki-ai/editquality/blob/master/editquality/feature_lists/wikidatawiki.py#L33
[04:56:13] <halfak>	 If you don't have time, I could probably run a few queries tomorrow to try to get a sense for what variations exist. 
[04:59:46] <halfak>	 Actually, I think I can do this pretty easily.  
[05:00:08] <halfak>	 Assuming that all client comments start with "/* client", we should be able to run this query quickly. 
[05:00:16] * halfak hacks on that. 
[05:00:33] <halfak>	 http://quarry.wmflabs.org/query/7098
[05:09:10] <halfak>	 Huh.  looks like there's only two client actions.  sitelink-update (aka pagemove) and sitelink-delete (aka pagedelete). 
[05:09:17] <halfak>	 Cool.  That will make it easier. 
[05:09:45] <halfak>	 Amir1, do you think we should include mergeinto and mergefrom?  
[05:10:06] <Amir1>	 sure
[05:10:11] <halfak>	 they aren't client actions, but it seems like they should be considered different from regular edits. 
[05:10:25] <Amir1>	 regarding comment matching
[05:11:23] <Amir1>	 halfak: you we should include mergefrom and mergeinto
[05:11:34] <Amir1>	 these are pretty predictive features 
[05:12:00] <halfak>	 Could someone do mergefrom and mergeinto as vandalism?
[05:12:07] <halfak>	 And would we have any hope of catching that?
[05:12:28] <halfak>	 It seems like this is sort of like catching a item-page-deletion as vandalism. 
[05:12:40] <halfak>	 It could be, but it's not really an edit,. 
[05:13:02] <halfak>	 I'm thinking about what set of edits we want to use to evaluate our classifier. 
[05:13:06] <Amir1>	 hmm
[05:13:10] <Amir1>	 valid point
[05:13:14] <halfak>	 it seems like these might need to be excluded. 
[05:13:25] <halfak>	 It's much more clear to me that client edits should be excluded. 
[05:13:31] <halfak>	 merging, I'm not so sure. 
[05:14:22] <Amir1>	 in order to evaluate the classifier I think mergefrom and mergeinto should be excluded  
[05:15:00] <halfak>	 OK.  Cool.  So if there is downtime tomorrow, I'm going to write a database query to gather a random sample of edits that are (1) not bots, (2) not client edits and (3) not merges. 
[05:15:09] <halfak>	 And then run those through the revert detector. 
[05:15:13] <halfak>	 And see what we get. 
[05:15:32] <halfak>	 If the prop is not 0.1% or less, we can draw our test set from it. 
[05:16:16] <halfak>	 When it comes to learning our filter rate, we should keep in mind all of the "edits" that show up on the recent changes page that we are excluding. 
[05:16:28] <halfak>	 but there's no reason that we should spend the time to *label* them. 
[05:16:40] <Amir1>	 one thing: set aside every merges
[05:16:51] <Amir1>	 let's see if we can find any vandalism in it
[05:16:53] <halfak>	 Want to assess them separately?
[05:16:55] <halfak>	 Sure!
[05:17:24] <Amir1>	 sorry, at first I wanted to say "they should not be excluded"
[05:17:34] <halfak>	 Maybe I'll draw a complete sample and add some fields to flag the nature of each edit. 
[05:17:48] <halfak>	 That way, we can explore subsets ad-hoc before loading them into wikilabels. 
[05:26:11] <Amir1>	 halfak: please see this comment: https://phabricator.wikimedia.org/T123795#1970561
[05:26:24] <halfak>	 Oh yeah.  
[05:26:40] <halfak>	 So varchar, unlike char, uses a variable length storage format
[05:26:43] <Amir1>	 Right now I'm trying to estimate db size 
[05:26:47] <halfak>	 hence the "var"
[05:27:11] <Amir1>	 I see
[05:27:15] <halfak>	 There's a table here that gives you storage sizes: http://dev.mysql.com/doc/refman/5.7/en/char.html
[05:27:42] <halfak>	 It takes 1 byte + the number of chars you want to store
[05:27:51] <halfak>	 so 'a' takes two bytes
[05:28:00] <halfak>	 'ab' takes three. 
[05:28:28] <halfak>	 So, if I were to back-of-the-envelope this for English Wikipedia, I'd start with the number of rows in recentchanges. 
[05:28:55] <halfak>	 Which is currently ~8.1m
[05:28:57] <Amir1>	 'true' takes 5 bytes
[05:29:03] <halfak>	 Meh.  more like 8.2
[05:29:04] <halfak>	 Yeah. 
[05:29:16] <halfak>	 'true' takes a substantial amount of bytes. 
[05:29:29] <Amir1>	 I dig much deep and get result soon
[05:29:35] <Amir1>	 don't worry about that
[05:29:43] <halfak>	 But I have a proposal.  Let's have a per-model configuration where we specify what class probability we are interested in. 
[05:29:52] <halfak>	 'true' for damaging and 'false' for goodfaith
[05:29:54] <Amir1>	 but regarding cutting rows with "false" in them
[05:29:55] <Amir1>	 what do you think? Do you agree?
[05:29:58] <halfak>	 And then drop the class column. 
[05:30:19] <halfak>	 That will allow us to cut the "false" rows. 
[05:30:59] <Amir1>	 I don't think cutting ores_class column would be a good diea
[05:31:02] <Amir1>	 *idea
[05:31:14] <halfak>	 Why is that?
[05:31:22] <Amir1>	 because we will add wp10 model (and non-binary models) later
[05:31:25] <halfak>	 Essentially, we'd have that column in the config. 
[05:31:40] <Amir1>	 and it would be impossible to work without the class 
[05:31:46] <halfak>	 +1
[05:31:52] <Amir1>	 and changing the database schema at that point 
[05:31:54] <halfak>	 Then again, i'm not sure we should worry about that right now. 
[05:31:55] <Amir1>	 is much harder than now
[05:32:08] <halfak>	 Maybe we'll want a different table structure (or memcached or something) for wp10
[05:32:43] <Amir1>	 we can have another table for non-binary ones
[05:33:02] <halfak>	 Yeah.  And with wp10, we don't want that table to match the recentchanges table anyway :) 
[05:33:16] <Amir1>	 but biggest problem right now is that if we don't make a flexible database for wp10 now, it'll come to us later 
[05:33:55] <halfak>	 One bit of good news is that we need to build in full-table loading scripts anyway. 
[05:33:59] <Amir1>	 I don't know but maybe our database will grow and store data for more than one month
[05:34:08] <halfak>	 Since every update to ORES will require that we re-generate the ORES tables. 
[05:34:29] <halfak>	 So if we ever want to do a schema change, that won't be more expensive than updating a model. 
[05:34:31] <Amir1>	 (specially since user contribs should be supported and that comes from revs table)
[05:34:57] <halfak>	 yeah.  that's a tough one.  but on the other hang, we can rely on ORES' cache to a large extent. 
[05:35:06] <Amir1>	 no, the database is designed to store values for different version of models too
[05:35:42] <halfak>	 "ores_model"?
[05:36:06] <Amir1>	 for example you can have for rows for each of revision one is damaging 1.0.1 and other one damaging 1.0.2
[05:36:18] <halfak>	 We might have a bit of trouble sorting on that column. 
[05:36:35] <halfak>	 e.g. with ascii sort, 1.10.0 comes before 1.2.0
[05:36:38] <Amir1>	 yes, oresc_model is a foreign key to oresm_id 
[05:37:04] <Amir1>	 we do some thing else
[05:37:09] <halfak>	 can you link me to the schema SQL quick?
[05:37:20] * halfak should do a better job of reviewing this. 
[05:37:27] <Amir1>	 we set damaging 1.0.1 the current version
[05:37:31] <Amir1>	 yeah sure
[05:37:46] <halfak>	 I need to get better at working with gerrit. 
[05:37:53] <halfak>	 Or push people to use phab's review system 
[05:38:02] <Amir1>	 https://github.com/wikimedia/mediawiki-extensions-ORES/blob/master/sql/ores_model.sql
[05:38:30] <Amir1>	 note the "oresm_is_current" 
[05:38:36] <halfak>	 I see.  We'll use "is_current"
[05:38:56] <halfak>	 So, when we run the update script we flip that flag. 
[05:39:13] <Amir1>	 then we use it in our queries
[05:39:26] <Amir1>	 (using left join)
[05:39:28] <halfak>	 So we need to join against the ores_model table. 
[05:39:36] <halfak>	 That's going to be somewhat expensive. 
[05:39:48] <Amir1>	 it'll save us lots of database storage 
[05:39:48] <halfak>	 But we can probably work the indexes in a nice way. 
[05:39:51] <halfak>	 +1
[05:39:54] <Amir1>	 no, I asked
[05:40:09] * halfak is amazed that *storage* space is a serious concern. 
[05:40:18] <Amir1>	 Hoo told me that joining primary keys is neglible 
[05:40:19] <Amir1>	 *negligible 
[05:40:38] <Amir1>	 he actually suggested these changes 
[05:40:47] <halfak>	 Yeah....  Try joinin the recentchanges table to the user table ;) 
[05:41:02] <Amir1>	 one sec to give you a link
[05:41:06] <Amir1>	 https://phabricator.wikimedia.org/T124443
[05:41:06] <halfak>	 But really, i think that if we have a filter on time and model ID, that will work pretty well. 
[05:41:29] <halfak>	 Yeah.  It's not wrong.  
[05:41:34] <halfak>	 It'll probably work in practice. 
[05:42:18] * halfak tries to imagine the likely query plan and hits the sleep wall. 
[05:42:24] <Amir1>	 the discussion is here: https://gerrit.wikimedia.org/r/#/c/265944/
[05:42:24] <halfak>	 I should call it a night pretty soon. 
[05:43:29] <halfak>	 OK.  To summarize: we need to trim the number of rows and the size of columns in the ores_classification table and then come up with an estimate. 
[05:44:13] <Amir1>	 yes
[05:44:15] <halfak>	 We should be able to drop the "false" rows by setting a config variable about which "classes" we want to store and use. 
[05:44:15] <Amir1>	 you go and sleep, I'll give results by when you wake up
[05:44:39] <halfak>	 We're not sure if we want to change the class name varchar to a tinyint
[05:44:45] <halfak>	 OK
[05:44:48] * halfak go to sleep. 
[05:45:10] <halfak>	 Have a good day!
[05:45:35] <Amir1>	 I give you number for each of improvement separately, so we can judge which one is feasible   
[05:45:39] <Amir1>	 you too 
[05:45:41] <Amir1>	 o/
[05:45:49] <halfak>	 Sounds great. 
[05:45:50] <halfak>	 o/
[17:50:00] <Amir1>	 halfak: https://www.wikidata.org/wiki/Wikidata:ORES/List_of_features
[17:50:11] <Amir1>	 I just wrote that as part of the paper