[22:22:28] hello halfak. [22:22:39] o/ leila [22:23:11] question about article quality: do you have a model we can feed articles created in frwiki to your models and get article quality as an output? [22:23:24] and if so, do you know the precision and recall in French? [22:23:52] halfak, ^ [22:24:16] Regretfully, I don't have one in French right now, but I am ~1 step away from training one. [22:24:28] how long will that take, halfak? [22:24:33] I might just need to run a couple of commands and review the fitness measures [22:24:36] I'll kick it off right now. [22:24:59] It'll take a few hours to extract features and 15 minutes to build the model. [22:25:06] :D [22:25:13] so you have it, halfak. ;-) [22:25:21] * halfak goes to try the commands. [22:25:34] do you know what precision and recall we should expect? [22:28:34] so, if you have the model, we can use it to get a sense of how many quality articles were created in frwiki as a result of the experiment, and compare it with articles created outside of the test, halfak. [22:29:38] leila, precision and recall are weird in multiclass [22:29:58] Do you have a nice way to compute that? [22:30:05] what is the measure of accuracy, halfak? [22:30:13] I usually use one-vs-rest AUC and overall accuracy. [22:30:32] With enwiki we get 80-95 AUC depending on the class and ~60% accuracy. [22:31:07] got it. that should suffice, we just need to report what numbers the reader should expect to get a sense of how good the models and our claims are [22:31:07] :D [22:33:50] BTW, stats of all the models can be requested from ORES if you just emit a rev_id in the path. See http://ores.wmflabs.org/scores/enwiki/wp10/ [22:34:02] That was just deployed last week :) [22:34:14] nice, thanks! :-) [22:34:14] Well... Sunday [22:37:32] BTW leila, that accuracy measure is taken after balancing classes. [22:37:47] So it's much better on a random sample. [22:37:53] Also, I'm looking at frwiki right now. [22:38:03] It looks like we won't be able to balance as easily. [22:38:19] e.g. There's only 1.5k observations of the adq class [22:38:27] I'd like to have 5k obs. per class. [22:39:00] I'll generate the model with <= 5k obs per class now and give you the eval. [22:39:19] We'll want to talk about what the best way to present the results is. [22:39:36] E.g. it won't be fair to compare this model's stats to enwiki. [22:39:53] unless I lower each class to 1500 obs. :S [22:42:18] halfak: sure. let's talk after you have the model. I can explain to you more but we won't be comparing it with enwiki. The goal is to say something along this line if it's true: articles created in group a_1 are on average x% more probable to be in quality class y based on the quality class predictor (and then reference to your meta page). We can also mention that the accuracy and AUC for this specific model based on a 1.5K tes [22:42:47] Gotcha. [22:43:02] halfak: I'm going to try really hard to make it a coffee shop and continue working from there. I think I'm officially going crazy working alone at home. :D will come back online in an hour at the latest. :D [22:43:21] see you. [22:51:07] Oh no. I should have said that I need to leave soon too, so this is more of a tomorrow thing.