[18:02:20] ToAruShiroiNeko: are you there? [18:04:41] yes [18:10:40] Oscar_ how can I help? [18:16:24] ToAruShiroiNeko: Yes! wanted to see how you classified articles within wikiprojects [18:16:30] we're interested in w:es, especially because most of the wikiprojects are roughly inactive [18:16:46] ah [18:17:00] so in a nutshell we look at the revisions when articles are promoted [18:17:16] using that we train machine learning classifiers [18:17:23] which we use to create a model [18:17:38] that can then be loaded to ores to check other articles as needed [18:20:46] Interesting...and what do you need to run a test? I can give a hand with translations [18:22:09] we have collaborated with a wikiproject, WP;Medicine IIRC [18:22:34] for this one translation isnt that vital. Hard part is parsing the talk pages to discover when the templates were added [18:23:07] we can run a wiki labels campaign to see if our predictions are correct or to train classifiers using community feedback [18:49:38] Sorry, had a call [18:50:01] ToAruShiroiNeko: Well, Let me know what i'm good at (I'm sysop, part of a chapter, iberocoop, sure we can deal with the community feedback) [18:52:14] indeed [18:52:17] we are here to help [18:52:36] first task is to get wiki labels campaign for edit uality going sinc ethat takes uite a bit of time [18:56:49] o/ hey guys [18:56:59] Re. standing up new article quality prediction models [18:57:07] Oscar_ & ToAruShiroiNeko [18:57:28] In w:en, we have been using the templates that WikiProjects add to talk pages with quality assessments [18:57:42] stub < start < c < b < ga < fa [18:57:47] yes [18:58:08] If there's something like that for eswiki and it is relatively consistently applied, we could probably train a model right away. [18:58:30] I have a framework that only requires that I put forth an extraction strategy that will work for that wiki. [18:58:39] Parsing templates and category links is no problem. [18:58:51] But I need to know what templates and category links to look for. [18:59:42] halfak: I know, in theory we use the same system but I doubt any wikiproject has the classification done, I follow two wikiproject, economics and Venezuela, neither use the system (that its available within the template), so it will probably be more that can start from scratch [19:02:23] Oscar_, we don't need that much. Really, we want to at at least 5000 observations of each class we want to predict. [19:03:17] Oscar_, could you link me to an example of an article that has a quality rating? [19:08:40] perhaps https://es.wikipedia.org/wiki/Julio_C%C3%A9sar ? [19:09:18] Julio Cesar has the highest rank, "AD" (FA in en) [19:09:46] Is there a template that is added to the article or talk page when it achieves this status? [19:09:52] And are there lesser statuses? [19:11:12] I see a "versiĆ³n=19235300" is referenced in a template on the talk page. Does that point to the version of the article that was assessed? [19:13:16] halfak: yes [19:13:33] Interesting. I can likely make use of that. [19:13:56] So, we could stand up a model that predicts whether or not the article should be consider for AD [19:14:10] If the probability is high, it is likely it will pass AD. [19:14:20] If the probability is low, it's probably not very high quality. [19:14:24] What do you think of that? [19:15:08] halfak: AB (good articles) are the next rank, like this one; https://es.wikipedia.org/wiki/Teor%C3%ADa_de_Olduvai [19:16:19] do you delete stubs? [19:16:24] halfak: sound great! [19:17:45] Oscar_ we can predict with limited training [19:17:55] it wouldnt be pefect but it would be advice for a willing wikiproject [19:18:01] ToAruShiroiNeko: no, but we don't have an "stub template/category" per se, it was deleted a few years ago [19:18:04] whom would assess maybe 1000 pages [19:18:05] we can then train on that [19:18:16] Oscar_ thats fine [19:18:26] there is a distinciton between stub and not somehow [19:18:40] we can look at histoic definitions too :) [19:18:47] stubs are kind of easy to predict [19:21:14] I think the length of the article is the best prediction [19:22:22] some (minus) kbs is a stub [19:22:25] it is always bad to rely on a single feature [19:22:36] it could very well be a not stub and instead vandalism :) [19:23:01] if it doesnt look like a stub... [19:24:07] ToAruShiroiNeko: well, we sure delete something between 5-30 kbs no matter how important is the topic [19:24:13] say a very short new page in chinese... [19:24:40] we just let AI decide what features are relevant [19:24:45] That's why you don't will find a stub with one paragraph in es:w [19:24:46] AI decides which one fits best [19:25:20] intuitively for stubs, length makes sense, but also the number of links probably is a good indicator [19:30:44] also at least one source/reference [19:31:17] ihmo a good example of a stub in es:w: https://es.wikipedia.org/wiki/Microcr%C3%A9dito [19:37:22] I may have dropped out for a moment. [19:37:34] Anything I missed? [19:40:49] halfak: not much, we were talking on how to predict stubs [19:41:05] [14:45:13] halfak: AB (good articles) are the next rank, like this one; https://es.wikipedia.org/wiki/Teor%C3%ADa_de_Olduvai [19:41:12] [15:01:22] ihmo a good example of a stub in es:w: https://es.wikipedia.org/wiki/Microcr%C3%A9dito [19:41:51] Looks like there's no label on the stub. [19:41:59] We could do the whole wikilabels campaign thing for that. [19:47:17] halfak: no, was deleted a few years back by community consensus [19:48:32] Gotcha. [19:48:49] But we use historic stuff, so that might still work for us.