[00:00:13] halfak: neilpquinn put me up to this [00:00:21] * bearloga takes a step away from the bus [00:01:53] I deny everything! I don't even know who bearloga is. [00:01:58] * neilpquinn says from under the bus [00:05:37] Oh-rez is just me pronouncing the word "ores" with my North Carolina accent. That's my story and I'm sticking to it. [15:27:29] hey guillom. did I manage to delete the Research group meeting for everyone? [15:28:11] I'm not sure if /I/ did it, but somehow the event has disappeared and I remember two days ago I was confused with two parallel events being called Research group meeting in my calendar and I deleted one of them, guillom. ;) [15:28:51] leila, FWIW, I see no more RG meeting this week or next [15:29:00] hahaha, thanks halfak [15:29:08] not sure if I have done it, and no idea how I can figure it out. :D [15:29:22] let me email the internal list about it at least, halfak. thanks for letting me know. ;) [15:34:22] leila: I got a notification from Calendar that Tilman had deleted it two days ago. [15:35:03] I don't see it any more in any of the future weeks. [15:35:12] ow okay. then it wasn't me, guillom. phew! I'll respond on the list then. [15:36:37] Not sure if he deleted just this one, or accidentally deleted them all; ther'se been some confusion since we tried to move it to Fridays. [15:36:56] HaeB: fyi ^ :) [15:38:08] yeah, guillom. [15:40:00] i deleted the one that was on the calendar for yesterday morning at 9am (i'm certain it was that time, i was surprised too ;) [15:41:34] ah, i see there is an email thread,will followup there [15:52:04] HaeB: I confirm that I saw the weird one you're referring to as well [15:52:12] :P [18:49:39] halfak: https://wikiedu.org/blog/2016/09/16/visualizing-article-history-with-structural-completeness/ [18:49:54] \o/ COOOL [18:51:08] I need to fight my way through some Vega config and javascript hackery to add mouseover details. [18:51:20] and links to diffs. [18:51:21] "I’m calling this data “structural completeness”, because the scores are based on how well an article matches the typical structural features of a mature Wikipedia article. [...]" Very well said :) [18:52:32] thanks! [19:39:49] guillom et al.: is there somewhere I can get some statistics about stubs on Wikipedia (all languages or one specific language)? [19:41:01] leila, https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Statistics ? [19:41:43] Related: https://phabricator.wikimedia.org/T135684#2636857 [19:41:52] ow halfak. how many times have you saved me from spending hours more on something?! ;) [19:41:54] thank you! [19:42:01] :D! Rock on :) [19:42:36] so, we're thinking of doing some extension of the recommendation work with stubs. [19:42:51] basically stub recommendation with hand-holding. ;) [19:43:34] now I'm looking at some basic statistics to get a better sense of how serious the stub problem is. (btw, pointers are welcome, everyone) [19:44:00] Oooh. that second link is to a discussion about a new dataset of ORES-predicted Stubs [19:44:16] (And, well, other article quality levels too) [19:44:19] Might be handy. [19:44:25] yeah, actually I knew about that one, and we are planning to use it if we get there. [19:44:28] We're working with Discovery on keeping the dataset up to date. [19:44:31] Gotcha. [19:44:43] Hopefully, by the time you do, we'll have it integrated. [19:44:44] yeah, that's a valuable dataset for this kind of work [19:46:38] halfak: how do you predict what's stub? what are the features? [19:48:28] https://ores.wmflabs.org/v2/scores/enwiki/wp10/720133421/?features [19:48:52] We actually use a combination of the basic features listed there (scaling & division) [19:49:43] See https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/feature_lists/enwiki.py#L42 and https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/feature_lists/wikipedia.py for the real feature list that goes into the model. [20:01:20] halfak, thanks. /me is looking into features. [20:01:46] J-Mo: I must say I laughed out loud when I saw your last email about the research group meeting. :P [20:02:10] I think the first thing we do is to change the name of these meetings. Call one of them X, the other Y. [20:05:30] halfak: do you know what are the top 3 predictors in these feature sets? [20:05:46] * halfak checks on the feature importance vector [20:06:05] "Research staff meeting" and "Lab meeting"? [20:06:18] +1 [20:06:25] to guillom's proposal [20:06:49] halfak: Of course you'd +1, it's basically your proposal :p [20:06:53] leila it's kind of funny (in a sad way) that we have to do all this coordination because the Google Calendar UI is confusing and/or buggy. [20:06:55] :P [20:07:18] +1 to guillom's articulation of halfak's proposal [20:07:22] J-Mo: I think the problem here is that everyone can edit, and we are making mistakes in different levels [20:07:46] I have never had a bug like this with any of my other meetings, J-Mo [20:07:46] yep. just goes to show: never let everyone edit anything. Nothing good could possibly come from that!!! [20:07:54] hahaha! [20:08:07] I did NOT mean that, J-Mo! but you need revision history. :P [20:08:07] if only Google Calendar had a history and revert button. [20:08:13] yeah! [20:08:34] and me as an admin, with my revertin' finger constantly hovering over the button. [20:08:43] mwuahaha [20:08:47] :P [20:08:56] Snuggle for Google Calendar. [20:09:22] * guillom thinks this discussion shows that we all need for the weekend to arrive. [20:09:54] * leila nods [20:10:28] guillom: if you were on the east coast, it would be the weekend already! [20:10:58] Emufarmers: I am. how come nobody told me about that? ;) [20:11:05] leila, https://gist.github.com/halfak/d7193c72f4a1e41782bb9c805bcdf56e [20:11:26] * leila checks halfak's link [20:11:46] Fun story. I see some bugs in here! [20:11:49] Emufarmers: As a middle ground, I'm going to go wait for the weekend by reading work-related books in my patio, then. [20:11:58] Features are well defined, but sometimes, not what was intended! [20:12:19] so, halfak, "chars" is the number of characters? [20:13:18] That's right. [20:14:15] leila, http://pythonhosted.org/revscoring/revscoring.features.wikitext.html#module-revscoring.features.wikitext [20:16:33] cool cool. thanks halfak. and one last comment hopefully: do you know if there may be fundamental biases in the training sets for this model? What I'm thinking is this: it is easy to go with the general rule of thumb that if something is long, it's not a stub. but when I read about guidelines of how to assess what is a stub or not, there are discussions around how length is not an important factor, and that a short article about som [20:17:03] leila, what do you mean "bias" [20:17:09] It sounds like you are talking about "error" [20:18:45] sorry, what I mean is this: editors may be more likely to use the stub template when the article is short, independent of the context and subject of the article. If this ends up being true, some of the labels will not be correct. [20:19:27] leila, ahh. yes. I suspect the models will perpetuate the patterns of the labeling that editors do. [20:19:37] In effect, the model is trying to do what editors do [20:19:49] So if we see their labeling as biased, the model most certainly is as well. [20:20:01] got you. thanks, halfak. [20:26:43] * halfak fixes issues with features and starts rebuilding wikiclass models :) [20:27:08] We might get a tiny bit more fitness assuming those length controlling divisions are important :) [20:39:15] guillom: is it correct based on https://fr.wikipedia.org/wiki/Projet:%C3%89valuation/Statistiques that 80% of the articles in frwiki are stubs? (seems too high to me) [22:41:50] leila: Yes, that's what the page says. [22:43:03] Maybe the criteria are more stringent ?