[00:00:07] so I generated new normalized ranks for articles viewed in the last 6 months and wikidata dumps from October 1. [00:00:26] bmansurov: the same guy with the fawiki article, I can test in es: https://es.wikipedia.org/api/rest_v1/data/recommendation/article/morelike/translation/Mahmud_Dowlatabad%C3%AD [00:00:43] bmansurov: yup. I'm looking into results. [00:00:52] oh perfect, both es and fa take results from enwiki now [00:00:58] which languages do you have the model in right now? [00:01:06] fa, es, and uz [00:01:19] fa and es take from en, uz takes from ru [00:01:31] niiiiice! very impressed by that choice of language. [00:01:59] it's because diego, you, and I can test ;) [00:02:25] yeah. what about en? :D [00:03:09] en is the biggest wiki, it should get the lowest priority imo [00:03:44] i just didn't train models for en, and doing it all over manually is a no-no, I'd rather work on creating a production pipeline instead. [00:04:05] yup [00:04:33] I pushed this in order to hit the Q2 goals ;) [00:04:33] bmansurov: though it's nice in the sense of balancing content there. cuz you'll be looking at other languages. [00:04:48] sure [00:04:50] * lzia is amazed at the power of goals and continues the quality checks [00:05:01] ;)) [00:06:40] bmansurov: have you documented the logic somewhere I can read? [00:06:59] lzia: model generation logic? [00:07:05] bmansurov: the results of https://es.wikipedia.org/api/rest_v1/data/recommendation/article/morelike/translation/Mahmud_Dowlatabad%C3%AD are very intriguing and I think it's important to get the logic when looking at them. [00:07:09] bmansurov: yup [00:07:23] yeah, it's in our 1:1 etherpad [00:07:31] we talked about it in one of our meetings [00:07:51] just FYI: the result are: the city he was born in and one of his book (Which makes sense), and Iranian philosopher way before him (which may make sense) and a killer (which I have no idea how it's related to him) [00:08:02] bmansurov: yup. pulling that up. [00:12:50] lzia: blame the morelike API ;) [00:13:05] bmansurov: actually I can't find it. ;) [00:13:06] we're getting similar articles from it, but using the algorithm ranking them [00:13:11] let me look for it [00:13:18] bmansurov: will you be putting the documentation somewhere? if yes, i can read it when you have it up. [00:13:57] lzia: yeah, I'll add documentation [00:14:22] bmansurov: ok. then I'll wait for that. no need to dig now. add the range to that , too. ;) [00:14:31] lzia: ok [00:14:32] I'll play mildly with it until doc is in. [00:14:48] lzia: ok, have fun [00:14:55] i'll talk to you tomorrow [00:14:56] uhu. I will [00:15:02] sounds good. have a good night. [00:15:18] thanks, you too [00:15:22] thanks. [06:54:30] lzia: re: data persistence - DBAs :) [17:30:31] lzia: o/ - today I have updated the task with more stuff to check, but it should be all, so please don't hate me :) If you want we can set up a meeting during the next days to check together to all that mess, I am available if you want [18:29:31] elukey: aa! got you re data persistence. [18:29:52] elukey: don't mention it. There is a loooooong way before we start hating you. ;) [18:30:07] elukey: it's on our team's radar and we plan to get back to you by Tuesday. [21:23:09] lzia: o/ I've added some documentation to https://github.com/wikimedia/research-article-recommender/blob/master/doc/algorithm.org Take a look while I work on documenting the morelike API. [22:11:55] * lzia reads the link [22:27:05] bmansurov: re normalized rank: this should be the rank of the article divided by the total article count in a language. In the description you have pageviews of the article instead of rank of the article. Can you double check this? [22:28:01] lzia: ok [22:29:24] lzia: I have it as pageviews [22:29:40] bmansurov: isn't this Eq. 1 in the paper? [22:30:37] lzia: you're right, i'll fix it. the resulting number will be different, but the order of results will be the same, no? [22:31:08] bmansurov: This needs a fix. dividing total pageviews for the article by total article count is not normalizing the pageview feature (if you divide by the total pageviews to that language, it will) but also it's not the feature described in Section 2.2 under features. [22:31:21] * lzia thinks [22:31:59] bmansurov: the resulting number will be different, and your predictions may actually end up being different, because now this feature may be picked up more or less by the algorithm. [22:33:41] lzia: OK, makes sense. And, logarithm stays the same, right? [22:33:48] log of page views? [22:34:09] bmansurov: yup. that one is the log of pageviews which you already have it right. [22:34:17] ok [22:38:15] bmansurov: actually, wait. I think I'm confused about that normalized pageview rank. let me think. [22:38:31] ok, i'll be back soon [22:44:53] back [23:39:57] bmansurov: I think we have a ton of patches already for Monday deploy?