[14:53:53] <leila>	 hey dsaez. did you see tizianop's email? it seems the results are not improved by the better dataset (which is surprising to me).
[14:54:50] <dsaez>	 leila: yep, I'm not sure why. I also don't think that AUC is a good metric
[14:55:20] <dsaez>	 I would like to see some examples
[14:55:35] <leila>	 examples of recommendations, dsaez?
[14:55:42] <dsaez>	 yep
[14:55:55] <leila>	 yeah. Tiziano is working on that one.
[14:57:18] <leila>	 dsaez: in the mean time: something we discussed yesterday that it's good if we look into at some point: let's represent each articles as a vector of sections. Find other articles that are closer to a given article (with a measure such as cosine similarity). If we cannot find such articles, basically the data is too sparse for factorization.
[14:58:05] <dsaez>	 ok, I'm now uploading the notebook of my previous experiments. I didn't want to push on that direction, but results looks better ...
[14:58:24] <leila>	 dsaez: better than what?
[14:58:37] <dsaez>	 than this results
[14:58:44] <leila>	 with what measure?
[15:00:31] * leila steps to a meeting with miriam
[15:00:33] <dsaez>	 prediction accuracy, on the task that i've defined, that is different. 
[15:00:41] <leila>	 yeah
[15:01:09] <dsaez>	 but, at least, if you see some examples, recommendations are reasonable 
[15:01:24] <leila>	 let's review them briefly in standup?
[15:02:44] <dsaez>	 sure
[15:11:48] <tizianop>	 dsaez, I did not have time to explore in details the results. The first thing I noticed is that the system tends to recommend the a small set of sections everywhere :/
[15:12:01] <dsaez>	 i see
[15:12:32] <dsaez>	 I'm still not sure that the task is correctly definied 
[15:13:15] <dsaez>	 I know that i've ask before, but can you refresh exactly what is in the matrix, and how we evaluate the results?
[15:13:53] <dsaez>	 (I'm still thinking that we are learning and evaluating on the wrong sets)
[15:16:09] <dsaez>	 tizianop, so, the matrix is categories vs sections, true?
[15:16:25] <tizianop>	 no, articles vs setions
[15:17:24] <tizianop>	 for categories vs. section we have to define how to represent the "rating"
[15:17:49] <tizianop>	 raw count, probability or something mixed
[15:18:04] <dsaez>	 ok  ok...
[15:18:14] <dsaez>	 I see
[15:19:05] <dsaez>	 so, the task will be given an article, see which are similar in terms of sections and try to predict what ... ?
[15:19:10] <tizianop>	 this is why I mentioned that maybe the problem is in using ALS where the rating is only 1 (or missing)
[15:20:36] <tizianop>	 given one article return the list of recommended sections
[15:22:11] <tizianop>	 the format of the dataset I shared is: article_title: [sorted top10 sections #1, #2...]
[15:26:40] <dsaez>	 ok, so AUC here will be ...?
[16:07:34] <dsaez>	 tizianop2, tizianop, my mistake:  7466 sections
[16:07:38] <dsaez>	 is that right?
[16:08:42] <tizianop2>	 yes, 7466 section and 34562 articles
[16:09:08] <dsaez>	 good, I was checking another dataset
[18:21:35] <leila>	 dsaez: did you and Tiziano figure out what was the source of discrepancy between the number of columns in your data and his?
[18:22:06] <dsaez>	 yep
[18:22:42] <leila>	 can you add one line after line 30 at https://etherpad.wikimedia.org/p/stubsExpansion what the problem was?
[18:24:41] <leila>	 thanks dsaez
[18:25:37] <dsaez>	 np
[23:35:31] <dsaez>	 anyone here working with the mwdb python library on the stats machines?