[14:00:22] hall1467, to-do for me is to try to get you a dataset from hive that contains the ([project, page_id] --> view_rate) [14:00:31] I'm going to try to get that started today. [14:01:21] Okay, anything I can do to facilitate the process? [14:02:18] halfak: I'm working in a semi-related area right now in needing a entity->pageviews mapping over a ~6 month period [14:02:44] * halfak is in next meeting. [14:02:54] hall1467, can you find out how much overlap we have with schana? [14:02:58] I'm currently just using the last month from the dumps [14:02:59] https://github.com/schana/wikimedia-utils/blob/master/get_sitelink_pageviews.py [14:03:28] but I'm trying to integrate it into a spark job [14:03:37] https://github.com/schana/recommendation-translation [14:11:20] schana: halfak and I are also interested in entity-pageviews. Perhaps we're doing something similar? Maybe we could touch base? [14:13:05] sure, hall1467. I think the difference is I'm interested in the entity views per sitelink, instead of a full aggregation [14:29:30] schana: Okay, are you looking at how often a given entity's sitelinks are viewed on client wikis? Is that another way to put it? [15:42:32] yes, hall1467 [15:43:22] specifically, section 2.2 of https://arxiv.org/pdf/1604.03235v1.pdf [15:44:00] as described in the "Page views" feature [16:45:46] schana: Okay, thanks for clarifying! [16:56:45] 10Quarry: Explain command forces Quarry to keep running endlessly - https://phabricator.wikimedia.org/T155808#3324705 (10yuvipanda) a:05yuvipanda>03None [18:59:12] 10Quarry: Explain command forces Quarry to keep running endlessly - https://phabricator.wikimedia.org/T155808#3326586 (10Soni) Just wanted to point out that 4 months later, the queries are still running. [19:26:06] hmmm… why did I assume that the pageview API counted redirect views? [19:33:42] Nettrom, because that would be totally useful ;) [19:34:01] The hive tables apparently have the right page_id [19:34:07] Oh I should check that :/ [19:53:05] Nettrom, halfak: see also https://phabricator.wikimedia.org/T121912 [20:01:34] 10Quarry: Explain command forces Quarry to keep running endlessly - https://phabricator.wikimedia.org/T155808#3330162 (10Nemo_bis) I think what the interface says isn't necessarily true. [20:33:51] HaeB: thanks, came across that task today while working on our view rate task (https://phabricator.wikimedia.org/T162933 btw) [20:38:19] Nettrom: cool - i can see other uses for that too, in particular i started thinking a while ago about building a little gadget that shows view rates next to each page in the recent changes and watchlist pages [20:38:49] because as a patroller i would often like to be able to focus attention on high impact diffs [20:40:35] i talked to amir at the wikimania hackathon last year about possibly reusing the ORES coloring code for those two pages, but didnt get to impement it further (and indeed the lack of a good endpoint was part of the impediments too) [20:41:08] Nettrom: do you have plans to make importance scores available to patrollers in this way? [20:49:18] HaeB: that use case was not on my list, but I can always add it [20:50:00] Nettrom: it would definitely scratch an itch of mine ;) (as patroller) [20:50:15] started https://meta.wikimedia.org/wiki/Research:Automated_classification_of_article_importance/Importance_API_draft today, so I’ll add it there (pinging halfak too, since he wants in on this) [20:54:20] Nice. I'd like to start proposing an API [20:54:30] *schema of request and response [20:54:33] I'll get to that soon. [20:54:45] Nettrom, can you make a task for completing this proposal? [20:54:55] sure, I’ll create that [21:54:51] 2015-me is in stunned disbelief [21:56:27] harej, por que [21:56:33] also where did you get the time machine?