[14:04:56] mgerlach: regarding the tf-idf for link anchor text challenge, I was told that the cirrus dumps are occasionally incorporated into Hive (example data in these dumps: https://en.wikipedia.org/wiki/Analytics?action=cirrusdump ) but it doesn't look to have the tokenized text like I thought it did. i was also pointed towards the CloudElastic endpoints as a potential way of gathering this data (# of articles where a given token appears), [14:04:57] but couldn't really figure out how to work with them: https://wikitech.wikimedia.org/wiki/CloudElastic [14:12:38] isaacj: thanks for following up (just following up on the thread in analytics-channel). I might go to search-office hours with this question [14:13:27] :thumbs up: i haven't done their office hours but have been using the product analytics office hours and it's great for this sort of thing