[16:45:00] Hi Hi! who knows about clickstream things? :D [16:45:06] halfak: ? :D [16:45:23] Hey! What do you need to know? [16:46:05] Well, I was just poking it a bit with pandas data frame and what not, but then didnt get any data and now I wonder if it is simply because the clickstreams data only exists for main namespace pages? [16:47:14] I was essentially doing df_botpages = df[df['prev'] == 'Wikipedia:Bots'] [16:47:31] But then I quickly did "cat clickstream-enwiki-2018-05.tsv |grep Wikipedia:Bots" and got nothing.... :P [16:49:40] Interesting. I'm not sure if non-main pages were excluded. But I know that pages that get low amounts of traffic are excluded for privacy reasons. [16:50:06] aaaaah [16:50:11] https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Data_Preparation [16:50:14] maybe I should just go make this query in hadoop directly then [16:50:15] Aha! Mainspace only [16:50:27] "an article in the main namespace -> the article title" [16:50:31] Aaah yes, I failed to spot that so far [16:50:33] thankkkks! :) [16:50:37] no problem! [16:53:47] miriam: whenever you get a chance and in case you haven't done it, yet, please update the EN:VP thread that you have turned off the data collection for CitationUsage. :) [16:54:35] leila: sure [16:54:42] thanks. [17:09:43] leila: the discussion on village pump was archived, I updated the meta page instead [21:51:29] miriam: it's also fine, I think, to update the archived one, for archive happiness.