[07:50:20] leila: looking at where papers on that topic that I've found interesting in recent years have been presented, I see NSDI (the most frequent one by far), OSDI, CoNEXT, SOSP, ATC. A lot of industry ones (eg. from Google or Akamai) aren't presented and are only submitted to journals [14:58:57] Hi there, we just launched https://wiki-atlas.org , check instructions here: https://youtu.be/UlqxOvZ_LMk [15:17:22] dsaez: could we have oauth sign in via wikimedia [15:20:14] dsaez: no valid ssl cert, using a gmail account for alerts? [15:35:36] Hi RhinosF1, thanks for your constructive feedback. About ssl cert, it's eff certbot certicated, and it seems to be valid at least the machine we have tried, can you tell me which user-agent is saying that certificate is not valid. About gmail alerts, yes we will try to change to self-hosted email, but we had some stability issues, so for launching we moved to gmail. [16:07:42] dsaez: great. I was on chrome for iOS. [17:27:20] ppl, I'm going for lunch, I'll join for the second half of the office hours [17:28:27] Yes, office hours will be starting in 30mins [18:01:07] Hey everyone, welcome to office hours for the next 1 hour. [18:01:17] o/ [18:01:18] If you would like to ask a question or discuss a particular topic, simply type in the chat. We will do our best to answer as quickly as possible (I will try to relay to specific members of our teams); but everyone else please feel free to chime in as you see fit. [18:01:30] One brief request before we start: If you are planning to answer a particular question, we encourage to signal by using the “/me” statement, e.g. by typing [18:01:30] “/me is working on a response to XYZ”, which will appear as [18:01:36] * mgerlach is working on a response to XYZ [18:01:52] Sometimes crafting a response might take some time, so it is helpful not only for the one asking that their question is not ignored, but also for others to potentially avoid many duplicates [18:02:06] So, lets go... [18:02:26] o/ i'll be in and out. if someone has a question that i can answer but won't be on IRC later, feel free to just email me at isaac@wikimedia.org too [18:04:28] Hi Analytics/Research friends! I have a question about the Pageviews data. How is it determined which sites to collect pageview data for? Is there a process for adding a site? [18:06:11] milimetric: maybe knows best about this? [18:07:17] or maybe joal?^ [18:09:04] apaskulin: reading the documentation https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews which pages are not included in the pageviews data? [18:10:34] is there a section on that? I didn't see one [18:11:31] I couldnt find anything either. do you have an example? [18:11:38] https://tools.wmflabs.org/siteviews/ [18:11:56] you have site views and also pageviews [18:13:28] djellel: can you clarify? whats the difference? (is it related to redirects?) [18:13:38] Mostly I was wondering if there was an established process for requesting that a new site be added to the pageview data. I can follow up with the people that were suggested :) thanks! [18:14:01] nope, pageviews is per page/per project. [18:14:16] siteviews, is an aggregation for a given project. [18:17:15] apaskulin you're not talking about the social media traffic report are you? [18:18:15] hello [18:18:28] no i was thinking of https://tools.wmflabs.org/pageviews [18:18:44] ah, got it [18:19:20] dsaez: do you know if any pages are excluded from pageviews data? [18:19:51] no really, do you have examples? [18:20:07] apaskulin, can you give an example? [18:21:46] my question was more about the process for adding a new site to the pageviews data than about individual pages. so for example, if there was a dogs.wikimedia.org site, could I request that it be added to the pageviews data? (assuming it was Very Important Dog-related Data) [18:23:25] one could argue that all dog-related data is very important ;) [18:23:33] apaskulin, I think all of them are added by default... [18:23:34] (it's a good question) [18:23:54] ok, awesome! thanks! [18:25:07] yes, my understanding was that as long as the page is part of one of the wiki-projects, it will automatically be in the pageviews [18:25:13] sorry, here [18:25:27] o/ [18:25:30] apaskulin / rest: they're not added by default, here's kind of how it works [18:25:47] there's a whitelist: https://github.com/wikimedia/analytics-refinery/blob/master/static_data/pageview/whitelist/whitelist.tsv [18:27:13] we get an alarm whenever we identify a pageview not on the list (according to the pageview definition at https://meta.wikimedia.org/wiki/Research:Page_view) [18:27:33] that's when we usually add it to the whitelist or leave it off and make an exception if it's not a domain we want to track [18:29:09] you can always request that we add a wiki, and I think adding new wikis to the whitelist is documented somewhere on mediawiki.org but I forget where [18:29:58] apaskulin: but in short, just ping us on #wikimedia-analytics if you're not sure [18:30:13] perfect! thanks, milimetric! [18:30:26] hope that helps, ping dsaez and mgerlach just FYI [18:30:51] thanks milimetric, didnt know that [18:30:52] thanks millimetric! [18:31:17] np, that's why I'm here (or, you know, always absent chasing crazy children around, but here in spirit :)) [18:32:31] cool milimetric [18:33:19] thanks for the question apaskulin (sorry it took me so long to understand) [18:33:40] I'm experimenting with KDE, sometimes I lose the window that I'm looking for :) [18:44:38] 15 min left for any open topics [18:46:00] * iflorez is working on editor activity bins for GLOW editors that received grants during the GLOW contest. Where is a good place to look at previous editor activity bins work? [18:48:13] in October Isaac brought a similar question to the research group meeting, ("How would I verify whether a distribution of editors is representative?"). Any insights/feedback/links on verifying distributions or delineating the various bins would be helpful. [18:49:09] if J-Mo_ is still around (on his last day), he might know? [18:49:34] or halAFK ^ [18:49:56] use the one, five, and 100 edits-per-month thresholds? [18:51:27] assuming you're looking for representativeness in terms of activity level, rather than some other criterion (geolocation, gender, registration date, etc) [18:51:50] yes, representations in terms of activity level and in this case monthly is also helpful [18:52:34] though, for India, during a period of high activity as was seen during GLOW, those threshholds are low... [18:55:50] there's also more sophisticated techniques for setting a control group like https://en.wikipedia.org/wiki/Propensity_score_matching but others are better equipped than I am to determine whether that's the best method for your use case [18:57:18] yeah, when i was asking the question, i think the challenge was an efficient way to build a dataset of # of editors binned by activity level for a given month [18:58:52] isaacj: is it simply that no one has done this yet or is there a specific challenge? [18:58:55] ty J-Mo. [18:58:55] The link is helpful for thinking about GLOW year 2. and the one, five, and 100 edits-per-month thresholds rec is helpful context. [18:59:43] iflorez: I think there should be some document on meta:Research about definitions and metrics on editors but cant find it now [19:00:09] will follow up if I find something [19:00:12] ty! [19:02:34] we are over time but if anyone still wants to add something? [19:02:42] mgerlach: not sure. because the editor data is there, i know it's possible to build this dataset. not sure if there's an easy way though [19:03:59] iflorez: I think this is what I had in mind https://meta.wikimedia.org/wiki/Research:Metrics [19:05:20] thanks everyone for coming by today [19:05:23] ty! [19:06:54] next office hours will be in a month from now (probably 2020-05-27); hope to see you then [19:19:49] iflorez: this was for a different analysis, but this code might actually help with computing the metrics you're interested in. it'll create a dictionary of editors + edit count for any given month based on the history dumps. you'd just have to adjust to apply the bins you're interested in. https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover [19:21:36] thank you, Isaac