[00:50:10] kevinator, do you have extra powers in analytics-internal? [00:50:14] ;p [01:10:42] Ironholds, are you still around? [01:48:57] leila, I am now [01:51:25] what's up? [02:14:13] evening DarTar [03:41:03] evening halfak! [03:41:15] and evening tnegrin :) [03:52:23] o/ [03:52:42] ^ Ironholds [03:53:21] how goes? [03:54:35] halfak, say, do you have any spare brainpower at the moment? [03:54:57] Sure. What's up? [03:55:56] So, I'm trying to think of decent ways to visualise, through box plots, the distribution of session[lengths/counts/pages-in-sessions] with fingerprinting versus eventlogging IDs. [03:56:22] The problem is the outliers, so my solution has been to log the values (we know these sorts of things follow log-normal distributions) - but it makes the scale pretty hard to understand. [03:56:37] you wouldn't happen to have any voodoo magic for this kind of situation, would you? Generating axes that make sense to humans, with underlying logged values. [03:57:09] ggplot? [03:57:36] indeedy [03:57:48] some magic with breaks or waivers? [03:57:54] wait, hangon. Isn't there a dedicated log10 scale... [03:58:49] there is! [03:58:53] halfak, thank you for the rubber ducking :D [03:59:01] https://github.com/halfak/Activity-sessions-research/blob/master/R/clustering.R#L83-L97 [03:59:03] relatedly the results for EL versus fingerprinting are draaaastically off. [03:59:12] (which, you know, we knew from the paper-work. But I've got it visualised now) [03:59:41] Are you using the MS dataset? [04:00:09] yup! [04:00:19] Toby asked me to generate some kind of baseline for desktop and the mobile web [04:00:27] For both fingerprint and token? [04:00:29] I decided to take the opportunity to demonstrate the necessity of a UUID [04:00:46] for the token, but I can generate fingerprints from the same data, and thus go "and here's what happens if we don't have UUIDs" [04:33:13] Deskana|Away, ping when you return [14:07:42] Hey hey science people. :) [15:24:39] morning [15:24:44] o/ [15:28:45] halfak, you may be able to help with this question [15:28:51] Wussup? [15:29:00] the norwegian wiki is complaining that according to Special:Statistics, 5% of their articles just..vanished, in a week in December. [15:29:16] So, I've checked the delete logs. Nothing weird. [15:29:21] I've checked the move logs. Nothing weird. [15:29:25] Where else would I check? [15:29:41] Huh. Does anyone have an example of a missing article? [15:29:48] Or is it a stat that changed? [15:30:16] the latter. Special:Statistics {{NUMBEROFARTICLES}} [15:30:24] this is kind of my point, really [15:30:48] Gotcha. A change in the stats code seems most likely. [15:30:54] if 5% of articles vanished and nobody /noticed/... [15:30:55] * Ironholds nods [15:31:01] ooh, I'll check the deploy logs. Good idea! [15:31:04] :) [15:31:21] * halfak is just bouncing your ideas back at you. [15:31:37] heh [15:33:42] morning tnegrin :) [15:46:25] HEY [15:46:27] hey [15:47:42] heh [15:47:45] halfak, re your email: yep! [15:48:03] I'm registering different memory addresses for test <- list("foo"); names(test) <- "bar" and names(test) <- "baz" [15:48:19] so it isn't even a "this didn't have a names attribute before so we need memory for that, which means expanding" [15:48:28] even if it has a names attribute, of the same length as the new value: copies. [15:48:56] WTF R [15:48:58] WTF [15:49:05] still not the silliest thing. [15:49:10] Agreed. [15:49:15] tnegrin, on that note, I almost have your desktop-versus-mobile session analysis done. [15:49:21] whee [15:49:25] and I generated fingerprints and worked out the distribution for the fingerprints, too [15:49:38] because that way we can see how inaccurate fingerprints are, and whether we can use them or not (and if so, what metrics we can use them for) [15:49:41] will you have it done by 11? [15:49:45] :) [15:49:47] absolutely [15:49:49] really? [15:49:53] I mean, I actually have the pretty visualisations done now [15:50:05] I just need to generate the table of values. [15:50:08] can I see them? maybe to use in metrics meeting [15:50:13] totally! Let me grab them. [15:51:15] tnegrin, YGM. number-of-sessions sent, just as a demonstration. [15:51:20] both EventLogging UUIDs and fingerprints [15:52:27] no comprende — I’d like to see mobile v desktop sessions by method [15:52:36] box plots or table? [15:52:46] table [15:52:53] yep, that's what I'm working on :) [15:54:18] excellent [15:54:46] Ironholds, I think we need to dig into what's going on with the data before we make strong statements. [15:54:53] yeah — totally [15:54:54] re. fingerprints vs. uuid [15:55:17] but I’m really interested in getting a read on the session distribution comparison bt desktop and mobile [15:55:35] it looks like there is a visual difference [15:56:28] I'll generate pretty density plots of those at some point too. [15:56:57] I was working on this until midnight, though, so may be a bit slow off the bat today. [15:58:56] tnegrin, with caveats about data sanity: [15:59:08] looks like mobile has about 20% more sessions [15:59:15] (using the eventlogging UUIDs) [15:59:34] but fewer pages/session? [16:01:57] not got the raw counts generated there yet. Will let you know. [16:02:06] This is excluding one-event users but not excluding one-event sessions. I've still got to implement that (well, send out an email to the thread. And then implement it.) [16:03:30] ok — cool [16:12:14] ooh, this is fascinating [16:12:21] tnegrin, got it. Writing to TSV and will stick in google drive. [16:12:24] halfak, will link to! [16:12:28] interesting? [16:12:44] (i’ll wait for the data :) [16:15:26] :) [16:16:31] tnegrin, YGM [16:16:48] I’m feeling old — what does that mean? [16:17:02] you've got mail! [16:17:04] to summarise: [16:17:09] *Mobile has around 20% more sessions [16:17:20] *Desktop has more pageviews per session, and longer sessions, but; [16:17:22] haha — I think that was a move [16:17:25] movie [16:17:27] *Mobile has a tighter fit of session lengths. [16:17:37] it was! It was TERRIBLE. I remember seeing it when I was 9 or 10 at boarding school. [16:18:36] for me, a crucial lesson is that fingerprinting overrepresents session length and pages by ~20% - presumably, grouping multiple users together. [16:18:45] So a UUID is definitely a superior way of doing it [16:18:46] it would not be my choice for 10 year old boys for sure [16:19:05] also, halfak, I generated some really interesting density curves for you [16:19:10] look at https://github.com/Ironholds/SessionDelta/blob/master/Output/desktop_session_count_density.png [16:20:27] so…if this data is correct, every user that moves to mobile means fewer page views [16:21:38] fewer pageviews per session [16:21:38] but more sessions [16:21:40] hmn. I could generate the distributions of "overall pageviews per user" [16:21:45] if that would be helpful? [16:21:49] for mobile versus desktop [16:22:11] Ironholds, (1) we expect session counts to be different and (2) this plot is not taking advantage of the kernel at all. [16:22:35] taking advantage of the kernel? [16:22:50] and agreed, I just found the rise/fall/smaller rise/fall/even-smaller-rise pattern aesthetically pleasing [16:23:09] In a density plot, a kernel is the strategy for applying the uncertainty of an observation appropriately. [16:23:24] In this case, geom_density() uses a normal kernel by default. [16:23:52] Here, we can see clear normal distributions around 1 session, 2 sessions, 3 sessions, etc. [16:24:03] I don’t see the increase in number of sessions [16:24:10] While the normals should be overlapping. [16:24:27] argh. even though it's using log values /me headdesks [16:25:00] halfak, hmn. What kernel would you use? [16:25:05] tnegrin, see the box I've highlighted [16:25:29] it's not a tremendous rise, but it is ~15-20%. [16:25:41] I think the normal kernel is fine -- it's just that discrete nature of the observations is causing a problem. [16:26:00] You could try adjusting the width, but I don't think that is the right strategy. [16:26:13] ok but it doesn’t balance out [16:27:30] not nicely, no :( [16:27:52] which means we need to dig into the edit-attempt/fundraising-banner experiment as a next step on this, I think. [16:29:29] halfak, hmn. We should definitely build a standalone toolkit for this as part of the session definition work :(. Darnit, 12 December. Show up so I have free time already. [16:49:02] +1 Ironholds [16:49:48] and stabbed my finger with an exacto-knife. [16:50:05] that and two hours of lucid nightmares and I think I can call today "done" [17:06:29] Ironholds, wat [17:06:33] Are you crafting things? [17:06:49] yeah, christmas presents, to be delivered in January [17:07:01] (there's one for you!) [17:08:25] \o/ [17:08:43] Say, when are we doing to assign secret santas? [17:08:53] * halfak also has Ironholds' present. [17:08:55] :D [17:08:57] when nuria__ and Erik have indicated if they want to do it [17:09:07] Is it a UMN t-shirt? :D [17:09:16] Nope. But I could get you one of those. [17:09:31] I'm good, I was just imagining it with "hint hint" written on it, or something [17:09:32] Ironholds: i am skipping .secret santa [17:09:49] thank you! [17:09:52] (that is, for telling me) [17:10:00] I will assume Erik is as well given his failure to respond to the thread [17:10:38] I think that's a fair assumption at this point. [17:11:17] done., [17:11:31] and now to try and get this logic for session counting to work :? [17:11:33] *:/ [17:11:43] if we want to exclude one-page sessions things get awkward in C++-land. [17:12:25] I'm still not clear about why we would want to exclude them. [17:12:57] We don't seem to be able to accurately model them; it's an ask for the mobile apps implementation. [17:13:18] I think we should look into this in more detail for the standard implementation. [19:46:17] Ironholds: What's up? [19:46:36] Deskana, now resolved! [19:46:45] it was about the session approach. We've worked it out now, so all is well. [22:04:07] oh yes, dario is out today