[00:55:14] tnegrin, JFYI; session calculation now takes 200 nanoseconds for 1.2 million users. [00:55:30] apps are gonna be happy :D [00:55:32] whee -- I thought for a sec that was the average session [00:58:08] hahah [00:58:12] okay, that would NOT make apps happy. [00:58:21] "so, good news, we can find out we're in trouble really really fast. Uh. Bad news..." [04:09:58] ack; gotta go to bed. Later all! [16:49:57] Ironholds, I want to upvote your blog posts. [16:50:05] halfak, what did I write? [16:50:14] I was amused about your von Neumann post. [16:50:23] Has he really been dead for 70 years!? [16:50:47] close to! 1957 [16:50:55] and guess what I'm seriously coming to halloween as next year? [16:51:01] zombie in a lab coat with a pocket of cards with RNs on them [16:51:27] heh. You've got to work in cellular automata in there some how. [16:51:48] The "non-von Neumann architecture" that von Neumann invented. [16:51:58] Just -- not the first one. [16:53:40] that'd be fun! [16:54:00] alright, I'm off shooting [16:54:16] but before I go: halfak, I couldn't sleep and spent some time splitting off the session reconstruction stuff into a standardised library for CRAN [16:54:42] Good call. This is generally useful stuff. [16:54:53] I'm pretty sure we could get a Journal of Statistical Software article out of "this is what session reconstruction is, these are the approaches, reference to the paper we already wrote, here are some metrics you compute after you have 'sessions', here is a toolkit" [16:55:07] you interested? Happy to draft myself and then throw to you for second-author review :) [16:55:13] +1 [16:55:19] I appreciate that time availability is variable for different people [16:55:24] We should talk API. Cranky as I am about R's limitations, I think I'd be a good sounding board for design choices. [16:55:37] (course, if we wanted to be really cool, we'd also write one in Python and release a JSS article that covers both) [16:55:38] gotcha! [16:55:46] +1 for covering both. [16:56:02] I'd like to see if a unified API could work. [16:56:15] We can certainly have tests that both libraries run against. [16:56:22] absolutely. Yay travis! [16:57:22] what would a unified API look like? [16:57:58] Not sure, but it would mean that one can seamlessly move between R and Python and expect similar call structure. [16:58:26] I think we can manage to do this both the R way and the Python way too. [16:58:39] yup [16:58:41] e.g. Python is more likely to have a class with a function call from a ".method()" [16:58:51] for example, we can't implement generalised streaming in R [16:58:59] .. R doesn't really do that. But that's OK. We won't be that strict [16:59:08] but we can totally expose the per-user sessioniser that the underlying "throw in a list of vectors" function uses [16:59:16] +1 [16:59:22] and that way people can build generator-like frameworks around it [16:59:34] ("I want to use this but automatically discard any session that meets X standard") [16:59:47] Next I want to talk about general C and setting Rcpp and Cython stubs that import a general library. [17:00:00] *stub --> shim [17:00:01] * Ironholds thinks [17:00:05] that should actually be really easy. [17:00:07] :) [17:00:29] So, fastread is a good example. Instead of having a big Rcpp library in their /src/ folder, they keep it in /inst/ [17:00:44] and then have a very limited pile of code in /src/ that just includes the inst contents. [17:01:01] We could do a similar thing, which would allow us to distinguish Rcpp-specific/Cython-specific C from generalised C. [17:01:11] +1 [17:01:12] the C would live in different repos, but it'd be identical. [17:01:21] I've actually deliberately structured the repo to allow this [17:01:37] example: https://github.com/Ironholds/sessionreconstruct/blob/master/src/session_metrics.h is "all the session metrics" [17:01:46] but you're actually calling https://github.com/Ironholds/sessionreconstruct/blob/master/src/sessionreconstruct.cpp [17:01:51] I feel like we should probably touch rings together now or something... "With our powers united, I am captain SCIENCE" [17:01:59] which consists of all the non-generic conversions and documentation, but nothing else. [17:02:03] just shims around generalised C++ [17:02:35] I have a curve ball for you though. [17:02:38] oh? [17:03:06] So, my session clustering function in python will take an arbitrary value to be associated with any session event. [17:03:21] yep, thought o'that [17:03:25] It could be HUGE, it could be an int, it could be None. [17:03:42] I'm not sure if we can replicate that in R. I mean, maybe a list of lists, instead of a list of vectors, where the first sub-list is always the numeric values? [17:04:14] I think that an arbitrary vector would be fine. [17:04:19] we could try. We'd probably need to look into some kind of converter to make it simple to turn, say, df-rows-with-timestamp into timestamps-with-associated-values [17:04:53] +1 It will look a bit like ddply. [17:04:55] hmmn. I'll think on it! Gonna go archery shooting today. Will give me some brain time :) [17:05:12] Hokay! Have fun :) [17:05:52] will do! [17:06:02] I'll be back around 4pm minnesota-time if you wanna chat this stuff through :) [17:09:00] I have to go play Wheelchair Basketball then. [17:09:13] If anything cool occurs to me, I'll drop you a note. [17:09:30] * halfak should probably read up on some cython module stuff before then. [23:01:21] halfak, okie-dokes! I should probably take a look at that as well. [23:01:39] I think I worked out how to associate metadata with individual sessions, but it ain't gonna be pretty.