[00:00:24] pubsez: good question. I think for proof-of-concept there wasn't any attempt to filter haiti specifically from French (simply all French views were Haitian). So, my follow-up question is whether any of the correlations might just be driven by media attention to diseases / disease outbreaks rather than people with symptoms themselves [00:00:56] computermacgyver, got it [00:02:29] computermacgyver, it looks like Reid may have answered that by pointing out the error in the cholera model. [00:02:38] Happy to ask anyway. he might have more to say. [00:03:08] halfak: Yeah, he's answered it; no need to ask. thanks. [00:03:16] kk [00:03:51] It's surprising how difficult it is to predict the past. [00:05:02] Hard to even know the offline present with Internet-only. That was the major criticism of Google Flu trends in the Science article, and in fairness to Google they are reworking it to use CDC data as an input as well now (http://googleresearch.blogspot.jp/2014/10/google-flu-trends-gets-brand-new-engine.html) [00:05:33] *with Internet-only data [00:08:30] https://github.com/reidpr/quac [00:08:45] ^ source code for the system [00:09:33] Could you go into more detail on how you are evaluating predictive accurracy of your models? [00:10:05] yeah, that's my question too ewulczyn [00:10:08] training test [00:10:26] I only saw reports of goodness of fit. [00:10:37] yep [00:10:56] yes, we should ask this. If Aaron can't squeeze it now, after office asks questions [00:11:18] leila, you should jump in the for the next question :) [00:11:35] Also, picking the top 10 correlated articles before spitting into train and test sets can lead to very misleading estimates of predictive power [00:11:36] okay, halfak! it's hard when I don't see the office. ;-) [00:12:01] yes, ewulczyn. will bring it up. [00:13:43] dengue is mosquito-borne.....was big news in Japan when there was an outbreak in a Tokyo park :-) [00:15:39] thanks! [00:16:06] thanks [00:16:24] thanks, everyone! [00:16:48] If you want to contact Reid, you can find his email address here: http://cnls.lanl.gov/External/people/Reid_Priedhorsky.php [10:30:44] quiddity, I have EIGHTY FOUR UNREAD MESSAGES in my inbox because of your midnight bout of "make infinitesimally minor modifications to all of the things" [10:31:00] if you're going to go through Phabricator unsubscribing people from everything Flow related, please add me to the list next time. [12:03:07] halfak, ping when you wake [14:17:07] o/ Ironholds [14:17:30] heyoo [14:17:48] so, I was thinking of working on the session reconstruction stuff today (currently waiting on people for the pageviews UDFs). Thoughts? [14:21:04] Hmm... Not sure what thoughts I might have. [14:25:18] Ironholds, ^ [14:25:33] I might not be understanding what you are referring to. [14:25:43] reconstructr stuff? [14:27:37] ah, no, sorry for the vagueness [14:27:39] the documentation on meta [14:27:50] at the moment it's sort of sharded; activity sessions is the most complete thing but it's fairly technical [14:28:43] +1 I don't have substantial time to contribute to writing, but I'll outline with you. :) [14:28:49] I'd like to work out how we reconcile all of the docs and structure the tree so that there's "here is a human-friendly introduction to why we are measuring this, what we measure, and how we do it. Here are pointers to particular use cases/backstories/etc" [14:28:53] awesome! [14:28:59] I'll open an etherpad and we can briefly hack [14:29:16] https://etherpad.wikimedia.org/p/bangarang [14:31:42] * halfak feels inspired to draw some pictures [14:36:33] yay! [14:38:43] Ironholds, in the nav-oriented heuristics, do they use an inactivity threshold? [14:38:57] * halfak wonders about page loads that span substantial time, but have a link. [14:40:21] they don't! [14:40:23] * Ironholds jazz hands [14:40:59] or at least, if they do, none of their papers do. But honestly I find Spiliopoulou's summary of those heuristics to be divergent from the papers she's citing [14:41:08] this is not a critique of Spiliopoulou, this is a critique of her sources' ability to actually write. [14:41:23] Digging into her sources was one of the least pleasant hours of my recent life. [14:44:46] ewww. [14:46:54] Seriously. WTF is up with nav. oriented? [14:47:11] Any decent website will have links back to the homepage from any other page. [14:47:22] And most people will *start* a session from the home page. [14:47:47] That means that most intuitive inter-session gaps will be covered by the link to the home page. [14:47:49] AHHH [14:51:27] halfak, yup [14:51:30] it's impressive, in a way [14:51:39] halfak, you know the really stupid thing? [14:51:51] the complex navigation-oriented strategy goes: wait, people backtrack. [14:52:19] so it's within the same session not only if it's got a referer chain, but also if any of the pages in the referer chain link to the page. [14:52:31] it's kind of an achievement to come up with a strategy in the late 90s that doesn't understand webrings [14:58:37] nice diagram, dude! [14:58:39] * halfak sighs [14:58:41] woot:) [15:04:45] https://commons.wikimedia.org/wiki/File:Time_vs._Navigation_orientation_(Session_reconstruction).svg [15:06:34] Ironholds, ^ [15:06:45] yay! [15:06:47] I've got to move on to other stuff, but I'll come back to this when I can. [15:07:12] Feel free to boldly clean up the graphic. [15:07:22] e.g. I'm worried that the legend is too big and bold. [15:07:29] Not sure what to do about that yet. [15:10:23] eh, it'll be fine :) [15:20:58] I had a little bit of muse, so I set it free here: https://etherpad.wikimedia.org/p/bangarang [15:21:35] neat! [15:21:50] hmn [15:22:07] so, do you envision a single "activity sessions" page, with a human readable summary followed by a deep dive for each section [15:22:22] or "activity sessions" containing human-readable summaries of each section, and then pointers to distinct pages for the technical commentary? [15:22:45] The former [15:23:00] I think that technical commentary about our implementation should be elsewhere. [15:23:20] But we can still get into the details of time-oriented heuristics on the activity sessions page. [15:23:34] hmn. Okay. I worry about it making it difficult to get through. [15:23:50] Why would you be reading this page? [15:24:10] I'm Lila and Toby has given me this page as an example of the awesome work the team has been doing [15:24:19] or, I'm Howie and I'm trying to grasp session reconstruction approaches [15:24:35] just speaking for myself, if I'm given a single page I'm likely to read linearly, so I'll have to confront the deep-dive into sectionA before hitting the summary for sectionB [15:24:44] What I'm proposing should be good for the later. The former, I don't think Lila should see any page like this. [15:24:58] Summary for section B is in the lead. [15:25:04] ohhh, gotcha [15:25:09] so, big-ass lead of summaries, and then deep-dives [15:25:35] that makes sense; I was imagining a structure that looked like SectionA{HumanReadableSummary,DetailedStuff},SectionB{HumanReadableSummary,DetailedStuff}... and getting worried [15:25:37] Well, you get a 1-2 sentence per big section to give you a gist. [15:25:40] yup [15:26:03] in that case I'll start with the lede! [15:26:10] Though I don't think we should have a big section of definitions right away. [15:26:14] So order might not match. [15:26:51] Let me try to restate that. Lead should start with our best definition. The section digging into different definitions should appear 3rd at best. [15:27:47] totally [15:28:09] Cool :) [15:28:21] * halfak runs off to ha- all the doops. [15:28:39] haha [15:44:28] halfak, when you've got some time, sent you a link to a derivative image I created as a top-line illustration of dividing activity into sessions [15:44:43] intent is to use it in the new lede [15:45:05] Looks solid. [15:45:16] Maybe crop it a bit [15:45:19] May I? [15:45:40] sure! [15:46:12] {{done}} [15:46:14] I like it. [15:46:25] Good call on changing "page view" to "action" [15:49:25] np! [15:49:32] * halfak moves a few hundred GB of enwiki data to HDFS [15:49:35] I want to generalise it to edits too; the examples will also be generalised, where possible. [15:50:39] +1 [15:53:33] also, halfak, the example I brought up in my presentation about sessions and temporal analysis? I totally want to do that. In fact, I don't even need anything special to do so. [15:53:43] Do we have any datasets of edits, handcoded by quality? [15:53:52] preferrably by IPs? [15:54:01] We do, but it is small and only for newly registered editors. [15:54:10] However, I have some bomb-ass metrics for edit quality. [15:54:13] :) [15:54:45] I can process ~20k revisions over night with the API and some clever querying. [15:54:54] oohhhrelly [15:54:54] Soon I'll have the whole thing processed. [15:54:59] halfak, I have an idea. [15:55:01] But in the mean time, it's easy to do a random sample. [15:55:22] What if we were to take {user_id,ip,rev_id} tuples from the checkuser table [15:55:28] run the revisions through your classifier [15:55:31] localise the timestamps [15:55:39] and look at how edit quality varies depending on when it's made? :) [15:55:58] there was a fascinating study that found judges are more lenient immediately after lunch [15:56:08] I want to find out if users screw up more at 10pm, or 12pm, or whatever. [15:56:15] We won't know the timezone of the judges. [15:56:21] why not? [15:56:31] They won't necessarily file their judgements [15:56:37] e.g. revert or not revert [15:56:51] oh, sure. But we can make a quality assessment of the edit. [15:56:57] Indeed. [15:57:09] the judges example was more a "time of day has an impact of how well you're thinking" [15:57:15] although it'd be fascinating to look at rollback/delete actions, too [15:57:17] I suppose we could approximate the central timezone for patrollers too. [15:57:42] I bet most patrollers are geographically similar. [15:57:44] well, if we're talking patrol actions, they're in the cu_changes table so we don't actually have to approximate [15:57:50] log actions go in there as well as edits [15:57:56] obviously we'd only have 3mnths worth of data, though [15:59:22] Should be plenty [15:59:31] Results will reflect recent WP stuffs [16:00:36] * Ironholds cries [16:00:43] all I want is 3 months off to write ALL OF THE PAPERS [16:00:48] Is that too much to ask the universe for? [16:02:37] heh. One day... IMO you get the most value out of researchers when they are chasing the things they find the most interesting. [16:03:00] So, it would make sense to give you 3 months *on* to write ALL OF THE PAPERS [16:03:23] * YuviPanda is off on a goose-ish chase today, bringing HHVM to toollabs because I felt like. [16:03:36] need one of these once every few weeks to prevent burning out or boring out [16:03:42] YuviPanda, make sense if people are going to do MW dev work there. [16:04:02] Oh wait. "toollabs" [16:04:02] halfak: no, but wiki(m|p)edia depends a lot on tools hosted there. [16:04:04] I still want y'all to hit up Boston in April-ish [16:04:13] and speeding them up would definitely help [16:04:20] I figure we rent a space, bring in Adam and Nate, spend a day throwing ideas for papers around [16:04:21] +1 YuviPanda [16:04:24] four days hacking, done [16:04:29] also it’s a fun problem and would let me clean up some architecture there [16:05:07] Ironholds, sounds like a fine idea to me. [16:05:16] YuviPanda, I'm curious how messy things are. [16:05:51] halfak: less messy than before (web tools used to be one big apache for everyone (people kept killing each other), then lighty + apache (with NFS for routing, eugh)) [16:06:04] Is it gum-and-tape or more robust-but-hard-to-work-with? [16:06:18] was gum and tape, now more like industrial grade glue plus a little bit of tape [16:06:39] but enough tape that we hardly have outages anymore in toollabs itself [16:06:43] which wasn’t the case a year ago [16:07:24] halfak, how did you change the size of the google drawing? [16:07:30] building one to describe bounce rate [16:07:48] Ironholds, there's some diagonal lines in the lower right when you put your mouse there. [16:07:52] aha [16:07:53] Grab & grad. [16:08:05] *drag [16:08:08] yikes [16:08:52] that sounds almost like the punchline to a joke [16:08:59] or an idiom [16:09:02] YuviPanda, gotcha. Thanks. :) [16:09:03] that's an idiom I want to introduce [16:09:16] to describe people who got a PhD based on work they'd already done in the private sector [16:09:18] "smash and grad" [16:09:25] instead of "smash and grab" [16:55:22] * Ironholds appears in a flash of smoke [16:55:36] I AM THE GOD OF HELLFIRE, AND I BRING YOU...TRANSPARENT CONVERSATIONS ABOUT EXPLANATORY DIAGRAMS! [16:55:38] So, the arrow is supposed to represent the "exit" [16:55:44] :P [16:55:47] I just got to make a Dead Ringers reference /and/ an Arthur Brown reference [16:55:49] this is a good day [16:56:08] that makes sense. In that case we should add an arrow to the non-bouncing example too [16:56:18] I mean, they must leave at some point [16:56:20] +1 [16:56:28] Maybe we could use a time line? [16:56:47] Meh. Not sure. It's get cluttery. [16:56:54] I'll leave it to you :) [16:58:01] I AM THE GOD OF HELLFIRE <---- I have that on vinyl, for some reason. (from the 90s) [16:58:31] haha [16:58:59] o/ quiddity [16:59:01] :) [16:59:09] hallo :) [16:59:23] You still working on Flow stuffs? [16:59:35] always! [16:59:40] til the end of time! [17:00:03] I just started digging into the data behind flow :) I'm working out some methodology stuff for the Mentoring IEG project. [17:00:22] gmorning lzia [17:00:22] ah, good stuff. [17:00:33] good mornin' halfak. :-) [17:00:36] halfak, see current version; thoughts? [17:00:41] quiddity, there's a page curation bug on enwiki for you [17:00:51] hey Ironholds. [17:00:55] hey leila :) [17:01:05] Ironholds, errr, wha? ain't nothing to do with me! [17:01:12] Ironholds, I like the attachement of the arrow to the action [17:01:24] But I think we're not representing order well. [17:01:43] quiddity, whose team has legacy support for page curation? [17:01:45] :D [17:01:59] * quiddity denies everything [17:02:00] halfak, hrm. Timeline-based approach, then? [17:02:11] quiddity, nice try. [17:02:25] Maybe just taking advantage of top-to-bottom or left to right. [17:02:36] I wonder if we can get away without a line. [17:02:43] I think we can [17:02:45] we just need to..hangon [17:04:12] halfak, see the doc? [17:04:19] Yeah. [17:04:22] I think that works. [17:04:29] yay! [17:06:17] I'm conflicted about the arrows I added between actions. [17:06:25] I like em! [17:06:26] If you don't like 'em plz delete [17:06:27] kk [17:06:40] alrighty, we has another diagram! [17:06:44] tnegrin, you'll like this. [17:06:58] I'm spending the morning working on the session methodology documentation [17:07:15] and halfak and I have come up with a standardised design language for visually representing the different approaches and the calculated metrics [17:29:51] come on people! [17:29:53] * Ironholds sings [17:30:19] it's time to STAND-UP, THEY'RE PLAYING MY SONG, THE BUTTERFLIES FLY AWAY [18:14:48] halfak, do you recall if one of the new machines was intended to have a public IP/ability to host some kind of service? [18:22:40] also, sent you a new diagram! [19:02:06] Ironholds, I think that's a good idea. [19:02:36] I don't think I have been CC'd on the new machines discussion though. [19:02:40] aww [19:32:20] halfak, https://trello.com/c/ATHjMADJ [19:37:09] halfak, we finished new ideas [19:37:16] let's move to the next one? [19:37:25] +1 [19:37:28] and shall we remove all the members from new ideas? [19:37:40] I did it for the top ones, I can do for the remainder if you agree [19:37:42] Sure. [19:37:44] okay [19:37:52] then choose a new column and let me know [19:37:54] I'll take a second pass to subscribe. [19:38:16] halfak, https://trello.com/c/MxMizxkL ? [19:38:46] Ironholds, that one is done! [19:38:48] Woops./ I' [19:38:50] ll move it [19:39:55] kk [19:43:32] halfak, https://trello.com/c/Fr9qmdz7 [19:48:28] halfak, https://trello.com/c/sVtBzchz [19:52:29] halfak there is one in the same swimming lane [19:52:34] Back-fill page tracking [21:15:33] whee, bow mounted [21:22:31] https://fbcdn-sphotos-h-a.akamaihd.net/hphotos-ak-xpa1/v/t34.0-12/10863687_10152956824586255_1835122814_n.jpg?oh=a7f7a4020dd68b25ab3d96c68c2d203b&oe=54974EB5&__gda__=1419177048_95b96102608e269ba183de7330092395 [21:22:40] and all I needed to do was use a guitar mount [21:22:45] I am THE GREAT ADAPTOR. Or something. [21:29:20] wb halfak :) [21:31:00] Hey dude. [21:31:03] Just got to cafe. [21:31:12] Hanging out with subbu. [21:31:40] hey subbu! [21:31:52] i can corroborate that. [21:31:54] hey [21:33:13] say, halfak, I had another idea for session applications [21:33:32] take a pool of long-term editors and measure changes to session-based metrics over time. [21:34:03] what happens as newbs become more experienced? Do they get more likely to bunch edits, or to distribute them, or just to edit overall? [21:41:16] +1 Ironholds. That was my original vision for the sessions paper. [21:41:33] It just turned out that looking into the regularity was more interesting once we got to it. [21:41:41] * halfak digs for hypotheses. [21:41:51] heh [21:41:56] halfak, I have an additional hypothesis [21:42:17] https://docs.google.com/document/d/1eZ1i91obf4CDNsEIZlzV1FDKEBSwkkmc12vLpBg_4-Y/edit [21:42:25] Those are super old [21:42:25] people stop editing in a session because something takes them away or because they run out of energy. IOW, because editing is taxing and not-fun. [21:42:43] if we can make editing less-taxing and speedier, which we can benchmark by looking at time-on-event, we can increase edit numbers [21:43:04] ooh, interesting [21:43:10] this'd be super-easy to grab, too. [21:43:28] Like, I'd need...about two hours, to generate session-lengths-over-time-editing [21:58:27] follow up from today's standup on outcome of mtg re: events that fail validation [21:59:11] 1. we are going to Get researchers access to right servers so that they can look at logs & quickly see that things are working as expected (events that fail validation) [21:59:41] oh, while we're here: ggellerman_ I can't find the thread ottomata was on, but he's here [21:59:53] hi ottomata! Any idea what's happening with the new stat100* machines? [21:59:58] ah [22:00:19] no, i was waiting until the vlan move before I pushed it, but i will say that analytics doesn't really have any budget for new nodes atm. [22:00:19] that doesn't sound like a good noise ;p [22:00:32] mark and toby would have to work that out [22:00:34] aha [22:00:38] hey tnegrin, can we have some money for new machines? Cut my pay or something. [22:00:48] I happen to know you saved 10% of my salary when I moved. [22:00:50] i think toby is for paying for them, but i haven't pushed it [22:01:01] gotcha [22:01:05] will do so after the holidays [22:01:35] there were some issues around what machines [22:01:46] oh yeah. I wanted a gaming machine, y'all wanted a server. hmph. [22:01:58] eh, we can work it out over the holidays [22:02:08] 2. longer term we will pipe events that fail validation to some as yet undetermined machines [22:02:16] I promise not to irrevocably break everything while you're away [22:02:38] Ironholds: i can't say I fully understand the need......>>>>...>.>.>.... [22:02:52] ottomata, if we want sampled logs, or dumps, or hive, we use stat1002 [22:02:57] stat1002 also hosts all of Erik's perl. [22:03:03] if we break stat1002, the perl goes away. [22:03:05] i know you are not going to like me saying this [22:03:10] but there are 22 nodes at your disposal! :D [22:03:13] last month we almost broke stat1002 twice. And by we I mean Ellery. [22:03:17] ottomata, that's a great idea! [22:03:23] I'll just ssh into one of those and run my code there! [22:03:26] thanks for the suggestion :) [22:03:39] * Ironholds goes to check we have R on analytics 1027. That's always been a robust and stable testbed. [22:03:47] hah [22:04:26] seriously, though, we should look into distributing tasks like that, but I'm not sure how to do so. I mean, I can just go "plz be installing this package on all the nodes" and then test stuff.. [22:05:06] ottomata: you thought we might be able to move more processing to hadoop [22:06:40] yes, but I do not want to say that in this chat room, for fear of pitchforks [22:06:52] naw, I'm fine with that [22:06:53] this is me! [22:07:05] all you need to do is get toby to go "Oliver, you need to do X by last Thursday" [22:07:18] I'll vanish for a week and resurface unshaven with an R package that allows you to do X in three functions. [22:07:35] If yer volunteering, https://github.com/RevolutionAnalytics/RHadoop/wiki - boop. Get these installed so I can start experimenting? [22:30:43] whee, this has been a really productive day [22:31:02] hey DarTar, can I do a showcase presentation just on session reconstruction? [22:31:12] demonstrating and defining the metrics we can extract from it? [22:31:58] I’ll buy that [22:32:17] we have felipe signed up for January [22:33:49] DarTar, grand! [22:34:05] Ironholds: how’s the enwp article coming together? [22:34:08] DarTar, and how much trouble am I in if I also use it to scientifically prove that automatically-generated padding is accurate? [22:34:23] you’ll have to respond to history [22:34:24] pretty well! I just need to expand the use cases. There's no actual scholarly literature on the use cases so I might end up writing one [22:34:36] but mostly I'm focusing on *drumrolllll* [22:34:36] https://meta.wikimedia.org/wiki/Research:Activity_session [22:34:46] human-readable diagrams with a consistent design language! [22:34:49] high-level summaries! [22:34:55] w00t [22:34:56] example use cases! [22:35:02] DarTar, my thinking is this, right? [22:35:07] I have nothing to do over the holidays. [22:35:31] Readership's quarterly goals were (1) PV def in hadoop (2) session methodology, (3) mobile support, (4) UUID approach [22:35:35] did you manage to hear from anyone at GA, Mixpanel or what not? [22:35:38] we have (1) and (3), (4) is no longer our problem [22:35:48] if I can do (2) we get to green-bar that thing at the quarterly review. Boom. [22:36:01] naw, not reached out yet. Whups :/ [22:36:11] ha ha, yes, I think we did a pretty good job with sessions in the quarter [22:36:24] what do you mean re: (4) [22:36:43] related -> https://trello.com/c/NCpPIPol/173-5-how-are-our-users-searching [22:44:08] that is, the UUID is currently blocked on Kevin/AnalyticsEngineering having capacity to deal with it, no? [22:45:27] Ironholds: yes [22:47:02] then: not our problem! Great success. [22:50:39] DarTar, also, we've decided to call this the Perdurantist Pageviews Definition [22:50:45] jfyi [22:52:10] ha ha [22:52:33] * DarTar picturing 4d wormholes materializing on our request logs [22:53:04] https://en.wikipedia.org/wiki/Perdurantism#Worm_theorists_and_stage_theorists [22:53:27] that's not what it means! [22:53:34] grr. I should've made that joke at Emma. [22:53:36] She would've got it. [22:54:09] WAT [22:54:29] it's a reference to the Ship of Theseus paradox; is something the same thing if, over time, you replace every element of it? [22:54:58] the Perdurantist answer is that things are best identified as existing both temporally and spatially, and so yes, the ship is still "the ship of theseus" all along, even if you replace it entirely, piece by piece. [22:55:03] yeah, I know Theseus from my undergrad days, the link I gave you is the one you want to read up [22:55:12] aka, the real shit [22:55:15] I know what a worm theorist is! [22:55:19] whew [22:55:45] so are you a worm theorist or a stage theorist? [22:56:03] stage; it's why I get my tattoos. [22:56:17] boo, I believe in worms [22:56:22] Because I die, constantly, in my entirety, and the tattoos are reminders of the person I was and that there were things that mattered to them. [22:56:34] (this is genuinely my rationale, and my basis. Ask halfak, I drunkenly explained it to him early last year) [22:56:44] I’ll ask for a full report [22:56:48] totally [22:56:49] ;p [22:56:52] :D [22:57:30] mind you, you can’t say the person I was if you’re a stage theorists [22:58:27] this has seriously interesting implications for session analysis [22:58:33] (not joking) [22:58:41] agreed [22:59:02] you looked up the wrong literature (log analysis) [22:59:03] but I don't mention it when discussion session analysis because, and correct me if I'm incorrect here [22:59:27] I suspect that if listing caveats leads to "whatever, just DO IT ALREADY" [22:59:51] launching into a discussion of whether a UUID is even valid between events due to the changes in the holder is likely to lead to me being bodily hurled through a 3rd-floor window ;p [22:59:52] I like the idea of testing tnegrin’s resistance to perdurantism [23:00:27] you joke, but I would absolutely do a presentation on the impact the definition of identity has on session reconstruction just to troll the shit out of management. [23:00:35] we could make some money! Have a book open on when various people would walk out. [23:00:43] And who'd be left at the end. [23:00:46] * DarTar nods [23:01:26] mako: I need a couple of mins [23:04:54] our article on perdurantism is TERRIBLE [23:04:57] * Ironholds adds to to-do [23:14:06] DarTar, I just had a horrifying thought. [23:14:17] mobile reduces pageview counts, inherently. This we know. [23:14:27] to the tune of 1 PV per unique client per month. [23:14:45] how many projects have a decline rate because of just..THIS? [23:29:08] Ironholds, sorry to just log out earlier. I realized I needed to get across town quickly. [23:29:32] Ironholds, I like the direction you are going with these thoughts re-session behavior. [23:30:24] The way I think of "activity sessions" is at odds with the run-out-of-energy hypothesis. [23:31:20] Which is awesome because we can test that./ [23:31:27] yay! [23:31:38] and no problem :) [23:33:01] halfak, so, for my january presentation, I'm thinking of it mostly being descriptive, but with one bit of Real Research (tm) [23:33:09] which I'd like your thoughts on [23:33:28] I want to show that we can accurately predict padding for the last event in a session, when calculating session length. [23:34:04] How would we know if we are doing a good job? [23:34:08] well [23:34:32] we could artificially truncate sessions to remove the last event, calculate an appropriate threshold from the remaining data points, re-add the last event, calculate THOSE intertimes and see how accurately the calculated value aligns with the Actual Values. [23:35:17] the only caveat is: it assumes that users go through the same cycle on final_event_of_session as on other events within the session [23:35:24] but I can't think of a way of justifying padding that doesn't make that assumption [23:35:54] Gotcha. Yeah. I had the same thought. [23:36:05] think it'd be viable? I mean, I can grab a dataset and find out! [23:36:16] Really, we can just use the last intertime in our estimation of the padding. [23:36:18] I'd also be interested to see which mean produces the more accurate value ;p [23:36:23] hmn [23:36:23] That would make sure we estimate it optimally. [23:36:25] I can try both approaches [23:36:31] all-intertimes, and last-but-one intertime. [23:36:59] end up with a 2 by 2 grid of values used and means used. [23:37:06] or 3 by 2. We could look at per-user thresholds too. [23:37:19] ...this is sounding like something I should, when I'm done, keep around for That Session Reconstruction Metrics paper. [23:37:36] Agreed. [23:37:37] So... [23:37:55] Would we hypothesis that the last intertime is different than the other intertimes? [23:37:58] And if so, why? [23:38:16] I can think of a few scenarios [23:39:09] so, if we assume an energy-based model (don't worry, I have scenarios for other models), the scenario is "user saw page1, page2, page3, ran out of steam somewhere before page4 (hence its absence). they probably didn't run out of steam at the end of their time consuming page3" [23:40:07] if we assume a consumption-based model instead, where users persist in browsing until they find the thing they want: presumably on each page the tree goes "open page, read until I encounter [factoid], reach end and continue seeking [factoid], or see a link that looks more relevant to finding [factoid]" [23:40:47] I'd expect the time values for those to differ; that is, a user is unlikely to find [factoid] at the end, or find the link that looks most relevant at the end [23:40:52] this is currently groundless hypothesising, mind. [23:41:21] I like the latter. I can't say I understand the former. [23:41:24] But either model, I can see a plausible argument for there being a difference between the last page intertime and the previous ones. Because the last one represents something fundamentally different: finding the thing you wanted or giving up. [23:41:36] Are you thinking that people will get tired of browsing and browse slower? [23:41:37] all previous ones are "hunting mode", I guess [23:41:50] no, more they'll get tired of trying to find the thing and give up entirely [23:42:06] and that the giving-up is unlikely to happen at the same point as their natural switch to a distinct page. [23:42:10] Oh I see. So, in that case, we'd expect all intertimes to be similar. [23:42:13] I can visualise it but not word it ;p [23:42:22] Yeah. I gotcha. [23:42:26] aha, all intertimes prior to giving up? Fair point. [23:42:30] That's a good way of disproving that model. [23:42:37] So, I want the average time between event for the intertime index in a session. [23:42:37] and now I'm seeing TWO papers. Bah :p [23:42:56] And I want to try it starting at the first intertime [23:43:02] As well as starting at the last. [23:43:11] If we see a trend, we know something's up. [23:43:14] yup [23:43:21] so for January, what I'm imagining is a grid of... [23:43:22] And we should be able to model better because of it. [23:44:03] for (per-user threshold, all-intertimes), (per-user threshold, last-intertime), (global-threshold, all-intertimes), (global-threshold, last-intertime) [23:44:13] the disparity between the resulting value, and the actual intertime value we see when we re-add the last event. [23:44:45] bah. I could spend a year just studying sessions [23:44:48] maybe THIS is my PhD. [23:45:13] "How People Give Up, by Oliver Keyes, PhD.inst();" [23:49:33] You could extend this work to try and address the engagement/retention pattern. [23:49:51] I've been playing around with COX regressions and hazard models in the context of sessions. [23:49:59] I haven't found much traction though. [23:50:17] * halfak gets link [23:50:39] See results here: https://meta.wikimedia.org/wiki/Research:Newcomer_survival_models [23:52:39] hmmn [23:53:05] But if we can understand what gets a user to cease or continue a "session" [23:53:22] We might be able to also understand what gets a user to cease or continue an "engagement" [23:53:42] * Ironholds strokes stubble [23:54:03] I mean, that looks like an interesting model. It looks like the Cox model over sessions produces more predictive value than edits. [23:54:19] I think we have a set of hypotheses about edit session termination, particularly, that we can test interestingly. [23:54:31] I'll do some thinking this evening about what hypotheses I can think of and how we might go about testing em? [23:54:44] +1 We could use a cog. psych person. (DarTar?) [23:55:16] howdy [23:55:18] DarTar would be awesome to have in on this. In his absence we probably know somebody else. [23:55:20] snap [23:55:34] halfak, oh, that reminds me, I found something weird I should send you that this hazard graph just reminded me of. [23:55:43] what’s the tl;dr [23:55:56] kevinator: will Q2 review be a 30 min meeting? [23:56:04] Or is that just for the devs [23:56:14] DarTar, We can test some hypotheses about human behavior by exploring sessions -- and we'll measure things more accurately too. [23:56:24] leila: 30 minutes for both teams. [23:56:31] mmm, is that enough? [23:56:34] is that a follow up from the WWW work [23:56:36] and we get a better idea of what we should target interventions at to perpetuate sessions, to boto. [23:56:46] DarTar, mostly it's halfak and I riffing on IRC about my January presentation [23:56:50] DarTar: 30 min QR for both teams (devs and research) is the deal? [23:56:51] got it [23:56:56] leila: it’s the direction coming from the top [23:56:56] specifically, motivations for terminating a session [23:57:05] Erik M sent out an email about it on Nov 20 [23:57:17] which email list kevinator? [23:57:40] kevinator: wow [23:58:15] sure, that will make our job easy: list of goals [x] done, round of applause, any questions [23:58:37] Ironholds: interesting [23:58:47] halfak: did you follow up with Wayne’s contacts? [23:59:11] Wayne? [23:59:32] kevinator: which email are you referring to? [23:59:34] leila: search for “from:erik quarterly review “ [23:59:47] wow, so it’s 30 mins *in total* [23:59:56] including Toby’s intro