[15:45:35] morning [15:46:18] o/ Ironholds [15:46:35] Presenting on session stuff today! [15:46:45] I'm just polishing the slide deck. [15:46:52] sweet! For WWW? [15:47:02] Yup :) [15:47:11] But today for the showcase :) [15:47:16] Yay! [15:48:02] Soo, latest update on WWW, btw; I will not be making it :/. The only flights available involved spending about 30 hours flying or prepping to fly, due to how it timed (going to paris, then waiting overnight, then flight) and 30 hours on my meds is... not a great idea for kidney functionality. Or my brain not having addictions. [15:48:48] Bummer. Yeah, I'm going to be braving those awful flights. [15:49:10] gooood luck [15:49:14] We could try getting you on VOIP for Q&A [15:49:19] I'm down! [15:49:20] but that rarely goes well. [15:49:27] I'm still happy to try. [15:49:37] I might try to get to Chicago the weekend before/after, because that's driving [15:49:44] but I will make myself available whatever :) [15:50:05] Ironholds, https://docs.google.com/presentation/d/1fXlyIRqEYL0riXKGCmxONyXjIHgD1ee8iioVqVXdTXw/edit#slide=id.g90103b519_0_860 [15:51:17] hahah [15:51:23] (the thanks page) [15:51:33] you use that bucket _everywhere_ :P [15:51:42] :) [15:51:49] I like the bucket. [15:52:07] The bucket of Science has lots of methodologies and ways of thinking inside of it. [15:52:21] And I like to dump it on things -- like actions on a timeline. [15:52:25] Or the editor decline. [15:52:50] yup! [15:53:03] MEGA-STUDY [15:53:45] "You realize you *hadn't died* in the first place." [15:53:56] Hidden wiki-gems [15:54:48] yup! [15:54:59] unrelated, my Harvard admissions officer is called Hillary Officer [15:55:09] Lol [15:55:31] http://www.careercast.com/career-news/could-your-name-predict-your-profession [15:55:37] hah! [15:55:45] by that standard I should break locks fulltime instead of parttime [15:56:30] I should own a plot of land. [15:56:56] it's actually doubly-amusing from my end, of course [15:57:00] because Hillary is an academic term [15:57:24] Oh? [15:57:33] https://en.wikipedia.org/wiki/Hilary_term [15:59:38] Greetings, Ironholds, halfak & other humanoids. [15:59:51] hey guillom :). In in a sec :D [16:00:02] Ironholds: I was actually going to ask: do we need to meet today? [16:00:12] ehh [16:00:13] * Ironholds thinks [16:00:27] Ironholds: Maybe just sync on the Wikidata thingy? [16:00:27] my update: VE work is done, pageview work is almost done, not had the time to get to anything else yet, which I am sadface about [16:00:32] yeah, that works [16:00:40] I'll be up in a sec (need to go get caffeine. And pants.) [16:00:40] IRC or hangout? [16:00:44] hangout WFM [16:00:47] ok, moving to the room now [16:22:08] o/ Hi ananthrk. [16:28:25] Hey YuviPanda, what would it take to get a dump of public data from quarry? [16:28:33] Hi Andrew [16:28:39] E.g. who ran a query, when and what was the SQL? [16:28:40] halfak: which data are you thinking of? [16:28:43] halfak: trivial. [16:28:50] Hi Aaron [16:28:53] I can setup a cron too if you want [16:29:04] :) I have a researcher who would really like to dig into it. [16:29:12] Cron would be great. [16:29:12] hi ananthrk. I’m leaving chennai tomorrow :( been running around and prepping to leave, so didn’t get time to meet, sorry [16:29:25] halfak: sweet. file a bug? can I get to it next week or should I do a by-hand-dump now? [16:29:41] YuviPanda, will file bug. No rush. :) [16:29:45] halfak: I can just make an sqldump, there’s no private data there, and the researcher can read it by just importing it to mysql locally [16:30:03] YuviPanda: Yup. Saw your updates and figured you will be busy :) Have a nice trip! [16:30:06] YuviPanda, SQL dump == :( [16:30:09] ananthrk: will do [16:30:17] halfak: well, I can write a simple thing that dumps it in some other format too. nbd. [16:30:22] :) [16:30:35] hmm, now that I think about it, I don’t really know what format to put these in. [16:30:42] JSON or TSV/CSV seems to be dominant formats for datasets. [16:30:57] well, it’s a fairly nested dataset [16:31:10] queries, query revisions, and query runs [16:31:23] so TSV sounds complicated [16:31:24] My opinion is to look at all like "events" and include identifiers. [16:31:32] Agreed that TSV won't work. [16:31:54] o/ ananthrk. Got a few minutes to chat about generating a re-curring metric. [16:32:51] hissss CSV [16:34:25] halfak: give me 10 mins. need to finish up something. [16:34:32] OK. Perfect. [16:34:37] * halfak writes YuviPanda a bug [16:34:58] halfak: yeah, I could do JSON but then it probably gets awfully nested. [16:35:13] halfak: also would this involve dumping the *output* as well? we can do that too easily but it might be a bit big :) [16:36:20] Na. I don't think so. Maybe the runtime or row count or something like that. [16:36:38] JSON would not need nesting :) [16:36:48] halfak: yeah, although we don’t capture runtime [16:36:51] we do capture row count [16:36:55] Cool [16:36:58] Good enough. [16:38:24] 10Quarry: Database dump for analysis - https://phabricator.wikimedia.org/T93907#1149417 (10Halfak) 3NEW [16:40:07] 10Quarry: Database dump for analysis - https://phabricator.wikimedia.org/T93907#1149427 (10yuvipanda) Hmm, so *this* would be somewhat hard to actually produce. The internal DB format is in https://github.com/wikimedia/analytics-quarry-web/blob/master/tables.sql, and I am not sure off the top of my head how to... [16:43:05] halfak: Oh good grief, you mean someone is going to look at all my horribly broken SQL queries in Quarry? [16:43:23] guillom, mwahahaha. [16:43:42] But seriously, "behavioral" researchers rarely look at examples. [16:43:47] * YuviPanda feels really happy with how quarry turned out :) [16:43:50] pfewww [16:43:53] They use code to "look" at your stuffs. [16:44:11] Now, the ethnographers... they are another story. [16:44:24] They might choose one of your queries to write their thesis on. [16:44:32] *shivers* [16:44:36] Or a book [16:44:38] ;) [16:44:41] �($I�)%! [16:44:45] four days [16:44:48] four days tracking down a bug [16:45:05] and it's an off-by-one error [16:46:03] *fixes, recompiles, tests again* [16:46:06] and now it segfaults! [16:46:08] much improved [16:47:31] YuviPanda: this looks terrific? is this the alternative to me just logging in to tool labs and doing it myself? :P [16:47:41] Ironholds, \o/ new errors [16:47:45] harej: quarry? for SQL queries? totally :) [16:47:45] I love new errors. [16:47:46] s/?/!/ [16:47:55] halfak, yep! best thing� [16:47:58] it’s an amazing tool, I’m told [16:48:24] harej: it kills queries after 10minutes though, although I am going to raise that to 20mins soon [16:49:17] halfak: I am back [16:50:12] ananthrk, I'm ahalfaker@wikimedia.org [16:50:16] Want to get on a quick call? [16:51:06] sure..hangout? [16:51:11] Yup [16:51:36] okay..let me send you an invite [16:51:46] halfak: have you put any thoughts into how a "trending topic" metric might work? a thought I had this morning is that i will want to see how a group of articles surges in editing relative to its regular edit volume [16:52:16] I think that Michael Gilbert has a bot for that. [16:52:32] Michael Gilbert needs to reply to his darn email [16:52:43] Woops. Looks like it is Kaldari's bot. https://en.wikipedia.org/wiki/User:HotArticlesBot [16:52:50] oh, halfak, you'll like this: I'm providing technical support to get an instance of Michael Gilbert's hand-coding tool up on labs [16:52:54] (if he's willing to open-source it) [16:53:05] so we'll have a generalised hand-coding tool! \0/ [16:53:15] Hmm... I'm literally making one of those right now. [16:53:25] right, I am familiar with Hot Articles Bot. but that's for trends *within* wikiprojects. what about trends *between* wikiprojects? [16:53:38] Ironholds, https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Coder [16:53:45] "this wikiproject is unusually galvinzed today" [16:53:51] yay [16:54:45] * halfak wonders what Michael's hand-coder looks like [16:55:09] Ironholds, I'm basically generalizing the strategy we used for AFT and bringing the UI within the wiki [16:57:13] cool! [16:57:26] Get J-Mo to send you Michael's stuff; it's optimised for wikis and already built and I am excited :) [16:57:36] * Ironholds sings the "less work for us" song [16:57:43] BTW halfak I've done a few handcoding sessions with the basic JS tool we discussed a few weeks ago: https://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback/Diffs [16:58:16] halfak Ironholds: as soon as Michael responds, I'll let y'all know if it's feasible to port Indicoder to labs. [16:58:45] yay! [16:58:58] it is sunny outside and my sweater is hella-comfy [16:59:03] I proclaim today only mostly terrible [17:15:14] Ironholds: Quick question: In page views referrers, does "unknown" cover both direct visits (without a referrer) and HTTPS traffic (with an unknown referrer)? Or can we distinguish between the two? [17:16:18] ask me on Monday when we have updated data [17:16:53] Ironholds: I was asking in theory, not specifically for our current data :) [17:17:09] gotcha [18:03:10] mornin' halfak et al. [18:03:27] we're waiting for office IT to join us and set up the call for testing, halfak. [18:05:19] Hi lzia. I'll be joining in a moment. [18:05:40] Actually there's nothing to join! [18:05:45] :) Ready when you guys are. [18:05:51] yeah, halfak. waiting for IT [18:06:01] there /has/ to be an easier way for this. [18:06:03] :-\ [18:06:54] and now I need Dario's password. ;-) [18:07:01] lol of course. [18:07:13] Try "DarioIsAWESOMESAUCE" [18:07:29] or MYFRienddaRio [18:07:33] :D [18:07:34] ! [18:07:35] :D [18:07:39] didn't work [18:08:02] so, halfak, while we're waiting: you're going first? [18:08:03] For the lurkers, see "My Friend Dario": https://www.youtube.com/watch?v=NOtmWxqN1fY [18:08:10] I'm OK with going first, yes. [18:08:16] okay. cool! [18:08:23] I'm going to go grab food quick. Back in a couple minutes. [18:09:05] k, halfak. [18:13:51] Back and ready when IT is [18:14:40] halfak: IT is in. we will ping you shortly [18:14:45] kk' [18:24:37] Research Showcase Starting in 5 [18:24:39] we don't hear you halfak. :D [18:24:41] hang in there. [18:25:06] Today we will have two presentations: [18:25:06] 1. User Session Identification by Aaron Halfaker [18:25:06] 2. Mining Missing Hyperlinks in Wikipedia by Bob West. [18:25:40] You can follow the talk on Youtube: https://www.youtube.com/watch?v=CgkwLXbALQg&feature=youtu.be [18:26:21] Now we can hear halfak but we've lost the projector :) [18:32:30] weee [18:32:31] Still getting set up folks [18:32:37] Should start any minute [18:34:17] it is very quiet! [18:34:39] This showcase brought to you by another demensiosnonsonsonsss [18:35:30] woooooo [18:35:37] leila is very quiet [18:35:44] hard to hear, for me at least [18:35:49] Are you hearing the reverb? [18:35:52] a lot of feedback still [18:35:53] yes [18:35:55] a while ago [18:35:57] i'm on youtube [18:35:58] OIT working on it [18:36:04] Maybe we could just get started [18:36:04] yup reverb [18:36:04] haha [18:36:11] -.o.- [18:36:20] cool effects pedals dude [18:37:02] halfak: we can't hear you at the moment [18:37:05] leila, just pantomime your research results [18:38:17] whoa [18:39:04] Looks like everything is back to normal [18:39:12] yay halfak [18:39:15] sounds good for me too [18:40:00] wow. This is weird [18:40:15] is good, but now i'm looking only at ashwin? [18:40:25] yes! [18:40:29] that sounds good leila [18:40:33] awesome [18:41:48] uhhh [18:41:55] no longer awesome [18:42:16] Someone has a mic on and playing the youtube video [18:42:23] it is good [18:42:29] so we just keep hearing stuff with the youtube live delay [18:42:36] oh yeah, that is what is happening [18:42:37] who is that! [18:42:38] Alright. If SF can work this out, I'm going to just start my presentation :P [18:42:43] I say this is just Dario playing a joke on us [18:42:46] i'm on youtube! but i'm plugged in and only have one window open [18:42:50] i hear the delay too [18:42:54] halfak: chip making a new hangout [18:42:57] haha [18:42:58] and sending the link [18:43:02] FYI: We'll have a new youtube link shortly. [18:43:07] ok [18:43:26] Horray for technology [18:43:47] So, who's ready for some SCIENCE!? [18:44:24] me halfak ;-) [18:44:32] Only if it's dangerous. [18:44:32] :D [18:44:40] but to be honest, I need to stay focused to help here if I can [18:44:42] :-\ [18:45:07] here's the new streaming link: http://youtu.be/PHQqicVoVx4 [18:45:26] thanks halfak [18:45:33] lzia, waiting on invite to the new hangout [18:45:40] yup. on the way [18:45:42] o/ AndyRussG [18:46:02] halfak: hi again [18:46:17] Is the youtube link changing too? [18:46:37] AndyRussG: yes, see topic [18:46:37] The one in the /topic is right [18:46:42] it changed a couple minutes ago [18:46:52] ah cool [18:48:49] here we go [18:49:14] sound is not very clear [18:50:07] *your* sound is clear though, aaron [18:50:46] I'm very sorry everyone for the long delay. we don't know what was going on but there was a serious technical problem on our end. [18:51:08] harej: which sound is not clear? the office's sound? [18:51:22] the sound coming from the speaker's microphone in the office [18:51:45] thanks, harej. communicating to IT. [18:54:36] Harej: itjhink you need to mute [18:54:50] I'm not actually on the hangout; just watching on youtube [18:55:15] harej: sorry, i think anyone on hangout not talking needs to mute [18:56:55] "You have been reincarnated.3 [18:57:13] " [18:58:53] hello research channel! I'm helping run AV for the showcase, everyone can hear Aaron well, right? [18:58:57] cndiv: harej mentioned my sound was not clear when I was speaking. [18:58:57] youtube viewers in particular [18:59:06] Aaron has mentioned this, too, cndiv [18:59:08] i hear aaron crystal clear [18:59:31] harej: OK, I'm sticking around to ensure that folks can hear mic users clearly. So sorry about all of this, I delegated. [18:59:49] cndiv: can hear aaron crystal. [18:59:56] YuviPanda: good. [19:00:37] In theory this is the last event like this in this space. It's far, far more complex than it needs to be - as the space was designed pre-hangouts. [19:06:52] halfak: cool, really neat stuff! [19:06:55] Aaron, nice work! [19:07:05] nice! that's great [19:07:06] I liked it very much! [19:07:26] I hope we recorded halfak as i missed the very beginning [19:07:30] if you have questions for Aaron, please send them here. I'll pass it to him. [19:07:36] I would like to run these metrics on arbitrary cohorts of users. Is that possible? [19:07:37] yes, it's on youtube nuria. [19:07:42] audio comments? [19:07:42] +1, as always halfak's presentation had both great content and great delivery [19:07:43] fine now? [19:07:44] omg it suddenly got much better [19:07:45] should be. [19:07:50] YuviPanda: Right, now it's fine? [19:07:54] cndiv: yeah [19:07:56] Do you think that RC patroling using Twinkle/Snuggle (or whatever is used these days) would look more like an operation instead as an action? [19:08:13] All of this crap gets a lot simpler on the 5th floor. There's no second user in the hangout to deal with. [19:08:23] Giovanni__: is this a question for Aaron? [19:08:33] Yes Leila! [19:08:43] got it. [19:10:23] Also I should mention that I did research on the overall activity lifetime of user accounts on Wikipedia and I found very similar patterns: two strong clusters separated more or less at the same 1-hour threshold [19:12:51] thanks, halfak. :-) [19:12:57] Thanks folks! [19:13:09] leila: halfak: do you the kind of actions (operations like edits in articles or edits in discussions) could influence session time? for instance, sessions with more edits in user spaces could be shorter (i hypothesize). [19:13:30] halfak: <3 [19:13:42] halfak: I would like to run these metrics on arbitrary cohorts of users. Is that possible? [19:13:42] marcmiquel, seems like something we could check. I have some datasets that would help you if you wanted to dig into that. [19:13:45] <3 YuviPanda [19:14:12] harej, I don't have a service for that yet, but it is relatively trivial to generate and I have python code. [19:14:13] Thanks Aaron! Yes I wish I had time to work on that ... :) :/ [19:14:28] harej, https://pythonhosted.org/mediawiki-utilities/lib/sessions.html#mw-lib-sessions [19:14:32] yeh. i'd like to work on this direction, you set a very interesting ground. thanks. [19:14:33] I'll take your Python. I like metrics. Surfacing metrics is one of my product priorities. [19:15:13] leila, I think that the wrong user is selected in the hangout [19:15:32] mm, why halfak? [19:15:46] Oh wait. I was mistaken [19:15:49] Sorry leila [19:15:54] phew! :-) [19:17:17] halfak: I have a few possible questions brewing but I'll have a peek at the paper first :) Is it the 2013 "Using edit sessions to measure participation..." mentioned in the slides? [19:17:55] More recent: http://arxiv.org/abs/1411.2878 [19:18:01] AndyRussG, ^ [19:18:26] brdlbrmpft? Easy! But who the hell is Walter Röblimann? [19:18:41] is this the last presenter, does anyone know? [19:18:46] Yes [19:18:48] Last presenter [19:18:52] cndiv, ^ [19:21:08] Woo! Both Bob and I are going to be presenting this work at WWW'15 in Florence! [19:21:37] I'm telling you, halfak. WWW and ICWSM being back to back helps. ;-) [19:21:41] * guillom misses attending scientific conferences. [19:21:56] I'm considering going to ICWSM since I'm going to be in Europe at the time. [19:22:02] guillom: submit a paper for ICWSM wikipedia workshop. [19:22:21] ^ this [19:22:31] Sadly I left my field of research when I joined the WMF :) [19:22:32] makes sense halfak. [19:22:59] guillom: let's talk about it. :-) http://www.icwsm.org/2015/program/workshop/ [19:25:27] "Orange" is an arbitrary destination. Do we know how to do link suggestions for real scenarios where we know people are looking for links? [19:26:23] halfak: cool thanks! [19:26:27] harej: that's the second part Bob will talk about now. that will be based on what people really do without us introducing a target [19:27:33] gotta run for now (child pick-up time)... sounds really cool Bob, I'll listen to the rest later :) [19:27:44] o/ AndyRussG|school [19:30:30] yes, after duplicates [19:30:41] ? [19:30:51] nm [19:31:46] Woo! Triples! I want to release that dataset! [19:32:03] halfak: an even better dataset will be released [19:32:06] that's in the pipeline [19:32:13] :D [19:32:14] Oh? [19:32:20] Better how? [19:33:11] we have talked about the details of releasing all traces, removing any editor data, with certain thresholds, etc. Once the work is completed, we will release the corresponding dataset, with the publication and all the updates. this should happen in 3-4 months. [19:33:49] Oh! So, more than triples. Long traces would categorically be truncated. [19:34:07] Because the longer the trace, the more unusual [19:34:10] yeah. [19:34:55] yes, I'll present it in the research meeting tomorrow or next week so we can gather feedback. with the current thresholds we have put, it seems safe to us to release longer traces. [19:35:02] halfak, ^ [19:35:12] Cool :) [19:35:37] if we can manage to send longer traces out without causing privacy issues, that'd be awesome. [19:37:36] need to leave but definitely checking our recording stream [19:37:36] If you have questions for Bob, please share it here. [19:37:46] thanks nuria. [19:39:36] Cool work bob! [19:41:00] +leila: a Q for Bob. If people use heuristic like clicking on the first link, then perhaps not all missing links would warrant a real semantic relatedness. So do you think the method may end up adding too many links? [19:41:56] related to Giovanni__'s question: I wonder if there was any sort of crawler detection / filtering [19:42:34] noted Giovanni__. guillom, you want to ask it in-room? [19:42:50] Giovanni__: Also, I wonder how many were people playing the Philosophy game :) https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy [19:43:24] leila: sure [19:43:43] Leila: Link following patterns could be ephemeral. E.g. going to "London" and then "Olympics" might only make sense temporarily. [19:43:56] Oh.. Forgot the Q. "What do you think about that?" [19:43:57] ha [19:44:42] Guillom, yes :) I think wikispeedia and thewikigame are exactly inspired on that. [19:44:56] tnx Leila! [19:45:02] halfak: you want to ask after guillom? [19:45:05] Sure [19:45:12] np Giovanni__. [19:45:27] bye yalls [19:45:31] nice presentation! [19:45:39] o/ ottomata [19:45:43] Thanks for coming :) [19:45:45] bye ottomata. [19:46:16] halfak: I think Bob is answering your question now [19:46:34] the things that are temporary, we should look at longer period of time, and see if it's sustained at the very least. [19:46:35] Indeed [19:46:41] +1 [19:46:53] Or we might recommend pruning after the pattern wanes. [19:47:19] like removing the link halfak? [19:47:22] Indeed. [19:47:58] If links are *for* navigation patterns and navigation patterns can chance, then presumably we could recommend the removal of links that no one uses. [19:48:02] I see. we should think about that. it's good to keep it in mind. We also should figure out how much of this we can eventually do. [19:48:21] yeah, I see what you say. [19:48:26] +1 The work is interesting regardless of the practicalities of tasking advantage of the results. [19:49:10] it's a neat work, it's good to see some version of it live some time not long from now. [19:49:22] +1