[15:16:45] Hey science people! [15:21:23] Gmorning Ironholds [15:21:33] morning halfak! [15:21:35] how goes? [15:22:02] Not bad. The coffee must flow. [17:23:26] ottomata: does the VLAN mean you can finally give virtualenv to the researchers? :) [17:35:06] yeah [17:35:16] heh [17:35:21] halfak, so I just encountered the following term [17:35:24] "yak trace - n. Explanation of the chain of unlikely obstacles that led from your actual goal to the ridiculous thing you're doing right now." [17:35:35] I feel like a tremendous amount of my time is spent either yak tracing or causing yak traces. [17:35:49] Ooh. Useful term [17:35:58] "What are you doing?" "comparing emplace_back to push_back" [17:36:14] "why?" "A product manager asked me how long people spent reading articles" [17:39:17] (you should always use push_back, is my conclusion. emplace_back is too compiler-dependent a call.) [17:41:00] Visual depiction: https://i.imgur.com/t0XHtgJ.gif [17:41:21] PERFECT. [17:41:27] (Also, I love that GIF.) [17:41:35] One day I want to move to a universe where everything is expressed via GIFs [17:42:13] Ironholds: I've always heard of it as 'yak shave' [17:42:32] https://en.wiktionary.org/wiki/yak_shaving [17:42:59] http://www.hanselman.com/blog/YakShavingDefinedIllGetThatDoneAsSoonAsIShaveThisYak.aspx [17:43:11] also I saw some Yaks in a friend's picture. OMG SO CUTE [17:44:31] YuviPanda, oh, totally! [17:44:43] yak shaving == ridiculous thing to solve unlikely obstacles [17:44:59] yak trace == explanation of the path to yak shaving [17:44:59] as in, stack trace [17:45:38] ah, heh. so trace is just... like... a stacktrace [17:45:40] but... for your yak [17:46:24] How you got to the yak, anyway. [17:47:39] halfyak [17:50:42] I feel like there needs to be a "yak trace" button on phabricator [17:50:50] it takes the bug you're on and traces the trail back to the epic. [17:52:12] :P Emufarmers [17:54:12] alright, all session code working again! [17:54:12] and I worked out how to rewrite the intertime generation so that it works nicely with data.table [17:54:13] and you can just dt[,j = list(inter = intertimes(timestamps)), by = "uuid"] [17:57:02] halfak, can I rubber duck with you for a second? [17:57:12] meeting ~15 mins [17:57:18] in or until? [17:57:28] can talk in 15 [18:13:15] halfak, kk :) [18:34:04] hi, I'm making a research about references in articles in ptwiki using history dump, and I'm not finding tag before january 2006, someone knows if the reference tag have other notations or if they don't exist before 2006? [18:36:16] danilo_: halfak might be able to help with wiki archeology [18:43:47] danilo_, the ref tag wasn't initially part of MediaWiki, iirc, it was part of Extension:Cite, which is from late 2005/early 2006, yep [18:44:26] Prior to that there weren't really any tags [18:44:28] see https://en.wikipedia.org/w/index.php?title=Adolf_Hitler&oldid=23190602 for example [18:45:16] Oh, wait [18:45:23] danilo_, looks like there was Template:ref from 2005-ish [18:45:32] so you can look for the {{ref| format [18:51:34] ok, I was looking for a bug in script, but it was't a bug, thanks! [19:18:38] Ironholds, so if I open a enwiki page, and that page has pictures in it, will I see different logs for each of those pictures? [19:20:14] yup! [19:20:14] because god hates us all [19:20:19] but MIME type filtering can exclude that particular example very easily. [19:20:40] got it. thanks. [19:20:45] np :) [19:20:57] what are you looking for/at? [19:24:00] leila, ^ [19:24:41] looking at webrequest logs for https://meta.wikimedia.org/wiki/Research:Improving_link_coverage [19:24:50] trying to understand the data with Bob [19:25:08] ahh [19:25:21] oh, that's gonna be fun on a bun for you :D [19:25:28] that reminds me of a really useful function I could write [19:25:33] A URL decoder [19:25:37] R has one. It's broken. [19:25:58] ah! I don't mean to add more work for you, Ironholds. ;-) [19:26:07] * Ironholds waves away [19:26:10] it's independently useful [19:26:14] I've been wanting one for MONTHS [19:26:32] plus, the R version is not only broken but, more offensively, not vectorised. [19:28:52] oh sweet, glib has some [19:34:13] aw, but glib is not standard. Bah. [19:36:31] Ironholds, do you have the code you use for App pageviews somewhere I can look into? [19:37:04] https://github.com/Ironholds/WMUtils/blob/master/R/log_sieve.R [19:37:04] see app_handler [19:37:40] thanks! [19:37:51] update: [19:38:06] after extensive research I have concluded that while R URL decoding is broken, this is at least in part because URL schemes are broken [19:39:56] But that R's suckage is independent of this. [19:47:53] I concluded my research about references using history dumps, I put the graph in commons: https://commons.wikimedia.org/wiki/File:Ptwiki_references_in_articles.png [19:48:28] danilo_, that's so cool! [19:48:47] is that references by article, or proportionate to page size, or..? [19:49:15] percentage of articles that have reference [21:49:03] Ironholds, does WMutils have code for chunking sessions? [21:49:16] (you really should have kept the workshop ;-) ) [21:50:25] leila, dividing things into sessions, or calculating information about sessions? [21:50:36] dividing into sessions [21:50:49] at the moment, no, but I want to build it. [21:50:49] I'm going to call it sessionizer just to piss off halfak [21:51:07] haha. okay, in that case, Bob and I may work on it and pass it to you [21:51:29] will keep you posted [21:51:56] Cool! Let me know how you resolve the 3600 problem :D [21:52:55] 1 hour? [21:53:28] yup. Specifically, you have a sequence of intertimes to divide into sessions [21:53:32] values are 3, 9, 3600, 3600. [21:53:39] 30min [21:53:41] :D [21:53:43] okay, 1800 [21:53:51] the point is sequential values that are > threshold [21:53:56] yeah [21:54:02] I'll let you know for your next paper [21:54:02] if you have 3, 9, 3600, 5 that is easy. two sessions of 3, 9 and 5. [21:54:03] ;-) [21:54:21] 3, 9, 3600, easy. Two sessions of 3, 9 and -1 or whatever your standin is for "can't work this out". [21:54:36] it gets more painful with sequences of >thresholds, though. [21:54:46] I was hoping to rubber-duck with halfak about just this problem so I could build something, but he vanished. [21:55:07] will let you know what we come up with [21:55:30] okie :) [21:55:36] what are you trying to turn into sessions? [21:55:41] and, what is the data you want out of the end? [21:55:56] separate question, in terms of dealing with the data for testing: I make a table in leila database. can I use hive to run queries on that? [22:05:34] leila, that is, a leila database in Hadoop? yep [22:05:40] just leila.table_name [22:05:51] but it gets fun with joins - http://blog.ironholds.org/joining-between-databases-in-hive/ [22:07:57] yeah, thanks Ironholds [23:26:37] Ironholds: do you know if the the sequence column is related to compressed storage? [23:28:24] in hive? [23:28:34] I think sequence is just "what request number was it on that varnish" [23:28:57] got it. thanks Ironholds.