[00:10:08] leila: go for it! [00:10:15] k, thanks! [00:19:53] halfak, do you have access to http://dl.acm.org/citation.cfm?id=1378891&dl=ACM&coll=DL&CFID=445638422&CFTOKEN=21986693 [00:27:19] leila, looking [00:30:56] {{done}} [00:31:03] thanks, halfak [00:31:07] * halfak --> evening [00:31:10] night folks [00:31:16] g'night [00:31:21] g'night [00:31:28] ciao [00:31:58] leila: I have an ACM DL subscription in case you need papers that are paywalled [00:32:11] aa! good to know. thansk! [00:32:26] thought to not bug you now that you got a chance to focus for 15 min [00:32:27] ;-) [00:32:54] god, I ended up just filling in interview notes [00:33:11] yeah! [00:33:33] I’m going to talk to the recruiters, we badly need these cover letters [00:33:47] I don’t care how awesome they think the candidates are [00:33:59] I see. that really sucks. I assumed it wasn't required because you guys said you didn't need it [16:39:37] halfak: switching to the channel ;) [16:39:51] kk [16:40:04] 1) per-product priorities might be relatively clear, but cross-team or cross-product aren’t always explicit [16:40:33] 2) executives and management should give us guidance on what’s the most important thing we should be focusing on (per leila’s comment) [16:41:41] One thing I need from Execs is some involvement in the strategy process. I don't know what the heck is going on and people keep talking about it as though I should. [16:42:24] 3) interruptions, context switch, lack of importance when communicating a new request really affect out ability to make decisions [16:43:35] 4) individual researchers need to have agency and space for intellectual creativity [16:44:05] Yay for 4. I wish it was the first thing in the list. [16:44:28] Because if we don't have creative freedom, we're running at least than half capacity. [16:44:30] re: strategy process, you’re right, I don’t know why the process is not being communicated more clearly [16:44:49] It should be open. Are we not wiki people? Why hide it? [16:45:16] ask the question if you’re on the hangout [16:45:37] Hmm. in the stream. I'll move to the hangout. [16:46:18] I think the reason is that the execs want to make sure the discussion doesn’t go in every possible direction, but you should ask them directly ;) [18:40:05] Hey Ironholds. Can I ask you for a hive query that I bet you'll find super easy (or may have actually completed)? [18:40:27] I'd like to get a dataset that has for the last month. [18:40:44] Oh wait. you did already make this. [18:41:16] Oh wait. No. I thought that the pageviews table had a page identifier. [18:43:13] halfak: [18:43:14] maybe [18:43:16] This query wouldn't have to be "perfect", but I'd appreciate if it benefits from filtering the corner cases you are familiar with. [18:43:17] regexp_extract(w.uri_path, '^.*/(.+)$', 1) [18:43:20] simple crappy one [18:43:25] to get page title from uri_path [18:44:10] Yeah. I just figured that Ironholds might just have something in his back pocket :) [18:44:45] Basically, I want a "viewrate" table. [18:45:10] That would be immensely valuable. :) And I'm trying to generate it in lame ways now :( [18:46:07] halfak, what do I need? [18:46:09] aha [18:46:12] * Ironholds thinks [18:46:32] yeah, otto's approach is probably the most solid one right now [18:46:34] The deep appreciation of halfak [18:46:40] a warning that I'm currently running two long-ass queries [18:46:51] Bummer. That means I'm just going to process the view log dumps. [18:46:54] (or long ass-queries, XKCD-style. Either works.) [19:13:36] oh hey halfak, I added a really basic WHERE year=.... generator to WMUtils [19:13:56] https://github.com/Ironholds/WMUtils/blob/master/R/hive_range.R not perfect, but it gets the job done [19:14:13] Is that to do ranges? [19:15:19] indeedy [19:31:43] Pageviews are goddamn crazy pants. [19:32:13] Article: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[canister]]]]]]]]]]]]]]]]]]]]]]]]]]]]] gets 1 view [19:32:17] WHY? [19:34:30] * Ironholds blinks [19:34:32] that's weird. When? [19:34:48] Oh, there's a ton of page views that look like that. [19:35:03] and this is why we need pageid=int in x_sane_analytics? [19:35:09] YES [19:35:12] So much YES [19:35:55] * halfak stands on his head and tries to say the incantation that turns pageview nonsense into page_ids. [19:36:49] "python, oh python. Let us sequence the nonsense and identify the identifiable. Let us account for redirects and never speak of this again." [19:42:19] You really find some weird stuff here. We have a ton of redirects in enwiki that look like this: https://en.wikipedia.org/w/index.php?title=%22Pat_Keely%22&redirect=no [19:43:54] We have more redirects than non-redirects. [19:46:32] huh! [19:46:39] halfak, pull me a recent example and I'll check zee logs? [19:46:41] Hey Nettrom, the second page of this PDF doesn't load up for me: http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf [19:47:08] Ironholds, example of which type of thing? [19:47:31] redirects that look like ... [19:49:12] http://quarry.wmflabs.org/query/827 [19:50:14] ohhh! [19:50:20] I thought you meant we had /unexpected/ pageviews [19:51:35] Oh yeah! That too. I have some good examples. One sec. [19:52:07] hey Ironholds, just finished an interview, I’ll be a few minutes late [19:52:18] sure [19:54:00] Ironholds, pulling lines from hourly pageview files: [19:54:00] commons.m en.wikipedia.org/wiki/User:Slambo 1 20 [19:54:09] That should not be a url. it should be a page title. [19:54:41] I'm confused; what do you mean? [19:55:18] Oh. I'm working from the old hourly pageview files. [19:55:31] And it contains some titles that are crazy and/or impossible. [19:56:35] This is old definition problems, for sure. [19:56:58] The only reason I'm working from these files is because they implement a somewhat reasonable pageview definition. [19:57:12] But you can see the crazy squeeking through the seams. [19:58:59] hehe