[13:15:45] halfak, you want uuid-timestamp or uuid-timestamp_converted_to_numeric_value_representing_seconds? [13:16:00] (also; the mobile web query completed, the app/desktop queries are well on their way. Wheee!) [13:16:06] Any timestamp format is cool with me. [13:16:14] Woot [13:16:18] seconds it is! [13:16:25] I mean, if you're turning them into intertimes anyway it makes life easier [13:16:34] changes the problem to ts[i] - ts[i-1] [13:16:41] * halfak converts them to an internal format when streaming. [13:16:55] then I shall do it not for you, but for the reusers! [13:17:57] :P If you want to make it easy to reuse, then I'm going to need to convert the other datasets too. [13:18:12] Oh wait... I think we should only host intertimes. [13:18:20] hmn. For safety, and science? [13:18:23] That's the safes anonymization I could think of. [13:18:25] and user protection? [13:18:29] yup [13:18:38] gotcha. If only I had a vectorised piece of C++ I could immediately access to do that.. [13:18:41] oh, wait. I do! [13:19:05] halfak, problem with intertimes, though, is it means immediately discounting one-event users [13:19:05] Up to you. I'll generate intertimes if you don't. [13:19:12] Yes it does. [13:19:24] hmn [13:19:25] * Ironholds thinks [13:19:43] I'm not worried. If users who perform two events (who dominate the dataset) look a certain way... [13:19:48] I'm mostly thinking of who-does-what in terms of making the dataset most usefully accessible [13:20:08] I think I'll generate the intertimes so we retain the ability to easily pair [dataset] with [code used to extract dataset] [13:20:40] * Ironholds tweaks code [13:21:53] I don't think that is a good idea Ironholds. [13:22:05] 1. no one is going to use your hive queries [13:22:14] 2. All other intertimes will be generated from my code. [13:22:30] okay! I'll just dump the datasets out to TSVs and throw them over [13:22:40] this also gives me an excuse to create a readme for your repo ;p [13:22:50] :) [13:23:45] in the absence of a settled name I'm going to call it [title needed] [13:30:14] Ironholds, how many rows will you output? [13:33:15] halfak, goood question. Let's see! [13:33:36] for the mobile web, 100k unique IPs translates to 1,092,601 rows [13:33:42] I imagine it'll be more for desktop and more for apps [13:34:03] That's well within the range I have been working with. :) [13:34:10] Movielens is 19million rows. [13:34:25] yep [13:34:28] AOL was 28 million [13:34:32] edits is going to be a pretty substantial dataset, I hope [13:34:45] * Ironholds runs that now [13:35:26] Ironholds, if you have any trouble pulling edits then don't sweat it. I still have my 2013 dataset. [13:37:13] halfak, oh no, it's fine. I spent a big part of yesterday running it working out how big I could make the dataset before something broke [13:37:22] we're going to have all edits from 100k randomly-selected enwiki users [13:37:30] and pray to god that Koavf isn't one of those 100k [13:37:35] heh. [13:45:46] halfak, edits is 2,973,354 [13:45:56] Awesome. Easy peasy. [13:45:57] (I can grab more if you'd like more. Query is oddly cheap! [13:46:04] *) [13:46:06] It does an index scan :) [13:46:18] Oh say, can you make sure the query result is sorted? [13:46:28] That should also be an index scan :) [13:46:47] by timestamp? [13:47:02] or, {user_id,timestamp} [13:47:41] user_id, timestamp plz. [13:47:46] totally! [13:47:53] Not that it's a big deal if you don't. [13:48:00] unix "sort" is pretty kickass. [13:48:04] haha. [13:48:11] you're just looking for opportunities to use it now you've found ti ;p [13:48:12] *it [13:48:41] heh. I'm unix utilitying everything recently. [13:49:01] you're turning into Dario! [13:49:08] next you'll be writing love poems about grep [13:49:13] heh. [13:52:32] alright, back in 5, running to the paint store [13:53:56] You damn hooligans and your graffiti. [14:04:07] halfak, halloween costume! [14:04:21] okay, mobile web and edit datasets generated, desktop, mobile app and search queries hovering around 70% and moving fast-ish [14:04:28] should have everything for you by EOD :d [14:07:48] Woot! [14:50:57] Hmmm.. Yeah. There's some weird stuff in the movielens dataset. [14:52:34] Oh! There's an auto-logout! [14:52:40] THAT"S NOT A USER ACTION [14:52:42] ARG! [15:05:25] goddammit [15:05:28] xen went down. [15:05:31] * halfak filters [15:05:37] What's xen? [15:05:47] VPS chain I use for my ZNC instance [15:12:01] Oh! [15:12:11] The internet sucks at telling me what ZNC is. [15:12:38] Wikipedia beats the pants off the ZNC site. [15:12:39] https://en.wikipedia.org/wiki/ZNC [15:23:37] haha [15:44:56] Ironholds, did we talk at some point about having python 3 on stat2 and 3? [15:45:05] leila, we did and we do! [15:45:05] or this is my imagination. [15:45:08] doh [15:45:13] just type python3 [15:45:19] and WMUtils' "rpy" has python3 support! [15:45:24] * Ironholds jazz hands [15:45:27] owwww. how would I know this? [15:45:39] I'd just type python, and would end up in 2.7.6. [15:46:22] thanks! [15:49:04] np! [16:03:55] morning leila [16:04:07] morning DarTar [16:04:18] just got in and saw your messages [16:04:43] I am waiting to hear from Maryana because we have some urgent stuff to close this morning [16:04:47] uhum. no rush. I'm not sure what's the best thing for me to do today. [16:04:52] I can spend time on ground truth [16:05:12] make sure that we’re good for the check-in on missing claims, that’s about it ;) [16:05:23] I see. okay. take your time. cancel the meeting request if you need to. I've put a hold on property distributions until we talk [16:05:38] ow okay. from her email, I understood that we don't need those [16:06:03] we can have them ready, but it seemed like we will have to do other things [16:24:00] DarTar, is breakfast with lila happening? [16:24:18] I don't see it on the staff calendar. [16:26:11] halfak: nope, not that I know of [16:26:20] yay! [16:31:15] DarTar, gonna be a couple of minutes late, need to restart [16:37:04] leila: yt? [16:37:05] standup [16:37:22] yes, DarTar. have updated the calendar. can't make it to the standup [16:37:28] ah ok, np [17:28:44] hey leila, is the check-in with Legal happening? [17:29:26] neither Luis nor Manprit accepted [17:32:24] I hope DarTar [17:33:00] yes, starting the hangout in a sec [19:39:53] who cares about user agents in webrequest logs? [19:39:56] ne1?: ) [19:39:58] :) [19:59:26] DarTar, could you add a video to the meeting [19:59:37] (I don't have permission) [19:59:38] ach, sure [19:59:59] ottomata, mee [20:00:30] leila: I can’t, boo [20:00:34] asking Maryana [20:00:40] np. no rush on my end [20:18:37] DarTar, ping [20:19:23] leila, ping [20:19:49] (this isn't a coup by the remotees or anything, just random probability) [20:21:33] hey – in a meeting with mobile [20:21:40] both of us [20:22:19] :) [20:22:54] aww [20:41:18] halfak, you just want fingerprinting_elements/timestamps for opensearch? [20:41:50] Yup. [20:41:56] <3 [20:42:33] I'll get these plots uploaded so that you can see what I'm seeing. [20:42:53] sure! [20:44:34] halfak, query launched! [20:45:01] Thanks dude. [20:45:02] the others have now all finished the first step. apps are 53% through the final, desktop 9%, search 4$ [20:45:04] *% [20:45:19] We should probably stop adding more things if we want them to run, of course ;p [20:45:43] Oh yes. Good point. You should cancel the ajaxy one, but keep it handy. [20:45:56] naw, we're fine [20:46:01] OK cool. [20:46:12] I'm also trying to dig into it from Movielens. [20:46:16] Worst-case scenario is we lose ~3 hours off the start of the 30 day period, which isn't a big deal imo [20:46:38] mostly I'm just sad apps doesn't have its own set of varnishes. If they did we could partition much more nicely! [20:49:13] unrelated: apparently we have at least one wikipedia editor contributing on a phone at work at 10pm [20:49:22] ...the fact that I can pin this down makes me feel rather creepy [20:57:12] how do you determine "at work"? [21:01:21] Emufarmers, their IP address comes from a corporate network [21:05:03] wb DarTar [21:09:08] halfak, I think the Japanese have a stronger work ethic than enwiki contributors [21:09:21] they've got a far lower proportion of edits during work hours, with a far stronger peak at lunch and after work [21:09:27] American Exceptionalism, defeated by data! [21:09:41] also, the French work until 6pm [21:10:29] That sounds like an excellent blog post :) [21:10:45] it would be REALLY fun [21:11:24] we need WMtrends. OKtrends for Wikimedia! [21:18:33] speaking of American exceptionalism, Lydia is sitting in front of me trying to make sense of a giant box of jellybeans [21:24:34] DarTar, have you heard the story about the Beatles and jelly beans? [21:25:10] After Beatlemania broke out, George Harrison commented that he really liked jelly babies. So people would chuck them at the band. [21:25:27] except in the US, there are no jelly babies. There are only jelly beans. Which have the crucial difference of having a rigid shell [21:25:34] apparently it was like being very half-heartedly pebbledashed [21:26:53] anyway. DarTar, who do I send the technical task to? [21:36:23] leila, can you physically poke DarTar if you can see him? ;p [21:36:37] unfortunately not. wfh. [21:36:41] she can’t because she’s miles away from the office [21:36:45] he is in a meeting, that much I know [21:36:58] awww :( [21:37:03] thanks anyhoo both! [21:37:22] DarTar, so what you're saying is we need to spend my chunk of the conference travel budget on a boot on a reaaaaaally long stick [21:37:30] that way we can poke each other from far away [21:37:41] Ironholds: bbl, talking to Lydia [21:37:45] and also come closer to my goal of entirely Looney Tunesing the office [21:45:30] Ironholds: technical task -> me and toby? [21:45:56] kk [21:46:55] DarTar, I'm feeling really good about our work in the last month [21:47:09] October has 25 completed cards - September has 35, but almost all the October ones are big pieces o'work [21:47:24] yup [21:47:34] which reminds me I need to put together that freaking report [21:48:09] halfak, is there any meta documentation on your editor session analysis paper? [21:48:47] Hmm.. meta documentation? [21:49:11] Oh! I see what you mean. [21:49:15] like: something I can bluelink to instead of-yep [21:49:20] (I mean, I'll link the paper too) [21:49:42] The meta documentation is R:Activity session (formerly R:Edit session) [21:49:50] See also R:Session duration [21:50:06] We didn't report the overall labor hours on Meta though. [21:52:58] ta! [21:54:14] hmn [21:54:31] I may actually wait on addiing to this documentation until we've had an opportunity to sit down and work out where we want to put things [21:57:06] leila: I was thinking that the current missing claim dataset per se may not be particularly sexy to visualize, but either the “child of a superclass” breakdown or the cross-language comparison would be stellar, we should talk about this next week [22:13:37] Ironholds, what docs were you considering adding? [23:34:43] leila, did you get that wikidata query worked out? [23:35:03] yess [23:35:08] Awesome! [23:35:19] wanted to message you but you weren't here (until now?) [23:35:20] :D [23:35:40] Heh. We've been missing each other all day. [23:35:57] I pinged when I got to the coffee shop, but you and DarTar were in a meeting. [23:36:06] I just got home from the coffee shop now :) [23:36:08] yeah, it was the WikiGrok meeting