[14:04:02] moooorning [14:04:49] qchris in daaaa hoouuse? [14:06:04] milimetric in escape mood? [14:06:16] :) [14:06:19] escape mood? [14:08:01] yo ottomatta [14:08:04] ottmata [14:08:46] tnegrin, IRC tab-completes nicknames [14:08:58] it took me like 6 months to figure that out but it was very nice :) [14:09:11] you're very sweet -- I'm such a n00b [14:09:22] ottomata: are we testing the cluster? [14:09:31] it worked! [14:10:26] tnegrin: i only learned that tab complete thing like a few months ago [14:10:29] and tnegrin, yes! [14:10:45] milimetric: yeah html escaping :D [14:10:51] oh lol [14:10:55] i'll pull and check it out [14:10:58] aight [14:11:04] cool -- at least I've moved passed typing my password into the room [14:11:48] hehe [14:13:52] ottomata: hangout? [14:14:23] cool, am in batcave, am trying to push a commit to the kafka debian branch atm [14:16:04] kk [14:36:29] Hi *, I don't know if this the right place to talk about that :) for my research I developed java code to convert the wikipedia dblp dump from xml to json https://github.com/diegoceccarelli/json-wikipedia [14:39:19] now, in my json dump each line represents an article with several fields, and I've a type field (normal article, category, redirect, disambiguation etc etc). I just observed that some pages, e.g. http://en.wikipedia.org/wiki/Luca_Ceccarelli, are disambiguation, but in the dump this seems not codified http://pastebin.com/sUfr0AXM [14:39:45] do you have an explanation for that? am i in the wrong place? :) [14:40:40] hi diegolo, this is a good place to ask but you might wanna poke apergos in the operations channel that's the person responsible for creating the xml dumps [14:41:23] thanks drdee [14:42:29] computer crash :( [14:42:49] my WMF computer is dead for good [14:43:03] made it a whole 2 weeks [14:43:31] that's pretty good milimetric [14:43:36] Hehe :-) [14:43:54] yeah, I'm impressed [14:51:42] o f**k solved, {{hndis}} is human disambiguation ;) [14:52:00] happy to hear that! [14:53:43] hndis stands for human disambiguation and requires another human being to disambiguate, oh oh the irony [14:53:59] lol [14:54:05] i really love self explanatory keys [14:54:11] [15:25:57] Does mingle allow to show a diff of the card's description between two versions of the same card? [15:39:26] not a real diff [15:39:36] you can go to the history of a card [15:39:53] and see what has changed for the properties [15:40:13] the diff for the description is basically doing a manual inspection between the two versions [15:41:01] :-(( [15:41:12] All hail Mingle! :-D [15:41:41] we could write dingle [15:41:55] :P [15:44:14] :-D [15:52:32] http://www.youtube.com/watch?v=WktHXOOCw6U [15:52:36] meet steve harris ^^ [15:54:13] Iron Maiden videos in #analytics? Cool. [15:54:48] qchris: it's not iron maiden, it's steve harris [15:55:15] qchris: it's a different steve harris :) [15:55:54] Hehe. I thought so :-D [15:58:08] qchris: "mingle can help you streamline your process delivery" http://www.youtube.com/watch?v=CxpiULv0j6A [15:58:56] "if you can use excel and a web browser, you can use mingle; it's that simple" [15:59:43] I cannot use mingle... Hence I have to fail at using excel, or any webbrowser. [15:59:57] Sounds like a task for coq to check if that's correct :-) [16:00:03] :))) [16:00:44] p -> q <=> not(q) -> not(p) [16:00:52] average average that you start spamming us with mingle promo's [16:01:23] drdee: I thought it was informative so that's why I put the link here :) [16:17:31] when used correctly, mingle can double your synergy [17:02:34] SUPER WILD!!!! [17:39:38] milimetric: offtopic -> reading e-mail from Dario about PagesCreated metric in Wikimetrics [17:41:50] woot, running a 20TB teragen right now, gotta run to the bank for a sec, back in just ab it [17:42:53] average_: sent a quick follow up [17:43:15] DarTar: will read that also [17:43:25] thx :) [18:26:12] milimetric: in my last reply "native SQL behavior", I mean for example DATE(timestamp),,,,, GROUP BY 1 which uses the left edge for labeling [18:26:27] ok now actually going to the bank, geeez [18:28:21] DarTar: so to clarify. The time slice, if i was to start today, end 14 days from now, and slice by 7 days, I would get two values: Sept. 5th and Sept. 12th, right? [18:28:47] or August 29th and Sept. 5th? [18:31:14] hmm the left edge label is more intuitive for natural bins like hours, days etc, in this case I don't have a strong feeling which one is better, probably the former [18:31:15] that's a good point [18:35:52] so user input: 2013-08-29, 2013-09-12, that translates to a query [2013-08-29 00:00:00, 2013-09-13 00:00:00[ and you would expect right labeling for all data up to that point [18:42:40] milimetric: there's some good discussion on intervals closure/labeling options in the Downsampling section of Python for Data Analysis [18:43:33] sure but we should pick something and just make sure it's well known [18:43:40] I leave it up to you to decide :) [18:44:22] "The choice of closed='right', label='right' as the default might seem a bit odd to some users. In practice the choice is somewhat arbitrary; for some target frequencies, closed='left' is preferable, while for others closed='right' makes more sense. The important thing is that you keep in mind exactly how you are segmenting the data." [18:45:47] to me, right-labeling makes more sense [18:45:53] since that's the last day of data considered [18:46:02] but again, I'm no analyst [18:46:12] in this case it does indeed [18:46:26] but consider binning by day [18:47:07] you would rather label with the date at the beginning of the interval [18:49:13] I guess the bottom line is that (1) we always use half-open intervals (2) we make it clear what the default label choice is (3) if people become really picky about the default choice for labels we make it a configurable option :) [18:56:10] cool, that works DarTar, thanks [20:46:25] wooot, drdee, or anybody, want a varnishkafka failover demo? [20:46:40] yes in 20 minutes? [20:46:50] mayyybe! [20:51:15] me! [20:51:27] yO! [20:52:53] YO [20:52:58] does that mean you want it now!?!? [20:53:12] whenever you guys are ready! [20:54:36] 6 minutes! [20:59:07] Snaps_: i forget, is compression available yet? [20:59:18] nopes [20:59:20] wip [20:59:23] aye ok [21:00:25] i ma in the hangout [21:00:37] damn I didn't plan this enough [21:00:39] Snaps_: https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [21:00:46] have to plan it better next time [21:00:46] average_ next time better :) [21:00:46] qchris: if you wanna too [21:00:52] good learning experience [21:01:09] ottomata: Sure. Booting google machine. [21:01:44] off to a ping pong party everyone, have a nice night [21:01:51] laterz milimetric [21:01:58] milimetric: Have fun! [21:02:05] laters! [21:02:08] punish that gingle for me drdee, I think it's almost read [21:02:11] *ready [21:02:16] aight [21:02:36] milimetric, drdee: sorry they shut down the internetz:( [21:02:45] np [21:02:47] anyway, yes we should have that conversation and yes this is a priority for enwiki too [21:03:25] thanks for the demo, great work