[14:48:48] morning guys [14:49:13] i am back under the living 9.9 [14:49:52] moorning [14:49:54] hiyoooo [14:57:53] YuviPanda|away: fyi you should have hadoop access now, ping me or milimetric later if you want some help figuring out how to use it :) [14:57:58] i'm running to a cafe, back in a bit [15:00:02] drdee: hi !! [15:00:16] guys - don't we have standup now? [15:00:28] hi ottomata [15:00:28] hi milimetric [15:00:32] wha standup? [15:00:40] didn't we change it to 10:00 EST? [15:00:41] was the time changed? [15:00:50] in yesterday's meeting [15:00:57] poop - I think I needed to do that [15:00:59] ugh, sry! [15:01:35] drdee: I don't have access to the standup [15:01:48] yeah, we agreed on 15:00 UTC [15:01:52] 1 sec [15:11:54] average: standup? [15:12:01] sorry I screwed up the timings a bit [16:24:39] ok actually heading to cafe now, back in a bit [17:23:31] oh hey! btw [17:23:39] if anyone here is looking for a fun hadoopy task to do [17:23:54] we need something to remove duplicate entries from raw webrequest json data in hdfs [17:24:13] just sayin! [17:43:56] I LOVE THAT STANDUP IS EARLIER [17:43:58] this is so great [17:44:02] I can eat lunch normally [17:44:10] i won't go through the midday jittery and starving [17:44:11] woooo [18:02:00] ottomata: I think we need an index on sequence numbers [18:03:02] once we have that, we can pull data from raw webrequest into some "clean webrequest" without duplicates and knowing about any missing sequence numbers [18:03:19] then we only have to keep raw webrequest stuff around for a day or so [18:03:46] i've gotta go to the doctor's for pre-op checkup [18:04:08] having a surgery next week - not sure how it'll impact my working (doctor says it won't) [18:04:43] ohhh surgery uh ohhhh [18:07:57] surgery's no big deal, nothing serious [18:10:02] an index on seqs? [18:10:41] cool yeah that makes sense ok [18:10:52] hmmm [18:11:13] haven't worked with indexes in hive yet [18:18:58] * Ironholds starts hive for another day of debugging [18:19:58] i am here for you! [18:20:08] except I have a short meeting that starts in 10 mins [18:20:11] aside from that I am here for you! [18:24:30] I've gotta run to the doctor Ironholds, but let me know if you have trouble and I'll get back to you [18:25:12] milimetric, have...fun? ;p [21:06:40] drdee et al, pls post something re 93006 [21:06:53] we need to get ops on board :) [21:18:26] qchris is your man [21:21:56] the hadoop pagecounts table; does it have/is it meant to have any enwiki data in it? [21:28:35] it should have everything that pagecounts has on dumps.wm.org [21:28:38] so ja think so [21:44:23] IH - yes, for the hours imported, it has the full content from dumps [21:44:57] (dumps.wikimedia.org/other/pagecounts-raw that is, not xml dumps) [21:45:02] IH|lunch: ^^ [21:45:14] aha. ta [21:45:18] interesting [22:03:41] DarTar: ping [22:08:27] drdee: is he online? is his nick qchris ? [22:08:58] i am online [22:09:05] yurik ^^ [22:09:22] qchris is christian, you need to get him onboard with your plans [22:09:40] drdee, you are not qchris, right? [22:09:47] no i am not [22:09:54] drdee: thought so :) [22:10:04] he said he is ok with that patch [22:10:08] just need his +1 [22:10:11] and yours :) [22:10:24] just his :) he is engineering, i am not [22:10:31] https://gerrit.wikimedia.org/r/#/c/93006/ [23:33:04] (PS1) Milimetric: November ComScore [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/94066 [23:33:12] (CR) Milimetric: [C: 2 V: 2] November ComScore [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/94066 (owner: Milimetric)