[02:02:39] New patchset: Milimetric; "/jobs/create demo working for tomorrow" [analytics/wikimetrics] (master) - https://gerrit.wikimedia.org/r/70579 [02:02:51] Change merged: Milimetric; [analytics/wikimetrics] (master) - https://gerrit.wikimedia.org/r/70579 [03:03:53] Change merged: Tim Starling; [analytics/log2udp2] (master) - https://gerrit.wikimedia.org/r/58449 [13:51:47] mooooorning [13:55:46] morning! [14:00:35] ahhh drdee hi :) [14:00:46] yoooo whaazzup? [14:00:48] I am going to move your Kraken jenkins job https://integration.wikimedia.org/ci/view/Java/job/Kraken/ [14:00:55] i noticed the bug report [14:00:59] it is currently running on master, I have setup slave box and will make it run there :) [14:01:03] cool [14:01:04] ty [14:01:06] doing so right now [14:01:24] you know we could make it build on patchset submitted to Gerrit ? [14:01:56] building it [14:02:29] sure i know, but it's not yet hosted on gerrit [14:02:33] ahhh [14:02:40] yeah I noticed the gh-pages branch :-] [14:04:59] * hashar looks at maven downloading the whole internet [14:06:35] retriggering a build [14:07:18] https://integration.wikimedia.org/ci/view/Java/job/Kraken/6/ :( [14:07:21] does not build sniff [14:07:34] java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/libdclassjni.so [14:08:28] yup [14:08:38] we are working on debianizing dclass [14:08:48] then that dependency will be fixed [14:09:03] see https://mingle.corp.wikimedia.org/projects/analytics/cards/716 [14:09:17] and [WARNING] The POM for storm:storm-kafka:jar:0.8.0-wip4 is missing, no dependency information available [14:09:17] :D [14:09:40] so I guess the migration to slave is working [14:09:44] yup [14:10:00] there are some issues that we are aware off and we are fixing them [14:10:35] good :-) [14:33:57] ottomata: how goes the backfilling? [14:35:04] hiya [14:35:21] hey -- how goes the backfilling? [14:36:22] it goes well! [14:36:24] let's see [14:36:58] most of the backfill finished for carrier [14:37:15] but there are 10 days throughout the backfill that had TIMEDOUT, not sure why [14:37:18] i submitted them for rerunning [14:37:24] country is almost all done [14:37:34] awesome! [14:37:53] it hadn't finished, its running the last few days of mid may now [14:38:06] but, it TIMEDOUT on the same nominal days as carrier [14:38:12] which is strange [14:38:20] and indicative that there might be something wrong with those days [14:38:30] what is TIMEDOUT? is that our error? [14:38:35] not sure, i'm looking at the data for one of those days now, to see if I can see anything [14:38:42] that is what oozie reports for that day [14:38:43] ummm [14:38:44] lemme see [14:38:47] hmm [14:38:58] well, i have sumibtted them for reruning [14:38:58] so now they are just WAITING [14:38:59] they should run when there are free job slots [14:39:07] if they timeout again I'll see if I can find out more [14:39:41] 9 days: [14:39:42] 2013-02-14 [14:39:42] 2013-02-27 [14:39:43] 2013-03-08 [14:39:43] 2013-03-09 [14:39:43] 2013-03-12 [14:39:43] 2013-03-13 [14:39:43] 2013-03-14 [14:39:44] 2013-04-19 [14:39:44] 2013-04-26 [14:40:10] we should find some time for you to get me access to the cluster -- job tracker, name node, etc [14:40:27] today is too busy maybe tomorrow [14:42:33] who wrote those jobs btw [14:42:34] totally [14:42:45] ummmm, i think david may have written the pig [14:42:54] and the original oozieness [14:42:58] i have worked on the oozieness twice now [14:43:05] and tweaked the pig stuff yesterday [14:43:10] drdee believe wrote the UDFs [14:43:14] I believe* [14:43:34] yup that's correct [14:43:43] i spent a day on making backfilling this stuff much easier a couple of weeks ago [14:43:58] which was super beneficial yesterday [14:44:14] that's why I said I think I could get this started last minute [14:44:26] there are a few more oozie abstractions I'd like to spend time on to make jobs like this even easier to run [14:49:45] oops -- sorry -- short attention span [14:50:21] I think that's a good idea -- I want to nail down the hadoop platform stuff [14:56:26] ottomata: when will the graphs be updated? [14:57:29] erosen has to do that [14:59:03] it sucks though that the backfill has some days not yet complete in every month [14:59:11] so, if he makes the graphs, we won't have the data for that yet [14:59:21] fingers crossed that it was just a fluke and they will rerun fine [14:59:39] and there is erosen :) [15:00:00] hello there [15:00:13] i updated the dashboards last nigh [15:00:15] t [15:00:21] around 1am PDT [15:00:28] i'll rerun them now and see what we get [15:00:58] hi erosen -- thanks! [15:01:19] erosen, just in case stats.wm.org/kraken-public isn't up to date (it rsyncs) [15:01:25] you might want to get your .tsvs out of hdfs [15:01:27] I think we can live with some gaps. [15:01:46] ottomata: it is only like a 15 minute lag at most, right? [15:02:03] ah, they all timed out [15:02:05] those 9 days [15:02:08] erosen no [15:02:16] ummm [15:02:21] ottomata: I see [15:02:32] I can always download the files from hdfs too [15:02:36] oh [15:02:37] yes [15:02:37] sorry [15:02:42] it rsyncs every 15 minutes [15:02:47] sorry i thought it was longer than that [15:02:50] ok yea [15:03:04] k, just updated [15:04:05] ottomata: looks like the country has backfilled, but not the carrier, is that consistent with expectations? [15:05:22] hm [15:05:46] no [15:05:52] i see 02-01 dates in carrier [15:06:01] carrier was lagging [15:06:04] and the backfill job doesn't coelesce [15:06:24] so its possible that some of the days are not coalesced in the .tsv [15:06:32] where are you missing backfill data? [15:06:54] i'm mostly just looking at the end result graphs [15:07:04] ottomata but let me check the tsv [15:07:20] there are 9 days missing from both carrier and country [15:08:29] everything else looks done [15:08:37] lemme manually coelesce the carrier data, uhhhh [15:08:41] now how do I do that… :p [15:09:37] oh! look at that! david wrote a little python script to do it [15:14:17] ottomata: heading to train station, I'll be back on line in 10 [15:14:43] k [15:21:27] erosen [15:21:32] oh he's not here [15:33:34] ottomata: did you manage to coalesce things? [15:34:16] yes! [15:34:16] and there def was more data after I did that [15:34:19] i put it at the usual location in hdfs [15:34:25] if you want to grab it from there and check it [15:36:03] ottomata: sounds good, grabbing from hdfs now [15:41:59] ok, tnegrin, erosen, the reason that those 9 days in the backfill failed [15:42:11] is because there is an occasional missing data import in those days [15:42:24] e.g, 2013-02-13 [15:42:24] is missing [15:42:31] 2013-02-13_19.00.00 [15:42:42] it has [15:42:43] 2013-02-13_19.15.00 [15:42:43] 2013-02-13_19.30.00 [15:42:43] 2013-02-13_19.45.00 [15:42:50] just not the 19.00.00 import [15:43:05] so. [15:43:07] hm. [15:43:16] ottomata: what do you mean by "data import" [15:43:24] you mean the input stream is missing? [15:43:40] i mean there is a 15 minute interval that doesn't have any data in hdfs for that day [15:43:41] that's all i know [15:43:41] the directory exists [15:43:41] it is just empty [15:44:04] gotcha [15:44:14] so 2013-02-13_19.00.00 should contain data for 19:00-19:15 [15:44:18] but it is empty [15:45:37] ottomata: one more concern... [15:45:40] check out: http://gp.wmflabs.org/dashboards/orange-ivory-coast [15:46:10] the spike near Feb 26 looks concerning [15:48:35] i'm looking at filesizes for feb 26 now [15:48:41] i see two anomalies [15:49:06] 2013-02-26_01.00.00 is 1/7th of what it should be [15:49:08] and [15:49:18] 2013-02-26_03.00.00 [15:49:27] is about 2x what it should be [15:49:38] aside from that import filesizes all look normalish [15:50:34] interesting [15:50:42] could be a varnish ip thing [15:51:47] erosen: i belief i always dropped those anomolies [15:51:59] see also spreadsheet that I sent to amit two weeks ago [15:52:08] gotcha [15:55:03] erosen: what about the fraction charts, do you have time to add those today as well? [15:55:15] they should be up to date [15:55:18] drdee: ^^ [15:56:01] erosen, could be a varnish IP thing, but I dont' trust our data flow path very much yet, so I wouldn't be so quick to blame varnish :) [15:56:14] erosen [15:56:15] i only see two charts on the dashboard, not three [15:56:24] erosen, for the 9 missing days [15:56:33] drdee: which dashboard are you looking at? [15:56:39] they are only missing a 15 min interval or two [15:56:40] orange-ivory-coast [15:56:48] shoudl I force the jobs to run without that data for those days? [15:56:49] aren't there three tabs? [15:57:11] ottomata: I think it would make the charts look a little prettier [15:57:18] but it is your call [15:57:30] erosen: sorry; i didn't see the tabs [15:57:34] np [15:57:42] naw, not my call! it means the daily aggregates will be off because of missing import intervals [15:57:44] not sure which is better [15:57:55] slight inaccuracy or missing days [15:58:01] ottomata: tough [15:58:03] let's ask amit? [15:58:07] be all agile, and stuff? [15:58:09] for the charts with only monthly data points we should change the x-axis to month and not to days, IMHO [15:58:32] drdee: is that possible in limn? [15:58:41] i hope so :D [15:58:49] it's not? [15:59:07] not sure, I'm checking on reportcard [15:59:24] drdee: would appear to be possible [15:59:41] could you fix that? [16:02:19] yeah, working on it [16:06:03] thx! [16:07:13] drdee: ottomata: walking from train station, be back in 20 [16:07:17] aight [16:07:25] k, i'm losing battery [16:07:28] internet at this cafe is crappy [16:07:32] i'm going to get lunch and move locations [16:07:41] k [16:20:59] erosen -- are we ready to show Amit the graphs? [16:23:36] tnegrin: erosen is on his way to the office, should arrive shortly [16:23:43] k [16:49:49] yoyo [17:02:36] heya, standup? [17:02:40] drdee? [17:03:01] ottomata: don't we skip on these days? [17:03:58] oh i guess so, its in the cal though, no? [17:04:47] yeah [17:19:55] we skip on sprint demo days [17:20:10] erosen: what's the status of the dasbhoards? [17:20:24] drdee: been trying to get limn to use a monthly scale for a while... [17:20:30] probably should stop working on that [17:20:55] otherwise, there is that discrepancy issue [17:21:18] where a few dates like 2/25 throw everything off [17:21:35] can you drop those anomolies? [17:21:39] i can hard code things to ignore those days, yeah [17:23:20] i would do that for now [17:23:38] that's consistent with the spreadsheet that I gave to amit [17:24:16] k [18:04:29] https://plus.google.com/hangouts/_/943c24ad200653796cbd1b3dae872c1c186910c5 [18:04:32] erosen ^ [18:04:44] https://plus.google.com/hangouts/_/943c24ad200653796cbd1b3dae872c1c186910c5 [20:03:52] ottomata, ping [20:04:05] hiya! [20:06:24] ottomata, can you spare a little time to introduce me to the analytics servers and such? i read through the pages on mediawiki.org, and i have a basic understanding, btw. was hoping we could do a google hangout [20:06:49] ottomata, was wondering if now was a good time :) [20:11:06] hmmmmm, i am in the middle of figuring out some pig stuff at the moment, and only have about an hour left to work today… [20:11:10] tnegrin also wants an intro [20:11:17] maybe we can schedule one with the both of you at the same time? [20:11:37] yes! [20:12:15] tnegrin, i betcha your calendar is pretty full [20:12:20] want to do after standup tomorrow? 10:30 your time? [20:12:25] dr0ptp4kt? [20:12:52] i have 30 mins then -- good for a start [20:13:01] ottomata, sorry man, went to another window. but poke alert brought me back! hang on a sec, will check calendar [20:13:16] ok cool, yeah that should be fine [20:14:31] ottomata, tnegrin: 10:30 am pacific time works for me, too. [20:14:37] ok cool [20:17:32] ottomata, want me to setup the hangout, or you got it covered? [20:17:49] oh, sure [20:18:52] dr0ptp4kt [20:19:03] you have a shell account on the analytics machines, right? [20:19:08] have you been able to log into those? [20:19:13] tnegrin: do you have a shell account yet? [20:19:27] doubt it [20:19:40] ok, rats, the ops policy is that we have to wait 3 days to get you one [20:19:45] i'll put in an RT ticket now [20:19:50] thxz [20:19:55] uhh, you are the manager, so you have to sign off on it i guess :p [20:20:09] ottomata, yeah, i shelled into analytics1001. i have a mac with mac os 10.8, so i think i will need to figure out a way to work around /etc/hosts (i see people talking about setting up their own local dns servers…insanity) [20:20:19] ? [20:21:00] tnegrin: do you have an office.wikimedia.org account? [20:21:12] yes [20:21:28] ottomata, mac os polls authoritative dns instead of trusting the /etc/hosts entries first. so a number of hosts don't resolve properly. but i was thinking i may be able to get some tips and tricks in our session! [20:22:01] ok, not quiiite sure what you are needing that for but ok! [20:22:30] ottomata, mediawiki.org suggests setting up aliases locally for some servers. i think after our discussion things will be clearer for me :) [20:22:32] ok, tnegrin, can you go ahead and create a page under your username on office wiki and add a public SSH key? [20:22:38] that you'd like to use to sign into servers [20:22:54] ahhh, ok, some of that might be outdated [20:23:12] ok -- just do the standard ssh-keygen on my mac? [20:23:35] yeah that's fine [20:23:40] it'd be best if you used a pw protected key [20:32:55] sure [20:33:12] tnegrin: did you finalize a time to meet with amit? [20:33:22] or a time to go over current dashboard status? [20:33:24] oops no -- thanks for the reminder [20:33:27] np [20:33:29] one sec [20:33:45] i'm just trying to find a chance to eat some lunch, but I wanted to leave some time to chat with diederik and you [20:37:31] understood - I have a 2:30 and a 3. Will we have enough time to talk if we meet amit at 2? [20:41:35] yeah, i think so [20:41:52] tnegrin: shall we hang out now? [20:42:15] drdee: are you around? [20:42:19] always [20:42:34] great, I'm headed to the batcave [20:44:41] going to the cave [22:54:07] drdee: I realized that the file you created won't necessarily work, because you combine m. and zero. [23:10:02] arrgh [23:10:35] I can safely assume everything is m. [23:10:44] but it could be problematic for amit