[09:11:28] good morning [09:11:31] milimetric1: hi [09:11:41] dschoon: hi [13:41:42] morning [13:41:56] morning! [13:41:59] hi average_drifter [13:42:06] and otto :) [13:42:11] average_drifter: ping me earlier? [13:48:11] morning all [13:48:15] milimetric time for hangout? [13:48:45] ottomata: udp2log and ssl? send it to a new instance of udp2log instead of disabling it? [13:49:00] sure drdee, I got time [13:49:08] awesome [13:49:39] https://plus.google.com/hangouts/_/5d4b2c4a70f39f50d0cc468cb4d2c421a265343d [14:56:06] milimetric let me know when you are ready to talk about metrics api feature cards [14:56:15] yep, will do [14:56:19] you mentioned there was a meeting today [14:57:14] when is that drdee? [14:57:38] i have a meeting with kraig about this :) and i need to be prepped for it [14:58:19] ohh and what's the status of the skin and active editor correlation bug ? (https://bugzilla.wikimedia.org/show_bug.cgi?id=45179) [15:03:41] i have a script for 45179, but haven't had time to vet it [15:04:22] maybe attach it to the bug and ask if someone else can push it forward? [15:04:23] drdee, when is that meeting with kraig? [15:04:47] um, I'd like to take another shot at it, maybe I'll finish it tonight off-hours [15:05:37] the reason is, I know T-SQL very well but MySQL seems to work a bit different [15:05:42] k [15:05:50] and I'd like to get to know that a bit more [15:05:54] meeting with kraig is 2pm est [15:06:03] k, cool, just trying to time-budget, thx [15:06:12] aight [15:35:08] ok, drdee, i've got gadolinium mostly set up for things that aren't dependent on Jeff and for the university forwarded streams [15:35:21] awesome! [15:36:08] I can move the university streams over now, but I'd have to disable them on locke at the same time [15:36:14] but that would make gadolinium officially in production [15:36:20] and i'm not sure how much benchmarking we wanted to do on it [15:36:31] kraig and dschoon had made a card or something, right? [15:36:43] i'm not excited about benchmarking it, i'm not really sure how much it is worth it to do so [15:36:54] but we need to make that decision before I go any farther with it [15:38:01] let's talk about it during today's scru [15:38:01] m [15:38:23] k [16:10:25] milimetric talk about metrics api in 5? [16:10:33] I've been drafting a memo :) [16:14:14] so yes drdee, I can talk [16:16:00] cool [16:16:12] https://plus.google.com/hangouts/_/67ba1f26fd1d124e1794ae467669f6f5cdc2241b [16:25:54] mornin [16:25:56] who am I [16:25:56] ? [16:25:57] AGGG [16:26:00] not who I want to be! [16:26:04] morning! [16:27:31] heh [16:30:04] morning [16:47:23] gm all [16:56:16] hi kraigparkinson [17:12:19] ottomata: with the new SSL logs coming to running on gado, right? are we running the PacketLossLogtailer.py against them? [17:12:34] if so, that means ganglia numbers for gado are crap :( [17:13:05] that's true, i'll turn it off for that instance [17:13:17] good good [17:13:59] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=packet_loss_90th&vl=%25&x=&n=&hreg%5B%5D=(locke%7Cemery%7Coxygen%7Cgadolinium)&mreg%5B%5D=packet_loss_90th>ype=line&glegend=show&aggregate=1 [17:14:06] but yeah: [17:14:06] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=packet_loss_average&vl=%25&x=&n=&hreg%5B%5D=(locke%7Cemery%7Coxygen)&mreg%5B%5D=packet_loss_average>ype=line&glegend=show&aggregate=1 [17:14:10] looks way better [18:03:16] since ottomata is at lunch, i'm going to grab a quick bite a little early [18:03:17] bbs [18:04:01] (ottomata: ping me when you're back) [18:04:41] i'm here! [18:19:07] drdee, do you know how the webstats stuff gets to dumps.wikimedia.org? [18:19:09] i don't remember [18:19:30] yeah it was a bit weird…… [18:19:38] we should put it on a wiki after discovering it [18:19:41] uhmmmmm [18:19:49] i think it is not puppetized at all [18:19:49] wasn't it a cronjob running on dataset2 [18:19:58] could be [18:20:20] don't see it [18:21:29] back [18:21:53] reading mail, then stuff [18:22:08] ottomata: you want to migrate the scheduler in a few? [18:22:21] i want to do it while we're both watchign [18:22:35] yeah [18:22:44] i'm ready, i'm just poking at locke and gadolinium [18:22:45] let's do it [18:23:39] give me a moment to get set up [18:24:45] ottomata: okay. [18:25:07] i'm going to suspend the running jobs [18:25:30] ottomata, reduced scope of card #155 and moved server maintenance to https://mingle.corp.wikimedia.org/projects/analytics/cards/460 [18:26:06] danke [18:26:19] ottomata: jobs suspended [18:27:23] now we need to update yarn-site.xml and create a fair-scheduler.xml [18:27:39] hmm, we probably shoulda prepped that first [18:27:39] milimetric, have a sec? [18:27:44] thought maybe you had done that already [18:27:49] i'll do it now. [18:28:00] ok, if this is going to take us more than 30 minutes we should unsuspend the jobs [18:28:09] it'll be fine. [18:28:10] oh you mean the oozie jobs? [18:28:11] that's fine [18:28:17] kraigparkinson: I'm eating a very late lunch due to two hours of meetings [18:28:17] kafka hadoop will still run though, ja? [18:28:23] the point of suspending was to avoid anything finishing with errors [18:28:27] mk [18:28:29] I'll ping you as soon as the last bite is chewed though, kraigparkinson :) [18:28:29] which would make it a pain to rerun them [18:28:32] aye k [18:28:33] thanks :) [18:28:43] ottomata: http://etherpad.wmflabs.org/pad/p/fair-scheduler [18:34:36] i wish it said what the defaults for those things were [18:34:42] like the minimum mb stuff [18:34:48] that should just default to whatever yarn is set at, right? [18:35:17] milimetric, erik beat me to the punch. see his email and let's talk re: point 4. [18:37:55] i added the defaults [18:38:03] ottomata: ^^ [18:38:08] thoughts on yarn.scheduler.fair.minimum-allocation-mb, yarn.scheduler.fair.maximum-allocation-mb [18:38:13] leave them null? [18:38:22] yeah those are the ones I want the defaults for [18:38:26] yeeah. [18:38:44] well, maybe let's set the min at 2G, and the max at 32? [18:38:52] we may as well bump the per-node max [18:39:02] since everything has been fine over the weekend [18:39:05] can we set this to false? [18:39:06] user-as-default-queue [18:39:14] i did. [18:39:15] ah ok [18:39:16] cool [18:39:16] look at the pad [18:39:23] (sorry just reading docs right now :p ) [18:39:24] i filled in most values [18:40:29] cool, just for readability, can we remove the ones where we keep the defaults? [18:40:35] sure [18:40:50] kraigparkinson: read the email, ready to chat [18:41:05] go ahead and do that while i'm doing the allocation file, ottomata [18:41:17] k [18:41:33] gonna see if I can figure out what minimum-allocation-mb defaults to [18:41:41] cool, whipping up a hangout [18:41:49] kraigparkinson: standup hangout is fine: https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [18:44:09] ok, from the source, it looks like fairscheduler takes the default from yarn [18:44:15] public Resource getMaximumMemoryAllocation() { [18:44:15] int mem = getInt( [18:44:15] YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, [18:44:15] YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB); [18:44:15] return Resources.createResource(mem); [18:44:15] [18:44:42] So let's try it without t hat too [18:46:31] we should talk about queues. [18:46:37] okay. [18:46:58] ottomata, drdee: so, queues [18:47:04] by default there's just "default" [18:47:19] i was thinking we might want to have a few standard queues: [18:47:21] - standard, for regularly recurring jobs [18:47:38] - core, for core jobs (that we've yet to write, but will, soon) [18:47:51] - adhoc, for one-time jobs like my X-CS [18:47:59] what's the difference between core and regularly recurring? [18:48:17] core jobs walk over all incoming data [18:48:24] and need to essentially be niced really hard [18:48:38] otherwise they'll likely be very greedy [18:48:53] ok, well, hm, let's define that queue when we get there [18:48:59] *shrug* [18:49:04] no harm in creating them now [18:49:05] i agree, i think two is fine for now [18:49:09] everything goes into default at first. [18:49:12] alright. [18:49:39] does default always exist? [18:49:45] can we make just one for the regular jobs? [18:49:47] i think so? [18:49:52] i would like to do that. [18:50:07] and then one off jobs would go into default? [18:50:31] i think it's a good idea to have an explicit "adhoc" as well as default [18:50:42] so it can be limited to 1 job at a time. [18:51:11] i updated the etherpad [18:51:14] pls look, drdee ottomata [18:52:08] i assume maxRunningApps is the max. # of concurrently running apps? [18:52:29] if that's the case, i think 50 is a bit high [18:52:34] http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration [18:52:37] scroll to the bottom [18:52:56] resource limits will prevent apps from launching no matter what [18:53:10] basically, if they can't launch with min-resources, they're always queued [18:53:14] right [18:53:22] so that number just means, "don't launch more than X even when you have resources" [18:53:37] it should never be a constraint in our cluster, i think [18:53:46] k [18:55:09] that look good, ottomata? [18:55:26] brb coffee TIME! [18:55:31] i htink we can abstract the defaults we want into the default queue [18:55:35] and make the others sub queues [18:55:45] hm. [18:55:53] 4 spaces, yo :) [18:56:42] hmm [18:56:43] now we need another leaf queue [18:56:48] which is the default for default :P [18:57:07] otherwise everything must be standard or adhoc [18:57:42] this is partially why i don't think that change is all that helpful [18:58:20] yeah maybe, all it did was abstract out the 50 [18:58:22] and fair [18:58:28] userMaxAppsDefault [18:58:31] but that's global anyway [18:59:13] wait, why do we have 3 queues? [18:59:18] we just want two, right? [18:59:24] one for stats users recurring jobs [18:59:30] the other for adhoc stuff by us [18:59:31] right? [18:59:50] no, we don't actually want the user queue [18:59:55] adhoc? [18:59:59] we want it to be the "standard" queue [19:00:07] it just so happens that only stats should be submitting to it [19:00:25] that's fine [19:00:29] so 'standard' and 'adhoc' [19:00:30] right? [19:00:37] er, we need default, yes? [19:00:42] dunno [19:00:45] i think yes. [19:00:45] i guess so [19:00:53] the other queues are just ways to be explicit [19:00:58] ok [19:01:07] also [19:01:09] we don't need the user name =stats do we? [19:01:18] no, probably not. [19:01:20] the maxrunningapps is already set in the standard queue [19:01:43] oh, wait [19:01:46] yes, we need that, right? [19:01:49] because the default is 2 [19:01:50] why? [19:01:55]         50 [19:01:55] we need to say stats can run a ton at once [19:01:58] oh and that is for the queue [19:01:59] hm [19:02:06] hmm [19:02:09] yeah hm,well [19:02:10] i vote we err on the side of verbosity [19:02:12] why set the global to 2? [19:02:16] since we don't really know how these combine [19:02:17] the queue will limit people [19:02:22] okay. [19:02:28] viola [19:02:42] note i also double-weighted the standard queue [19:02:59] and i think Kafka.Consumer runs as hdfs [19:03:00] mmmmk, ja like it [19:03:01] yeah [19:03:05] oh it does [19:03:09] good [19:03:09] we need to figure the [19:03:13] property [19:03:18] to control what queue a job goes into [19:03:28] yeah [19:03:39] mapred.queue.name ? [19:04:03] i also see mapred.job.queue.name [19:04:44] i guess i will try them both. [19:05:07] mapreduce.job.queuename [19:05:22] lol [19:05:27] great. [19:05:37] okay, let's wait until 12:15 [19:05:42] and the next Kafka run goes [19:05:45] i think? [19:05:46] then we'll have several minutes [19:05:53] before we interfere with it [19:10:27] this probably only needs to be changed on namenode/resourcemanager to take affect, right? [19:10:43] probably, but i would still ensure the files are consistent everywhere. [19:10:59] probably rsync down a clean copy first [19:11:01] yeah [19:11:03] then rsync back up all over [19:11:12] ps. it should be on our list to fix the deploy process [19:11:14] this is ass. [19:11:49] shoudl I edit files and rsync? [19:11:53] yeah [19:12:01] k [19:17:11] brb a moment. [19:17:15] (don't restart yet!) [19:25:53] ok, dschoon, config deployed, ready for restart [19:26:35] back [19:26:46] fire the restart at 12:31 :) [19:26:50] i guess 3:31 for you? [19:27:50] hokay! [19:30:49] do you want to check the current files into /etc in kraken? [19:30:59] that way any conf changes are all deployed off the same set? [19:31:09] not a bad idea [19:31:20] it's not puppet [19:31:20] but we've gotta do something [19:31:24] otherwise we'll override each other [19:31:25] !log restarting hadoop with fair scheduler [19:32:19] i guess it doesn't like us any more [19:33:59] i am watching http://localhost:8888/jobbrowser/jobs/?state=all&text=&user=hdfs [19:34:11] we will see the consumer run in 10m [19:34:17] that better succeed [19:34:26] once that happens, we should be able to see what queue it was it [19:34:39] if it fails, i expect we'll need to add a property for the default queue [19:34:52] first priority is getting import running again [19:34:54] yes? [19:35:00] ^^ ottomata drdee [19:36:00] 1 sec [19:36:15] fair scheduler card: https://mingle.corp.wikimedia.org/projects/analytics/cards/461 [19:36:57] i thought that was the point of the problem card? [19:37:02] to track progress? [19:37:59] dschoon, its possible that jobs without queuename specified submit to a queue named after the user running the job [19:38:04] http://localhost:8088/cluster/scheduler [19:38:19] looks good so far. [19:39:13] dschoon: but problem cards are not part of the sprint and so this way we can keep track of actually fixing the problem [19:39:29] hm. [19:39:34] seems like a duplication [19:39:44] shouldn't we just make it so that problem cards can be scheduled and assigned? [19:39:58] i'm going to run a grunt shell and see what happens [19:40:03] wait [19:40:09] ottomata: wait for the first kafka consumer to run [19:40:16] k [19:40:18] that's our highest priority, so let's not fuck with it [19:40:21] one note [19:40:37] each queue appears to have 16G associated [19:40:58] (provided that number is MB) [19:42:11] ottomata: what do you make of those resource numbers? [19:42:36] hm that's from yarn, but i thought it would be per node, hmmmm [19:42:51] 16G per node is what we set on friday [19:43:03] but it's saying 16G per queue [19:43:11] yeah [19:43:19] so maybe we need to set those explicitly [19:43:35] i'm not exactly sure what it means, though [19:43:59] because that's only a single node's worth [19:44:14] and i know that number is the max for the sum of all running apps in that queue [19:44:29] if an app's total is the sum of all usage on all nodes [19:44:41] it means an app can only use a single node's worth of memory [19:44:46] which is... bad [19:44:57] ^^ ottomata drdee [19:45:14] that is, unless [19:45:25] yeah [19:45:34] that number is yarn.scheduler.fair.maximum-allocation-mb [19:45:36] which says: [19:45:36] oop, stuff is happening [19:45:43] The largest container the scheduler can allocate, in MB of memory [19:46:03] welp [19:46:05] it succeeded [19:46:21] yup! [19:46:25] i also saw this kinda stuff in logs: [19:46:31] an10: 2013-03-25 19:45:36,157 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Node offered to app: application_1364239892421_0001 reserved: false [19:46:48] an10: 2013-03-25 19:45:37,677 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Released container container_1364239892421_0001_01_000001 of capacity memory: 2048 on host analytics1012:8041, which currently has 0 containers, memory: 0 used and memory: 16384 available, release resources=true [19:47:11] looks good I think ? [19:47:54] can we figure out what queue it ran in? [19:48:14] an10: 2013-03-25 19:46:43,910 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_1364239892421_0001_000001 [19:48:21] hm [19:48:26] oh that might be from you clicking around [19:48:26] hm [19:48:28] not sure [19:48:28] hm [19:48:30] no [19:48:41] ran in default queue [19:48:41] an10: 2013-03-25 19:46:08,593 INFO org.apache.hadoop.mapreduce.jobhistory.JobSummary: jobId=job_1364239892421_0001,submitTime=1364240706479,launchTime=1364240712441,firstMapTaskLaunchTime=1364240716966,firstReduceTaskLaunchTime=0,finishTime=1364240731595,resourcesPerMap=2048,resourcesPerReduce=0,numMaps=2,numReduces=0,user=hdfs,queue=default,status=SUCCEEDED,mapSlotSeconds=56,reduceSlotSeconds=0,jobName=Kafka.Consumer [19:49:04] ah [19:49:05] great. [19:49:13] okay, now let's try to change that [19:49:26] btw, the queue is the third row here [19:49:26] http://analytics1010.eqiad.wmnet:19888/jobhistory/job/job_1364239892421_0001/jobhistory/job/job_1364239892421_0001 [19:49:54] oh ya nice [19:50:42] yeah, and mapreduce.job.queuename seems right [19:50:47] from http://analytics1010.eqiad.wmnet:19888/jobhistory/conf/job_1364239892421_0001 [19:50:49] ok cool, so ummmmm, hm [19:50:50] search for queue [19:51:03] yeah! [19:51:04] cool [19:51:11] so, how do we figure out the queue size mb thinh [19:51:12] hm [19:51:14] just up it? [19:51:23] well [19:51:23] first [19:51:31] let's modify whatever conf file there is for https://github.com/wikimedia-incubator/kafka-hadoop-consumer [19:51:40] to make it submit that job to the standard queue [19:52:30] we really need to be able to set job.xml properties [19:52:43] unless you're okay with it running in default for now... [19:52:49] that kinda worries me, though [19:53:09] its basically the same in default, no? [19:53:13] as it would be in standard? [19:53:30] oh, i' sure we can set that with a -D [19:53:38] ah, good idea. [19:53:40] let's try it. [19:53:56] and no, not the same. because every test job and whatever goes into default [19:54:07] we need to, at least, ensure all the critical jobs are not in that queue [19:54:13] i'll fix the ones in oozie [19:54:18] when i resubmit them [19:54:19] ok [19:54:25] ergh, i built a .deb for this hm [19:54:44] this is why i said this task would take 4 hours :) [20:00:02] i think the pom in this project is way wrong [20:01:00] ? [20:01:12] oh i'm sure o fit [20:01:59] dschoon, before I modify this, can we do a test to see if -Dmapreduce.job.queuename works? [20:02:06] i'll also point out that it depends on a bunch of cdh 4.1.2 stuff [20:02:26] i could fire off a pig job [20:02:28] sure [20:02:30] doing it [20:02:31] k [20:02:57] i'm also going to fix the consumer pom [20:03:29] oook, i actually was hoping not to rebuild this....>>...... [20:03:31] i guess we can [20:03:46] it is really just a wrapper script that needs modified, and we are only running this on one machine [20:03:49] i can edit the script there [20:03:57] i know, the cluster is hacky right now! [20:04:09] ah, excellent [20:04:10] but since this works as is, i'd touch as little as possible [20:04:17] i'd commit my change [20:04:25] SET mapreduce.job.queuename adhoc [20:04:25] should do what i want [20:04:27] oh in pig [20:04:27] ha [20:04:31] yes [20:04:31] but what about -D via cli? [20:04:36] should do the same [20:04:38] ok [20:04:40] will try it [20:06:18] about to launch... [20:06:20] ok? [20:07:22] here we go... [20:09:03] looks great [20:09:06] it says it ran in adhoc [20:09:54] consumer running again in default... [20:09:54] :( [20:15:58] hmmmm [20:15:59] http://localhost:19888/jobhistory/job/job_1364239892421_0007/jobhistory/job/job_1364239892421_0007 [20:16:04] this is with [20:16:10] -Dmapreduce.job.queuename=standard [20:16:23] java -Dmapreduce.job.queuename=standard -cp /usr/share/java/kafka-hadoop-consumer.jar:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* kafka.consumer.HadoopConsumer -z analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqi [20:16:35] ... [20:16:37] truncated [20:16:38] oh [20:16:40] at the front [20:16:41] yeah [20:16:42] hmm [20:16:50] phh [20:16:52] ohh [20:16:58] ottomata, i know why it doesn't work [20:17:00] oh? [20:17:02] it's pretty obvious once you think abou it [20:17:06] the jar itself isn't the job [20:17:12] it preps and submits a job via the tracker [20:17:16] hm, [20:17:17] k [20:17:35] so setting that prop in the JVM doesn't end up getting captured by the job that's submitted [20:17:49] i think we're going to have to recompile [20:17:53] i'll commit my pom fixes [20:18:07] ooook.... [20:18:31] i'm going to make some tea [20:19:39] i've pushed [20:26:17] fascinating. [20:30:23] sup? [20:31:06] so, the conf is set in HadoopConsumer starting around line 63 [20:31:25] oo ja [20:34:10] so i'm thinking i'll add a CLI param to point it to a config file [20:34:29] and it'll load it if true, overriding the defaults [20:34:52] config file? [20:35:31] really? there's no way to set java parameters except by coding? [20:35:39] properties* [20:36:04] no, because the conf file that's passed to the job is created in code [20:36:10] i should say: no way i know of [20:39:01] hmph. [20:39:14] ottomata: the conf should be overridden by CLI params, right? [20:40:27] let's just keep it simple [20:40:31] add a cli param [20:40:38] if not set, then don't specify [20:40:53] don't worry about a config file [20:42:58] ottomata: what version of pig are we currently using? I'm trying to match locally [20:43:22] currently / planned I guess, I could of course get the actual version myself [20:43:26] Apache Pig version 0.10.0-cdh4.2.0 (rexported) [20:43:32] pig -version [20:43:33] :) [20:43:49] right, i just heard you guys talking about upgrading but i guess that was just CDH4 [20:43:56] yeah [20:44:01] cool, thank you [20:44:01] it actually was upgraded [20:44:04] when we changed to 4.2.0 [20:44:08] (i think) [20:44:13] dschoon, I will change the wrapper script to pass the queue name as a cli param [20:44:17] instead of -D [20:44:26] k [20:44:48] --queue -q [20:45:46] yeah [20:48:18] also adding --job-conf -C [20:51:19] oh will this work for any mapreduce job param? [20:54:19] that's the idea. [20:54:23] i don't want to have to do this again. [20:54:25] it'd be stupid. [20:54:59] ahhhhh, ok i thought you were creating a conf option for just the few cli options already there [20:54:59] that's cool [20:55:08] tsk [20:55:19] give me a LITTLE credit [21:11:15] oh [21:11:25] didn't we want to be able to set the job name as well, ottomata? [21:11:27] i remember that from ages ago [21:21:50] yeah [21:23:30] welp [21:23:34] the project has no tests [21:23:40] but uh [21:23:42] i think i'm done? [21:24:13] not really sure how to check if it works... [21:24:18] so i think i shall just push :) [21:24:25] sounds good, i gotta rebuild, eh? [21:24:29] did you change the cdh4 deps? [21:25:23] i did [21:26:00] mvn install should Just Work [21:26:26] no fiddling with kafka versions, as I changed the dep to be the org.apache.kafka artifact [21:27:56] ok [21:28:09] everything's pushed, otto [21:28:11] ottomata: [21:28:28] install? [21:28:34] how does it know where to install? [21:28:44] `mvn install` installs the artifact to your maven repo [21:28:56] which is ~/.m2/repository by default [21:29:06] but i need to rebuild the .deb, right? [21:29:19] i'm saying that will build the jar with the shaded crap [21:29:24] it'll all be in target as usual [21:29:28] you can do whatever you want after that [21:29:33] (such as run your shell scripts) [21:29:37] hmm, ok [21:29:43] my shell script does this right now [21:29:43] mvn install:install-file -DgroupId=kafka -DartifactId=kafka -Dversion=0.7.2 -Dpackaging=jar -Dfile=/usr/lib/kafka/core/target/scala_2.8.0/kafka-0.7.2.jar -DgeneratePom=true [21:29:47] i shoudl remove all the crap? [21:29:51] yes [21:29:53] or at least [21:29:56] comment it out [21:29:56] just mvn install [21:29:58] k [21:30:02] will try [21:30:11] i didn't realize that was still there [21:30:43] brb a moment [21:45:02] back [21:45:04] how'd it go, ottomata? [21:45:42] got it, had to tweak some things [21:45:47] bout to install the .deb manually on an02 [21:46:06] ok [21:49:10] yay! [21:49:10] http://localhost:19888/jobhistory/job/job_1364239892421_0022/jobhistory/job/job_1364239892421_0022 [21:49:19] woo [21:49:27] notice i changed the default name :) [21:49:32] nice! [21:49:43] did you specify --queue? [21:49:47] or use the job properties? [21:49:51] specified [21:50:02] there are actually two wrapper scripts, one is kraken specific [21:50:04] can you try another using a conf file? [21:50:06] in kraken bin/... [21:50:13] i just made it use standard as default [21:50:19] uhh, ok [21:50:29] what should the config file look like? is a properties file ok? [21:50:35] it uses hadoop's configuration xml format -- try setting mapreduce.job.queuename in there [21:50:39] ok [21:51:01] also try running it with --name [21:51:05] and setting the jobname [21:51:08] just to test :) [21:52:12] HMmmm [21:52:17] does the conf fiel have to be in hdfs? [21:52:24] no [21:52:28] in fact [21:52:31] it needs to be local [21:52:37] it's loaded at the time the executable is run [21:52:42] (that's why the -D props didn't work) [21:53:45] yeah! [21:53:46] it works [21:53:46] http://localhost:19888/jobhistory/job/job_1364239892421_0023/mapreduce/job/job_1364239892421_0023 [21:53:56] hot! [21:54:05] okay, great to know. [21:54:16] hacking on this is pretty easy, btw [21:54:44] so if we have a list of crap we'd like to change, i can give a meaningful LoE [21:55:09] LoE? [21:55:14] level-of-effort [21:55:16] aye [21:55:25] for kafka hadoop consuemr you mean? [21:56:17] yes [21:56:33] just saying it's not a black box any more in terms of source operation [21:56:45] aye [22:00:28] so um, yayyyy we did it dschoon! [22:00:38] now maybe you can run your job in adhoc without breaking things? [22:00:48] yep, will check that next/ [22:02:52] ottomata: you wanna update the readme? [22:03:01] i just noticed it's wrong, and you might want to update the .deb stuff [22:04:49] done [22:04:55] sweet [22:05:04] did you commit our conf files to kraken/etc? [22:07:36] no [22:07:57] can do, i think maybe on an10 we can symlink /etc/hadoop/conf to /opt/kraken/hadoop/conf or something [22:08:06] and use that as the master [22:08:09] and then rsync out from there [22:08:11] okay [22:08:15] that seems reasonable [22:08:27] maybe write a script? [22:08:33] yeah i wrote one, but [22:08:39] you have to have root to be able to use it :/ [22:08:40] to ensure we have a canonical way to pull as well as push? [22:08:43] unless we change perms of files [22:08:52] because i want to make sure i can update my local copy [22:08:58] (hence thinking it should be in the repo) [22:09:02] aye [22:09:09] /opt/kraken [22:09:11] is the kraken repo [22:09:16] on the nodes [22:09:29] hm [22:09:44] i think it should live in kraken/etc [22:09:49] because there are really four or five dirs [22:09:52] yeah [22:09:55] that's fine [22:10:01] kraken/etc/hadoop/conf [22:10:01] hadoop hbase hive oozie pig sqoop zookeeper [22:10:04] ja [22:10:11] but on the machines [22:10:21] /etc/hadoop/conf is a symlink [22:10:26] right [22:10:28] to /etc/hadoop/conf.empty [22:10:31] or conf.dist [22:10:36] i will make it point at to /opt/kraken/etc/hadoop/conf [22:10:43] on an10 [22:11:34] i'm going to remove the other crap in /etc [22:11:35] ok? [22:11:37] kraken/etc [22:11:39] sounds good [22:11:41] heh [22:13:52] ottomata: what's in your dsh group "k"? [22:16:53] uhhh, i think [22:16:53] an10 [22:16:54] an11 [22:16:54] an12 [22:16:54] an13 [22:16:54] an14 [22:16:54] an15 [22:16:55] an16 [22:16:55] an17 [22:16:56] an18 [22:16:56] an19 [22:16:57] an20 [22:19:09] 21 and 22 aren't datanodes? [22:21:32] no [22:21:42] kafka brokers [22:21:42] ah [22:26:23] ottomata: do you get an error if you try to kill the coordinator job 0002299-130220153347486-oozie-oozi-C in the gui? [22:26:26] http://localhost:8888/oozie/list_oozie_coordinators/ [22:27:19] yes [22:27:22] bleh. [22:27:22] i'd just use the cli [22:27:25] i will. [22:27:29] just annoyed. [22:35:55] ottomata: so i see jobs for all.100 and event running again? [22:36:14] ah we can remove those [22:36:15] one sec [22:36:20] well [22:36:22] hold on [22:36:25] are they *actually* running? [22:36:26] oh? [22:36:32] they are launched by crons [22:36:34] but there's no data to import [22:36:41] ah. [22:36:51] what's wrong with the event import? [22:37:26] no event stream in kafka [22:37:28] it was running on an01 [22:37:35] hasn't been re set up since we reinstalled [22:37:37] what stops us from fixing that? [22:37:45] not touching an01 unless we do it in puppet [22:37:51] since it has been reinstalled [22:38:12] couldn't it use the same stream from oxygen? [22:38:31] no its not on oxygen [22:38:34] its a separate stream [22:38:37] its sent to vandadium [22:38:39] and to analytics1001 [22:38:41] oh. [22:38:53] by the bits varnish boxes? [22:38:55] yup [22:39:00] and they're all in eqiad? [22:39:18] i think so [22:39:39] so we could theoretically fix this by telling it to send the stream to a different box? [22:44:32] yes [23:09:20] anyone got time for a nooby pig question? [23:09:25] drdee, dschoon, ottomata? [23:09:55] i'm kinda here [23:09:56] whaaasssup [23:10:07] well if I do this: [23:10:10] log_fields = LOAD_WEBREQUEST('hdfs:///wmf/raw/webrequest/webrequest-wikipedia-mobile/2013-03-25_22.30*'); [23:10:14] and then go: [23:10:44] LOGS_GROUP= GROUP log_fields ALL; [23:10:44] LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(log_fields); [23:10:52] then illustrate LOG_COUNT tells me 2 [23:12:11] which might make sense 'cause there's two files [23:12:29] oh so I need FLATTEN in here don't I? [23:13:16] hmm, i like to actually dump various values to see where I go wrong [23:13:20] DUMP LOGS_GROUP; [23:13:23] does that look ok? [23:16:12] job failed... [23:16:13] hm [23:16:14] Failed to read data from "hdfs:///wmf/raw/webrequest/webrequest-wikipedia-mobile/2013-03-25_22.30*" [23:16:15] yeah [23:16:22] it didn't before [23:16:29] i was dumping and stuff too [23:16:43] trying again :) [23:17:03] it was reading this set of input files a while ago [23:17:36] hm, it gets up to 49% and fails [23:18:38] grunt is a very good name... [23:20:00] haha [23:20:15] reduces failed! [23:20:25] mk... dump log_files works, though I had to kill my terminal [23:21:06] ok, perhaps this is best picked up in the morning. But this seems like an unscalable way to develop. My local pig / hadooop installations refuse to work [23:21:07] hm [23:21:13] oh that is [23:21:14] that should work [23:21:15] there's some bug about how they read $JAVA_HOME (which is set right) [23:21:28] lets' figure out your hadoop local [23:21:30] tomorrow or something [23:21:32] yea [23:21:37] oh you can run hadoop on the cluster in local mode too [23:21:44] ok, that's better [23:21:50] i'll do that [23:21:56] thanks otto, have a good night [23:21:56] i debug doing that [23:22:03] just put a sample file somewhere locally [23:22:12] pig -x local [23:22:12] etc. [23:22:12] yup you too! [23:29:43] ahh dschoon [23:29:51] we need to fix the zero_carrier_country hierarchy [23:29:59] the limnify stuff I had going is broken cause the stuff changed [23:36:22] milimetric:i don't think we should invest time to get your local install working; we really get you on kraken so you can develop there [23:36:54] naw, local install is good for testing [23:37:00] to just run pig scripts? [23:37:10] if he's runing udfs, i dunno, then kraken fo sho [23:37:14] i never got udfs to work locally [23:38:52] well it also means you need private data at your local box and that's against policy [23:43:29] that's my fault [23:43:49] i started that two days ago, and packetloss + fair scheduler intervened [23:43:57] i'll fix that now, milimetric and ping you [23:52:08] thanks dschoon, I'm running in local mode now on some sample data [23:52:19] awesome [23:53:19] drdee, I wouldn't use any private data if I did it locally. I would create my own tab-delimited mock sample file [23:53:46] I'm on kraken, in local mode because running on the full datasets doesn't let me iterate nearly fast enough to figure out what in the world I'm doing [23:53:57] It takes FOREVER to run on even 5 minutes of data [23:56:05] i'll brb in 20. gonna head home before finishing up the last hour or two of the day.