[09:11:28] good morning [09:11:31] milimetric1: hi [09:11:41] dschoon: hi [13:41:42] morning [13:41:56] morning! [13:41:59] hi average_drifter [13:42:06] and otto :) [13:42:11] average_drifter: ping me earlier? [13:48:11] morning all [13:48:15] milimetric time for hangout? [13:48:45] ottomata: udp2log and ssl? send it to a new instance of udp2log instead of disabling it? [13:49:00] sure drdee, I got time [13:49:08] awesome [13:49:39] https://plus.google.com/hangouts/_/5d4b2c4a70f39f50d0cc468cb4d2c421a265343d [14:56:06] milimetric let me know when you are ready to talk about metrics api feature cards [14:56:15] yep, will do [14:56:19] you mentioned there was a meeting today [14:57:14] when is that drdee? [14:57:38] i have a meeting with kraig about this :) and i need to be prepped for it [14:58:19] ohh and what's the status of the skin and active editor correlation bug ? (https://bugzilla.wikimedia.org/show_bug.cgi?id=45179) [15:03:41] i have a script for 45179, but haven't had time to vet it [15:04:22] maybe attach it to the bug and ask if someone else can push it forward? [15:04:23] drdee, when is that meeting with kraig? [15:04:47] um, I'd like to take another shot at it, maybe I'll finish it tonight off-hours [15:05:37] the reason is, I know T-SQL very well but MySQL seems to work a bit different [15:05:42] k [15:05:50] and I'd like to get to know that a bit more [15:05:54] meeting with kraig is 2pm est [15:06:03] k, cool, just trying to time-budget, thx [15:06:12] aight [15:35:08] ok, drdee, i've got gadolinium mostly set up for things that aren't dependent on Jeff and for the university forwarded streams [15:35:21] awesome! [15:36:08] I can move the university streams over now, but I'd have to disable them on locke at the same time [15:36:14] but that would make gadolinium officially in production [15:36:20] and i'm not sure how much benchmarking we wanted to do on it [15:36:31] kraig and dschoon had made a card or something, right? [15:36:43] i'm not excited about benchmarking it, i'm not really sure how much it is worth it to do so [15:36:54] but we need to make that decision before I go any farther with it [15:38:01] let's talk about it during today's scru [15:38:01] m [15:38:23] k [16:10:25] milimetric talk about metrics api in 5? [16:10:33] I've been drafting a memo :) [16:14:14] so yes drdee, I can talk [16:16:00] cool [16:16:12] https://plus.google.com/hangouts/_/67ba1f26fd1d124e1794ae467669f6f5cdc2241b [16:25:54] mornin [16:25:56] who am I [16:25:56] ? [16:25:57] AGGG [16:26:00] not who I want to be! [16:26:04] morning! [16:27:31] heh [16:30:04] morning [16:47:23] gm all [16:56:16] hi kraigparkinson [17:12:19] ottomata: with the new SSL logs coming to running on gado, right? are we running the PacketLossLogtailer.py against them? [17:12:34] if so, that means ganglia numbers for gado are crap :( [17:13:05] that's true, i'll turn it off for that instance [17:13:17] good good [17:13:59] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=packet_loss_90th&vl=%25&x=&n=&hreg%5B%5D=(locke%7Cemery%7Coxygen%7Cgadolinium)&mreg%5B%5D=packet_loss_90th>ype=line&glegend=show&aggregate=1 [17:14:06] but yeah: [17:14:06] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=packet_loss_average&vl=%25&x=&n=&hreg%5B%5D=(locke%7Cemery%7Coxygen)&mreg%5B%5D=packet_loss_average>ype=line&glegend=show&aggregate=1 [17:14:10] looks way better [18:03:16] since ottomata is at lunch, i'm going to grab a quick bite a little early [18:03:17] bbs [18:04:01] (ottomata: ping me when you're back) [18:04:41] i'm here! [18:19:07] drdee, do you know how the webstats stuff gets to dumps.wikimedia.org? [18:19:09] i don't remember [18:19:30] yeah it was a bit weird…… [18:19:38] we should put it on a wiki after discovering it [18:19:41] uhmmmmm [18:19:49] i think it is not puppetized at all [18:19:49] wasn't it a cronjob running on dataset2 [18:19:58] could be [18:20:20] don't see it [18:21:29] back [18:21:53] reading mail, then stuff [18:22:08] ottomata: you want to migrate the scheduler in a few? [18:22:21] i want to do it while we're both watchign [18:22:35] yeah [18:22:44] i'm ready, i'm just poking at locke and gadolinium [18:22:45] let's do it [18:23:39] give me a moment to get set up [18:24:45] ottomata: okay. [18:25:07] i'm going to suspend the running jobs [18:25:30] ottomata, reduced scope of card #155 and moved server maintenance to https://mingle.corp.wikimedia.org/projects/analytics/cards/460 [18:26:06] danke [18:26:19] ottomata: jobs suspended [18:27:23] now we need to update yarn-site.xml and create a fair-scheduler.xml [18:27:39] hmm, we probably shoulda prepped that first [18:27:39] milimetric, have a sec? [18:27:44] thought maybe you had done that already [18:27:49] i'll do it now. [18:28:00] ok, if this is going to take us more than 30 minutes we should unsuspend the jobs [18:28:09] it'll be fine. [18:28:10] oh you mean the oozie jobs? [18:28:11] that's fine [18:28:17] kraigparkinson: I'm eating a very late lunch due to two hours of meetings [18:28:17] kafka hadoop will still run though, ja? [18:28:23] the point of suspending was to avoid anything finishing with errors [18:28:27] mk [18:28:29] I'll ping you as soon as the last bite is chewed though, kraigparkinson :) [18:28:29] which would make it a pain to rerun them [18:28:32] aye k [18:28:33] thanks :) [18:28:43] ottomata: http://etherpad.wmflabs.org/pad/p/fair-scheduler [18:34:36] i wish it said what the defaults for those things were [18:34:42] like the minimum mb stuff [18:34:48] that should just default to whatever yarn is set at, right? [18:35:17] milimetric, erik beat me to the punch. see his email and let's talk re: point 4. [18:37:55] i added the defaults [18:38:03] ottomata: ^^ [18:38:08] thoughts on yarn.scheduler.fair.minimum-allocation-mb, yarn.scheduler.fair.maximum-allocation-mb [18:38:13] leave them null? [18:38:22] yeah those are the ones I want the defaults for [18:38:26] yeeah. [18:38:44] well, maybe let's set the min at 2G, and the max at 32? [18:38:52] we may as well bump the per-node max [18:39:02] since everything has been fine over the weekend [18:39:05] can we set this to false? [18:39:06] user-as-default-queue [18:39:14] i did. [18:39:15] ah ok [18:39:16] cool [18:39:16] look at the pad [18:39:23] (sorry just reading docs right now :p ) [18:39:24] i filled in most values [18:40:29] cool, just for readability, can we remove the ones where we keep the defaults? [18:40:35] sure [18:40:50] kraigparkinson: read the email, ready to chat [18:41:05] go ahead and do that while i'm doing the allocation file, ottomata [18:41:17] k [18:41:33] gonna see if I can figure out what minimum-allocation-mb defaults to [18:41:41] cool, whipping up a hangout [18:41:49] kraigparkinson: standup hangout is fine: https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [18:44:09] ok, from the source, it looks like fairscheduler takes the default from yarn [18:44:15] public Resource getMaximumMemoryAllocation() { [18:44:15] int mem = getInt( [18:44:15] YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, [18:44:15] YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB); [18:44:15] return Resources.createResource(mem); [18:44:15] [18:44:42] So let's try it without t hat too [18:46:31] we should talk about queues. [18:46:37] okay. [18:46:58] ottomata, drdee: so, queues [18:47:04] by default there's just "default" [18:47:19] i was thinking we might want to have a few standard queues: [18:47:21] - standard, for regularly recurring jobs [18:47:38] - core, for core jobs (that we've yet to write, but will, soon) [18:47:51] - adhoc, for one-time jobs like my X-CS [18:47:59] what's the difference between core and regularly recurring? [18:48:17] core jobs walk over all incoming data [18:48:24] and need to essentially be niced really hard [18:48:38] otherwise they'll likely be very greedy [18:48:53] ok, well, hm, let's define that queue when we get there [18:48:59] *shrug* [18:49:04] no harm in creating them now [18:49:05] i agree, i think two is fine for now [18:49:09] everything goes into default at first. [18:49:12] alright. [18:49:39] does default always exist? [18:49:45] can we make just one for the regular jobs? [18:49:47] i think so? [18:49:52] i would like to do that. [18:50:07] and then one off jobs would go into default? [18:50:31] i think it's a good idea to have an explicit "adhoc" as well as default [18:50:42] so it can be limited to 1 job at a time. [18:51:11] i updated the etherpad [18:51:14] pls look, drdee ottomata [18:52:08] i assume maxRunningApps is the max. # of concurrently running apps? [18:52:29] if that's the case, i think 50 is a bit high [18:52:34] http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration [18:52:37] scroll to the bottom [18:52:56] resource limits will prevent apps from launching no matter what [18:53:10] basically, if they can't launch with min-resources, they're always queued [18:53:14] right [18:53:22] so that number just means, "don't launch more than X even when you have resources" [18:53:37] it should never be a constraint in our cluster, i think [18:53:46] k [18:55:09] that look good, ottomata? [18:55:26] brb coffee TIME! [18:55:31] i htink we can abstract the defaults we want into the default queue [18:55:35] and make the others sub queues [18:55:45] hm. [18:55:53] 4 spaces, yo :) [18:56:42] hmm [18:56:43] now we need another leaf queue [18:56:48] which is the default for default :P [18:57:07] otherwise everything must be standard or adhoc [18:57:42] this is partially why i don't think that change is all that helpful [18:58:20] yeah maybe, all it did was abstract out the 50 [18:58:22] and fair [18:58:28] userMaxAppsDefault [18:58:31] but that's global anyway [18:59:13] wait, why do we have 3 queues? [18:59:18] we just want two, right? [18:59:24] one for stats users recurring jobs [18:59:30] the other for adhoc stuff by us [18:59:31] right? [18:59:50] no, we don't actually want the user queue [18:59:55] adhoc? [18:59:59] we want it to be the "standard" queue [19:00:07] it just so happens that only stats should be submitting to it [19:00:25] that's fine [19:00:29] so 'standard' and 'adhoc' [19:00:30] right? [19:00:37] er, we need default, yes? [19:00:42] dunno [19:00:45] i think yes. [19:00:45] i guess so [19:00:53] the other queues are just ways to be explicit [19:00:58] ok [19:01:07] also [19:01:09] we don't need the user name =stats do we? [19:01:18] no, probably not. [19:01:20] the maxrunningapps is already set in the standard queue [19:01:43] oh, wait [19:01:46] yes, we need that, right? [19:01:49] because the default is 2 [19:01:50] why? [19:01:55] 50 [19:01:55] we need to say stats can run a ton at once [19:01:58] oh and that is for the queue [19:01:59] hm [19:02:06] hmm [19:02:09] yeah hm,well [19:02:10] i vote we err on the side of verbosity [19:02:12] why set the global to 2? [19:02:16] since we don't really know how these combine [19:02:17] the queue will limit people [19:02:22] okay. [19:02:28] viola [19:02:42] note i also double-weighted the standard queue [19:02:59] and i think Kafka.Consumer runs as hdfs [19:03:00] mmmmk, ja like it [19:03:01] yeah [19:03:05] oh it does [19:03:09] good [19:03:09] we need to figure the [19:03:13] property [19:03:18] to control what queue a job goes into [19:03:28] yeah [19:03:39] mapred.queue.name ? [19:04:03] i also see mapred.job.queue.name [19:04:44] i guess i will try them both. [19:05:07] mapreduce.job.queuename [19:05:22] lol [19:05:27] great. [19:05:37] okay, let's wait until 12:15 [19:05:42] and the next Kafka run goes [19:05:45] i think? [19:05:46] then we'll have several minutes [19:05:53] before we interfere with it [19:10:27] this probably only needs to be changed on namenode/resourcemanager to take affect, right? [19:10:43] probably, but i would still ensure the files are consistent everywhere. [19:10:59] probably rsync down a clean copy first [19:11:01] yeah [19:11:03] then rsync back up all over [19:11:12] ps. it should be on our list to fix the deploy process [19:11:14] this is ass. [19:11:49] shoudl I edit files and rsync? [19:11:53] yeah [19:12:01] k [19:17:11] brb a moment. [19:17:15] (don't restart yet!) [19:25:53] ok, dschoon, config deployed, ready for restart [19:26:35] back [19:26:46] fire the restart at 12:31 :) [19:26:50] i guess 3:31 for you? [19:27:50] hokay! [19:30:49] do you want to check the current files into /etc in kraken? [19:30:59] that way any conf changes are all deployed off the same set? [19:31:09] not a bad idea [19:31:20] it's not puppet [19:31:20] but we've gotta do something [19:31:24] otherwise we'll override each other [19:31:25] !log restarting hadoop with fair scheduler [19:32:19] i guess it doesn't like us any more [19:33:59] i am watching http://localhost:8888/jobbrowser/jobs/?state=all&text=&user=hdfs [19:34:11] we will see the consumer run in 10m [19:34:17] that better succeed [19:34:26] once that happens, we should be able to see what queue it was it [19:34:39] if it fails, i expect we'll need to add a property for the default queue [19:34:52] first priority is getting import running again [19:34:54] yes? [19:35:00] ^^ ottomata drdee [19:36:00] 1 sec [19:36:15] fair scheduler card: https://mingle.corp.wikimedia.org/projects/analytics/cards/461 [19:36:57] i thought that was the point of the problem card? [19:37:02] to track progress? [19:37:59] dschoon, its possible that jobs without queuename specified submit to a queue named after the user running the job [19:38:04] http://localhost:8088/cluster/scheduler [19:38:19] looks good so far. [19:39:13] dschoon: but problem cards are not part of the sprint and so this way we can keep track of actually fixing the problem [19:39:29] hm. [19:39:34] seems like a duplication [19:39:44] shouldn't we just make it so that problem cards can be scheduled and assigned? [19:39:58] i'm going to run a grunt shell and see what happens [19:40:03] wait [19:40:09] ottomata: wait for the first kafka consumer to run [19:40:16] k [19:40:18] that's our highest priority, so let's not fuck with it [19:40:21] one note [19:40:37] each queue appears to have 16G associated [19:40:58] (provided that number is MB) [19:42:11] ottomata: what do you make of those resource numbers? [19:42:36] hm that's from yarn, but i thought it would be per node, hmmmm [19:42:51] 16G per node is what we set on friday [19:43:03] but it's saying 16G per queue [19:43:11] yeah [19:43:19] so maybe we need to set those explicitly [19:43:35] i'm not exactly sure what it means, though [19:43:59] because that's only a single node's worth [19:44:14] and i know that number is the max for the sum of all running apps in that queue [19:44:29] if an app's total is the sum of all usage on all nodes [19:44:41] it means an app can only use a single node's worth of memory [19:44:46] which is... bad [19:44:57] ^^ ottomata drdee [19:45:14] that is, unless [19:45:25] yeah [19:45:34] that number is yarn.scheduler.fair.maximum-allocation-mb [19:45:36] which says: [19:45:36] oop, stuff is happening [19:45:43] The largest container the scheduler can allocate, in MB of memory [19:46:03] welp [19:46:05] it succeeded [19:46:21] yup! [19:46:25] i also saw this kinda stuff in logs: [19:46:31] an10: 2013-03-25 19:45:36,157 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Node offered to app: application_1364239892421_0001 reserved: false [19:46:48] an10: 2013-03-25 19:45:37,677 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Released container container_1364239892421_0001_01_000001 of capacity memory: 2048 on host analytics1012:8041, which currently has 0 containers, memory: 0 used and memory: 16384 available, release resources=true [19:47:11] looks good I think ? [19:47:54] can we figure out what queue it ran in? [19:48:14] an10: 2013-03-25 19:46:43,910 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_1364239892421_0001_000001 [19:48:21] hm [19:48:26] oh that might be from you clicking around [19:48:26] hm [19:48:28] not sure [19:48:28] hm [19:48:30] no [19:48:41] ran in default queue [19:48:41] an10: 2013-03-25 19:46:08,593 INFO org.apache.hadoop.mapreduce.jobhistory.JobSummary: jobId=job_1364239892421_0001,submitTime=1364240706479,launchTime=1364240712441,firstMapTaskLaunchTime=1364240716966,firstReduceTaskLaunchTime=0,finishTime=1364240731595,resourcesPerMap=2048,resourcesPerReduce=0,numMaps=2,numReduces=0,user=hdfs,queue=default,status=SUCCEEDED,mapSlotSeconds=56,reduceSlotSeconds=0,jobName=Kafka.Consumer [19:49:04] ah [19:49:05] great. [19:49:13] okay, now let's try to change that [19:49:26] btw, the queue is the third row here [19:49:26] http://analytics1010.eqiad.wmnet:19888/jobhistory/job/job_1364239892421_0001/jobhistory/job/job_1364239892421_0001 [19:49:54] oh ya nice [19:50:42] yeah, and mapreduce.job.queuename seems right [19:50:47] from http://analytics1010.eqiad.wmnet:19888/jobhistory/conf/job_1364239892421_0001 [19:50:49] ok cool, so ummmmm, hm [19:50:50] search for queue [19:51:03] yeah! [19:51:04] cool [19:51:11] so, how do we figure out the queue size mb thinh [19:51:12] hm [19:51:14] just up it? [19:51:23] well [19:51:23] first [19:51:31] let's modify whatever conf file there is for https://github.com/wikimedia-incubator/kafka-hadoop-consumer [19:51:40] to make it submit that job to the standard queue [19:52:30] we really need to be able to set job.xml properties [19:52:43] unless you're okay with it running in default for now... [19:52:49] that kinda worries me, though [19:53:09] its basically the same in default, no? [19:53:13] as it would be in standard? [19:53:30] oh, i' sure we can set that with a -D [19:53:38] ah, good idea. [19:53:40] let's try it. [19:53:56] and no, not the same. because every test job and whatever goes into default [19:54:07] we need to, at least, ensure all the critical jobs are not in that queue [19:54:13] i'll fix the ones in oozie [19:54:18] when i resubmit them [19:54:19] ok [19:54:25] ergh, i built a .deb for this hm [19:54:44] this is why i said this task would take 4 hours :) [20:00:02] i think the pom in this project is way wrong [20:01:00] ? [20:01:12] oh i'm sure o fit [20:01:59] dschoon, before I modify this, can we do a test to see if -Dmapreduce.job.queuename works? [20:02:06] i'll also point out that it depends on a bunch of cdh 4.1.2 stuff [20:02:26] i could fire off a pig job [20:02:28] sure [20:02:30] doing it [20:02:31] k [20:02:57] i'm also going to fix the consumer pom [20:03:29] oook, i actually was hoping not to rebuild this....>>...... [20:03:31] i guess we can [20:03:46] it is really just a wrapper script that needs modified, and we are only running this on one machine [20:03:49] i can edit the script there [20:03:57] i know, the cluster is hacky right now! [20:04:09] ah, excellent [20:04:10] but since this works as is, i'd touch as little as possible [20:04:17] i'd commit my change [20:04:25] SET mapreduce.job.queuename adhoc [20:04:25] should do what i want [20:04:27] oh in pig [20:04:27] ha [20:04:31]