[00:02:19] ya [00:02:19] hey [00:02:40] want to do the pig thing now? [00:02:43] i can come down in 5 [00:04:01] sounds good [00:04:22] cool [00:19:05] night guys [14:36:41] morning peoples [14:41:44] morning! [14:42:13] moorning [14:42:20] ottomata, quick quick question [14:42:43] yessuh [14:42:51] i need to supply for oozie the namenode and job tracker addresses but i am not sure what they are after the redesign.... [14:43:25] hm, yeah i need to make a big ol' doc about that, eh? now that many of the nodes are permanent [14:43:30] namenode is analytics1010 [14:43:39] there is no jobtracker though, [14:43:49] in YARN, no jobtracker [14:43:55] there is ResourceManager on analytics1010 [14:43:56] right i know but..... [14:43:59] which is kind of its counterpart [14:44:09] i'll try that [14:44:11] k [14:44:28] ping average_drifter [14:44:32] thx ottomata [14:46:48] hello drdee ! [14:46:57] just woke up [14:47:14] :D [14:47:16] me too [14:47:23] udp-filters didn't crash on editors squid logs, still running [14:47:33] with the webstatscollector, I set up a run [14:48:09] excellent! [14:48:23] I noticed something, it's running ok and receiving data for ~2m, then it stops in the sense that udp-filter is not saying anything and collector is not saying anything [14:48:48] I was thinkin of doing a ngrep on the udp port to catch the traffic and see what the network activity is [14:51:16] brb 5m [16:25:53] average_drifter.... [16:36:20] I'm here [16:36:33] tyrin to figure out a way to check why's data not passing through [16:41:24] ottomata, if you have some spare cycles, could you help average_drifter? [16:41:52] alright running ngrep on loopback interface locally [16:42:17] messages passing through. using a 504MB file.. but at some point they stop (at least that's what was happening last time I tried) [16:42:36] hmm looks like locally everything went through ok [16:42:48] ottomata: can we make a test run together on stat1 ? [16:43:10] ottomata: I got everything set up over there but I'd like your oppinion on what's happening there because locally the results are ok [16:43:54] ottomata: but on stat1 at some point throughout data transmission udp-filter stops giving any feedback and so does collector and udp-filter doesn't return which would mean that it's still sending/processing data [16:45:05] yeah sure, i can help [16:45:05] i'm working on getting event stream into kafka atm, but can multitask [16:45:06] average_drifter, how can I help? where can I see what's happening? [16:46:01] ottomata: let's go on stat1 please [16:46:32] ok [16:46:47] now in /home/spetrea/webstats_debugging [16:46:48] if we don't pipe udp-filter into log2udp/collector [16:46:54] does udp-fitler process the whole file? [16:46:55] all the binaries needed are there [16:47:20] where is your source file? [16:47:26] ottomata: well yes, I think it shouldn't have any reason to not process all the file [16:47:54] well, we shoudl make sure it does that first, we shoudl figure out if the problem is udp=filter or collector [16:48:07] ottomata: let's take one of the .gz /a/squid/archive/sampled/sampled-1000.log-20120320.gz [16:48:21] ottomata: the problem is collector 99% [16:48:34] so if you pipe that through udp-filter -o [16:48:37] it will finish 100%? [16:48:46] ottomata: collector blocks along the way [16:48:49] ottomata: it stalls [16:49:09] yes [16:49:22] if I pipe through udp-filter -o it goes through all the data fine [16:50:26] so for a test run [16:50:34] ottomata: ./collector-static -d -p 5401 -t 30 [16:50:37] in /home/spetrea/webstats_debugging [16:50:57] and separately [16:50:58] zcat /a/squid/archive/edits/edits.log-20120407.gz | ./udp-filter-static -o | nc -u 0.0.0.0 5401 [16:51:23] locally I put ngrep on the loopback interface, I see when collector stalls, udp-filter still keeps sending data [16:51:31] so there's still data moving on UDP 5401 [16:51:36] but .. collector stops [16:51:46] I mean it doesn't crashes, just stall [16:51:50] *stalls [16:53:51] ok [16:53:51] i don't really know much about collector, can you put a bunch of debugging statements in collector.c and run it again [16:53:51] oh, collector only prints out once per hour, right? [16:53:57] I'm gonna run it in the meantime with GDB locally [16:54:17] ottomata: yes but with the -t 30 switch, it prints every 30 seconds [16:54:23] ok [16:55:08] ok, can I run that ? how long does it take before it stalls? [16:55:39] ottomata: around 2m [16:55:57] ok [16:59:00] ok, I'm putting in some debug statements [16:59:02] locally [17:01:07] ok, i'm running it right now [17:01:08] hmm, i used log2udp [17:01:08] and i htikn it worked [17:01:08] ddin't use netcat [17:01:08] lemme try netcat [17:03:38] ok [17:04:00] ottomata: if you see dumps/ with empty files it means it hasn't printed anything to disk [17:04:08] ottomata: did log2udp prepend that new number ? [17:04:28] ottomata: you remember last time we had an additional number which was generated by log2udp I think ? [17:04:49] ottomata: that's why now the -d switch allows to not read that additional number, so we can run it without log2udp [17:05:12] but you're trying with netcat, that's cool , that's how I tested it also [17:05:30] ottomata: please tell me how I can use log2udp to test with that also [17:09:27] log2udp -h 127.0.0.1 -p 5401 [17:09:47] ottomata: ok, thanks [17:09:55] I attached to the collector with GDB [17:12:03] ok this seems to be the message it stopped on [17:12:04] https://raw.github.com/gist/3279f8aa43e75ba437e0/3cd985551bb18c4c1f27144a810b38248d3acb45/gistfile1.txt [17:13:11] what's the log line? [17:13:22] data=0xbfa75120 "Received message: [1 en 1 20 Dark%20Side%20Of%20The%20Moon]\n\n5%BD%A2%E9%9B%BB%E8%BB%8A_(%E5%88%9D%E4%BB%A3)]\nE5%87%BA%E9%87%8F%E3%81%AE%E6%8A%8A%E6%8F%A1%E7%AD%89%E5%8F%8A%E3%81%B3%E7%AE%A1%E7%90%86%E"..., [17:13:25] ohhh wait [17:13:27] to_do=60) at fileops.c:530 [17:13:31] I think this means it's another buffer overflow [17:13:36] i think collector has a max length of the url as well [17:13:39] yea [17:13:51] we need to make them uniform across the different source files [17:14:07] drdee: so a common header included in both ? [17:14:17] sounds good to me [17:14:18] but these are different projects with different git repositories [17:14:26] so they would have to be in the same repo if we will use a common header [17:14:26] ohh shoot [17:14:39] that's too much work for now [17:14:46] just standardize it [17:14:48] ok I'll just extend the buffers in collector [17:14:50] and make a note in both source files [17:14:53] yes [17:14:59] that if you change one you also need to change the other one [17:15:05] yes [17:51:13] will miss scrum today again (difficult to schedule meeting with Frank Schulenberg about Education Program analytics) [17:55:28] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:02:32] scrum [18:02:41] ottomata, milimetric, dschoon ^^ [18:04:46] oh standup already! [18:15:43] i wonder if maybe my router is just dropping packets like whoa. [18:24:24] 5th local run, collector seems to be doing well [18:24:34] my gnome-terminal sucks , I really need to use something else [18:24:42] rxvt, xterm something [18:28:29] great! [18:42:18] so I think that I thought that collector was blocked because gnome-terminal is silly [18:42:24] ok it works fine [18:42:40] I'm building statics, making the comments in code and switch to stat1 for a final test [18:47:38] nice job! [19:08:42] ok, relocating brb [20:56:04] ottomata? [21:33:11] how do I pull tags ? [21:33:20] weirdly enough I never did that [21:33:33] because if I push them, there surely must be a way to pull them.. [21:33:38] and I need them to build packages [21:34:30] drdee: ^^ is it possible to do that ? [21:35:00] yes [21:35:01] I tried git fetch --tags [21:35:04] but to no availb [21:35:07] *avail [21:35:10] google? [21:35:17] I'll google some more [21:48:11] aiight. i need lunch. [21:48:52] ok I didn't push them to build1 [21:48:56] that was the problem [21:49:03] I need to explicitly push tags to a remote [21:49:29] i have some errands to run that are toward the office, so i'll head that way and do the interview from there [21:51:47] brb soon [22:00:39] building packages [22:00:43] ottomata: hey :) [22:00:47] hiii [22:00:52] ottomata: can you deploy some packages please ? [22:01:05] ottomata: can we make a test run please for udp-filter and webstatscollector ? :) [22:01:08] on locke [22:01:30] no testing on locke :) [22:01:48] either stat1 or somewhere else [22:02:20] oh alright [22:02:21] an01 ? [22:02:23] https://raw.github.com/gist/10ebf691e412036309cf/6aef16ea8bf4f0890886e6e764d787205cc00879/gistfile1.txt [22:03:07] that's what I thought it was locke [22:03:50] well, what we did before was [22:04:00] i ran udp2log | log2udp on an26 [22:04:03] and then on stat1 [22:04:04] ran collector [22:04:16] so ok [22:04:21] gimme packages! [22:08:41] finishing them up now [22:19:14] ottomata: debs ready for precise in /home/spetrea/precise/ [22:19:16] on build1 [22:19:47] ok [22:26:29] average_drifter [22:26:33] does -d mean I can't use log2udp? [22:27:26] average_drifter, can you run the collector test on stat1? [22:27:34] i've got the stream beaming over there right now [22:27:39] you should be able to listen on port 3815 [22:27:42] ottomata: yes [22:27:53] ottomata: -d means it will not read the first number that log2udp uses [22:28:04] ok [22:28:18] ottomata: is the data flowing on the loopback ? [22:28:22] no [22:28:32] then I can't run the collector [22:28:35] oh that's right, i remember this [22:28:36] gah [22:28:37] so dumb [22:28:40] yes.. :( [22:29:08] I remember trying to re-route it to the loopback but that didn't work [22:29:32] ok t here [22:29:33] haha [22:29:35] i just rerouted it [22:29:37] with log2udp [22:29:39] oh poop [22:29:42] ?D [22:29:42] but that adds a nother number [22:29:42] doh [22:31:15] ok, average_drifter [22:31:25] netcat works, but you have to be running collector while you run the netcat fowarder [22:31:28] so [22:31:30] start collector first [22:31:31] then run [22:31:39] stat collector on 3816 [22:31:39] then [22:31:42] start* [22:31:44] netcat -lu stat1.wikimedia.org 3815 | netcat -u 127.0.0.1 3816 [22:32:06] I'm starting the collector now [22:32:10] ok [22:34:41] ok I'm running as you wrote above [22:34:53] so I was missing the -l when I tried to re-route last time [22:35:07] that's why it wasn't working then [22:35:30] so "listen on 3815 and pipe data to a new netcat which sends it to 3816" [22:35:45] on the loopback [22:36:06] waiting for it to run and produce some data on disk now [22:36:53] ok data is being written on stat1 to /home/spetrea/webstats_debugging/dumps/ [22:37:03] cool [22:38:10] ottomata: is this live data ? [22:40:40] woo, nice [22:40:45] yup [22:40:57] very convoluted live data :) [22:41:43] sources -> oxygen -> socat to multicast -> udp2log listener on an26 -> log2udp to stat1 -> netcat -lu | netcat -u 127.0.0.1 | collector [22:59:58] erosen, i just updated the TIM Brasil zero filters [23:00:04] and also added it to the kafka /hadoop thing [23:00:09] nice [23:00:16] what is the kafka hadoop thing? [23:00:43] i had this idea earlier for keeping a single canonical set of partner ips [23:00:51] http://hue.analytics.wikimedia.org/filebrowser/view/wmf/raw/wikipedia-zero?file_filter=any [23:00:54] not sure if it is useful with the X-carrier change [23:01:04] hopefully we won't have to deal with IPs [23:01:09] once that is in place [23:01:12] yeah [23:01:13] k [23:01:21] i see [23:01:29] you added it to the kafka hadoop thing [23:01:33] I misread [23:01:39] well cool [23:01:41] thanks for the heads up [23:01:51] yup [23:01:53] while I have your attention [23:02:02] any ideas on how to get to the job management page? [23:02:18] i'm here [23:02:33] and i need to get to the page on the other end of the Application Master link [23:02:43] i discovered that lynx on an01 works pretty well [23:03:31] ha, yeah [23:03:38] jobs.analytics.wikimedia.org [23:03:39] right? [23:04:00] meant to paste this: http://jobs.analytics.wikimedia.org/cluster/apps/RUNNING [23:04:10] so sort of [23:04:21] aj that's good, right? [23:04:24] what are you looking for? [23:04:30] ah this? [23:04:31] http://analytics1010.eqiad.wmnet:8088/proxy/application_1353342609923_1164/ [23:04:37] this is what I meant: http://jobs.analytics.wikimedia.org/cluster/app/application_1353342609923_1164/ [23:04:52] http://jobs.analytics.wikimedia.org/proxy/application_1353342609923_1164/mapreduce/job/job_1353342609923_1164 [23:04:53] so page i just sent you is sort of only a summary of the job [23:05:16] but if you actually want to see how many mappers and reducers have finished you need to click the link for the ApplicationMaster again [23:05:22] on the page I just sent [23:05:49] which just takes you to the same page [23:06:19] anytime you see a url with http://analytics1010.eqiad.wmnet:8088/ [23:06:21] you can change it to [23:06:25] jobs.analytics.wikimedia.org [23:06:28] yeah [23:06:33] so i've been doing that [23:06:38] and it takes me to the same page [23:06:39] ok, so i'm confused then [23:06:39] ha [23:06:41] which is sort of confusing [23:06:56] but if you type this at a terminal on an01 [23:06:59] lynx http://analytics1010.eqiad.wmnet:8088/proxy/application_1353342609923_1164/ [23:07:03] there's another link though, right? [23:07:10] this is what you are looking for? [23:07:10] http://jobs.analytics.wikimedia.org/proxy/application_1353342609923_1164/mapreduce/job/job_1353342609923_1164 [23:07:48] ya [23:07:52] where do you find this? [23:08:06] the link on this page [23:08:06] http://jobs.analytics.wikimedia.org/proxy/application_1353342609923_1164/ [23:08:14] which is the one you sent me, but with jobs…. [23:08:38] hmm [23:08:48] i'm still confused [23:08:52] let me look at things a sec [23:08:55] ok [23:09:24] aah [23:09:32] i was adding cluser/app/ [23:09:40] to the replacement url [23:09:55] so I was typing jobs.analytics.wikimedia.org/cluster/app/.... [23:10:07] instead of jobs.analytics.wikimedia.org/proxy/.… [23:10:23] well thanks for helping figure this out [23:10:23] ah, ja just replace the domain [23:10:37] yeah that makes the most sense [23:12:18] yup [23:13:10] erosen: check http://pig.apache.org/docs/r0.10.0/func.html#pigstorage you can use the 'tag source' parameter to expose the filename of the raw data in your script, probably you already knew this but anyways.... [23:13:32] niiiiice [23:13:43] wish this would have turned up on some of my google searches a week ago [23:13:59] yeah :) [23:14:29] thanks for the tip [23:14:31] oh that is nice, [23:14:41] wouldn't have to prepend the timestamp for dario on those project count files then [23:16:21] ottomata, how can i find the sqoop / oozie / hadoop log file that contains "Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]" [23:17:01] bwerrr [23:17:32] because the odd thing is this: my job gets killed in oozie [23:17:47] but the jobs.analytics.wikimedia.org page says job ran succesfull [23:18:18] run it again? [23:18:29] otto@analytics1027:~$ tail -f /var/log/{hue,oozie}/* [23:18:29] ? [23:18:35] or is that a log file inside of hadoop? [23:18:39] see http://jobs.analytics.wikimedia.org/cluster/app/application_1353342609923_1150 [23:19:04] i think sqoop has the most useful information but i cannot find the sqoop log files at all [23:20:23] sqoop is just a cli thing, right? its not a daemon [23:21:25] no it's not a daemon [23:22:15] i did tail when running the job [23:22:22] http://history.analytics.wikimedia.org/jobhistory/logs/analytics1014:8041/container_1353342609923_1150_01_000001/job_1353342609923_1150/diederik [23:22:34] it just doesn't give me anything usefulfl except https://gist.github.com/c1895fe4e262efdaf3b5 [23:23:54] yeah hm [23:23:55] i dunno [23:24:15] aaaaand unfortunately i have to ruuun [23:24:27] but I can help fo show monday morning, ja? [23:25:31] this doesn't sound good: Job jar is not present. Not adding any jar to the list of resources. [23:25:39] yeah totally have an awesome weekend! [23:25:42] cu monday [23:25:53] ok laataaaas! [23:38:21] later guys [23:39:24] adieu