[01:50:36] I'm out to get basics(inspite of weird hour). I'm half-way through SquidReportArchive.sh (adding parameters, and commandline switches, preparing for using it on stat1). As a status for wikistats, we have commented out the code which uses /usr/local/bin/geoiplogtag and we've made a couple of runs (me and Erik while screensharing today) and it seemed to run fine. So when I get back I'll finish SquidReportArchive.sh and make a run for a mo [01:51:41] awesome! [14:00:55] morning milimetric, ottomata, average_drifter [14:28:01] ping average_drifter [15:17:44] morning ottomata [15:19:03] mornign! [15:21:06] toothpicks :D [15:24:28] what's the status with the kafka consumers? [15:26:31] good! I got the kafka hadoop consumer .deb built yesterday [15:26:33] today: puppetization [15:26:42] sweeeeeeeeeeeeeeet [15:26:50] so we might be on track for you wed deadline [15:26:57] fpm makes things so easy! [15:27:12] very cool [15:27:31] besides tooth picking, did you actually like my proposal? [15:27:50] yes, i did [15:27:53] I'm FOR 'em! [15:28:15] (you are supposed to google quotes and then know what they are if you don't already :p ) [15:28:16] heheh [15:28:53] i am totally sorry [15:28:58] * drdee is googling [15:29:09] * drdee is feeling very old [15:29:20] * drdee is feeling ashamed [15:29:34] hahahaha [15:29:46] its ok, its not a well known quote [15:29:54] just a funny one [15:30:03] but humor is subjective! [15:31:57] * drdee is feeling excited for being taught the ottomata way [15:32:30] * drdee is a humble apprentice of the ottomata way [15:33:13] hahah get outta heerrreeeee [17:48:33] hey dschoon or erosen, python q for you [17:48:40] aye [17:48:41] i have a list of strings [17:48:47] I want to extract the elements that match a regex [17:48:51] i could iterate over the list [17:48:56] but there seems to be a cooler way [17:48:59] hehe [17:49:09] is there? should I bother looking? [17:49:14] yeah I've been dissatisfied with simple string regex conciseness [17:49:15] i'd know how to do it in ruby [17:49:31] i mean a filter with a lambda might be nice [17:49:41] new_list = list.collect { |element| element.match('/regex/') } [17:49:47] filter(lambda s : re.match(s, pat), list_of strs) [17:50:00] oooo, ok, was looking at that, haven't used filter/lambda [17:50:03] ok will try [17:50:05] hmm [17:50:16] cool [17:50:46] does bat have to be re.compile? [17:50:57] nope [17:51:00] it can take a raw string [17:51:26] also I might have the order reversed on the pattern / test string [17:52:57] ahh, its re.match(pat, s) [17:52:58] :) [17:53:06] nice [17:53:14] cool, works great, thank you! [17:53:17] np [17:53:44] ommnomnom bagel [17:53:47] sup ottomata1? [17:53:57] yes [17:53:59] no no [17:54:01] no lambdas! [17:54:02] nono!? [17:54:02] ever! [17:54:04] you no like lambda! [17:54:05] haha [17:54:06] ok [17:54:06] lambdas are evil! [17:54:11] whaaa, ok? [17:54:11] because you have *comprehensions* [17:54:20] which are way better, prettier, and faster [17:54:30] yes I remember them, but not exactly sure how to use in this case [17:54:39] new_list = [ pat.match(x) for x in list ] [17:54:42] yeah [17:54:43] er [17:54:43] that is nicer [17:54:44] oops [17:54:48] if [17:54:53] new_list = [ x for x in list if pat.match(x) ] [17:54:57] mah bad [17:56:02] pat is re.compile then? [17:56:15] usually how i do it, yeah [17:56:48] ah or re.match works [17:56:49] ok cool [17:56:55] re caches the most recent k patterns actually so you usually don't need to deal with that [17:57:16] does it now [17:57:19] that is new to me! [17:57:21] i think so [17:57:32] cool. makes sense [17:57:42] http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile [17:57:46] i knew that _sre did some weird stuff under the scenes [17:58:54] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:58:58] i am at my favorite cafe with terrible internet today [17:59:00] so here we go! [18:26:31] milimetric can you join on skype? [18:27:03] drdee on skype [18:27:11] ty [19:33:10] drdee, have you seen this before? [19:33:11] java.io.FileNotFoundException: File file:/var/lib/hadoop/data/g/yarn/local/usercache/otto/appcache/application_1353342609923_1847 does not exist [19:33:34] its happening when I run jobs from the ciscos, so something must be configured improperly there [19:33:39] give me 30 minutes [19:33:46] ok [19:33:58] this is probably not relaetd [19:34:37] but I was seeing errors like that when I had an error in a streaming function [19:34:48] yeah? [19:35:00] not sure it was exactly the same [19:35:16] but I definitely got some filenotfoundExceptions for application files [19:35:33] i figured it was because the app died but some process didn't realize it yet [19:35:43] yeah, I which I knew which host was generating that error, that file def does not exist on ciscos, but it isn't supposed to [19:35:47] you know I think you might be right about that [19:54:18] what type of MR job are you running? [19:54:44] (pig, hive, sqoop, streaming) [20:07:23] ottomata ^^ [20:07:27] erosen ^^ [20:07:33] aah [20:07:43] you mean what kind of job was it that I was running [20:07:44] ? [20:07:48] streaming, i think [20:07:48] yes [20:08:04] did the job work? [20:08:13] no [20:08:26] could be an issue with distributing the geoip database file [20:08:27] it was a symtom of a job failing [20:08:30] symptom [20:09:07] but it was a log file that was missing, I think [20:09:17] or a directory [20:09:19] not sure which [20:09:33] can you give me the url of the log file? [20:10:06] this is really ottomatta's thing [20:10:18] i don't even remember when this happened to me [20:10:26] i was just chiming in with a similar experience [20:10:45] oh ok [20:11:17] ottomata, where did you find the java.io.FileNotFoundException error? in which log file? [20:14:46] brb grabbing coffee [20:31:51] ottomata, where did you find the java.io.FileNotFoundException error? in which log file? [20:32:14] its in my stdout when I run the thing [20:32:29] i think evan is right though, its caused by the job dying for another reason before hadoop thinks it should [20:33:07] check the logs on hdfs://var/log/hadoop-yarn/// [20:38:22] yeah i looked there [20:38:26] thanks for that btw [20:39:48] one of my problems [20:39:57] is that the hadoop classpath is funky if I run as a different use than myself [20:40:03] it is using MRv1 [20:40:05] trying to figure out wh [20:41:59] can we run both MRv1 and YARN? [20:42:05] i thought we could only run YARN [20:43:43] word [20:43:47] ottomata, i am talking with ori-l about setting up EventLogging for all edits and send that data to kraken [20:44:15] it's already _kinda_ doing it [20:44:49] what is kinda ? [20:44:51] :) [20:44:58] drdee, re MRv1, exactly, its not supposed to be running MRv1 [20:45:07] right :) [20:45:14] that's what i was thinking [20:45:19] but it is trying to, the hadoop classpath command is picking up the wrong directories when I run as diff use [20:45:24] trying to understand why [20:45:32] means I have to understand all hadoop script and all its crazy env files [20:46:03] ori-l and drdee, anything that is going to event.gif is going into kraken right now (i'm trying to make this more robust, but it is happening) [20:46:21] ori-l does that answer your question? [20:46:50] i'm not sure it's anything; i think it's just esams traffic [20:47:06] but the edit events are generated by mediawiki directly, not event.gif [20:47:26] eqiad /event.gif traffic goes directly to vanadium [20:48:00] i have to check to be 100% sure [20:48:04] but hang on [20:48:33] k [20:52:05] drdee: packages alert :) They are ready [20:52:11] woot woot [20:52:18] let's get them deployed [20:52:23] ok [20:52:25] ottomata: hi :) [20:54:52] ottomata: we have some packages, new versions, can we deploy them please ? [20:55:01] ottomata: it's the webstatscollector and udp-filters [20:58:05] so, you can test them out 100% without me, right? [20:58:07] take the .deb [20:58:09] put it on stat1 [20:58:20] dpkg-deb -x path/to/new.deb ./ [20:58:30] then you'll have a locallly installed version of the pacakge [20:58:31] drdee: Erik just sent an e-mail containing our current status [20:58:35] drdee: I am reading it as well [20:58:37] at ./usr/bin/whatever [20:58:45] ok [20:58:58] ottomata: yes, I'm gonna take them to stat1 [20:59:10] maybe I need to update the udp-filter stream on an26? [20:59:19] ottomata: I'm putting them in /home/spetrea/webstats_debugging/ so we can both see them there [20:59:22] but hang on, my head is in this hadoop thing, i'm sooo close [20:59:24] cool [21:02:39] okay, this will take some tweaking, i'm going to check back in with you guys in a couple of hours [21:03:13] thanks ori-l! [21:09:29] ottomata: on stat1 should we use lucid or precise packages ? [21:09:39] precise [21:09:55] ok [21:26:18] ottomata: got them unpacked in /home/spetrea/webstats_debugging https://gist.github.com/80b0a0fee19bd3e53b60 [21:27:54] are there changes to udp-filter that need to be made for your colelctor stream? [21:28:01] i'm running whatever last version on an26 that you had me run [21:28:49] ottomata: yes [21:29:02] ottomata: there were some fixes which were applied [21:29:10] ottomata: to udp-filters [21:29:11] ok cool, lemme get that up then [21:29:16] alright [21:29:41] still udp-filters_0.3.19_amd64.deb [21:29:42] ? [21:30:23] average_drifter: you fixed the output-collector file (the blog bug) and that is part of udp-filters, right? [21:30:39] drdee: yes [21:31:04] ok, average_drifter, I just unpacked that .deb from you rhome dir on an26, and restarted the udp-filter -o thing [21:31:12] so your stream on stat1 should be updated [21:32:56] ottomata: alright, I'm gonna run the collector right ? [21:33:23] ja, you might have to do that annoying loopback relay thing [21:35:32] average_drifter, once this is running, can you send me a gist with blog pageviews? [21:36:32] drdee: yes [21:38:04] ottomata: is the data on udp port 3815 ? [21:38:08] ottomata: like last time ? [21:39:13] ja, but remember its not 127.0.0.1, its coming from an26 on the main IP [21:39:32] ok [21:39:56] oh, was i running the relay before? [21:39:57] i think I was [21:40:00] was I/ [21:40:02] i can re run it [21:40:17] ottomata: if I do nc -u stat1.wikimedia.org 3815 [21:40:24] netcat -lu [21:40:24] ottomata: I should be seeing data on my screen right ? [21:40:36] with -lu, yes [21:40:45] ok [21:40:52] this was the relay I was running before: [21:40:54] netcat -lu stat1.wikimedia.org 3815 | netcat -u 127.0.0.1 3816 [21:42:32] I did this [21:42:33] spetrea@stat1:~/webstats_debugging$ nc -lu stat1.wikimedia.org 3815 | nc -u 0.0.0.0 3816 [21:42:36] and this [21:42:45] spetrea@stat1:~/webstats_debugging$ ./usr/bin/collector-static -d -p 3816 [21:42:49] data not comin through [21:42:56] I tried with 127.0.0.1 too [21:43:20] hm, you get the data on 3815 though, right? [21:43:34] 3815 from stat1.wikimedia.org yes [21:43:58] but the rerouting seems to not happen, last time it worked [21:44:57] hm [21:45:45] netcat: Address already in use [21:45:46] hm [21:45:53] :~$ netcat -lu 127.0.0.1 3816 [21:45:53] netcat: Address already in use [21:46:08] oh can you kill your collector for a min? [21:46:51] ah it works for me [21:46:53] but [21:46:55] you have to start the listener first [21:47:01] so you need to start collector before you start the relay [21:48:17] ok I'll try [21:49:09] I tried again [21:49:12] same [21:49:20] ottomata: what did you run ? [21:49:44] so you were using 3816, so I did: [21:50:10] for listener terminal: [21:50:10] netcat -lu 127.0.0.1 3817 [21:50:15] then relay in another terminal: [21:50:19] netcat -lu stat1.wikimedia.org 3815 | netcat -u 127.0.0.1 3817 [21:53:00] ok it works now, I'll run it for a couple of minutes to see what output I get [21:53:04] ottomata: thanks ! :) [21:53:11] ottomata: how did you figure out 3816 was busy ? [21:53:22] I may have some collector left somewhere in a screen or something... [21:58:21] i tried to run netcat -u 127.0.0.1 3816 and got Address already in use [21:58:31] ah, k [22:21:32] drdee, woohoo! [22:21:33] https://github.com/wmf-analytics/operations-puppet/blob/analytics/manifests/role/analytics.pp#L73 [22:21:39] i *thikn* it is working [22:21:46] it only runs hourly, so i'll have to check up on it tomorrow [22:21:57] aight!!! [22:22:56] i need to puppetize the wikipedia zero producer; thats still running manualy on an26 [22:22:58] but soooon! [22:23:05] i'm running the consumers on an02 righ tnow [22:23:13] an02 is going to be the storm nimbus server eventually [22:23:20] seemed like an OK place [22:23:24] an01 is running the event producer relay [22:23:29] udp2log -> kafka [22:23:36] an02 does kafka -> hadoop [22:32:00] hey drdee [22:32:13] just double checking, I am going to respond to magnus: [22:32:48] when we said "total counts" these are per article, right? [22:32:53] right [22:33:30] ok, and I guess we could easily maintain separate tables aggregating by project [22:33:39] less of a priority at the moment [22:37:15] drdee: only seeing blog.f [22:37:35] drdee: maybe have to wait a bit more to get blog.w traffic ? [22:48:12] latas duders! [23:44:44] dschoon: busy? could you give me some help with a problem on plots with Rickshaw? [23:44:58] sorry, super-busy at the moment. [23:45:00] tomorrow? [23:45:15] nop, i'll try tomorrow then [23:58:01] DarTar: sorry, didn't see your email when i replied [23:58:39] i've been working on the pageviews for quite some time now… and aggregating them doesn't really take a long time… [23:59:29] i do it on my NAS, an Atom 330 with 2 GB Ram in < 30 minutes, also updating the months, years and all-time stats [23:59:53] I thought Atom CPUs were only on Asus EEE-PC ? [23:59:57] (was cheaper for me to get the algorithm to the data than vice versa)