[13:19:24] mornin [13:59:18] foods! [13:59:22] i'll be back in an hour or so. [14:30:23] drdee: reports reached 25 December 2012 [14:30:35] drdee: morning btw :) [14:30:44] great, did you see my comments btw? [14:30:49] on the e-mail ? [14:30:52] good morning to you as well! [14:30:54] pm [14:30:57] oh on IRC [14:31:00] yes I saw the comments [14:37:12] drdee: so Uganda is off [14:37:26] by 50% [14:37:28] or 100% [14:37:41] mmmmmmm [14:38:01] but the grand total is about te same [14:38:34] which means something happened and some pageviews got shifted somewhere else [14:48:36] ottomata, morning [14:48:53] mooorning [14:48:53] where there any changes to the uganda zero filter in november last year? [14:50:11] not in git log [14:53:06] mmmm [14:53:33] ok, stefan and i will dive into it [14:53:35] so about kafka [14:53:39] i built it last night [14:53:45] mornin [14:53:50] trying to figure out how to publish it to our nexus repo [14:53:51] (i still need a coffeeeee) [14:53:56] buuuuut [14:54:05] there is hadoop consumer when you built kafka [14:54:06] buuuut [14:54:10] did you know that? [14:54:23] yeah [14:54:30] i recall it had limitations we weren't a fan of [14:54:39] unless this is a new thing [14:54:47] they've been moving pretty fast with what's included [14:54:50] but maybe it doesn't have the packet loss issue :) [14:55:07] which is a good feature to have :D [14:55:24] dschoon, are you familiar with sbt? [14:55:35] it's a scala wrapper around maven [14:55:37] otherwise, not really [14:55:46] i figured that part :) [14:55:46] you have a jar to upload, yes? [14:55:50] plenty [14:55:52] you need to log in as admin [14:55:56] i know [14:55:59] i did [14:56:01] kk [14:56:03] this is the thing [14:56:20] these are snapshot jars and those you can't upload manually [14:56:26] Left side: Repositories [14:56:30] only release jars can be uploaded manually [14:56:51] so trying to publish from within sbt [14:56:52] Then select "3rd Party" and switch to the "Upload Artifact" tab at the bottom [14:57:00] no that doesn't work [14:57:02] i tried [14:57:05] i just explained why not [14:57:29] but i have to figure out how to tell sbt to use our nexus repo [14:57:50] you try renaming the jar and changing the meta in the pom? [14:57:59] well renaming is a bad idea [14:58:00] but hold on [14:58:06] drdee, yes yes [14:58:09] it does not use zookeeper [14:58:10] let me build 0.7.3 instead of 0.8.0 snapshot [14:58:22] i'll do the same. [14:58:39] is 0.7.3 out? [14:58:49] http://kafka.apache.org/downloads.html doesn't have it listed [14:59:12] we have 0.7.2 installed [14:59:29] mk. [15:01:04] but i think it would not be that hard to build zk support into the included kafka hadoop consumer [15:01:07] maybe it would work better [15:04:44] Can one consumer meet our need until we build zk support into the new one or patch the current consumer? [15:05:42] i think we don't know [15:05:54] we don't know why the current one is not importing everything [15:06:09] and i'm still skeptical about this udp2log -> KafkaProducer shell bit for larger data streams [15:06:31] I started collecting mobile logs last night on an01 [15:06:35] i'm going to see how we are doing there [15:06:42] find out if and where logs are being lost [15:07:10] i agree that the shell component is a pretty big concern. [15:14:19] ok, kafka 0.7.2 is uploaded to nexus in the 3rd party repo [15:14:27] groupid is org.apache.kafka [15:14:36] artifiactid is kafka [15:14:40] version is 0.72 [15:14:47] version is 0.7.2 [15:14:47] i also added a thirdparty-snapshots repo [15:14:51] and a group for all the snapshots [15:14:53] yup saw that [15:14:54] thx [15:15:00] i will push 0.8 snapshot there [15:17:27] i'm going to go grab a coffee while scala does its thing [15:20:29] hey guys: [15:20:29] https://www.mediawiki.org/wiki/Analytics/Kraken/Firehose [15:20:40] awesoooome [15:21:06] AWESOME INDEED [15:21:07] there are a few other possible pipelines that I didn't list (mainly ones that have different kafka hadoop consumers) [15:21:17] but, i figured there are so many different ways we can and might try [15:21:21] it'll be good to right it down [15:21:24] write* [15:21:31] mmm coffeeee [15:22:01] coffee indeed [15:23:17] ditto on awesome indeed [15:23:58] well, my awesome comment was meant for kafka in maven [15:24:00] this is better than the diagram I was talking about 'cause it's much easier to edit [15:24:02] oh [15:24:07] but yeah [15:24:18] and also we can add our findings of shortcomings to each approach [15:24:23] well that's great too :) [15:25:06] ok, i'm going to sleuth the mobile data now, let's see what I find.... [15:25:47] drdee I tried changing the pom to point to org.apache.kafka, is there something else I need to do to get the Consumer to build? [15:25:56] it doesn't look like it's recognizing that groupid either [15:26:27] 1 sec [15:30:55] milimetric, so i am able to grap the kafka dependency from nexus [15:31:06] use this [15:31:06] [15:31:07] org.apache.kafka [15:31:08] kafka [15:31:09] 0.7.2 [15:31:10] [15:31:10] BUUUUUT [15:31:28] go to https://github.com/wmf-analytics/kraken/tree/master/maven [15:31:45] look at example.settings. xml and make in ~/.m2/settings.xml [15:31:57] then it will grab kafka [15:31:58] i don't really know what you guys are doing [15:31:58] but [15:32:06] here are some changes I had to make to pom.xml to get thigns to work [15:32:07] https://gist.github.com/4608193 [15:32:08] but now it doesn't grab hadoop so let me fix that [15:32:08] cool, thanks drdee, trying [15:32:20] I think we need to grab some of those ZK deps and put them in our repo as well [15:32:35] ZK should already be in nexus [15:32:41] yeahhhhh but which version? [15:32:51] whatever version you need [15:33:06] as specified by the pom [15:33:23] yeah but i think kafka-hadoop-consumer is built with a different zk client or somehting [15:33:26] https://github.com/sgroschupf/zkclient [15:33:28] http://mvnrepository.com/artifact/com.github.sgroschupf/zkclient/0.1 [15:34:27] btw, millimetric, i pushed a change to the moreLogging branch, just made the LOG.error messages a wee more descriptive [15:34:32] back [15:35:24] ottomata, did you get that into nexus already? [15:35:44] eh? [15:35:51] my change? [15:35:58] (i have done nothing with nexus) [15:36:03] okay. [15:36:07] do you want to learn how? [15:36:09] yes [15:36:30] sweet [15:36:38] very much so [15:36:40] it's gotten a ton better [15:36:41] http://nexus.wmflabs.org/ [15:36:56] i feel we are replicating efforts right now [15:36:57] pretty straight-forward, other than... getting the goddamn thing to bind to the right port [15:37:06] tsk. we are *sharing knowledge* [15:37:10] should I read some stuff about what nexus and sonatype are first? [15:37:11] hehe [15:37:13] nah [15:37:14] k [15:37:38] the thing we're doing right now is uploading missing deps [15:37:49] so when we try to build, maven can find them in our repo [15:37:57] oh you do it through web gui? [15:37:57] yep. [15:37:58] ok cool [15:38:01] .jar files usually? [15:38:03] thankfully, no rsyncing or xml any more. [15:38:10] yes, the jar/war [15:38:12] ok [15:38:21] click on Repositories at the left [15:38:26] and this can also proxy/cache against other repos? [15:38:34] like cdh / kafka or whatever? [15:38:37] yep. [15:38:45] ok repositories... [15:38:48] in the case of kafka, it's not in maven central. [15:38:50] (sadly) [15:39:20] aye ok [15:39:29] so we put it in ours [15:39:30] aye [15:39:38] *nod* [15:40:29] i see kafka/hadoop-consumer_2.8.0 in there [15:40:53] i put it there :) [15:41:11] aye, it compiled? [15:41:19] yes [15:41:28] as long as you save https://raw.github.com/wmf-analytics/kraken/master/maven/example.settings.xml to your ~/.m2/ [15:41:33] as settings.xml [15:41:48] alright. what next? [15:42:00] well building this consumer and putting it back through the ringer [15:42:45] so [15:42:48] (ottomata, I think the logger would've given you that extra context from the throwable it takes in (stack trace, etc.) but doesn't hurt to make it better) [15:42:55] yeah [15:42:56] i know [15:43:15] buuuuuuut i like being able to grep the code for messages I see in logging output [15:43:25] i know stacktraces give lines [15:43:25] but more info the better [15:43:25] yep [15:43:32] so, if you guys got it all to compile [15:43:35] i can easily build deb and reinstall it [15:43:39] on an01 [15:43:44] actually [15:43:46] this isa ll an02 [15:43:48] so it will affect all imports [15:43:50] but yeah [15:43:55] so, what do I do? [15:43:58] cool, you should just be able to pull, save that settings file, and it'll compile in IntelliJ (don't know about eclipse) [15:44:06] uhhhhhhhhhhh [15:44:08] IntellJ eh? [15:44:14] it's free now! [15:44:15] hehe [15:44:17] i know i have it [15:44:20] i did not know. <3 [15:44:26] every time I open up those IDEs I lose hours of time though :p [15:44:27] free as in speech [15:44:30] can I just clone on an01 and build? [15:44:30] haha [15:44:32] it's true. [15:44:32] and then build deb? [15:44:35] probably? [15:44:35] sure [15:44:38] give it a shot! [15:44:46] that settings file would still be needed [15:44:52] k [15:45:29] it still goes in ~/.m2/settings.xml [15:45:47] pretty much everything respects it. they all use the same libs. [15:46:06] (though eclipse stores its local repo somewhere other than ~/.m2, barf) [15:46:26] okay, another option [15:47:04] is https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka -> ? [15:47:32] which eliminates the need for a cron or whatever [15:47:47] i'm looking for a bolt that writes to hdfs [15:47:53] because surely this is the most common thing on the planet [15:48:06] (just fyi, https://github.com/nathanmarz/storm-contrib ) [15:48:10] [ERROR] Failed to execute goal on project hadoop_consumer: Could not resolve dependencies for project kafka.consumer:hadoop_consumer:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-common:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.0.0-cdh4.1.2: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4 [15:48:29] yeah, that is a possible pipeline too [15:49:39] [WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2 is missing, no dependency information available [15:50:00] i think drdee was adding those to our repo? [15:50:03] this was with mvn compile right? [15:50:12] tho we're proxying cloudera, sooo... [15:50:19] the issue is with settings.xml [15:50:26] mvn package [15:50:27] but yeah i guess [15:50:29] the mirrorOf value is incorrect [15:50:30] fi [15:50:34] fixing it now [15:50:35] k [15:57:02] ah-ha! [15:57:03] https://github.com/nathanmarz/storm-contrib/tree/master/storm-state [15:57:45] hm, drdee and ottomata - I did a clean clone of that repo, checked out moreLogging and mvn package works ok [15:57:48] unintuitively named, but that's a bolt which provides HDFS-backed collections [15:58:02] I copied my settings.xml directly from that example above [15:58:22] awesome [15:58:30] my mirrorOf is * [15:58:46] milimetric, if ottomata still has problems, try mv ~/.m2/{,_}repository and then build again [15:59:12] ~/.m2/repository is where maven puts the downloaded jars and metadata [15:59:22] deleting or moving it forces it to redownload everything [15:59:22] sounds very similar to the way the flume hdfs sink works [15:59:28] dschoon [15:59:54] in some cases, it can cache resource misses, which is dumb [15:59:56] *ndo* [16:00:25] flume probably has better configuration, though :) [16:00:59] should we all join the hangout? [16:01:28] k, renamed repository [16:01:32] it re dled tons of stuff [16:01:32] but i got [16:01:33] [ERROR] Failed to execute goal on project hadoop_consumer: Could not resolve dependencies for project kafka.consumer:hadoop_consumer:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-common:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.0.0-cdh4.1.2: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4 [16:01:51] you run clean first? [16:01:55] no [16:01:55] ok [16:02:01] it's the same settings issue [16:02:15] what do I need to change? [16:02:15] all requests are redirected to the public repo [16:02:16] that [16:02:17] 's [16:02:19] incorrect [16:02:25] right. [16:02:30] cdh4 requests should go to the cdh4 proxy [16:02:39] i don't know how to do that in settings.xml [16:02:48] there it needs to be fixed [16:02:52] http://nexus.wmflabs.org/nexus/content/groups/public/org/apache/hadoop/hadoop-client/ [16:03:00] wait wait [16:03:03] hold up. [16:03:09] nobody change their settings.xml yet [16:03:14] because that is what groups are for! [16:03:54] omg I think it's downloading the internet [16:04:13] that is precisely what maven does. [16:04:22] :D [16:04:33] $ mvn package && sudo make sandwich [16:04:39] yo [16:04:40] otto [16:04:46] ottomata, try the build again [16:07:06] same deal [16:07:09] do I need to change something? [16:07:31] let me look. [16:09:06] good news for mobile sleuthing: there are only 4 hosts that handle most of the requests [16:10:00] oh, hot. [16:10:37] how's the noise in ops, btw? it seems everything is going well with the dc migration [16:10:57] ok so I tried wiping out .m2 and doing mvn package [16:11:01] it gave me errors [16:11:16] er [16:11:16] I restored my old .m2 and all is ok again [16:11:27] you should just move ~/.m2/repository [16:11:33] that way it keeps your settings.xml file [16:11:40] right, i tried all three ways [16:11:45] reeeeally. [16:11:49] sigh, maven! [16:11:51] remove .m2, remove just .m2/repository [16:12:10] without the settings it doesn't find the kafka jar, obviously [16:12:20] (this is why someone always martyrs himself on the cross of maven, so others need only copy some files into ~ and forget) [16:12:22] with the settings, it fails too [16:12:29] same error as ottomata? [16:12:31] no [16:13:02] sorry, it spit the internet back on my standard output so I can't find the error [16:13:20] it was something about a shaded jar [16:14:02] but what does this mean? That I downloaded the dependency from the funnel or kraken project? [16:14:20] and even though the pom for the consumer is busted, it finds the old download? [16:14:23] what are you trying to build? [16:14:35] and what directory are you in? [16:14:40] https://github.com/wmf-analytics/kafka-hadoop-consumer [16:14:53] git clone that [16:14:56] mvn package [16:15:33] with different .m2 setups (blank; just .m2/settings.xml; my old .m2/*) [16:16:40] drdee: [WARNING] The POM for kafka:kafka:jar:0.7.2 is missing, no dependency information available [16:16:45] did you remove it? [16:16:59] yeah, not your fault, milimetric [16:17:02] no, but now you are not using the settings.xml file [16:17:03] things are misconfigured [16:17:15] so it's not using nexus [16:17:30] I'm not sure who you guys are talking to :) [16:17:33] but I'm using settings.xml [16:17:40] downloaded exactly from here: [16:17:52] https://raw.github.com/wmf-analytics/kraken/master/maven/example.settings.xml [16:18:05] and you can download kafka right? [16:18:25] if I save that settings file into my old .m2, I can clone the consumer and mvn package works [16:18:37] without the settings file, it doesn't find kafka [16:18:42] exactly [16:18:46] without the old .m2 it gives me some weird error about shaded jars [16:18:51] so just give the jar to ottomata [16:18:56] k [16:19:00] the mirroring is not functioning 100% [16:19:08] i need to read the docs more carefully [16:19:15] k [16:19:25] ottomata: how would you like this beautiful jar delivered? [16:20:14] btw, drdee, I get a trillion of these when doing mvn package: "[WARNING] We have a duplicate com/google/inject/servlet/UriPatternType.class in /home/dan/.m2/repository/com/google/inject/extensions/guice-servlet/3.0/guice-servlet-3.0.jar" [16:20:20] millimetric [16:20:25] where are you running? [16:20:29] i need .deb built with .jar [16:20:34] my laptop [16:20:41] there is a script in the kafka-hadoop-consumer to build it [16:20:48] is your laptop ubuntu? [16:20:49] oh cool [16:20:54] milimetric,.mmmmmmm [16:21:01] maybe mvn clean first? [16:21:21] if so, hopefully [16:21:31] hmm [16:21:34] hmm [16:21:57] yes, ubuntu [16:22:01] running the build deb thing now [16:22:28] i'm afraid to run mvn clean [16:22:31] drdee ^ [16:22:46] don't be afraid luke [16:22:47] because maybe it'll wipe out the magic sauce that's letting me build with a misconfigured pom [16:22:52] okay. [16:23:11] i just dialed the cache settings for Central way down [16:23:20] because before it was caching 404s and metadata for 24h [16:23:27] i'm gonna go ahead and disagree Anakin. Because last time you told me to restart reportcard2, I wasn't afraid [16:23:36] millimetric, i think the build script tries to apt get install stuff [16:23:37] like kafka [16:23:42] which you probably don't need with the maven doing it now [16:23:48] yep, it's doing that :) [16:23:54] i'm also going to purge it, so the next builds y'all run which need to download deps will trigger new proxy reqs [16:24:02] it's downloading the internet again :) I now have 3 copies [16:24:12] ok [16:24:18] dschoon, let me know if/when I should try building on an01 again [16:26:59] dschoon && ottomata - the deb script failed because I think it's trying to use a different .m2 directory (not my /home/dan/ one) [16:27:19] oh, that is possible [16:27:19] ah yes [16:27:24] if it's in the pom [16:27:24] "I am such a maven noob" [16:27:26] above this line: [16:27:28] it can be overwritten [16:27:35] # I am such a maven noob: [16:27:35] cd /root/.m2/repository/asm/asm/3.1 && rm asm-3.1.jar && wget http://repo1.maven.org/maven2/asm/asm/3.1/asm-3.1.jar [16:27:35] cd $origdir [16:27:57] AHhh, iremember that crap [16:28:19] i couldn't get mvn to find that (believe me this is hacky as poooooOOOp) [16:28:23] take it out [16:28:31] ahh. [16:28:47] see, i bet that was a 404 that got cached for 24h [16:28:51] i've dialed all that back now [16:28:53] try building again [16:29:08] ok, but btw, that is only a step in the build deb script [16:29:29] lol [16:29:32] wait, [16:29:37] we only have apache.org snapshots? [16:29:39] not releases? [16:29:59] correct me if I'm wrong, but maybe I should just erase lines 1-15, right? So start here: https://github.com/wmf-analytics/kafka-hadoop-consumer/blob/master/build_deb_package.sh#L16 [16:30:06] we should be using the cdh4 repo [16:30:10] yeah [16:30:15] def [16:30:21] we are. [16:30:27] well, keep origdir [16:30:30] cdh4 wasn't originally in the public group [16:30:32] i have fixed that [16:30:38] ah! [16:30:38] it should now be proxied with the original settings conf [16:30:40] dschoon new error now [16:30:45] exciting! [16:31:06] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hadoop_consumer: Compilation failure [16:31:06] [ERROR] /home/otto/kafka-hadoop-consumer/src/main/java/kafka/consumer/KafkaContext.java:[202,31] cannot find symbol [16:31:06]