[13:19:24] mornin [13:59:18] foods! [13:59:22] i'll be back in an hour or so. [14:30:23] drdee: reports reached 25 December 2012 [14:30:35] drdee: morning btw :) [14:30:44] great, did you see my comments btw? [14:30:49] on the e-mail ? [14:30:52] good morning to you as well! [14:30:54] pm [14:30:57] oh on IRC [14:31:00] yes I saw the comments [14:37:12] drdee: so Uganda is off [14:37:26] by 50% [14:37:28] or 100% [14:37:41] mmmmmmm [14:38:01] but the grand total is about te same [14:38:34] which means something happened and some pageviews got shifted somewhere else [14:48:36] ottomata, morning [14:48:53] mooorning [14:48:53] where there any changes to the uganda zero filter in november last year? [14:50:11] not in git log [14:53:06] mmmm [14:53:33] ok, stefan and i will dive into it [14:53:35] so about kafka [14:53:39] i built it last night [14:53:45] mornin [14:53:50] trying to figure out how to publish it to our nexus repo [14:53:51] (i still need a coffeeeee) [14:53:56] buuuuut [14:54:05] there is hadoop consumer when you built kafka [14:54:06] buuuut [14:54:10] did you know that? [14:54:23] yeah [14:54:30] i recall it had limitations we weren't a fan of [14:54:39] unless this is a new thing [14:54:47] they've been moving pretty fast with what's included [14:54:50] but maybe it doesn't have the packet loss issue :) [14:55:07] which is a good feature to have :D [14:55:24] dschoon, are you familiar with sbt? [14:55:35] it's a scala wrapper around maven [14:55:37] otherwise, not really [14:55:46] i figured that part :) [14:55:46] you have a jar to upload, yes? [14:55:50] plenty [14:55:52] you need to log in as admin [14:55:56] i know [14:55:59] i did [14:56:01] kk [14:56:03] this is the thing [14:56:20] these are snapshot jars and those you can't upload manually [14:56:26] Left side: Repositories [14:56:30] only release jars can be uploaded manually [14:56:51] so trying to publish from within sbt [14:56:52] Then select "3rd Party" and switch to the "Upload Artifact" tab at the bottom [14:57:00] no that doesn't work [14:57:02] i tried [14:57:05] i just explained why not [14:57:29] but i have to figure out how to tell sbt to use our nexus repo [14:57:50] you try renaming the jar and changing the meta in the pom? [14:57:59] well renaming is a bad idea [14:58:00] but hold on [14:58:06] drdee, yes yes [14:58:09] it does not use zookeeper [14:58:10] let me build 0.7.3 instead of 0.8.0 snapshot [14:58:22] i'll do the same. [14:58:39] is 0.7.3 out? [14:58:49] http://kafka.apache.org/downloads.html doesn't have it listed [14:59:12] we have 0.7.2 installed [14:59:29] mk. [15:01:04] but i think it would not be that hard to build zk support into the included kafka hadoop consumer [15:01:07] maybe it would work better [15:04:44] Can one consumer meet our need until we build zk support into the new one or patch the current consumer? [15:05:42] i think we don't know [15:05:54] we don't know why the current one is not importing everything [15:06:09] and i'm still skeptical about this udp2log -> KafkaProducer shell bit for larger data streams [15:06:31] I started collecting mobile logs last night on an01 [15:06:35] i'm going to see how we are doing there [15:06:42] find out if and where logs are being lost [15:07:10] i agree that the shell component is a pretty big concern. [15:14:19] ok, kafka 0.7.2 is uploaded to nexus in the 3rd party repo [15:14:27] groupid is org.apache.kafka [15:14:36] artifiactid is kafka [15:14:40] version is 0.72 [15:14:47] version is 0.7.2 [15:14:47] i also added a thirdparty-snapshots repo [15:14:51] and a group for all the snapshots [15:14:53] yup saw that [15:14:54] thx [15:15:00] i will push 0.8 snapshot there [15:17:27] i'm going to go grab a coffee while scala does its thing [15:20:29] hey guys: [15:20:29] https://www.mediawiki.org/wiki/Analytics/Kraken/Firehose [15:20:40] awesoooome [15:21:06] AWESOME INDEED [15:21:07] there are a few other possible pipelines that I didn't list (mainly ones that have different kafka hadoop consumers) [15:21:17] but, i figured there are so many different ways we can and might try [15:21:21] it'll be good to right it down [15:21:24] write* [15:21:31] mmm coffeeee [15:22:01] coffee indeed [15:23:17] ditto on awesome indeed [15:23:58] well, my awesome comment was meant for kafka in maven [15:24:00] this is better than the diagram I was talking about 'cause it's much easier to edit [15:24:02] oh [15:24:07] but yeah [15:24:18] and also we can add our findings of shortcomings to each approach [15:24:23] well that's great too :) [15:25:06] ok, i'm going to sleuth the mobile data now, let's see what I find.... [15:25:47] drdee I tried changing the pom to point to org.apache.kafka, is there something else I need to do to get the Consumer to build? [15:25:56] it doesn't look like it's recognizing that groupid either [15:26:27] 1 sec [15:30:55] milimetric, so i am able to grap the kafka dependency from nexus [15:31:06] use this [15:31:06] [15:31:07] org.apache.kafka [15:31:08] kafka [15:31:09] 0.7.2 [15:31:10] [15:31:10] BUUUUUT [15:31:28] go to https://github.com/wmf-analytics/kraken/tree/master/maven [15:31:45] look at example.settings. xml and make in ~/.m2/settings.xml [15:31:57] then it will grab kafka [15:31:58] i don't really know what you guys are doing [15:31:58] but [15:32:06] here are some changes I had to make to pom.xml to get thigns to work [15:32:07] https://gist.github.com/4608193 [15:32:08] but now it doesn't grab hadoop so let me fix that [15:32:08] cool, thanks drdee, trying [15:32:20] I think we need to grab some of those ZK deps and put them in our repo as well [15:32:35] ZK should already be in nexus [15:32:41] yeahhhhh but which version? [15:32:51] whatever version you need [15:33:06] as specified by the pom [15:33:23] yeah but i think kafka-hadoop-consumer is built with a different zk client or somehting [15:33:26] https://github.com/sgroschupf/zkclient [15:33:28] http://mvnrepository.com/artifact/com.github.sgroschupf/zkclient/0.1 [15:34:27] btw, millimetric, i pushed a change to the moreLogging branch, just made the LOG.error messages a wee more descriptive [15:34:32] back [15:35:24] ottomata, did you get that into nexus already? [15:35:44] eh? [15:35:51] my change? [15:35:58] (i have done nothing with nexus) [15:36:03] okay. [15:36:07] do you want to learn how? [15:36:09] yes [15:36:30] sweet [15:36:38] very much so [15:36:40] it's gotten a ton better [15:36:41] http://nexus.wmflabs.org/ [15:36:56] i feel we are replicating efforts right now [15:36:57] pretty straight-forward, other than... getting the goddamn thing to bind to the right port [15:37:06] tsk. we are *sharing knowledge* [15:37:10] should I read some stuff about what nexus and sonatype are first? [15:37:11] hehe [15:37:13] nah [15:37:14] k [15:37:38] the thing we're doing right now is uploading missing deps [15:37:49] so when we try to build, maven can find them in our repo [15:37:57] oh you do it through web gui? [15:37:57] yep. [15:37:58] ok cool [15:38:01] .jar files usually? [15:38:03] thankfully, no rsyncing or xml any more. [15:38:10] yes, the jar/war [15:38:12] ok [15:38:21] click on Repositories at the left [15:38:26] and this can also proxy/cache against other repos? [15:38:34] like cdh / kafka or whatever? [15:38:37] yep. [15:38:45] ok repositories... [15:38:48] in the case of kafka, it's not in maven central. [15:38:50] (sadly) [15:39:20] aye ok [15:39:29] so we put it in ours [15:39:30] aye [15:39:38] *nod* [15:40:29] i see kafka/hadoop-consumer_2.8.0 in there [15:40:53] i put it there :) [15:41:11] aye, it compiled? [15:41:19] yes [15:41:28] as long as you save https://raw.github.com/wmf-analytics/kraken/master/maven/example.settings.xml to your ~/.m2/ [15:41:33] as settings.xml [15:41:48] alright. what next? [15:42:00] well building this consumer and putting it back through the ringer [15:42:45] so [15:42:48] (ottomata, I think the logger would've given you that extra context from the throwable it takes in (stack trace, etc.) but doesn't hurt to make it better) [15:42:55] yeah [15:42:56] i know [15:43:15] buuuuuuut i like being able to grep the code for messages I see in logging output [15:43:25] i know stacktraces give lines [15:43:25] but more info the better [15:43:25] yep [15:43:32] so, if you guys got it all to compile [15:43:35] i can easily build deb and reinstall it [15:43:39] on an01 [15:43:44] actually [15:43:46] this isa ll an02 [15:43:48] so it will affect all imports [15:43:50] but yeah [15:43:55] so, what do I do? [15:43:58] cool, you should just be able to pull, save that settings file, and it'll compile in IntelliJ (don't know about eclipse) [15:44:06] uhhhhhhhhhhh [15:44:08] IntellJ eh? [15:44:14] it's free now! [15:44:15] hehe [15:44:17] i know i have it [15:44:20] i did not know. <3 [15:44:26] every time I open up those IDEs I lose hours of time though :p [15:44:27] free as in speech [15:44:30] can I just clone on an01 and build? [15:44:30] haha [15:44:32] it's true. [15:44:32] and then build deb? [15:44:35] probably? [15:44:35] sure [15:44:38] give it a shot! [15:44:46] that settings file would still be needed [15:44:52] k [15:45:29] it still goes in ~/.m2/settings.xml [15:45:47] pretty much everything respects it. they all use the same libs. [15:46:06] (though eclipse stores its local repo somewhere other than ~/.m2, barf) [15:46:26] okay, another option [15:47:04] is https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka -> ? [15:47:32] which eliminates the need for a cron or whatever [15:47:47] i'm looking for a bolt that writes to hdfs [15:47:53] because surely this is the most common thing on the planet [15:48:06] (just fyi, https://github.com/nathanmarz/storm-contrib ) [15:48:10] [ERROR] Failed to execute goal on project hadoop_consumer: Could not resolve dependencies for project kafka.consumer:hadoop_consumer:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-common:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.0.0-cdh4.1.2: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4 [15:48:29] yeah, that is a possible pipeline too [15:49:39] [WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2 is missing, no dependency information available [15:50:00] i think drdee was adding those to our repo? [15:50:03] this was with mvn compile right? [15:50:12] tho we're proxying cloudera, sooo... [15:50:19] the issue is with settings.xml [15:50:26] mvn package [15:50:27] but yeah i guess [15:50:29] the mirrorOf value is incorrect [15:50:30] fi [15:50:34] fixing it now [15:50:35] k [15:57:02] ah-ha! [15:57:03] https://github.com/nathanmarz/storm-contrib/tree/master/storm-state [15:57:45] hm, drdee and ottomata - I did a clean clone of that repo, checked out moreLogging and mvn package works ok [15:57:48] unintuitively named, but that's a bolt which provides HDFS-backed collections [15:58:02] I copied my settings.xml directly from that example above [15:58:22] awesome [15:58:30] my mirrorOf is * [15:58:46] milimetric, if ottomata still has problems, try mv ~/.m2/{,_}repository and then build again [15:59:12] ~/.m2/repository is where maven puts the downloaded jars and metadata [15:59:22] deleting or moving it forces it to redownload everything [15:59:22] sounds very similar to the way the flume hdfs sink works [15:59:28] dschoon [15:59:54] in some cases, it can cache resource misses, which is dumb [15:59:56] *ndo* [16:00:25] flume probably has better configuration, though :) [16:00:59] should we all join the hangout? [16:01:28] k, renamed repository [16:01:32] it re dled tons of stuff [16:01:32] but i got [16:01:33] [ERROR] Failed to execute goal on project hadoop_consumer: Could not resolve dependencies for project kafka.consumer:hadoop_consumer:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-common:jar:2.0.0-cdh4.1.2, org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.0.0-cdh4.1.2: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4 [16:01:51] you run clean first? [16:01:55] no [16:01:55] ok [16:02:01] it's the same settings issue [16:02:15] what do I need to change? [16:02:15] all requests are redirected to the public repo [16:02:16] that [16:02:17] 's [16:02:19] incorrect [16:02:25] right. [16:02:30] cdh4 requests should go to the cdh4 proxy [16:02:39] i don't know how to do that in settings.xml [16:02:48] there it needs to be fixed [16:02:52] http://nexus.wmflabs.org/nexus/content/groups/public/org/apache/hadoop/hadoop-client/ [16:03:00] wait wait [16:03:03] hold up. [16:03:09] nobody change their settings.xml yet [16:03:14] because that is what groups are for! [16:03:54] omg I think it's downloading the internet [16:04:13] that is precisely what maven does. [16:04:22] :D [16:04:33] $ mvn package && sudo make sandwich [16:04:39] yo [16:04:40] otto [16:04:46] ottomata, try the build again [16:07:06] same deal [16:07:09] do I need to change something? [16:07:31] let me look. [16:09:06] good news for mobile sleuthing: there are only 4 hosts that handle most of the requests [16:10:00] oh, hot. [16:10:37] how's the noise in ops, btw? it seems everything is going well with the dc migration [16:10:57] ok so I tried wiping out .m2 and doing mvn package [16:11:01] it gave me errors [16:11:16] er [16:11:16] I restored my old .m2 and all is ok again [16:11:27] you should just move ~/.m2/repository [16:11:33] that way it keeps your settings.xml file [16:11:40] right, i tried all three ways [16:11:45] reeeeally. [16:11:49] sigh, maven! [16:11:51] remove .m2, remove just .m2/repository [16:12:10] without the settings it doesn't find the kafka jar, obviously [16:12:20] (this is why someone always martyrs himself on the cross of maven, so others need only copy some files into ~ and forget) [16:12:22] with the settings, it fails too [16:12:29] same error as ottomata? [16:12:31] no [16:13:02] sorry, it spit the internet back on my standard output so I can't find the error [16:13:20] it was something about a shaded jar [16:14:02] but what does this mean? That I downloaded the dependency from the funnel or kraken project? [16:14:20] and even though the pom for the consumer is busted, it finds the old download? [16:14:23] what are you trying to build? [16:14:35] and what directory are you in? [16:14:40] https://github.com/wmf-analytics/kafka-hadoop-consumer [16:14:53] git clone that [16:14:56] mvn package [16:15:33] with different .m2 setups (blank; just .m2/settings.xml; my old .m2/*) [16:16:40] drdee: [WARNING] The POM for kafka:kafka:jar:0.7.2 is missing, no dependency information available [16:16:45] did you remove it? [16:16:59] yeah, not your fault, milimetric [16:17:02] no, but now you are not using the settings.xml file [16:17:03] things are misconfigured [16:17:15] so it's not using nexus [16:17:30] I'm not sure who you guys are talking to :) [16:17:33] but I'm using settings.xml [16:17:40] downloaded exactly from here: [16:17:52] https://raw.github.com/wmf-analytics/kraken/master/maven/example.settings.xml [16:18:05] and you can download kafka right? [16:18:25] if I save that settings file into my old .m2, I can clone the consumer and mvn package works [16:18:37] without the settings file, it doesn't find kafka [16:18:42] exactly [16:18:46] without the old .m2 it gives me some weird error about shaded jars [16:18:51] so just give the jar to ottomata [16:18:56] k [16:19:00] the mirroring is not functioning 100% [16:19:08] i need to read the docs more carefully [16:19:15] k [16:19:25] ottomata: how would you like this beautiful jar delivered? [16:20:14] btw, drdee, I get a trillion of these when doing mvn package: "[WARNING] We have a duplicate com/google/inject/servlet/UriPatternType.class in /home/dan/.m2/repository/com/google/inject/extensions/guice-servlet/3.0/guice-servlet-3.0.jar" [16:20:20] millimetric [16:20:25] where are you running? [16:20:29] i need .deb built with .jar [16:20:34] my laptop [16:20:41] there is a script in the kafka-hadoop-consumer to build it [16:20:48] is your laptop ubuntu? [16:20:49] oh cool [16:20:54] milimetric,.mmmmmmm [16:21:01] maybe mvn clean first? [16:21:21] if so, hopefully [16:21:31] hmm [16:21:34] hmm [16:21:57] yes, ubuntu [16:22:01] running the build deb thing now [16:22:28] i'm afraid to run mvn clean [16:22:31] drdee ^ [16:22:46] don't be afraid luke [16:22:47] because maybe it'll wipe out the magic sauce that's letting me build with a misconfigured pom [16:22:52] okay. [16:23:11] i just dialed the cache settings for Central way down [16:23:20] because before it was caching 404s and metadata for 24h [16:23:27] i'm gonna go ahead and disagree Anakin. Because last time you told me to restart reportcard2, I wasn't afraid [16:23:36] millimetric, i think the build script tries to apt get install stuff [16:23:37] like kafka [16:23:42] which you probably don't need with the maven doing it now [16:23:48] yep, it's doing that :) [16:23:54] i'm also going to purge it, so the next builds y'all run which need to download deps will trigger new proxy reqs [16:24:02] it's downloading the internet again :) I now have 3 copies [16:24:12] ok [16:24:18] dschoon, let me know if/when I should try building on an01 again [16:26:59] dschoon && ottomata - the deb script failed because I think it's trying to use a different .m2 directory (not my /home/dan/ one) [16:27:19] oh, that is possible [16:27:19] ah yes [16:27:24] if it's in the pom [16:27:24] "I am such a maven noob" [16:27:26] above this line: [16:27:28] it can be overwritten [16:27:35] # I am such a maven noob: [16:27:35] cd /root/.m2/repository/asm/asm/3.1 && rm asm-3.1.jar && wget http://repo1.maven.org/maven2/asm/asm/3.1/asm-3.1.jar [16:27:35] cd $origdir [16:27:57] AHhh, iremember that crap [16:28:19] i couldn't get mvn to find that (believe me this is hacky as poooooOOOp) [16:28:23] take it out [16:28:31] ahh. [16:28:47] see, i bet that was a 404 that got cached for 24h [16:28:51] i've dialed all that back now [16:28:53] try building again [16:29:08] ok, but btw, that is only a step in the build deb script [16:29:29] lol [16:29:32] wait, [16:29:37] we only have apache.org snapshots? [16:29:39] not releases? [16:29:59] correct me if I'm wrong, but maybe I should just erase lines 1-15, right? So start here: https://github.com/wmf-analytics/kafka-hadoop-consumer/blob/master/build_deb_package.sh#L16 [16:30:06] we should be using the cdh4 repo [16:30:10] yeah [16:30:15] def [16:30:21] we are. [16:30:27] well, keep origdir [16:30:30] cdh4 wasn't originally in the public group [16:30:32] i have fixed that [16:30:38] ah! [16:30:38] it should now be proxied with the original settings conf [16:30:40] dschoon new error now [16:30:45] exciting! [16:31:06] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hadoop_consumer: Compilation failure [16:31:06] [ERROR] /home/otto/kafka-hadoop-consumer/src/main/java/kafka/consumer/KafkaContext.java:[202,31] cannot find symbol [16:31:06] [ERROR] symbol : method warning(java.lang.String,java.lang.InterruptedException) [16:31:06] [ERROR] location: interface org.slf4j.Logger [16:31:18] oh! [16:31:22] maybe that was one of my changes! [16:31:48] yup I added that [16:31:53] what no warning() method?! [16:32:14] hehe [16:32:18] ah warn() [16:33:02] heh. [16:33:02] yay compiled!!!!! [16:33:06] perhaps i was a bit aggressive. [16:33:13] as everything is incredibly slow atm. [16:33:49] you fuckers, nexus is using 94% cpu on kripke. [16:34:05] ok ottomata [16:34:12] i got a deb [16:34:30] yeehawww, i think i'm about to as well [16:34:37] oh good [16:34:53] i'll push my script change [16:34:59] woo. [16:35:20] i just pushed a change too :p [16:35:20] fwiw, i think the problem was that we weren't proxying Apache [16:37:41] ok, so you can build too ottomata? [16:37:48] if not I can just get you this deb [16:38:04] yup! [16:38:12] i got it [16:38:34] cool beans. So let's regroup after you get this into testing [16:38:45] and groupthink the next step [16:43:19] okay, i believe i have all the repos in nexus set up right [16:43:29] (with non-degenerate cache settings) [16:45:01] note that you can use nexus to look up the full coordinates of artifacts [16:45:02] http://nexus.wmflabs.org/nexus/index.html#nexus-search;quick~kafka [16:45:23] i'll test out your beliefs by bashing my .m2/repo again [16:46:49] COMMENCE DOWNLOADING THE INTERNET [16:48:37] mk, it must've cached the internet somewhere close by because it was faster that time [16:48:39] and it worke [16:48:44] gj dschoon [16:48:51] woo. [16:48:56] yeah, i upped the cache times [16:49:14] i was ok with maven until you said that exact statement [16:49:19] now i hate it [16:49:44] but i'm glad you're the martyr :) [16:49:51] * milimetric scurries back to limn [16:51:08] oh, drdee - re: duplicate class warnings [16:51:13] I did mvn clean and they still show up [16:51:18] three places: [16:51:29] org/apache/commons/collections/* [16:51:33] org.slf4j.* [16:51:40] org/apache/commons/beanutils/* [16:52:02] javax/servlet/* [16:52:06] and I lied, fourth place: [16:52:17] org/apache/jasper/compiler/Localizer.class [16:53:19] weird. [16:53:31] the servlet api shouldn't have any classes. just interfaces. [16:54:44] heh [16:54:52] nexus is actually a pretty fantastic piece of software [16:54:53] example: [16:54:54] http://nexus.wmflabs.org/nexus/service/local/feeds [16:55:28] that's a registry of data feeds reflecting various changes in the server [16:55:49] new/cached/invalidated artifacts, releases, snapshots [17:36:51] restarting nexus [17:36:53] (done) [18:00:45] interesting! [18:00:47] check this out boys [18:00:48] https://gist.github.com/4611076 [18:01:17] hm, can't get into the standup drdee [18:01:19] i should go and triple check my work [18:01:20] oh its time [18:02:01] yeah its not working for me either it hikn [18:02:08] i think server problems [18:02:11] skype? [18:02:14] interesting. [18:02:20] ya [18:02:48] ok [18:02:52] same -- problems. [18:02:55] --> skype! [18:04:02] it may be possible to create a new hangout that works [18:04:09] (not tied to a cal entry) [18:04:17] that worked for me this morning [18:04:50] we are trying to get you on skype [18:04:53] robla ^^ [18:23:08] hi everybody [18:24:45] I can't make it to this morning's hangout but I wanted to schedule a chat to review the limn changes you guys presented last week [18:25:08] milimetric, drdee: ok if I book you for an hour tomorrow? [18:25:15] I gotta say this is one siren of a problem. 14.6*% packet loss. What the heck?! [18:25:24] hi DarTar - sure [18:25:52] sure [18:26:10] DarTar: loop me into that as well, if you would. [18:26:18] sure [18:26:29] tyty [18:26:57] btw dschoon, I think you own unpushed reportcard-data changes [18:27:04] possibly! [18:27:06] i will check. [18:27:11] what's the symptom? [18:40:08] ottomata, milimetric, drdee -- i added diff and % comparisons to that gist. https://gist.github.com/4611085 [18:40:30] ottomata: where are the result files? [18:40:32] an02? [18:40:43] sorry to interrupt [18:40:55] but could you please tell me what is this packet loss problem ? [18:40:59] is it the same as with the udp-filter ? [18:41:02] but now on kraken ? [18:41:14] * average_drifter is just curious [18:41:31] no, probably unrelated. [18:41:38] /a/otto/mobile-sleuth [18:42:17] we're seeing a discrepancy in the number of messages received by: udp2log, then imported by kafka, then pushed into hdfs [18:42:27] you can see the diff in that link i just pasted, average_drifter [18:45:27] heh [18:45:35] i just came across this, ottomata, drdee, milimetric: http://bb10.com/java-clojure-storm/2012-06/msg00129.html [18:46:08] Storm + Kafka is a very effective log processing solution. A number of users of Storm use this combination, including us at Twitter in a few instances. Kafka gives you a high throughput, reliable way to persist/replay log messages, and Storm gives you the ability to process those messages in arbitrarily complex ways. [18:46:08] - nathan marz [18:48:16] yeah so we're definitely missing something obvious [18:48:31] I'm reading the mailing list now. [18:48:31] and to me, the Consumer seems most likely [19:01:56] I'm one setp away from fixing dclass [19:02:17] problem at hand : nm is saying dclass_load_file is "U" in the 2nd column [19:02:24] that means that symbol is undefined [19:03:41] -no-undefined seems to not do it [19:04:07] this is JNI stuff, right? i don't know much about it :( [19:04:49] dschoon: yep JNI [19:04:58] dschoon: this is actually a gnu toolchain problem [19:05:02] so not JNI specific [19:05:27] :/ [19:05:33] not my area of expertise, sadly [19:05:37] ok [19:05:56] ottomata, milimetric, drdee [19:06:00] i have a suspicion [19:06:05] don't type my screen name! [19:06:08] i am trying to go eaaaat [19:06:18] we modified kafka-hadoop-consumer to build with CDH4 [19:06:49] but (speaking for myself) we don't have really comprehensive knowledge of what change in those APIs between major versions [19:06:51] so it builds. [19:06:51] let otto eat, tell us tantalization after :) [19:06:55] haha [19:07:05] but it's still possible the semantics of the APIs changed [19:07:13] yeahhhhhHHHhhhhhhh maybe, the APIs are supposed to be backwards compatible though [19:07:15] mabye [19:07:16] things like timeouts, buffer-waits, etc [19:07:44] ok be back in a bit [19:07:48] how the parts use zk, how frequently blocks are flushed to NN... [19:08:12] well FINE [19:08:17] then i'm afk food, TOO [19:32:58] I hate autotools [19:33:00] I hate it [19:33:09] I swear I could make a build system from scratch [19:33:25] but you are not gonna do it :D [19:33:40] yes [19:34:08] I wonder if looking into other build systems like cmake or some other stuff is worth it [19:34:39] I guess it doesn't make sense in terms of time [19:34:43] I mean it all boils down to time [19:35:31] there is an autotools book out there, first chance I get I should read it [19:36:17] my good deed for the day: [19:36:18] https://gerrit.wikimedia.org/r/#/c/45353/ [19:36:21] erosen ^^ [19:36:44] nice [19:37:04] not sure i know what MCC-MNC means [19:37:06] but i'm googling [19:37:29] nice [19:37:31] international agreed mobile carrier codes [19:37:36] seems like a good idea [19:37:42] does anyone at wmf use them [19:37:48] we now do [19:37:49] i mean clearly they should [19:37:51] gotcha [19:37:52] this was an ops idea [19:37:56] i just patched it [19:37:58] seems like a good idea to me [19:38:16] now you need a dict mcc-mnc -> full carrier name [19:38:21] and done [19:38:27] ya [19:38:28] check also http://www.mcc-mnc.com/ [19:38:29] not so bad [19:48:30] ok I got a bunch of questions [19:48:39] drdee: do we really need libdclass.so or just libdclassjni.so ? [19:48:47] so, I solved the problem [19:48:54] but I have question [19:49:08] whatever works :) [19:49:26] the problem was that I wrestled with autotools to convince it that I want dclass_load_file and 2 other functions to have defined symbols [19:49:29] in libdclassjni.so [19:49:48] but because we were linking dynamically with libdclass.so [19:49:50] it seems that those symbols were undefined [19:50:16] and I think that makes sense since that's what a dynamic library is supposed to be, I mean you link to it, and the symbols are defined in the dynamic library not the library you link to [19:50:37] so if A.so has functions A_f and A_g [19:50:45] and you have B.so which uses A_f and A_g [19:50:52] they will appear as undefined in B.so [19:50:57] and JNI doesn't like that [19:51:01] JNI wants them to be defined [19:51:14] so we eliminate A.so and just have B.so which has everything in it [19:51:19] that's my current solution [19:51:52] ok [19:52:00] cool [20:02:30] drdee: done [20:02:34] drdee: please try it out [20:02:42] pushed? [20:02:45] drdee: cd kraken; git pull origin master [20:02:52] drdee: cd libdclass; git pull origin package [20:03:17] drdee: cd libdclass; make clean; rm -rf debian/libdclass-dev .libs Makefile configure debian/libdclass-dev*; autoconf; automake --add-missing; dpkg-buildpackage -us -uc ; sudo dpkg -i ../libdclass-dev_2.0.12_amd64.deb [20:03:40] aight [20:03:58] country reports are on the last day [20:04:03] they should be ready soon [20:29:47] new country reports for December [20:29:48] http://stat1.wikimedia.org/spetrea/wikistats/mobile_countryreports_devices_deploy/r3/2012-12/SquidReportCountryData.htm [20:30:24] great! [20:30:40] and for November(the ones which were available since a few days ago) http://stat1.wikimedia.org/spetrea/wikistats/mobile_countryreports_devices_deploy/r2/2012-11/SquidReportCountryData.htm [20:35:02] soooooo average_drifter did something pretty cool! he wrote the JNI wrapper for the C library dClass (to identify devices based on user agent string), he created a Debian package and it compiles and runs on both Ubuntu and OSX [20:35:13] very cool! [20:39:16] totally awesome! [20:52:17] brb a few [20:54:50] AH [20:54:54] my script TOTALLY had an error in it [20:57:32] hmmm maybe it did [20:57:33] hmm [20:58:44] nope I take it back [21:00:50] https://gist.github.com/4613161 [21:01:55] dschoon, can I run over what I did here to be extra sure? [21:01:58] oh you are brb [21:02:01] milimetric? [21:03:15] sorry, drdee call [21:06:48] that's only 5% loss during the times I checked [21:09:25] so that's back to the emery/locke levels? [21:11:48] i guess so, i dunno [21:12:22] hi! all yours ottomata [21:12:44] back in one moment [21:13:30] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90?authuser=1 [21:13:50] can't we chop the problem in even smaller components, and rule out alternative explanations? measure packet loss when it enters the an* machines, packet loss when it enters kafka whatever, packet loss when it leaves kafka, packet loss when it's about to go into hdfs, packet loss when it's in hdfs [21:15:03] drdee: that's basically what we're doing by counting seqs [21:15:38] so far we are comparing entire flows against each other [21:24:35] guys do you think it would be possible to publish, together with the clicks, also the paths of the users over the wikipedia pages? it would be realy interesting [21:25:14] I mean something like " 50" [21:25:41] meaning that 50 persons clicked on Michelangelo in the Leonardo da vinci page [22:09:36] no, i'm comparing the collected data at different points in the flow [22:19:38] you just wrote pig script to use dClass jni stuff [22:20:23] ottomata, if average_drifter gives you the deb, could you install it on an* and then i can see if it actually really works : :D [22:20:44] pig script,i mean pig udf [22:20:55] yup [22:21:00] totally [22:24:51] ok average_drifter will email you the deb [22:28:18] drdee , ottomata .deb available on stat1 in /home/spetrea/releases/libdclass-dev [22:28:33] awesome! [22:34:29] hey folks, do you know if we maintain an archive of pages like: http://stats.wikimedia.org/EN/Sitemap.htm ? [22:35:30] i.e., is there an easy way of getting historical data on a monthly basis for all projects from ez's reports? [22:35:54] give me 1 sec [22:37:56] diegolo_: We don't have that sort of analysis yet, but we're broadly referring to it as funnel analysis. DarTar is doing some work in that and we investigated it a little bit. I personally think it's something that might be more and more requested in the coming months [22:38:49] DarTar: i am afraid not, but check with EZ to be sure [22:39:06] drdee: will do, thx [22:39:07] milimetric, cool, if I can help in some way, plz let me know [22:40:05] sure thing diegolo_, we'll try to keep as open as possible :) [23:03:58] DarTar: i'm certain we don't, sadly. we talked about setting up a blackbox test to scrape the page every few hours to detect when numbers changed too much, but it got deprioritized into oblivion. [23:04:15] ha shame [23:04:39] i initially suggested it'd be easy to build that because, you know, clearly every deploy persists forever, and you just update a symlink, right? [23:04:40] right? [23:04:43] ... [23:04:49] the request came from WSOR's Giovanni and he's going to follow up with EZ [23:05:02] but no. the deploy script explicitly rm -rf the old content [23:05:03] maybe archive.org can come to the rescue? [23:05:17] for you, possibly! [23:05:27] i needed higher fidelity (on the order of hours, not weeks) [23:05:29] but seriously that's a hack we shouldn't even consider mentioning publicly [23:05:35] i know. [23:05:37] jeez. [23:06:30] meanwhile, speaking of reliability, packetloss is down to -1.5% on some machines! [23:06:59] i am going to start claiming packetloss needs to be expressed using complex numbers [23:07:04] because this is all fucking imaginary. [23:07:20] you can write them in polar form [23:07:30] e^z [23:07:35] then we can all go in a circle while debugging! [23:07:40] :)) [23:07:51] aiight :) i'm done for the day. i've been at this since 6a PST :) [23:08:02] ping me if y'all need anything [23:21:08] drdee: splitting counting requests between /wiki/ and /w/api.php raises new questions [23:21:43] drdee: what will the new delta be now ? will we have two deltas or just one as before ? [23:31:06] just one