[13:44:06] morning fella's [13:47:43] morning drdee [13:50:59] MORNIGN! [13:51:00] so [13:51:07] did you fix the limn link bug? [13:51:24] (i got confused) [13:51:53] qchris…..the bat cave link is broken [13:52:00] this is really really annoying [13:52:10] Hi drdee_ [13:52:16] The one that I sent out today? [13:52:30] yup [13:52:43] (at least for me) [13:53:13] Harrr. That's annoying :-( [13:53:20] So gogle is sabotaging us. [13:54:26] Yes, it now says "Party is over" for me as well. [13:54:49] and soon the party is over for google hangouts in general [13:54:51] Mhmmm. How did you create the previous hangout? [13:55:06] just create it :) [13:55:21] i think at least 1 person needs to stay in the hangout [13:55:29] to make sure it does not expire [13:55:31] ? [13:55:36] :-D [13:55:40] if everybody leaves the link will expire [13:55:46] (hypothesis0 [13:56:12] I tried that before posting the link [13:56:17] I created the hangout. [13:56:23] Went in. [13:56:25] Went out. [13:56:30] Waited for 30 minutes. [13:56:31] Idea for a solution: Using Google Hangout API to automatically, each day create a new url, and have a static URL(for our team) that points to that new dynamic URL. For further information https://developers.google.com/+/hangouts/api/ [13:56:33] Went in again [13:56:46] average: that's not a bad idea [13:56:49] :) [13:56:52] we can do it commanderdata [13:57:02] and ask him to give the link [13:57:09] yes, theoretically it would be possible [13:57:37] milimietric; qchris: ^^ [13:58:16] If a static hangout Url is not possible, that sounds like the next best thing. [14:01:15] is there a python lib for the api? [14:01:35] milimetric: is the limn link bug fixed? [14:01:40] yes [14:01:42] you are really keeping me in suspense [14:01:42] but not deployed [14:01:45] k [14:01:46] because the deployer is dead [14:01:52] file a new bug :D [14:01:56] heh [14:02:04] easy/hard? [14:02:32] no idea [14:02:44] it's some ssh bug [14:02:55] * milimetric knows next to nothing about ssh [14:03:31] guys I'm looking at alternatives but it's crystal clear why Google Hangouts is popular [14:03:32] I haven't had any experience with that before [14:03:44] everything else looks like ... frankly, shit [14:05:24] I feel CommanderData is a cyborg [14:05:35] it's comments are too many times spot on [14:05:59] can ottomata help you? [14:06:04] (with the ssl bug) [14:06:11] i mean ssh [14:06:27] ok i've to pick up a car, should be back in 60 minutes [14:06:42] if somebody finds a google lib for the hangout api please email it to me [14:09:01] actually i don't think we need a lib for this [14:09:02] that hangout api is javascript. You guys are too open source-minded for this problem. It's really just as easy as calling and demanding support [14:09:13] we can just use the requests lib [14:09:17] you do that milmietric :) [14:09:22] see ya in a bit [14:09:26] k, later [14:12:35] and here is an answer. this person sets his google calendar to automatically create hangouts for his google calendar events, and then he is able to add attendees to that event [14:12:45] in our case the event is the daily meeting [14:12:53] http://www.riskcompletefailure.com/2012/11/programmatically-scheduling-hangouts.html [14:12:58] https://developers.google.com/google-apps/calendar/v3/reference/events/insert [14:13:27] example section of the last link has Python/Java/PHP/Ruby code [14:14:00] CommanderData can have his own gmail, and he can invite us each day through the Calendar [14:16:01] Stackoverflow says the exact same thing http://stackoverflow.com/questions/18010875/starting-a-hangout-with-a-list-of-users-invited [14:16:19] they consider it a hack because it doesn't specifically belong just the Hangouts API but requires Google Calendar as well [14:16:25] but in the end, it is actually possible to do it [14:18:21] 17:09 < milimetric> that hangout api is javascript. You guys are too open source-minded for this problem. It's really just as easy as calling and demanding support [14:18:41] milimetric: ^^ you were right about that one. but the last two links are a different approach, and they apparently work [14:19:24] i'm confused average, this is definitely a new bug [14:19:35] a bug ? why ? [14:19:35] In the interest of efficiency I only check my email for that on a Friday [14:19:43] because hangouts always worked in the past [14:19:53] and you didn't need to invite people via the calendar to make them work [14:20:04] yes but G always rolls out new versions [14:20:53] this bug of transient hangout urls may actually be a feature in one of their Changelogs [14:21:36] i looked, couldn't find anything [14:21:43] I emailed techsupport, they should get back to us [14:21:53] It seems they indeed updated the Hangout stuff. Today when I created the Bat Cave, some icons were new. And it showed a new "app" as well. [14:23:28] yeah, either we're missing a new feature somewhere, or the google apps administrator has to enable / configure something, or it's a bug [14:23:30] mK},a [14:23:33] :) [14:23:43] qchris has gone full base64 encoded [14:23:47] sorry, wrong window :-) [14:27:59] i'm going to breakfast with a friend, I'll be back later and a little spotty today. If anyone needs me, call my cell (it's on the staff contact page) [14:54:55] milimetric, drdee: Once you come back online, let's try if the hangout at http://goo.gl/1pm5JI works for you. [15:36:56] Snaps_: lemme know if you are around and want to work on that varnishkafka issue [15:37:19] qchris, wanna check out camus/hive stuff? [15:37:29] Sure :-) [15:37:38] great, I *think* we can do this via IRC :p [15:37:50] Ok :-) [15:37:50] lemme just remember what to do [15:37:52] :) [15:38:15] first let's get you up to speed with camus in labs [15:38:26] making varnishkafka requests and consuming from kafka [15:38:32] Ok. [15:39:15] Which machine should I connect to? [15:39:26] kafka-main1 [15:39:34] ottomata: the failover bits? [15:39:37] ya [15:40:05] ottomata: I have that in a working branch, needs some more testing, not going to be able to do anything on it tonight I think... [15:40:24] ok cool [15:40:25] no probs [15:40:35] do you know what was wrong, Snaps_? [15:40:59] qchris: google short url works [15:41:00] ok qchris, let's get you working with kafka there [15:41:11] drdee_: Yippie. [15:41:14] there are two kafka brokers running: kafka-main1 and kafka-main2 [15:41:20] varnishkafka is running on kafka-main1 [15:41:27] you can curl it and it will produce to those brokers [15:41:38] curl which url? [15:41:41] ottomata: It wont move sent messages (but not acked) from one broker to another. So any messages in-flight during a failover will need the failing broker to come up again before being transmitted. Add retry and timeouts to that and those messages are probably lost. [15:41:42] there's a script in my home dir to do this in a loop: /home/otto/curl_varnish.sh [15:42:05] Ok. I copied that. [15:42:07] iiinteresting [15:42:08] ok cool [15:42:10] yeah so [15:42:18] also, maybe add this to your .bashrc or whatever: [15:42:21] test -f /etc/kafka/server.properties && export ZOOKEEPER_URL=$(grep zookeeper.connect= /etc/kafka/server.properties | cut -d '=' -f 2) [15:42:43] its a shortcut to keep you from having to provide the —zookeeper on kafka commands [15:42:51] if ZOOKEEPER_URL is set [15:42:52] run [15:42:54] Cool. [15:43:09] kafka console-consumer --topic varnish --group qchris0 [15:43:14] group is arbitrary [15:43:26] that's just the id that will be used to save your consumed high water mark [15:43:37] so if you ever want to start from the end of the log, rather than where you last left of [15:43:39] off [15:43:41] just use a new group [15:43:44] so, if you run the consumer [15:43:46] and then curl varnish [15:43:54] you should see output, ja? [15:44:20] yes. [15:44:33] awesome, ok cool [15:44:50] so I usually leave a consumer running while I test things, just so I can make sure the data is at least getting that far [15:44:54] now for camus [15:44:55] uhhhh [15:45:06] Yes, sounds good. [15:45:12] you can run camus from anywhere, but i've been submitting camus jobs to hadoop from the kraken-namenode-standby instance [15:45:36] Ok. I'll prepare ssh setup for that machine. [15:45:40] k [15:46:31] Ok. I am on kraken-namenode-standby. [15:47:07] ok cool, go ahead and clone camus [15:47:09] git@github.com:wikimedia/camus.git [15:47:45] you'll want the wikimedia branch [15:48:53] ah hmm, i need to commit this JsonStringMessageDecoder [15:49:09] I am on branch wikimedia [15:49:24] ok i think this is where I'm going to need help with the proper way to do things, i'm going to go ahead and commit what I have and show you how I do it [15:49:37] Sure. [15:50:17] ok pull again on wikimedia branch [15:50:18] i just pushed [15:50:35] you can see my commit here [15:50:35] https://github.com/wikimedia/camus/commit/cf8e2d9540facced14adbbafefc52660f9271028 [15:50:42] Yes. [15:50:52] Done. [15:50:55] i think i'm doing that wrong, beacuse I've added the gson dependancy in the camus-example.pom [15:51:45] Where do you think it should go? [15:51:57] i'm trying to remember/check if this is true [15:52:08] but the example target is the only one that builds the shaded jar [15:52:15] Yes. It's true :-) You added it to camus-example/pom.xml [15:52:19] so its the only one I could get to actually work [15:52:25] when submitting jobs to hadoop [15:52:29] i'm not sure what is best to do there [15:52:33] Ok. We'll figure that out. [15:52:37] yeah [15:52:38] ottomata: is this bug affecting your tests at this point? [15:52:41] But it allows to build a working jar? [15:52:52] (Last time it did IIRC) [15:52:55] yes……. i think so qchris, although it has been about a month since I tried, yeah [15:53:06] Snaps_: ja, i mean [15:53:13] i'm just testing failover [15:53:16] and I see this problem [15:53:27] with a single varnishkafka producer [15:53:35] producing to a 2 broker cluster, [15:53:45] the topic has one partition and 2 replicas [15:53:51] if I shutdown the leader for the toppar [15:54:15] i miss about a seconds worth of data, which on whichever production varnish instance i'm testing on is about 4K messages [15:54:26] mm, ok, gotcha. [15:54:29] is it blocking you? [15:55:04] i mean, right now i'm just testing stuff, so um, no/yes? if you know the problem and are working on it, i will wait for your fix [15:55:13] before I continue testing [15:55:52] okay, cool [15:55:55] I'll let you know [15:55:58] k danke [15:56:36] ottomata: I am running mvn package in the checkout. [15:56:40] ok [15:56:46] ottomata: Or do I need a different directory? [15:56:48] Ok. [15:57:27] naw in the main directory is good [15:57:43] Could not resolve dependencies for project com.linkedin.camus:camus-example:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT [15:57:57] Do I need to add wmf nexus? [15:58:04] hm [15:58:14] like in your ~/.m2? [15:58:20] Yes. [15:58:24] yeah i have it [15:58:34] also, I noticed that b uilding in labs is SLLLOWWWWW [15:58:40] i've always built locally and rsynced [15:58:45] but maybe it will work better for you [15:58:51] or maybe if you don't build in your /home dir [15:58:53] because /home is nfs [15:59:23] ok [16:03:16] ok while you're waiting [16:03:23] check out /home/otto/camus.wmf.properties [16:03:35] I just edited it to make it up to dayte [16:03:50] I am having compilation problems locally: [16:04:06] And on kraken-namenode-standby as well. [16:04:10] sup? [16:04:22] Could not resolve dependencies for project com.linkedin.camus:camus-example:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT, com.linkedin.camus:camus-schema-registry:jar:0.1.0-SNAPSHOT, com.linkedin.camus:camus-etl-kafka:jar:0.1.0-SNAPSHOT: Could not find artifact com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT in nexus (http://nexus.wmflabs.org/nexus/content/groups/public) [16:04:30] ^ that's local [16:04:34] hm [16:04:39] Failed to read artifact descriptor for org.apache.maven.plugins:maven-jar-plugin:jar:2.2: Could not transfer artifact org.apache.maven.plugins:maven-jar-plugin:pom:2.2 from/to nexus (http://nexus.wmflabs.org/nexus/content/groups/public): Connection timed out [16:04:48] ^ thats on kraken-namenode-standby [16:05:05] that first one seems weird, the deps shoudl be compiled as the first part of mvn package [16:05:32] gonna try locally [16:05:35] with a fresh clone [16:06:16] Maybe you already have the required jar in your local maven cache? [16:07:30] i got a differnt compile error, since gson wasn't a camus-etl-kafka dep [16:07:33] hm [16:08:22] hmmm [16:08:35] I have stashed locally a version that worked [16:08:44] I can compare that later to what is upstream. [16:08:52] ok i'm committing a change to poms [16:08:58] i think the gson dep should be in the etl target, not example [16:08:59] Do you have a built jar that we can continue working on for now? [16:09:02] since the Json thing is there [16:09:27] yes. hm [16:09:39] ok but you rlocal build error doesn't make any sense [16:09:41] i think we should fix that [16:09:57] it says example doesn't have the dependencies it needs [16:10:05] but those deps should be built by maven as this project [16:10:07] right? [16:10:16] (go ahead and pull again, but I do'nt think this will solve your problem) [16:12:23] ...compiling... [16:13:45] Looks like it will go through locally [16:14:09] Which jar should I upload to kraken-namenode-standby? [16:14:33] camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar ? [16:15:46] ja [16:16:03] ETA: 3minutes ... :-( [16:16:06] haha [16:16:06] really? [16:16:18] yes. [16:18:15] so yeah, check out /home/otto/camus.wmf.properties [16:18:23] (I'm verifying that this actually works…:) ) [16:19:56] Mhmm. Mostly comments :-) [16:20:57] haha yeah [16:21:57] grep to the rescue :-) I briefly read over it. [16:22:09] The classes you set are the relevant parts I guess? [16:22:19] yeah classes, and the kafka settings [16:22:25] particularly the kafka topic [16:22:33] varnishkafka is producing to hte topic varnish [16:22:42] hmm, looks like the timezone setting isn't working, but the import worked [16:22:45] That's the wihtelisted one. [16:22:49] ja exactly [16:22:56] i mean, we could configure camus to just eat everything [16:23:02] but we'll figure out whatever we want eventually [16:23:09] maybe some prefixed topic names or something [16:23:12] if we want them into hadoop [16:23:16] or just hardcoded for each one [16:23:17] whataavaaaa [16:23:20] :-D [16:23:26] hmmm timezone didnt' work, looking at that [16:23:30] let me know when you are ready to run the jar [16:23:37] and we'll do that and then also look at hive [16:23:51] oh also relevant in the .propertiies [16:23:51] Jar is uploaded. [16:23:56] i added the hadoop command that I run [16:23:59] because I always forgot it [16:24:06] it sin a comment there nearish the top [16:24:14] you can copy/paste that and provide the path to your .jar and your .properties file [16:24:44] hmm, oh i htink the time ddint' work beacuse the time format is wrong, since we are now using a different one! [16:26:09] Permission denied: user=qchris, access=WRITE, inode="/wmf/raw/webrequest/test/camus":otto:project-analytics:drwxr-xr-x [16:26:45] oh ha [16:26:59] Should I create separate directories or should I try to get access to those? [16:27:06] ummm, just get access to them [16:27:06] so [16:27:07] in hadoop [16:27:10] hdfs is hte super user [16:27:13] so you can chmod files with hdfs [16:27:26] But I am no superuser :-/ [16:27:30] you are in labs! [16:27:35] Oh! [16:27:37] :-D [16:27:37] sudo -u hdfs hadoop fs -chown... [16:28:17] actually we shoudl jsut chgrp them project-analytics [16:28:19] so [16:29:02] sudo -u hdfs hadoop fs -chown -R hdfs:project-analytics /wmf [16:29:02] sudo -u hdfs hadoop fs -chmod -R g+w /wmf [16:29:41] i'm going to run this again to check that it works ok [16:29:52] I just ran them. [16:30:34] ok great [16:30:38] The CamusJob semms to have passed. [16:30:57] The consumer is spitting out many things now. [16:31:28] yeah i'm curling :) [16:31:35] Spammer :-P [16:31:45] hmm, i'm not sure why it is importing into the wrong hour [16:31:45] The third party API is not responding [16:31:46] bah, anyway [16:31:49] we'll figure that out later [16:32:06] I added one more thing to camus.wmf.properties that I thought would have fixed it [16:32:25] camus.message.timestamp.format=yyyy-MM-ddTHH:mm:ss [16:32:41] I just diffed them. [16:32:43] i've had this working before… hm [16:32:47] anway, let's work on that later [16:32:54] Yes. Totally. [16:33:00] we can deal with incorrect directories for a min [16:33:03] ok so hive [16:33:15] this is where I only half know what i'm doing :) [16:33:30] so, if you just run [16:33:33] hive [16:33:37] you should get a hive prompt [16:33:51] Yes. [16:33:55] show tables; [16:33:56] I can just load the data? [16:34:34] describe webrequest_test0; gave me [16:34:38] here's a couple of useful posts [16:34:39] http://blog.cloudera.com/blog/2013/03/how-to-analyze-twitter-data-with-hue/ [16:34:43] http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ [16:34:46] FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe com.cloudera.hive.serde.JSONSerDe does not exist) [16:35:02] yeah [16:35:03] run this [16:35:15] ADD JAR /home/otto/hive-serdes-1.0-SNAPSHOT.jar [16:35:32] Working. [16:35:35] Thanks. [16:35:42] let's look at webrequest_test1 [16:35:56] do [16:36:00] show create table webrequest_test1; [16:37:31] i think we might want to create a new table for our stuff [16:37:39] Ja. Read through it. [16:37:49] Yes. [16:38:08] so ok, yeah we'll have to do a lot of research and experimenting here [16:38:12] this is where I need the most help [16:38:18] here's what I've got so far [16:38:29] the data we are importing is just the raw json data [16:38:42] we want to be able to query it with hive for convenience [16:38:48] Sure. [16:38:52] but this data will probably not be the main data we run analysis on [16:39:09] we will have some other ETL phase (in hive or pig, who knows), that will create more robust hive tables [16:39:21] it will actually insert this data into a table location elsewhere [16:39:31] but, we'll thikn about the ETL bit later [16:39:43] for now, we will just see if we can figure out how to create what is called a hive 'external' table on this data [16:40:12] an external table is where the hive schema uses existing data, outside of the hive datadir, rather than internally managed hive data [16:40:12] Ok. I'll toy around with that. [16:40:17] the most confusing thing is partitions [16:40:32] partitions are set up so that if you know you only need to look at part of the data [16:40:39] say for a particular date range [16:40:52] hive will only launch MR jobs that work on particulary directories [16:41:08] since this is an external table [16:41:13] partitions have to be manually created [16:41:24] Ja, I read that part of the docs. I liked that part :-) [16:41:34] basically, every time we import a new hour (create a new directory), we'll need to add a partition for that hour [16:41:43] Yes, I think so too. [16:41:44] since our imports are year/month/day/hour [16:41:50] i added partitions for each of those buckets [16:42:01] But that looks like it can easily be scripted. [16:42:11] yeah i think so [16:42:22] either in oozie, manually, who knows maybe even in camus [16:42:48] Mayde even Drake :-) [16:43:04] I'll test a few things. Ok. [16:44:11] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions [16:44:15] yeah maybe drake! [16:44:45] ok, so now I think you know everything that I know :) [16:44:54] i will see if I can fix this timezone/timestamp issue [16:44:55] Hahaha :-) [16:45:04] Thanks for this introduction. [16:45:43] i think if we can get to the point where we can: [16:45:43] 1. run camus [16:45:44] 2. add partition (manually) [16:45:44] 3. run hive queries [16:45:53] we will be pretty good for a first step [16:46:00] once we're there, we can think about how to automate it [16:46:09] and then figure out the ETL phase with other table [16:46:22] Ok. Sounds good. [16:47:16] Just to make sure, since this is labs ... [16:47:26] how careful do I have to be about not getting in your way? [16:47:36] We're working on the same cluster, with the same configs? [16:49:29] not at all [16:49:31] just do whatever [16:49:34] i'm actually mostly working in kafka [16:49:37] Ok. Cool. [16:49:46] the worst would be kafk doesn't work :0 [16:50:04] if that's becomes a problem I can set up a seperate kafka cluster for me [16:50:06] Where does the hive-serdes come from. Is this a custom built jar, or some downloaded one? [16:50:11] downloaded [16:50:14] Ok. [16:50:27] No need to setup a separate cluster. Just scream at me :-) [16:51:01] http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar [16:51:12] link found here [16:51:13] https://github.com/romainr/cdh-twitter-example#setting-up-hive [16:51:18] Ok. Thanks. [16:54:46] yeah, these are the types of things i've been talking about when i've been saying i don't know the best way to deploy and run this stuff [16:55:05] i can make it work, but i'm not sure how to best productionize it [16:55:49] * drdee_ SCREAMS AT qchris [16:58:01] * qchris does not listen drdee_ :-) [16:58:01] Whats up drdee_? [16:59:00]