[13:44:06] morning fella's [13:47:43] morning drdee [13:50:59] MORNIGN! [13:51:00] so [13:51:07] did you fix the limn link bug? [13:51:24] (i got confused) [13:51:53] qchris…..the bat cave link is broken [13:52:00] this is really really annoying [13:52:10] Hi drdee_ [13:52:16] The one that I sent out today? [13:52:30] yup [13:52:43] (at least for me) [13:53:13] Harrr. That's annoying :-( [13:53:20] So gogle is sabotaging us. [13:54:26] Yes, it now says "Party is over" for me as well. [13:54:49] and soon the party is over for google hangouts in general [13:54:51] Mhmmm. How did you create the previous hangout? [13:55:06] just create it :) [13:55:21] i think at least 1 person needs to stay in the hangout [13:55:29] to make sure it does not expire [13:55:31] ? [13:55:36] :-D [13:55:40] if everybody leaves the link will expire [13:55:46] (hypothesis0 [13:56:12] I tried that before posting the link [13:56:17] I created the hangout. [13:56:23] Went in. [13:56:25] Went out. [13:56:30] Waited for 30 minutes. [13:56:31] Idea for a solution: Using Google Hangout API to automatically, each day create a new url, and have a static URL(for our team) that points to that new dynamic URL. For further information https://developers.google.com/+/hangouts/api/ [13:56:33] Went in again [13:56:46] average: that's not a bad idea [13:56:49] :) [13:56:52] we can do it commanderdata [13:57:02] and ask him to give the link [13:57:09] yes, theoretically it would be possible [13:57:37] milimietric; qchris: ^^ [13:58:16] If a static hangout Url is not possible, that sounds like the next best thing. [14:01:15] is there a python lib for the api? [14:01:35] milimetric: is the limn link bug fixed? [14:01:40] yes [14:01:42] you are really keeping me in suspense [14:01:42] but not deployed [14:01:45] k [14:01:46] because the deployer is dead [14:01:52] file a new bug :D [14:01:56] heh [14:02:04] easy/hard? [14:02:32] no idea [14:02:44] it's some ssh bug [14:02:55] * milimetric knows next to nothing about ssh [14:03:31] guys I'm looking at alternatives but it's crystal clear why Google Hangouts is popular [14:03:32] I haven't had any experience with that before [14:03:44] everything else looks like ... frankly, shit [14:05:24] I feel CommanderData is a cyborg [14:05:35] it's comments are too many times spot on [14:05:59] can ottomata help you? [14:06:04] (with the ssl bug) [14:06:11] i mean ssh [14:06:27] ok i've to pick up a car, should be back in 60 minutes [14:06:42] if somebody finds a google lib for the hangout api please email it to me [14:09:01] actually i don't think we need a lib for this [14:09:02] that hangout api is javascript. You guys are too open source-minded for this problem. It's really just as easy as calling and demanding support [14:09:13] we can just use the requests lib [14:09:17] you do that milmietric :) [14:09:22] see ya in a bit [14:09:26] k, later [14:12:35] and here is an answer. this person sets his google calendar to automatically create hangouts for his google calendar events, and then he is able to add attendees to that event [14:12:45] in our case the event is the daily meeting [14:12:53] http://www.riskcompletefailure.com/2012/11/programmatically-scheduling-hangouts.html [14:12:58] https://developers.google.com/google-apps/calendar/v3/reference/events/insert [14:13:27] example section of the last link has Python/Java/PHP/Ruby code [14:14:00] CommanderData can have his own gmail, and he can invite us each day through the Calendar [14:16:01] Stackoverflow says the exact same thing http://stackoverflow.com/questions/18010875/starting-a-hangout-with-a-list-of-users-invited [14:16:19] they consider it a hack because it doesn't specifically belong just the Hangouts API but requires Google Calendar as well [14:16:25] but in the end, it is actually possible to do it [14:18:21] 17:09 < milimetric> that hangout api is javascript. You guys are too open source-minded for this problem. It's really just as easy as calling and demanding support [14:18:41] milimetric: ^^ you were right about that one. but the last two links are a different approach, and they apparently work [14:19:24] i'm confused average, this is definitely a new bug [14:19:35] a bug ? why ? [14:19:35] In the interest of efficiency I only check my email for that on a Friday [14:19:43] because hangouts always worked in the past [14:19:53] and you didn't need to invite people via the calendar to make them work [14:20:04] yes but G always rolls out new versions [14:20:53] this bug of transient hangout urls may actually be a feature in one of their Changelogs [14:21:36] i looked, couldn't find anything [14:21:43] I emailed techsupport, they should get back to us [14:21:53] It seems they indeed updated the Hangout stuff. Today when I created the Bat Cave, some icons were new. And it showed a new "app" as well. [14:23:28] yeah, either we're missing a new feature somewhere, or the google apps administrator has to enable / configure something, or it's a bug [14:23:30] mK},a [14:23:33] :) [14:23:43] qchris has gone full base64 encoded [14:23:47] sorry, wrong window :-) [14:27:59] i'm going to breakfast with a friend, I'll be back later and a little spotty today. If anyone needs me, call my cell (it's on the staff contact page) [14:54:55] milimetric, drdee: Once you come back online, let's try if the hangout at http://goo.gl/1pm5JI works for you. [15:36:56] Snaps_: lemme know if you are around and want to work on that varnishkafka issue [15:37:19] qchris, wanna check out camus/hive stuff? [15:37:29] Sure :-) [15:37:38] great, I *think* we can do this via IRC :p [15:37:50] Ok :-) [15:37:50] lemme just remember what to do [15:37:52] :) [15:38:15] first let's get you up to speed with camus in labs [15:38:26] making varnishkafka requests and consuming from kafka [15:38:32] Ok. [15:39:15] Which machine should I connect to? [15:39:26] kafka-main1 [15:39:34] ottomata: the failover bits? [15:39:37] ya [15:40:05] ottomata: I have that in a working branch, needs some more testing, not going to be able to do anything on it tonight I think... [15:40:24] ok cool [15:40:25] no probs [15:40:35] do you know what was wrong, Snaps_? [15:40:59] qchris: google short url works [15:41:00] ok qchris, let's get you working with kafka there [15:41:11] drdee_: Yippie. [15:41:14] there are two kafka brokers running: kafka-main1 and kafka-main2 [15:41:20] varnishkafka is running on kafka-main1 [15:41:27] you can curl it and it will produce to those brokers [15:41:38] curl which url? [15:41:41] ottomata: It wont move sent messages (but not acked) from one broker to another. So any messages in-flight during a failover will need the failing broker to come up again before being transmitted. Add retry and timeouts to that and those messages are probably lost. [15:41:42] there's a script in my home dir to do this in a loop: /home/otto/curl_varnish.sh [15:42:05] Ok. I copied that. [15:42:07] iiinteresting [15:42:08] ok cool [15:42:10] yeah so [15:42:18] also, maybe add this to your .bashrc or whatever: [15:42:21] test -f /etc/kafka/server.properties && export ZOOKEEPER_URL=$(grep zookeeper.connect= /etc/kafka/server.properties | cut -d '=' -f 2) [15:42:43] its a shortcut to keep you from having to provide the —zookeeper on kafka commands [15:42:51] if ZOOKEEPER_URL is set [15:42:52] run [15:42:54] Cool. [15:43:09] kafka console-consumer --topic varnish --group qchris0 [15:43:14] group is arbitrary [15:43:26] that's just the id that will be used to save your consumed high water mark [15:43:37] so if you ever want to start from the end of the log, rather than where you last left of [15:43:39] off [15:43:41] just use a new group [15:43:44] so, if you run the consumer [15:43:46] and then curl varnish [15:43:54] you should see output, ja? [15:44:20] yes. [15:44:33] awesome, ok cool [15:44:50] so I usually leave a consumer running while I test things, just so I can make sure the data is at least getting that far [15:44:54] now for camus [15:44:55] uhhhh [15:45:06] Yes, sounds good. [15:45:12] you can run camus from anywhere, but i've been submitting camus jobs to hadoop from the kraken-namenode-standby instance [15:45:36] Ok. I'll prepare ssh setup for that machine. [15:45:40] k [15:46:31] Ok. I am on kraken-namenode-standby. [15:47:07] ok cool, go ahead and clone camus [15:47:09] git@github.com:wikimedia/camus.git [15:47:45] you'll want the wikimedia branch [15:48:53] ah hmm, i need to commit this JsonStringMessageDecoder [15:49:09] I am on branch wikimedia [15:49:24] ok i think this is where I'm going to need help with the proper way to do things, i'm going to go ahead and commit what I have and show you how I do it [15:49:37] Sure. [15:50:17] ok pull again on wikimedia branch [15:50:18] i just pushed [15:50:35] you can see my commit here [15:50:35] https://github.com/wikimedia/camus/commit/cf8e2d9540facced14adbbafefc52660f9271028 [15:50:42] Yes. [15:50:52] Done. [15:50:55] i think i'm doing that wrong, beacuse I've added the gson dependancy in the camus-example.pom [15:51:45] Where do you think it should go? [15:51:57] i'm trying to remember/check if this is true [15:52:08] but the example target is the only one that builds the shaded jar [15:52:15] Yes. It's true :-) You added it to camus-example/pom.xml [15:52:19] so its the only one I could get to actually work [15:52:25] when submitting jobs to hadoop [15:52:29] i'm not sure what is best to do there [15:52:33] Ok. We'll figure that out. [15:52:37] yeah [15:52:38] ottomata: is this bug affecting your tests at this point? [15:52:41] But it allows to build a working jar? [15:52:52] (Last time it did IIRC) [15:52:55] yes……. i think so qchris, although it has been about a month since I tried, yeah [15:53:06] Snaps_: ja, i mean [15:53:13] i'm just testing failover [15:53:16] and I see this problem [15:53:27] with a single varnishkafka producer [15:53:35] producing to a 2 broker cluster, [15:53:45] the topic has one partition and 2 replicas [15:53:51] if I shutdown the leader for the toppar [15:54:15] i miss about a seconds worth of data, which on whichever production varnish instance i'm testing on is about 4K messages [15:54:26] mm, ok, gotcha. [15:54:29] is it blocking you? [15:55:04] i mean, right now i'm just testing stuff, so um, no/yes? if you know the problem and are working on it, i will wait for your fix [15:55:13] before I continue testing [15:55:52] okay, cool [15:55:55] I'll let you know [15:55:58] k danke [15:56:36] ottomata: I am running mvn package in the checkout. [15:56:40] ok [15:56:46] ottomata: Or do I need a different directory? [15:56:48] Ok. [15:57:27] naw in the main directory is good [15:57:43] Could not resolve dependencies for project com.linkedin.camus:camus-example:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT [15:57:57] Do I need to add wmf nexus? [15:58:04] hm [15:58:14] like in your ~/.m2? [15:58:20] Yes. [15:58:24] yeah i have it [15:58:34] also, I noticed that b uilding in labs is SLLLOWWWWW [15:58:40] i've always built locally and rsynced [15:58:45] but maybe it will work better for you [15:58:51] or maybe if you don't build in your /home dir [15:58:53] because /home is nfs [15:59:23] ok [16:03:16] ok while you're waiting [16:03:23] check out /home/otto/camus.wmf.properties [16:03:35] I just edited it to make it up to dayte [16:03:50] I am having compilation problems locally: [16:04:06] And on kraken-namenode-standby as well. [16:04:10] sup? [16:04:22] Could not resolve dependencies for project com.linkedin.camus:camus-example:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT, com.linkedin.camus:camus-schema-registry:jar:0.1.0-SNAPSHOT, com.linkedin.camus:camus-etl-kafka:jar:0.1.0-SNAPSHOT: Could not find artifact com.linkedin.camus:camus-api:jar:0.1.0-SNAPSHOT in nexus (http://nexus.wmflabs.org/nexus/content/groups/public) [16:04:30] ^ that's local [16:04:34] hm [16:04:39] Failed to read artifact descriptor for org.apache.maven.plugins:maven-jar-plugin:jar:2.2: Could not transfer artifact org.apache.maven.plugins:maven-jar-plugin:pom:2.2 from/to nexus (http://nexus.wmflabs.org/nexus/content/groups/public): Connection timed out [16:04:48] ^ thats on kraken-namenode-standby [16:05:05] that first one seems weird, the deps shoudl be compiled as the first part of mvn package [16:05:32] gonna try locally [16:05:35] with a fresh clone [16:06:16] Maybe you already have the required jar in your local maven cache? [16:07:30] i got a differnt compile error, since gson wasn't a camus-etl-kafka dep [16:07:33] hm [16:08:22] hmmm [16:08:35] I have stashed locally a version that worked [16:08:44] I can compare that later to what is upstream. [16:08:52] ok i'm committing a change to poms [16:08:58] i think the gson dep should be in the etl target, not example [16:08:59] Do you have a built jar that we can continue working on for now? [16:09:02] since the Json thing is there [16:09:27] yes. hm [16:09:39] ok but you rlocal build error doesn't make any sense [16:09:41] i think we should fix that [16:09:57] it says example doesn't have the dependencies it needs [16:10:05] but those deps should be built by maven as this project [16:10:07] right? [16:10:16] (go ahead and pull again, but I do'nt think this will solve your problem) [16:12:23] ...compiling... [16:13:45] Looks like it will go through locally [16:14:09] Which jar should I upload to kraken-namenode-standby? [16:14:33] camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar ? [16:15:46] ja [16:16:03] ETA: 3minutes ... :-( [16:16:06] haha [16:16:06] really? [16:16:18] yes. [16:18:15] so yeah, check out /home/otto/camus.wmf.properties [16:18:23] (I'm verifying that this actually works…:) ) [16:19:56] Mhmm. Mostly comments :-) [16:20:57] haha yeah [16:21:57] grep to the rescue :-) I briefly read over it. [16:22:09] The classes you set are the relevant parts I guess? [16:22:19] yeah classes, and the kafka settings [16:22:25] particularly the kafka topic [16:22:33] varnishkafka is producing to hte topic varnish [16:22:42] hmm, looks like the timezone setting isn't working, but the import worked [16:22:45] That's the wihtelisted one. [16:22:49] ja exactly [16:22:56] i mean, we could configure camus to just eat everything [16:23:02] but we'll figure out whatever we want eventually [16:23:09] maybe some prefixed topic names or something [16:23:12] if we want them into hadoop [16:23:16] or just hardcoded for each one [16:23:17] whataavaaaa [16:23:20] :-D [16:23:26] hmmm timezone didnt' work, looking at that [16:23:30] let me know when you are ready to run the jar [16:23:37] and we'll do that and then also look at hive [16:23:51] oh also relevant in the .propertiies [16:23:51] Jar is uploaded. [16:23:56] i added the hadoop command that I run [16:23:59] because I always forgot it [16:24:06] it sin a comment there nearish the top [16:24:14] you can copy/paste that and provide the path to your .jar and your .properties file [16:24:44] hmm, oh i htink the time ddint' work beacuse the time format is wrong, since we are now using a different one! [16:26:09] Permission denied: user=qchris, access=WRITE, inode="/wmf/raw/webrequest/test/camus":otto:project-analytics:drwxr-xr-x [16:26:45] oh ha [16:26:59] Should I create separate directories or should I try to get access to those? [16:27:06] ummm, just get access to them [16:27:06] so [16:27:07] in hadoop [16:27:10] hdfs is hte super user [16:27:13] so you can chmod files with hdfs [16:27:26] But I am no superuser :-/ [16:27:30] you are in labs! [16:27:35] Oh! [16:27:37] :-D [16:27:37] sudo -u hdfs hadoop fs -chown... [16:28:17] actually we shoudl jsut chgrp them project-analytics [16:28:19] so [16:29:02] sudo -u hdfs hadoop fs -chown -R hdfs:project-analytics /wmf [16:29:02] sudo -u hdfs hadoop fs -chmod -R g+w /wmf [16:29:41] i'm going to run this again to check that it works ok [16:29:52] I just ran them. [16:30:34] ok great [16:30:38] The CamusJob semms to have passed. [16:30:57] The consumer is spitting out many things now. [16:31:28] yeah i'm curling :) [16:31:35] Spammer :-P [16:31:45] hmm, i'm not sure why it is importing into the wrong hour [16:31:45] The third party API is not responding [16:31:46] bah, anyway [16:31:49] we'll figure that out later [16:32:06] I added one more thing to camus.wmf.properties that I thought would have fixed it [16:32:25] camus.message.timestamp.format=yyyy-MM-ddTHH:mm:ss [16:32:41] I just diffed them. [16:32:43] i've had this working before… hm [16:32:47] anway, let's work on that later [16:32:54] Yes. Totally. [16:33:00] we can deal with incorrect directories for a min [16:33:03] ok so hive [16:33:15] this is where I only half know what i'm doing :) [16:33:30] so, if you just run [16:33:33] hive [16:33:37] you should get a hive prompt [16:33:51] Yes. [16:33:55] show tables; [16:33:56] I can just load the data? [16:34:34] describe webrequest_test0; gave me [16:34:38] here's a couple of useful posts [16:34:39] http://blog.cloudera.com/blog/2013/03/how-to-analyze-twitter-data-with-hue/ [16:34:43] http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ [16:34:46] FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe com.cloudera.hive.serde.JSONSerDe does not exist) [16:35:02] yeah [16:35:03] run this [16:35:15] ADD JAR /home/otto/hive-serdes-1.0-SNAPSHOT.jar [16:35:32] Working. [16:35:35] Thanks. [16:35:42] let's look at webrequest_test1 [16:35:56] do [16:36:00] show create table webrequest_test1; [16:37:31] i think we might want to create a new table for our stuff [16:37:39] Ja. Read through it. [16:37:49] Yes. [16:38:08] so ok, yeah we'll have to do a lot of research and experimenting here [16:38:12] this is where I need the most help [16:38:18] here's what I've got so far [16:38:29] the data we are importing is just the raw json data [16:38:42] we want to be able to query it with hive for convenience [16:38:48] Sure. [16:38:52] but this data will probably not be the main data we run analysis on [16:39:09] we will have some other ETL phase (in hive or pig, who knows), that will create more robust hive tables [16:39:21] it will actually insert this data into a table location elsewhere [16:39:31] but, we'll thikn about the ETL bit later [16:39:43] for now, we will just see if we can figure out how to create what is called a hive 'external' table on this data [16:40:12] an external table is where the hive schema uses existing data, outside of the hive datadir, rather than internally managed hive data [16:40:12] Ok. I'll toy around with that. [16:40:17] the most confusing thing is partitions [16:40:32] partitions are set up so that if you know you only need to look at part of the data [16:40:39] say for a particular date range [16:40:52] hive will only launch MR jobs that work on particulary directories [16:41:08] since this is an external table [16:41:13] partitions have to be manually created [16:41:24] Ja, I read that part of the docs. I liked that part :-) [16:41:34] basically, every time we import a new hour (create a new directory), we'll need to add a partition for that hour [16:41:43] Yes, I think so too. [16:41:44] since our imports are year/month/day/hour [16:41:50] i added partitions for each of those buckets [16:42:01] But that looks like it can easily be scripted. [16:42:11] yeah i think so [16:42:22] either in oozie, manually, who knows maybe even in camus [16:42:48] Mayde even Drake :-) [16:43:04] I'll test a few things. Ok. [16:44:11] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions [16:44:15] yeah maybe drake! [16:44:45] ok, so now I think you know everything that I know :) [16:44:54] i will see if I can fix this timezone/timestamp issue [16:44:55] Hahaha :-) [16:45:04] Thanks for this introduction. [16:45:43] i think if we can get to the point where we can: [16:45:43] 1. run camus [16:45:44] 2. add partition (manually) [16:45:44] 3. run hive queries [16:45:53] we will be pretty good for a first step [16:46:00] once we're there, we can think about how to automate it [16:46:09] and then figure out the ETL phase with other table [16:46:22] Ok. Sounds good. [16:47:16] Just to make sure, since this is labs ... [16:47:26] how careful do I have to be about not getting in your way? [16:47:36] We're working on the same cluster, with the same configs? [16:49:29] not at all [16:49:31] just do whatever [16:49:34] i'm actually mostly working in kafka [16:49:37] Ok. Cool. [16:49:46] the worst would be kafk doesn't work :0 [16:50:04] if that's becomes a problem I can set up a seperate kafka cluster for me [16:50:06] Where does the hive-serdes come from. Is this a custom built jar, or some downloaded one? [16:50:11] downloaded [16:50:14] Ok. [16:50:27] No need to setup a separate cluster. Just scream at me :-) [16:51:01] http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar [16:51:12] link found here [16:51:13] https://github.com/romainr/cdh-twitter-example#setting-up-hive [16:51:18] Ok. Thanks. [16:54:46] yeah, these are the types of things i've been talking about when i've been saying i don't know the best way to deploy and run this stuff [16:55:05] i can make it work, but i'm not sure how to best productionize it [16:55:49] * drdee_ SCREAMS AT qchris [16:58:01] * qchris does not listen drdee_ :-) [16:58:01] Whats up drdee_? [16:59:00] just screaming for now apparent reason :D [16:59:17] qchris: rake [16:59:21] drake?@? [16:59:26] YES YES YES [16:59:33] I thought I'd read that. [16:59:34] that's totally the way i believe [16:59:45] I am not sure. [17:00:02] scrum: http://goo.gl/1pm5JI [17:24:11] DarTar: hi [17:24:17] I'm modifying 701 as per our discussions [17:24:19] hey [17:24:26] great [17:25:04] DarTar: have a look please https://mingle.corp.wikimedia.org/projects/analytics/cards/701 [17:26:24] DarTar: I captured the two usecases we talked about [17:26:49] hmm, I think the only one we should support is #3 [17:26:50] DarTar hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/3 [17:27:03] argh sorry CommanderData [17:27:52] DarTar: ok, modifying card now [17:29:23] DarTar: ok modified [17:29:24] case 2 is "threshold" or "milestones" aka #699 [17:29:24] DarTar hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/699 [17:30:21] even though that card is very confusing (it currently lists 4 different metrics, there should be just one) [17:31:07] DarTar: wait, we have to separate stuff, #699 is a different thing, I'm working on #701 right now [17:31:07] average hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/699 [17:31:07] average hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/701 [17:31:26] average: sure [17:32:35] #699 was originally the card capturing UserMetrics threshold metric, but it looks like now it's something different, we should revisit it before you guys start to work on it [17:32:35] DarTar hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/699 [17:35:45] DarTar: sure, as soon as 701 is done, which I'm very confident now will be done today [17:35:51] DarTar: as soon as it's done, we can talk about 699 [17:36:04] sounds good! [17:37:34] average: shouldn't we drop case 1 too? I'm not sure what this does [17:39:26] the only optional case that (I think) we initially wanted was: [T+t, today], i.e. make the s parameter optional [17:39:43] but T and t should be required [17:39:50] does that make sense? [17:40:48] I'll go ahead and edit the card to avoid confusion between T and t0 [17:49:16] (CR) QChris: [V: 2] Add 'Active Editor Totals' [analytics/geowiki] - https://gerrit.wikimedia.org/r/84875 (owner: QChris) [17:56:31] DarTar: thank you [17:56:34] DarTar: I just read the card [17:56:37] re-read [17:56:48] average: does it make sense now? The only cases that this doesn't cover are when (1) the reference timestamp T is different from the user registration or (2) someone wants to measure survival past a specified date [17:57:23] neither (1) nor (2) are known use cases (although Program Evaluation might be interested in them) [17:57:56] ok [19:24:51] milimetric: How does limn choose how to color graphs? [19:25:17] On http://gp.wmflabs.org/graphs/active_editors_total [19:25:25] ah, interesting question [19:25:29] the line is dark blue [19:25:34] yea [19:25:41] and if I add more lines, they are all dark blue :-( [19:25:50] http://gp.wmflabs.org/graphs/global_south_editor_fractions [19:25:50] right [19:25:53] Is orange. [19:26:11] Typically colors are different for different lines. [19:26:30] yes, ok, so let's see what the case is here [19:26:49] the short answer is there are palettes which attempt to pick different colors, and they can be overriden by line-specific options [19:27:05] diff graphs/*{active_editors,fraction}* [19:27:21] (in ~/gp/gp-geowiki-data-repository) does not show such differences [19:27:29] (for user limn on limn0) [19:27:48] (same holds true for datasources) [19:28:21] heh, qchris, these graphs don't make sense [19:28:31] they don't specify a palette or a line-specific color [19:28:43] oh! [19:29:01] so the default is to use the labels and look them up in a global palette [19:29:04] using regexes [19:29:16] so it must be finding the labels or pieces of the labels in there [19:29:18] one sec, lemme see [19:29:26] Oh. We have global palettes? [19:29:35] That would explain lots of things :-D [19:32:16] I found some matches in the limn repo. [19:32:22] Thanks for the hint! [19:34:05] sorry qchris, talking to a few people at once is not easy :) [19:34:42] Don't bother. You already brought me to the right place. Thanks! [19:38:17] ok qchris, when you find it, link me if it's easy to explain. Evan kept changing how he set that up so I'm interested what he settled on [19:38:48] ok. [19:45:26] qchris: i fixed the timestamp thing [19:45:33] we were missing my pull request merged in from upstream [19:45:39] ottomata: Awesome. [19:45:44] also, we needed the ability to set the name of the timestamp json field [19:45:46] so I added that too [19:45:49] :-) [19:45:55] Sounds great. [19:45:58] see the diff of my /home/otto/camus.wmf.properties [19:46:33] I knew I'd find an use for this bottle of vodka in my fridge. Meteorologists announce an extremely cold winter [19:48:08] in all europe [19:48:38] average: you shoudl probably keep it warm to combat t he cold [19:48:47] maybe microwave it [19:53:50] I never tried that, but I might need to do that :) [19:54:08] http://iceagenow.info/2013/09/brutal-winter-europe/ [19:54:16] brutal [20:05:41] [travis-ci] develop/4808915 (#142 by milimetric): The build has errored. http://travis-ci.org/wikimedia/limn/builds/11608614 [20:07:40] my parents were born in the early 50s average [20:07:51] they say the snows covered houess [20:07:52] *houses [20:08:04] and I never thought I'd live to see that, but last year proved me wrong [20:11:13] milimetric: in philadelphia ? [20:13:13] oh no! in Ploiesti [20:13:38] my dad's from Ploiesti and my mom's from Botosani [20:13:48] and I grew up in Bucuresti / Ploiesti [20:40:12] I've been thinking over this for some time, and cannot really make sense of those two graphs: [20:40:14] http://reportcard.wmflabs.org/ [20:40:22] (The one on the very bottom) [20:40:24] and [20:40:32] http://gp.wmflabs.org/graphs/active_editors_total [20:40:43] Shouldn't they match closely? [20:42:03] However the second graph overshoots the "Total" line of the first graph for the first few months [20:42:33] And for recent months, the second graph seems a tad too low :-( [20:47:50] drdee_ ^ [20:49:53] qchris: 1 sec [20:50:32] hey ottomata [20:50:37] do you think you could package https://github.com/mumrah/kafka-python ? [20:51:35] if so, I'll implement an EL writer plugin that uses it to pump data into Hadoop [20:52:09] qchris: these two numbers are different because how they are calculated [20:52:11] if you'd like me to log an RT or Mingle request, let me know [20:52:47] drdee_: You think they are both sound? [20:52:53] http://reportcard.wmflabs.org/ uses the xml dump files to calculate the active editors [20:53:12] while http://gp.wmflabs.org/graphs/active_editors_total uses the mysql db (recent changes table iirc) [20:53:17] ideally they should map [20:53:22] but of course they don't [20:53:45] I agree that they are computed differently, but they are far off. [20:53:51] how far? [20:54:03] And while the second graph shows a trend, the first one does not. [20:54:04] ori-l: cool i think I can do that [20:54:20] Depending on the month, a few thousand active editors. [20:54:23] wow [20:54:31] which chart reports more? [20:54:41] That depends on the month :-( [20:54:44] aaargh [20:54:59] mmmmmmmm this worries me [20:55:00] Recently, the mysql fed graph is lower than the other one [20:55:05] is there a .deb for python snappy? ori-l? [20:55:35] ottomata: yep. 0.3.2-1 is in our repos [20:57:06] qchris: this is probably the single most important metric we have to report [20:57:12] (PS1) Milimetric: fixed validate configuration function [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/85331 [20:57:20] (CR) Milimetric: [C: 2 V: 2] fixed validate configuration function [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/85331 (owner: Milimetric) [20:57:27] how confident are you in accuracy of the query? [20:57:41] drdee_: Totally /not/ confident. [20:58:05] milimetric: spoke with jwild about csv time series format, will create new mingle card [20:58:10] drdee_: I mean we just take the numbers that fall out of geowiki. [20:58:26] great, danke ori-l [20:58:28] dev package too? [20:58:29] right, i am afraid we need to do some due diligence [20:58:33] orr ig ues sits python [20:58:34] nm [20:58:35] :p [20:58:43] yay [20:59:16] we could eventually decide to get fancy and write python bindings for librdkafka, but that'd be just for fun [20:59:19] drdee_: Do we know how good the numbers of http://reportcard.wmflabs.org/ are? [20:59:31] those are pretty battle-proven numbers [20:59:33] drdee_: Do we know how good the cu tables are? [20:59:52] the cu tables themselves are also battle-proven [21:00:17] Ok. Then I'll try to some geowiki review. [21:00:46] there were cu battles? [21:03:15] ori-l, ottomata: mingle card https://mingle.corp.wikimedia.org/projects/analytics/cards/1165 [21:03:35] drdee_: excellent, thanks! [21:04:43] :) [21:07:06] ottomata: should I use a keyed producer and specify the schema as a key for each event? [21:07:32] that sounds like a nice alignment of the related systems [21:07:57] (i'm asking 'cause i'm messing around with it now and just thinking out loud, not urgent or anything) [21:08:49] up to you, the key is irrelevant to me unless we want to use it for load balancing across brokers, or in case we want to be sure a particular key fails together [21:09:02] the key is useful say if you are logging a user activity stream [21:09:08] if we partition by key [21:09:10] and a single partition fails [21:09:27] you'd only ahve problems with users in that partition, but other users in other parittions would be unaffected [21:09:42] ah, okay; it's not actually useful for separating the data into separate bins for analysis? [21:09:54] just for failover strategies? [21:11:11] it could be, you can get the key when you consume [21:11:22] if you wanted to use them you can [21:11:27] cool [21:11:45] say maybe, you wanted to import the data into hive [21:11:55] but only ever queried for a single user_id at at ime [21:12:13] you could then use the key in kafka to import into hdfs directories based on user_id and use those as hive partitoins [21:12:35] that way hive MR jobs would only have to read data out of a single hdfs directory, instead of scanning the full import data [21:12:57] we're thinking about actually using a webrequest timestamp as a key for our stuff [21:13:12] this would be mostly useless for partitions in kafka, so we'd still jsut use the random partition [21:13:28] but it would allow us to save in date time buckets in hdfs without having to parse the actual message for the timestamp [21:13:42] that requires a bit of extra coding to camus though, so I haven't messed with it yet [21:29:38] DarTar: check this out https://www.facebook.com/groups/programevaluation/ [21:30:21] * DarTar doesn't do FB (and the link is private, sorry) [21:30:41] what's that? [21:30:52] i can show you in a hangout if you want [21:30:59] sure [21:31:03] in 30 min [21:31:09] yeah np [21:31:36] anyways; it's a support group for wikimetrics [21:31:58] on fb? :D [21:32:09] yes! [21:32:15] ha ha [21:32:47] I guess that's a good sign that people are excited about it and not screaming about metrics [21:37:54] drdee_: add a link to the mingle card? https://gerrit.wikimedia.org/r/#/c/85337/ [21:38:09] yup will do [21:38:11] thanks [21:38:24] ty for adding kafka support to EL, very exciting! [21:38:54] for me too [21:39:46] done [21:40:00] does that depend on mumrah's kafka-python? [21:40:06] yep [21:40:30] I just had a chat with him today about joining efforts, building his python client ontop of librdkafka [21:40:50] small world :D [21:40:53] that would be great; it's a very clean API [21:41:08] he might want to keep the pure-python implementation for pypi [21:41:28] but some manner of librdkafka integration would be cool [21:41:39] er, pypy, not pypi [21:42:03] building it on top of librdkafka would minimize dev efforts though. Snaps_ is building in a lot of awesome failover stuff [21:42:11] making sure that data isn't lost on broker reconfiguration [21:42:56] if it's substantially more sophisticated / robust, sure, then it might make sense to base it exclusively on librdkafka [21:43:08] i think this would be very cool all in all [21:44:00] we decided I'll get back to him when I finalize the API which should be next week. he had some Python-C guru friend who maybe was willing to help with the bindings aswell [21:45:41] Snaps_: cool; have you done Python bindings before? If not, check out http://cffi.readthedocs.org/en/release-0.7/ & http://cython.org/ [21:46:07] ori-l: nope. thanks, I'll check those out. [21:48:11] cool. librdkafka looks great btw, i've been half-following the development & packaging effort from afar. [21:49:02] thanks, its progressing slowly but well, and its getting time for a real release to let the world know about it [21:49:55] but not tonight, bed time! see you guys. [21:50:13] laterz Snaps_ [21:52:03] laatas! [21:54:31] lataaaaaas [21:54:32] i'm out too, latesr all! [22:01:48] i am out as well, laterz! [22:02:06] laterall, good weekend! [22:02:31] Enjoy the weekend!