[01:37:34] (CR) Nuria: Add annotations to graphs (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [02:56:33] (PS10) Nuria: Mobile apps oozie jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/181017 [03:00:19] (PS1) Nuria: [WIP] How to send job results via email using oozie [analytics/refinery] - https://gerrit.wikimedia.org/r/182350 [03:04:32] (PS2) Nuria: [WIP] How to send job results via email using oozie [analytics/refinery] - https://gerrit.wikimedia.org/r/182350 [03:06:04] (PS3) Nuria: [WIP] How to send job results via email using oozie [analytics/refinery] - https://gerrit.wikimedia.org/r/182350 [13:49:58] (PS4) Mforns: Add annotations to graphs [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 [13:56:37] (PS5) Mforns: Add annotations to graphs [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 [13:59:12] (CR) Mforns: Add annotations to graphs (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [14:06:47] (CR) Mforns: "I could not manage to find a CSS that adapts the graph box and graph to the available height in both cases (with annotations and without)." [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [15:38:19] mforns: twix is cool! [15:48:55] (CR) Milimetric: [C: 2 V: 2] "looks good - a few comments left inline as random thoughts, nothing to improve right now." (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [17:55:10] okay, this cascading stuff is...way out of my comfort zone :/ [17:55:12] * Ironholds sighs [17:55:20] I wish I was motivated to do ANYTHING this week. [18:58:41] Ironholds: .... tumbleweed [19:16:28] no one here today [19:16:34] :] [19:25:11] (CR) Mforns: Add annotations to graphs (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [19:25:49] (CR) Bmansurov: Update scripts in light of recent changes (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/181428 (owner: Jdlrobson) [19:26:50] (CR) Milimetric: Add annotations to graphs (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/181591 (owner: Mforns) [19:27:22] Ironholds: what cascade? You washin' dishes man? [19:27:55] it's a mapreduce framework ottomata recommended [20:03:09] guys, leaving now, happy new year! [20:03:22] Happy new year to you, too! [20:14:24] milimetric: Hi. I was wondering if you had a chance to review my script about anomaly detection? [20:14:36] bmansurov: thanks for the ping! [20:14:42] I haven't yet [20:14:58] ok [20:15:09] I'll ping you next week again ;) [20:15:10] I'm off in a bit, but I will put it in my calendar to do Monday morning - does that miss the Sprint deadline you were shooting for? [20:15:27] milimetric: no, it's fine. there is no deadline for that script [20:16:13] thanks and happy holidays [20:16:14] ok bmansurov, good deal, if you don't have my review Monday morning at 08:00 PST, please ping me again. [20:16:19] ok [20:16:19] Happy New Year! :) [20:18:14] Anyone interested in looking at a weird hadoop error with me? [20:18:22] org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device [20:18:26] Should not be possible. [20:19:35] It's possible that I am writing out a mega-superduper big JSON blob. [20:19:54] 64 MB is the biggest I have seen. [20:29:42] halfak: hadoop operations require space not just on HDFS, but on hadoop.tmp.dir, which could be full on one or more individual nodes [20:29:46] the log directory could be full, too [20:29:58] Yeah. I've been looking into that. [20:30:09] I'm not sure if the logs appear in hdfs. [20:30:25] A simple dh on the mounted dir suggests there are TBs for that. [20:30:32] But the local machines could be running low. [20:30:54] It's hard to tell. As best as I can figure out, the actual blobs that are output right before crash are ~2-3mb [20:34:00] They are vandalism edits for sure. It looks like someone is trying to test the limits of what MediaWiki will let you save. [20:39:17] Found something else that might be relevant. "Could not find any valid local directory for output/attempt_1415917009743_51745_m_000001_0/file.out" [20:41:23] OK. I'm thinking this is the temp space used on the machines between map and reduce. Will send my thoughts at ottomata. [20:43:54] The input files are 11TB! [20:44:01] That's probably related. [21:02:52] * halfak trims off some superfluous data [21:12:58] and so, to avoid building issues [21:13:06] I've set up this project with Maven [21:13:14] ...somehow I suspect I'm going to regret this [21:17:50] I don't think I actually know enough about Hadoop to make this work. Baaah. [21:18:14] halfak, do you have any of your mapreduce examples up on github so I can thieve? [21:18:39] Sure. :) [21:18:53] But it might be easier to direct you to the spot on stat2 [21:19:25] totally! [21:19:59] Will you be streaming and would you like to run in python? [21:20:08] * halfak has a few examples. [21:24:49] Ironholds, ^ [21:25:04] streaming and no, Java! [21:25:07] (hisssss, I know) [21:25:13] (but I figure if I have to learn one new tech..) [21:25:19] Oh. Then you'll probably want to implement your mapper and reducer directly. [21:25:24] I don't have examples for that. [21:26:24] oh, you're using Hadoop's inbuilt stuff? [21:27:51] Nope. I'm using it's streaming interface. If you're in java land, you probably want to use hadoops stuffs. [21:28:13] I haven't written java for hadoop since I was playing with Flume at the google. [21:28:23] So, I don't know if I'll be much help. [21:37:09] halfak, okie! Otto has got me working with Cascading [21:37:22] I'm just setting up for session analysis via map/reduce tasks, because I had a rare moment of brilliance. [21:37:41] namely, realising that mapper are designed to return key/pair tuples, and a UUID and a timestamp is a key/pair. [21:37:48] Yes :) [21:37:50] so we can get the mappers to not only find the value locations but also extract them [21:38:08] the interesting bit is going to make sure reducers have THEIR work distributed by the key. hrm. [21:38:11] And you can use the sort/shuffle that happens between map and reduce to read in a user's events in order [21:38:16] partitioner [21:38:19] yup! [21:38:22] I know how to do this in streaming :) [21:38:36] for the time being, though, my problem is "turning a varnish dt into a long". I think I have cracked it, but we shall see. [21:38:47] Say, you've worked with Java. Can I ask a language design question? [21:38:53] It may be a bit stupid. [21:41:14] Sure. [21:42:23] so, I instantiate a class with a method, Foo [21:42:28] Foo contains a string definition, Bar [21:42:53] is Bar regenerated every time Foo is called, or is there, even without a call to "static", some consistency between runs? [21:43:19] the actual problem is I'm trying to work out if I'd take a performance hit from putting something inside a method (where it lives), versus outside a method (where it doesn't live, but would only be defined once per instantiation) [21:43:21] When you say "string definition" do you mean a member variable or a class variable? [21:44:01] Is this a constant value you'll be drawing from a lot? [21:45:20] member...variable versus class variable? [21:45:23] and yes, but only within one method [21:45:44] I can show you the code if that'd be more helpful ;p [21:46:05] I think it would :) [21:47:06] awesome! One mo while I throw it up on le githubs [21:51:46] Analytics, Multimedia: Update Multimedia dashboards to use datasets.wikimedia.org instead of stat1001.wikimedia.org - https://phabricator.wikimedia.org/T85250#951319 (Tgr) Open>Resolved Forgot to close this. [21:55:11] halfak, so https://github.com/Ironholds/sessions/blob/master/src/main/java/org/wikimedia/analytics/adhoc/mapper.java for example [21:55:18] line 38; the date format. [21:55:43] There's going to be a public method that retrieves a line, checks it's an app line, extracts the UUID and dt if so, and then turns the dt into a long. All nice and simple. [21:55:52] But that's in one HELL of a for loop. [21:56:21] so I'm wondering about Java's approach to variables declared within methods. If I declare varnishSDF in parseDT, is it going to be created anew every time parseDT is called? [21:56:35] if so, I'm better off defining it as a standalone variable so it's only created when the class is instantiated [21:56:46] if not, I should throw it into parseDT because it makes more sense living there [21:57:00] Ironholds, that's right. You should expect all variables to be cleared when their scope is cleared. [21:57:08] yay, I can computer science! [21:57:17] so I should keep it where it is right now. Thanks! [21:57:37] yes. [21:57:56] The best thing about this is that it will be defined only once for the class. [21:58:18] So even if multiple instances of the mapper are instantiated, there will be only one varnishSDF. [21:59:13] BTW, if you are going to sort the hadoop to sort based on a milisecond timestamp, you'll need to tell it to sort like a number -- as opposed to a string. [21:59:35] > "90" > "1000" [21:59:35] [1] TRUE [22:00:09] milimetric: hey, where are wikimetrics logs saved? /var/log/apache2/error.log doesn't have any info :( [22:00:12] I'm not sure how to do that in java hadoop land. [22:00:16] Ironholds, ^ [22:00:41] halfak, awesome! Alternately I can store it as an int by /1000 [22:00:53] which I probably want to do because ints are smaller and I really don't give a flying crap about miliseconds [22:01:06] varnish dts don't contain them these days, so... [22:01:29] in fact, I'll do that. Thanks for the mental prompt! [22:02:18] Ironholds, same problem. But it shouldn't matter for a number of years -- unless you are doing historical processing. [22:02:30] oh totally, I'll still need to sort [22:02:45] and yeah, it's what, 2038 before this becomes an issue? [22:02:46] Looks like Sunday, Nov 20th of 2286 we'll get another digit in our timestamp [22:02:50] hah [22:02:52] :) [22:03:21] If your code is still running then. Well. Good, I guess :) [22:08:55] if my code is still running then I think everyone in analytics engineering should be fired [22:08:59] but that's my impostor syndrome talking [22:19:57] milimetric, what does our Java styleguide say about if/else statements? [22:20:08] what about them? [22:20:33] All on the same line if(...){...}else{...} [22:20:40] :@ [22:21:02] ^ agreed [22:21:09] semicolons are the devil [22:21:16] :) [22:24:09] huh [22:24:20] and yet we demand {} be split up for function and class defs? [22:24:30] wait [22:24:34] oh, you're trolling me. Fuck you ;p [22:24:44] Whatever, I'll finish this blob of code and y'all can tell me what to change. [22:36:01] https://github.com/Ironholds/sessions/blob/master/src/main/java/org/wikimedia/analytics/adhoc/mapper.java#L28-L35 critique away! [22:37:00] Ahh! There's a line break before the curly bracket. [22:37:08] * halfak has no idea what our style is [22:38:22] I've been told line breaks go before curly brackets to make the presence of missing brackets obvious [22:38:32] but that was in method definitions; it seems dumbass to apply that to if-statements [22:41:39] halfak, do we have a gerrit repo for like...miscellaneous things? [22:41:59] not that I know of. [22:42:28] okie-dokes! [22:45:45] I still have no idea what to do with MapReduce [22:45:52] but, you know, most of this code is agnostic as to how the data shows up [22:45:57] so I can leave that for when Otto is back [23:05:28] bye everyone! Ironholds: I officially don't care about if/else formatting until Jan 5th, 2015. At which point I shall be very picky again. [23:05:29] <3 [23:05:35] Happy New Year!!! [23:06:09] kk [23:25:08] happy new year, milimetric! [23:56:17] (PS1) Bmansurov: Check ownership before adding tag to cohort [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/182391