[12:35:55] good morning [13:03:17] moooooorning [13:08:22] hey ottomata, are we currently importing the stat1 sampled1000 files into kraken? [13:29:09] hiii morning [13:29:09] i am not [13:29:22] man i got a new o ring for my machina coffee maker this morning [13:29:25] its being real weird! [13:29:42] evan was importing those, right? [13:34:12] ottomata, yes evan was doing that but we need to make that more structural, see https://mingle.corp.wikimedia.org/projects/analytics/cards/408 and please add your thoughts :) [13:40:32] wait, what are my thoughts? [13:40:33] agh [13:40:35] "ok" [13:40:36] are my thoughts [13:40:44] do I need to think about how to do it right now? [13:40:44] :) [14:08:37] ottomata: unfortunately, the sampled1000 files are needed now so we can deliver *some* kind of metrics on mobile apps [14:09:19] so either importing them or some awful hack to allow my pig to read them [14:09:45] we can import them [14:09:52] my thoughts are "ok" [14:10:02] just gotta think about the best way to do it, but its certainly possible [14:10:29] :) [14:11:03] i'd be curious what some of the bad ways are, i'm here if you wanna brain bounce [14:16:10] well, they are gzipped files that live on stat1 [14:16:19] ideally we'd just install the hadoop client on stat1 [14:16:22] and load them up [14:16:26] but, we certainly can't do that right now [14:16:28] so [14:16:37] we have to sync them to a holding area in the analytics cluster [14:16:42] and the zcat them one by one into hadoop [14:16:54] there's no real hadoop rsync (oo, will google for that?) [14:16:56] so [14:17:08] we'd have to code in detection to figure out what files need to go in [14:17:57] why don't we just import sampled data from feb 1 2013 onwards [14:18:05] that's fine, still the same problems [14:18:06] that way all data is tab delimited [14:18:09] if you want it to be regular [14:18:27] webhdfs? [14:19:11] i guess you can put with that, welcome to ops review nightmare though [14:19:34] just cronjob? [14:19:36] brb [14:20:02] yeah, probably cronjob on an27 or an10 [14:20:06] rsyncs from stat1 [14:20:08] to a place there [14:20:17] then, foreach file [14:20:20] check if in hdfs [14:20:21] if not [14:20:24] hadoop fs -put [14:20:39] so we don't duplicate files, is that the problem we're solving? [14:21:02] so hadoop fs -rsync would be ideal [14:21:19] ja, no existy [14:21:40] what I do for the couple of syncs I have right now [14:21:51] is delete everything, and then re put the entire directory [14:21:58] this is fine for jars and pig and oozie files [14:22:01] not so much for huge logs like that [14:22:09] also, these files are gzipped [14:22:20] so we have to zcat them into hadoop fs -put [14:22:41] http://dl.acm.org/citation.cfm?id=2355549 [14:22:42] :) [14:23:02] but that looks theoretical more than something we can use [14:23:16] for file in ./*.gz; do [14:23:17] if [ -n $(hadoop fs -ls path/in/hdfs/$file) ]; then [14:23:17] zcat file | hadoop fs -put - path/in/hdfs/$file; [14:23:18] done [14:24:00] cool, that works. [14:36:37] drdee, ottomata, dschoon: kraken jar deployment [14:37:03] currently, if I change something, there are no tests to let me know how much of a difference a deployment would make [14:37:25] I think we need that going forward, I think it's pretty crucial [15:14:19] milimetric: there are tests [15:14:30] run mvn install [15:14:31] you mean the unit tests? [15:14:34] yes. [15:14:42] yeah :) [15:14:46] not even close [15:14:48] we *do* need more [15:14:54] training sets where output is compared [15:14:55] it's the wrong kind of test [15:14:57] yep [15:14:58] that's what PigUnit is for [15:15:03] yep [15:15:14] i agree we need to build that capability [15:15:26] cool [15:21:44] milimetric: btw, what was the result of the conversation about mobile app traffic? [15:22:22] we're going to get it from the sampled logs [15:22:39] andrew's importing those into kraken from stat1 [15:22:43] uh [15:22:44] via that lovely for loop a page up [15:22:51] (zcat) [15:22:55] yea [15:23:06] so we're going to have a permanent cron? [15:23:11] the idea is - once the real data's in, we replace it [15:23:17] hm [15:23:20] yeah, permanent cron [15:23:23] ick. [15:23:24] but yes. [15:23:26] indeed [15:23:32] having the data there is good, at least [15:23:40] it's ... somethig [15:23:50] it's another option [15:25:23] btw, ottomata milimetric you don't have to unzip the files [15:25:35] erosen says PigStorage() handles gzip transparently [15:25:36] pig will read into the zips? [15:25:42] ooh, fancy [15:25:51] otto: zcat's not necessary then [15:25:52] yup [15:26:00] i haven't tried it, but that feels right to me [15:26:15] would it save processing time though? [15:26:17] it might... [15:26:22] *shrug* [15:26:33] hmmmmm [15:26:36] i wouldn't worry that much about it [15:26:40] i thought gzip in hdfs was a problem [15:26:54] imo, streaming the bytes uncompressed would be bad [15:26:56] some problem with hdfs block size [15:27:06] oh, speaking of, ottomata [15:27:13] we should deploy some conf changes today [15:27:23] i have a list :P [15:27:33] i saw you committed more of the conf files [15:27:42] so i'll make changes, and then we can talk about deployment [15:27:46] maybe even with dsync! [15:28:48] also, ottomata, i committed shell scripts for hadoop-streaming and coalesce [15:29:06] https://github.com/wikimedia/kraken/blob/master/bin/jobs/coalesce [15:30:09] ok cool! [15:30:26] last night i ran a bunch of backfills for the device-props data [15:30:32] https://github.com/wikimedia/kraken/blob/master/bin/jobs/device-props-backfill.sh [15:30:39] and they all worked, so [15:30:46] easy to turn lots of files into one file now [15:31:39] brb, picking up food [15:31:55] brb, putting on pants [15:33:38] brb, brushing teeth [17:04:01] average_drifter scrum? [17:04:07] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [17:04:30] kraigparkinson: [17:04:32] coming now [17:34:11] ottomata: hurry up@ [17:34:19] microwave faster! [18:42:53] yay, cool! udp2log packet loss is now being monitored by icinga! [18:43:02] on analytics nodes [18:43:24] we should get alerts if they have udp2log detectable packet loss [18:44:29] NICE! [18:44:55] ottomata I cleaned up https://mingle.corp.wikimedia.org/projects/analytics/cards/134 and moved hadoop applications to https://mingle.corp.wikimedia.org/projects/analytics/cards/506 [18:45:48] nice, like it [19:05:47] New patchset: JGonera; "Add requirements.txt" [analytics/limn-mobile-data] (master) - https://gerrit.wikimedia.org/r/57129 [19:19:15] ottomata: by "analytics nodes", you mean oxygen/locke/etc or anXX? [19:21:19] no [19:21:21] an03-06 [19:21:30] ah, ok [19:21:37] and that's all our udp2log importers? [19:21:39] yes [19:21:54] i'm also delving into the disk alert stuff, most stuff is monitored, [19:21:58] ganglia only shows aggregtae usage [19:22:07] but icinga should alert if any mounted disk starts to fill up [19:22:12] trying to make a test case now [19:22:41] cool [19:25:07] just a reminder to everyone on the cluster [19:25:38] if you were to import a bunch of crap into your home directory, kindly move it someplace so we can all use it, or clean it up when you're done [19:25:47] i'm looking at everyone with 1TB+ in /user [19:26:17] :) [19:26:18] that is all [19:31:33] oh, drdee, if you talk to that cloudera guy [19:31:36] ask him about who uses oozie [19:37:16] dschoon: removed spurious server comments from https://mingle.corp.wikimedia.org/projects/analytics/cards/367 [19:53:31] http://www.atlassian.com/jirajr [19:53:39] heh [19:53:41] kraigparkinson: that's for you. [19:53:48] i move we switch from mingle. [19:55:26] LOL [19:56:09] did you watch the video? [19:56:20] yeah, skimmed it [19:56:28] "who wants to stay up late for some backlog grooming?!?!" [19:56:31] eerie in familiarity. [19:59:20] milimetric, average_drifter: i updated https://raw.github.com/wikimedia/metrics/master/pageviews/new_mobile_pageviews_report/pageview_definition.png as per our discussion yesterday [19:59:30] cool [20:04:28] kraigparkinson: i am in the hangout and brewing some coffee [20:04:49] drdee, sorry, didn't mean to stand you up, be right there. [20:11:34] ottomata: no analytics1026 alert for me [20:11:51] yeah, hm, the alert def fired in icinga [20:11:55] just no email [20:47:00] New patchset: JGonera; "Fix upload errors graph for Web" [analytics/limn-mobile-data] (master) - https://gerrit.wikimedia.org/r/57186 [20:49:33] Change merged: JGonera; [analytics/limn-mobile-data] (master) - https://gerrit.wikimedia.org/r/57186 [20:51:21] New review: Yuvipanda; "lgtm, but remember that if we've to deploy it it needs to go into puppet" [analytics/limn-mobile-data] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/57129 [20:51:22] Change merged: Yuvipanda; [analytics/limn-mobile-data] (master) - https://gerrit.wikimedia.org/r/57129 [20:57:32] so kraigparkinson, drdee, dschoon, I updated the mingle monitoring problem cards [20:57:44] awesome [20:57:46] will check it out [20:58:24] thanks ottomata [20:58:56] basically: everything is ok. [21:26:58] YuviPanda: JGonera's patchset above doesn't have anything to do with puppet right? [21:27:13] milimetric: none at all. it's for local testing [21:27:23] puppet will just run the scripts as they are in the latest version of the limn-mobile-data repository [21:27:59] k, cool [21:30:58] yeah [21:40:54] back [21:46:20] ottomata, what's left to be done for 460? [22:01:33] milimetric: updated https://mingle.corp.wikimedia.org/projects/analytics/cards/378 [22:03:44] erosen: you about? [22:03:53] in the zero analytics meeting [22:04:10] k [22:04:18] *eyebrow* [23:01:36] i just had a crazy idea [23:01:40] and it will be AWESOME if it works. [23:04:22] uh oh [23:04:48] * milimetric checks nuclear shelter for structural integrity [23:16:47] [travis-ci] master/87b285e (#116 by milimetric): The build passed. http://travis-ci.org/wikimedia/limn/builds/6001374 [23:32:10] robla: don't we have a meeting? [23:53:53] dschoon: [23:54:02] take a look at the sources for http://test-reportcard.wmflabs.org/graphs/pageviews [23:54:05] (JS) [23:54:08] !!! [23:54:51] i've added the metrics-def stuff into master but it thinks there are multiple files. I'm knee deep in user metrics stuff right now and can't think [23:55:14] I couldn't recreate locally [23:56:31] is it something sinister with these files never expiring from cache and therefore the browser remembers and continues to fetch them?