[07:04:53] New review: Tim Starling; "(4 comments)" [analytics/log2udp2] (master) C: -1; - https://gerrit.wikimedia.org/r/58449 [12:34:29] mooooooorning [12:41:39] drdee: morning [13:42:07] yoooo [13:46:20] yooo [13:47:18] done [13:47:19] :) [13:47:51] done what? [13:48:14] merged analinterns removal :) [13:48:18] oh [13:48:31] thx [13:48:42] drdee, i'm hanging out with babies too! [13:48:50] 2 twin 1 year olds [13:48:51] awesome! [13:48:56] ooo wow [13:48:58] this morning we played with blocks [13:49:05] and it is nap time [13:49:09] dad is working on that one [13:49:22] hard to put two down at the same time i think [13:49:23] definitely! [13:49:28] herding cats is easier [13:49:52] gonna brew some coffee now [13:52:47] mark reverted it [13:53:15] hm, ok [13:53:24] 'nostalgia' [13:57:31] ottomata, brain bounce? [13:57:49] it's about a bug regarding page view count and SSL [13:58:18] sure [13:58:33] in standup hangout? [14:00:08] yup [14:29:30] changing locs, back in 20ish [14:35:23] wrote a script to run the wikistats reports [14:35:27] it's called basic-count.sh [14:35:32] started it on stat1 [15:00:06] milimetric, around? [15:00:13] yep [15:00:19] good mooorning! [15:00:23] morning :) [15:01:02] i'm just finishing breakfast and was gonna look at the pig stuff again, use the sampled data [15:02:21] did you talk with dschoon about #539? [15:02:28] that one should go first :) [15:05:42] he said he'd do it on Friday [15:05:53] I'll check in with him, but he mentioned he already knew the problem [15:08:09] k [16:00:34] drdee: ran it [16:00:46] ncie [16:00:49] reports all have wrong numbers [16:00:54] zero all over [16:00:55] less nice [16:00:59] ok [16:01:01] weird data in most of them [16:01:09] I don't know when this happened [16:01:09] that sucks [16:01:44] ottomata, can you quickly help average with setting up a place on kripke where we can publish beta reports of wikistats? [16:03:06] you just need a publically hosted thingee? [16:03:26] yes, plain html files [16:04:42] http://analytics.wmflabs.org/wikistats/ [16:04:42] /srv/analytics.wmflabs.org/wikistats [16:04:46] how'zat? [16:04:54] awesome! [16:05:00] :) [16:05:02] average: do you have access to kripke? [16:05:17] the dir is group writeable by wikidev [16:05:22] so if you do you should be able to put files there [16:05:31] what do I ssh into ? [16:05:46] drdee: probably, not sure [16:05:47] the same way as build1 [16:05:56] ssh spetrea@kripke.wikimedia.org ? [16:05:56] but kripke instead [16:06:01] it's running on labs [16:06:01] oh [16:06:01] no [16:06:15] ssh spetrea@kripke.wmflabs.pmtpa ? [16:06:16] try :) [16:06:35] ja [16:06:57] I am on kripke [16:07:11] kool [16:07:13] drdee , ottomata are the big .gz files on kripke ? [16:07:19] no [16:07:27] will they be transferred ? [16:07:31] and you cannot copy them to kripke either [16:07:32] NO! [16:07:35] ok [16:08:27] so I run stuff on stat1 and publish results on kripke [16:08:30] is that correct [16:08:31] ? [16:10:54] drdee: ? [16:16:10] correct :) [16:16:24] ok [16:26:18] I'm going to go out to get some stuff [16:26:25] I'll be back [16:39:06] hey ottomata, I'm testing some pig jobs [16:39:07] oo [16:39:08] cool [16:39:09] and they failed and I'm trying to look at: http://localhost:8088/proxy/application_1364239892421_11509/ [16:39:29] but if i put that in my browser, it redirects to: http://analytics1010.eqiad.wmnet:19888/jobhistory/job/job_1364239892421_11509/ [16:39:32] and says it can't find it [16:39:33] yup [16:39:38] you have to type in localhost everytime it redirects [16:39:46] and you need the history tunnel running too [16:39:47] ktunnel history [16:39:50] gotcha [16:39:50] ok [16:40:05] pretttyyyy annoying, i know [16:40:06] i must've always had all of them on [16:40:14] no it's fine, just didn't know [16:41:35] hm, so for that job, I have some basic error output but nothing shows up in the link [16:41:41] like, no stack trace [16:41:58] the basic error output is in my grunt session I mean [16:42:51] milimetric: sometimes it's useful to look at the mapper-/reducer-specific logs [16:43:22] which live somewhere on hdfs like /hadoop-yarn/users/milimetric/... [16:43:37] it's a "Failed to read data from <>" error [16:43:46] i see [16:43:55] ok, cool, i'll take a look there [16:43:55] thx [16:46:31] hm, uh, erosen, where would hadoop-yarn be? [16:46:33] it's not in / [16:48:04] sorry one sec [16:51:46] milimetric: looking now [16:52:09] not a prob. I found the error message, it was in a local log file in my home directory in this case [16:52:28] but it'd be interesting to find out what you mean about the other kinds of logs [16:52:43] yeah, I've found them to be pretty useful at atimes [16:53:42] mornin [16:53:43] milimetric: check out this: http://localhost:8888/filebrowser/view/user/erosen#/var/log/hadoop-yarn/apps/dandreescu [16:53:48] morning [16:54:18] milimetric: i've added this to my /etc/hosts file: [16:54:29] 127.0.0.1 analytics1010.eqiad.wmnet analytics1027.eqiad.wmnet [16:54:40] oh ok, var/log [16:54:48] but i don't think i have filebrowser privileges [16:54:56] really? [16:55:07] that means i don't have to replace the stupid URL to look at AM logs [16:55:30] gotcha dschoon, thx [16:55:49] but yeah erosen, it's because my labs and hue usernames are different [16:55:54] aah [16:55:55] i see [16:56:04] we raised this as an issue a long time ago but nobody's addressed it yet [16:56:06] well i guess this is another motivation to sort that out somehow [16:56:15] though at the time everyone seemed very much in agreement it should be [16:56:39] i guess until then I'll just hadoop fs -ls around like a mole :) [16:57:54] milimetric: what specifically are you looking for? [16:58:32] the three links i typically use to find jobs and logs are: [16:58:34] milimetric: also, not sure how this deals with permissions, but the grunt shell has decent file navigation tools [16:58:41] http://localhost:8088/cluster/scheduler for currently running jobs [16:58:43] dschoon: [16:58:45] I know [16:58:46] plus http://localhost:8088/cluster/ [16:58:50] I've got the problem [16:58:55] it's not in any of those [16:58:57] http://localhost:19888/jobhistory for history [16:58:59] it's in my home directory [16:59:02] (finished jobs) [16:59:09] wha? [16:59:11] because there was some problem before any of those other logs could be made [16:59:16] *what* is in your home dir? [16:59:18] okay. [16:59:21] so it stored it in some pig_13...blah [16:59:25] .log file [16:59:40] you launched it via the pig CLI on like an02? [16:59:46] yes, an02 [16:59:48] it also writes the log file to CWD [16:59:57] it was just failing on the new IP address format [17:00:07] with IPAddress|CountryCode [17:00:32] okay. what's the question? [17:00:56] ottomata…. scrum [17:00:56] erosen scrum [17:31:50] DarTar: can you add a google hangout link? [17:32:15] yep, setting this up in a sec [17:35:00] https://plus.google.com/hangouts/_/calendar/ZHRhcmFib3JlbGxpQHdpa2ltZWRpYS5vcmc.mumqh32odqvaqed5e8sm88klqo [17:47:07] cool, job worked that time. [17:48:10] total time for the job to process one day is about 15m. [17:48:26] amusingly, it was about 12m for an hour. [17:48:34] so i was right that framework overhead was the bulk of the time [17:48:43] the pig job for a day's worth of data is 10m [17:48:50] ^^ drdee, ottomata [17:49:08] NICE! [17:50:14] ha, nice [17:50:14] yeah [17:51:26] the backfill job is now running. [17:51:42] it has 44 days to process, which is about 11h [17:51:47] so it'll be done by tomorrow. [17:51:55] (which is a bit better than 8 days :) [17:52:19] it cool to move #61 to ready for showcase, drdee? [17:52:34] btw, http://localhost:8888/filebrowser/view/wmf/public/webrequest/mobile/device/props/2013/04 [17:54:28] ottomata: https://mingle.corp.wikimedia.org/projects/analytics/cards/579 [17:55:55] milimetric: lmk if you get blocked by stuff [17:56:14] will definitely do, I'm in a user metrics meeting now [17:56:18] kk [17:57:55] dschoon: sure if you think you can showcase it :) [17:58:03] it'll be done by tomorrow [17:58:28] you can track progress by refreshing http://localhost:8888/filebrowser/view/wmf/public/webrequest/mobile/device/props/2013/04 [17:58:42] and [17:58:49] http://localhost:8888/filebrowser/view/wmf/public/webrequest/mobile/device/props/2013/03 once it exists [17:59:49] mmk, thanks dan [18:04:50] hm. i'm concerned the backfills will run a lot slower due to being in the adhoc queue [18:04:59] it looks like it somehow only got 4G of memory [18:08:10] i'll monitor it and see how long 3/1 takes [18:25:40] milimetric: I hear your point about separating the service from the website. The point I was trying to make is that the client currently simply issues http requests and is independent of the api insofar in that it is simply polling the service itself. in the API code base the website is coupled with the service in the views module, and i think that is what could be refactored. [18:26:32] yeah, it might make it easier and more necessary to focus on making the API have a standard interface that others could consume [18:26:43] totally agree [18:26:46] I think we're on the same page :) [18:26:56] yep :) [18:27:38] the other really cool thing about that is it can become very flexible [18:28:16] so we can keep the API on a normal cluster to service the "website" but then deploy it onto a monster cluster and have it service real-time analytics even for wiki projects [18:28:36] yeah. i like the focus on making the service as light weight as possible, it'll make it more extensible also [18:29:06] ^^nice idea [18:29:25] totally, I think that'll be hard and we have to not mess with the magic sauce too :) [18:29:50] the tricky part I see is that the API so far has grown organically and has a lot of useful but not necessarily homogeneous ways of slicing and dicing data [18:30:09] so finding the right structure that fits on top of that as a standard interface might be challenging. [18:33:45] potentially. the core request structure although organically grown i think is fairly well defined, at least enough so that we could work with … i mean if by slicing and dicing you're referring to time series and aggregation stuff i actually believe those are pretty core. the service doesn't do a whole lot of fancy stuff other than those two things, it pulls well defined metrics by user [18:34:28] but i'm all for having a core normalization layer upon which all other data operations can be built [18:34:39] which i do think we mostly have [18:35:55] but i guess some more complex requests were added to service immediate e3 needs at the time… that is def a consequence of the organic growth [18:57:08] oh I didn't mean anything specific earlier rfaulkner, I was just thinking about a typical REST URI structure and how User Metrics wouldn't immediately fit into that structure. [18:58:12] but I agree with you that all of the analyses are core to the project, that's what I was trying to say - we have to be careful when we standardize [19:01:27] milimetric: https://github.com/dsc/limn/commit/7a6cc61956b1eddc6946ee60c4062d431bec6513 [19:01:36] milimetric: sounds good [19:02:54] deploying dev now [19:03:07] oh ok, cool dschoon, that makes sense [19:03:20] brb, getting some food [19:04:44] [travis-ci] develop/31201f8 (#126 by dsc): The build is still failing. http://travis-ci.org/wikimedia/limn/builds/6361809 [19:05:58] milimetric: the deployer has a bug where it will never restart the server. [19:07:13] ops meeting over, grabbing coffee, back in abit [19:18:28] ok, that makes sense too dschoon [19:18:34] fixing it now. [19:18:43] ok, thank you [20:22:17] New patchset: Rfaulk; "Merge branch 'working'" [analytics/E3Analysis] (master) - https://gerrit.wikimedia.org/r/59190 [20:32:03] Change abandoned: Rfaulk; "(no reason)" [analytics/E3Analysis] (master) - https://gerrit.wikimedia.org/r/59190 [20:51:18] New patchset: Rfaulk; "fix. bug in revert_rate. Historics revs should be ordered by descending rev_id." [analytics/E3Analysis] (master) - https://gerrit.wikimedia.org/r/59200 [20:59:45] Change abandoned: Rfaulk; "(no reason)" [analytics/E3Analysis] (master) - https://gerrit.wikimedia.org/r/59200 [21:00:21] New patchset: Rfaulk; "fix. bug in revert_rate. Historics revs should be ordered by descending rev_id." [analytics/E3Analysis] (master) - https://gerrit.wikimedia.org/r/59264 [21:52:26] fyi milimetric, deployer is updated [22:08:24] bleh. [22:09:04] drdee: the job for 3/1 hasn't finished yet, so I guess I'm going to have to tweak *something* [22:09:45] (given it's about 4 hours overdue) [22:11:58] grumble [22:12:26] i think it's because of the relative queue resource limits [22:12:37] is otto out for the day? [22:26:40] drdee: reports are being copied to kripke as we speak [22:26:44] I don't know how to expose them [22:26:56] average: i can help. [22:27:01] dschoon: please do [22:27:08] where are they being copied to? [22:27:27] dschoon: /home/spetrea/reports on kripke (but copying is in progress) [22:27:31] okay. [22:27:46] you're spetrea, right? [22:27:53] yes [22:27:54] dschoon: could you add a vhost with documentroot that points there ? [22:28:20] i've added you to the www group [22:28:37] which will give you access to /srv/analytics.wmflabs.org [22:28:44] ok [22:28:47] that folder is served up to http://analytics.wmflabs.org/ [22:28:59] actually [22:29:03] let's just make a symlink [22:29:37] spetrea@kripke:/srv/analytics.wmflabs.org$ ln -s /home/spetrea/reports/ [22:29:40] ln: creating symbolic link `./reports': Permission denied [22:29:54] dschoon: do I need to relogin to get the permissions ? [22:30:10] ok works [22:30:30] dschoon: thank you [22:30:38] drdee: reports are sitting here on kripke http://analytics.wmflabs.org/reports/r52/ [22:30:43] http://analytics.wmflabs.org/wikistats/reports/ [22:30:50] either way. :) [22:31:53] average: okay; can you copy all revisions for the mobile report to kripke? [22:32:01] yes [22:32:11] average: do you also have r52 ready in wikistats format? [22:32:12] can I copy from stat1 => kripke ? [22:32:19] try it [22:32:22] or do I need to stat1 => local and then local => kripke ? [22:32:23] ok [22:32:36] drdee: yes, it is present in wikistats format too http://analytics.wmflabs.org/reports/r52/out_sp/EN/TablesPageViewsMonthlySquidsMobile.htm [22:33:01] ok; ty [22:33:01] I don't know what Original means as opposed to the other one [22:33:15] http://analytics.wmflabs.org/reports/r52/out_sp/EN/TablesPageViewsMonthlySquidsOriginalMobile.htm [22:40:23] I copied all of them [22:52:27] ty [22:53:25] I think I know the problem in wikistats [22:53:51] some time ago we agreed we'll drop the format with IP|countrycode [22:53:59] and we'll put the countrycode at the end as a new column [22:54:26] well, I ran it against the old format .. and that's what happened there [22:54:51] ok [22:54:55] please rerun :) [23:02:06] finishing out from home. back in 15-20 or so [23:16:05] drdee: maybe we could make it so that we can make those changes to the stream itself [23:16:51] drdee: what happens if we don't do that is that we have heterogenous data [23:17:17] and that leads to if(data_is_of_type1){}else if(data_is_of_type2){...} else if(...) ... [23:18:08] or, alternatively, it means that someone has to go in regularly and convert the data, so it's all in the expected format [23:18:46] I guess because that someone is me, that's why I'm proposing that we change them upstream [23:18:57] I wouldn't mind doing the change if I knew where to do it [23:21:19] I mean the change to the program that's generating the stream [23:22:17] * average goes back to wikistats code [23:22:21] back [23:37:45] drdee: you there ? [23:37:48] yes [23:38:03] I need to make a decision [23:38:08] either I use udp-filter to geocode [23:38:14] or I use Geo::IP module from maxmind in the code [23:38:20] what would you do ? [23:38:31] I'm in the hangout [23:38:35] if you wanna talk about this [23:38:37] does Geo::IP use the C library? [23:38:50] yes [23:39:19] then I would say use Geo::IP [23:39:27] it's the official one from maxmind. they have API for multiple languages, Ruby, Python, C, Perl etc [23:39:27] and make sure you choose the right cache setting :) [23:39:34] ok, sounds good to me [23:40:36] ok