[00:00:44] :( [00:01:11] Is there an easy way to visualise changes to an article history over time other than manually screengrabbing and putting into a video on my own? [00:02:24] buh. [00:02:32] what do you mean by "visualise"? [00:02:59] i think it's a rather complicated problem since there isn't really any structure to text [00:03:18] unless you just mean "flash a color behind deletions and insertions as they appear"? [00:03:19] http://ozziesport.com/wp-content/uploads/2012/11/Tyan-Taylor.wmv [00:03:21] Like that. [00:03:33] That would be nice but wouldn't even need that. [00:03:55] I'll look, but this is probably a question for someone who better-knows tools that people have made for wp. [00:03:58] maybe drdee? [00:04:22] Not a big deal. I can manually compile it. [00:04:35] Presenting at a conference on Tuesday or Wednesday. [00:06:31] That's what I'd do [00:06:40] Wouldn't be too hard to do in a webbrowser with JS. [00:15:10] I'm manually getting them. A little monotonous but getting done. [00:17:26] you could probably just download a dump, but that'd definitely require all manner of annoying parsing [00:17:52] I just want one example page. [00:18:28] Presenting at a conference about sourcing issues for women's football in Africa articles when not in Africa and in dealing with Wikipedia standards [00:18:35] *nod* [00:18:55] Realised I was expected to do a powerpoint. [00:19:00] would kind of be a neat browser-extension [00:19:14] add a button on each history page and use the API [00:19:18] mmm :) [00:19:26] I've seen the visualization done. [00:19:34] I can't recall where I saw it though. [00:19:48] It is really nice though for showing people how articles can and do develop. [00:20:02] For my audience, I'd like to contrast edits with page views. ;) [00:20:11] but that's neither here nor there. [00:20:51] I put together a big report on the state of Australian women's sport material on Wikipedia, Commons and Wikinews. [00:21:15] Which will be presented through my university to parts of the Australian government later this year. [00:38:14] purplepopple: totally agree with dschoon [00:38:27] i am calling it a day, dschoon, see you tomorrow :) [00:38:38] see ya drdee [00:44:02] dschoon, today was a very productive day btw: ottomata puppetized storm, i fixed instrumentation for hive, otto and i fixed the hadoop job history server and robh racked the dell machines [00:44:15] nice [00:44:19] that's hot shit [00:44:26] i wrote a bunch of limn code [00:44:37] and wasted half the day trying to get internet. [00:44:41] (tragic) [00:46:14] :) [00:52:35] somehow after 20 years of ISP's that still seems to be problematic :( [00:52:41] now i am really leaving [00:52:51] :) nite [00:54:55] they seriously are [00:54:56] night man [12:41:50] good morning peoples [13:47:36] man, that's way too early! [13:47:45] milimetric1! [13:59:55] milimetric morning and ^^ [14:00:05] morning! [14:00:17] oh :) [14:00:24] um, yeah, I am super-obsessed [14:00:38] there's an amazing elegant solution here. I can friggin smell it [14:00:48] but I don't think I have enough time to figure it out before the demo [14:01:04] either way, I think we're scrapping this approach until after december [14:01:12] but like - SOO close :) [14:06:46] check this out: http://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad it shows raw PHP source code for me ??!??!?!??! [14:08:26] milimetric ,before the demo, you mean today? [14:08:36] yep [14:09:13] hey drdee that's a very cool dashboard [14:09:20] what is it :) [14:09:46] is it real and pointing to our cluster? [14:09:50] it's WMF"s standard monitoring infrastructure for all our boxes [14:09:59] that view is on the current 7 Cisco's [14:10:10] so yes this is baby kraken [14:23:19] mooooooorning ottomata [14:23:22] morning! [14:24:37] quick question, it seems that analytics1001 is not currently used for MR jobs, any specific reason? [14:26:24] whatcha mean [14:26:26] ? [14:26:46] as in like datanode and nodemanager are not running there? [14:26:59] because its the namenode, and cdh4 recommends keeping namenode and datanodes separate [14:27:15] yes, look at http://analytics1001.wikimedia.org:8088/cluster/nodes [14:27:31] but then we are underutilizing a very beefy machine [14:28:35] we might wanna move all the stuff on an01 to another box, or not? [14:30:43] yeah probably so [14:30:48] when we set this up we just didn't have any other machines [14:31:27] i wonder if we could do failover testing when we do that! [14:32:42] aight [14:33:01] so, what's up with webstatscollector? [14:33:26] average_drifter and i wrote a new version that we could deploy [14:33:39] ideally side-by-side with the old version to make sure it works [14:33:48] ok [14:33:58] where does it run? i don't really know much about this guy [14:34:06] oh this is a udp2log thing right/ [14:34:07] ? [14:34:09] which guy? [14:34:14] i think on locke [14:34:15] webstatscollector [14:34:19] :D [14:34:20] locke [14:34:23] it runs [14:34:35] and it just consumes a udp2log stream [14:34:38] aye [14:34:43] without the PIPE command [14:34:52] so it's unsampled [14:35:04] webstatscollector consists of two components: [14:35:22] a) daemon that stores data in berekely db and writes db to a file every hour [14:35:40] b) filter that does the filtering, we deprecated filter and moved the functionality to udp-filter [14:35:41] # domas' stuff. [14:35:41] # (This looks like a bunch of C to filter for mobile pages [14:35:41] # and output things by language.) [14:35:41] pipe 1 /a/webstats/bin/filter | log2udp -h 127.0.0.1 -p 3815 [14:35:42] ? [14:35:50] yes that's it [14:36:07] so that's 'filter' and we are deprecating that part in the new version [14:36:12] ok [14:36:17] we will use udp-filter [14:36:26] but there is also a daemon running called collector [14:36:58] yeah [14:36:59] i see that [14:37:06] how is it launched? [14:37:10] let's wait for average_drifter to show up [14:37:13] ok [14:37:17] the daemon? [14:37:21] yeah [14:37:30] i think: './collector' [14:37:36] yeah but by who? [14:37:41] its running on locke [14:37:42] no init script though [14:37:49] but we provided some command line parameters to customize the port it's running on [14:37:52] and I don't see it in puppet [14:37:56] i have no clue [14:38:04] it could be that it's not yet puppetized [14:38:10] probably not [14:38:32] we will give you a shiny new debian package [14:38:42] how's that? [14:38:46] coool [14:40:52] brb need some juicy coffee, my coffee shop got broken in yesterday, the entire front door was kicked in [14:54:29] milimetric: node.js has been installed on the jenkins ci server [14:54:40] ah, cool [14:55:16] so we don't need travis ci then maybe [14:55:44] but I am not sure we'll be able to get to unit testing until after December. It'd be awesome if we could [14:58:12] true, it's not high urgency for sure but just wanted to let you know [15:04:27] ottomata, is it okay if i start fiddling with the FairScheduler for Hadoop today? [15:04:32] sure [15:04:59] i was thinking of two queues: one for core jobs and one for regular users [15:05:52] sounds good, i don't know that much about fair scheduler yet [15:05:59] only what it is for [15:06:08] that makes two of us :D [15:06:24] started looking at it the other day, but stopped because I wanted to have a use to test it with [15:06:30] i could easily just set the parameters you had mentioned [15:06:36] but it'd be nice if we could compare before and after [15:07:02] compare what? [15:07:10] right now it's First In First Out [15:07:20] for jobs that are submitted [15:07:48] with FairScheduler you can basically manage the priorities of different jobs [15:08:22] i'd like to see it work, so [15:08:24] so you need at least to submit multiple jobs at the same time [15:08:25] for example [15:08:27] right [15:08:35] so we submit a buncha jobs, (of two types) [15:08:41] ok [15:08:41] and see that the ones last in line have to wait for the first ones [15:08:44] then, we change [15:08:48] and see them execute in parallel [15:08:50] ok [15:08:58] first coffee then fiddling ;) [15:09:01] k [15:23:26] oh hi [15:58:12] hey average_drifter [15:58:26] can you create a debian package for webstatscollector [15:58:26] ? [16:15:30] ottomata, are you editing mapred-site.xml? [16:15:46] nope [16:15:52] is puppet running? [16:16:08] ah yes! [16:16:08] grrr [16:16:10] probably a cronjob [16:16:13] i made changes but nano says " File was modified since you opened it, continue saving ? " [16:16:41] ok, stopped puppet again and commented out a cronjob [16:16:48] that restarted puppet [16:16:49] ty [16:20:37] ottomata, can i restart hadoop? [16:20:44] sure [16:39:26] morning all [16:39:44] milimetric: so i didn't write much code last night, but i *did* read quite a bit of code [16:39:51] dude [16:40:00] I awoke with like awesome power [16:40:05] haha [16:40:07] sleep does that :) [16:40:08] my brain was dead [16:40:12] DEAD man [16:40:12] :) [16:40:21] I'm writing this thing the way knockout was meant to be used [16:40:36] anything for me to pull? [16:40:38] I just had to go google.com -> "knockout async" and refresh my mind a bit [16:41:06] um still working on it but if you pull my fork and look at /models/Graph.co you'll see that part at least working now [16:41:15] k. [16:41:17] I'm working on simplifying all the children [16:41:20] like, it still does nothing now [16:41:21] i wrote some useful things, i think. [16:41:26] but it's not dead [16:41:30] cool [16:41:45] I mean I still think we abandon this 'cause I failed but it's definitely useful [16:41:48] for later [16:42:02] well [16:43:38] i created a new branch, and did some refactoring. [16:43:54] i mostly got rid of `www` and made a `views` directory for the server views. [16:44:21] i also set it up so the templates could still be written in jade but work in knockout [16:44:49] i thought about using that template loader thing, but decided against it. [16:45:18] my father once told me that the most accurate metaphor for the heart and mind are an elephant and a donkey tied to the same chariot, you being in the chariot [16:45:30] our hearts are definitely strong :) [16:45:36] poor donkey [16:46:00] haha [16:46:21] seriously though, we have to make a cool headed decision later today [16:46:39] last 100 yards, lets sprint so we have all the data we need [16:46:47] did you pull my changes? [16:46:50] for the server stuff? [16:47:04] i'm afraid to because i want to be isolated a little longer while I fight with this [16:47:09] then I will [16:47:10] mk [16:47:15] i just get conflicts when i pull. [16:47:24] oh that sucks [16:47:44] was it working with just your stuff? [16:47:55] sorta. [16:47:59] the server was working. [16:48:19] the response from /dashboards.json was different in master than you were expecting [16:48:29] so i made some changes to the stuff that handled responses [16:49:51] because yours just loads files [16:49:54] which isn't quite right [16:51:18] i'll fix it again, but please pull this time [17:12:13] packages ready [17:12:15] on build1 [17:12:19] /home/spetrea/wikistats [17:12:24] udp-filters_0.3.23_amd64.deb [17:12:28] webstatscollector_0.2.34_amd64.deb [17:12:44] ottomata: do you have some time so we can deploy these ? [17:15:21] yeah, let's do after standup [17:16:30] ok [17:24:39] battery dying, back on in a bit [17:30:59] average_drifter: just sent you your recommendation letter, let me know if all is good [17:52:48] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:01:07] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:01:12] ottomata ^^ [18:32:22] hey dudes, webstast collector time? [18:36:16] yooooo [18:39:18] average_drifter ^^ [18:41:02] hey hey ready [18:41:15] ok, so [18:41:22] this what i recommend: [18:41:23] let's do webstats collector first [18:41:29] i can add that deb to wikimedia apt [18:41:31] should I do that? [18:41:33] keep the current stuff running on locke [18:41:41] i think so [18:42:03] then setup udp-filter not to filter out bots (stefan knows how to do this) [18:42:47] yes, there's a switch [18:42:54] let's run it for 48 hours and compare the old and new files on monday [18:43:16] -B is the switch for bot detection [18:44:36] -B turns it on or off? [18:44:47] on [18:45:29] average_drifter, how do you set udp-filter in webstatscollector mode? [18:45:47] (there is a switch for that as well) [18:47:21] the switch is -t [18:47:38] so udp-filter has a switch called -t which does exactly what the filter in webstatscollector does [18:47:39] okay so just run udp-filter -t [18:48:26] for filter, we still need collector? [18:48:42] wait, is there a reason to use the .deb then? if collector is already installed and running? [18:48:48] we only have to change the udp2log filter then, right? [18:49:41] so webstatscollector provides a collector (and filter, but we moved the functionality for the filter in udp-filters) [18:50:04] the new collector is only required if we enable bots filtering [18:50:06] has collector changed at all in the .deb? [18:50:08] hm [18:50:11] ok [18:50:15] which we are not doing atm [18:50:21] so i guess the old collector is fine [18:50:37] so let's just do udp-filter for now, and do webstats collector if/when we have to [18:50:42] but then you have to run it on a box not being locke [18:51:03] because old collector uses a hardcoded port [18:52:30] but, filter is already running as a udp2log filter [18:52:34] in the udp2log config file [18:52:40] so if we don't do anything to collector at all [18:52:48] and just change the line in udp2log-locke config [18:52:55] it shoudl be the same, right? [18:53:01] pipe 1 /a/webstats/bin/filter | log2udp -h 127.0.0.1 -p 3815 [18:53:03] changes to [18:53:07] right but then we cannot compare the results [18:53:09] pipe 1 udp-filter -t [18:53:11] or add a line [18:53:13] or whatever [18:53:17] right? [18:53:47] is that in order to multiplex the output ? [18:53:59] bwer? [18:54:04] syntax looks good but i would be more relaxed if we would spin this on a different box [18:54:09] we can do that [18:54:18] especially with udp2log multicast stream [18:54:23] we can try this on stat1 even [18:54:25] kool [18:54:33] wait, what does collector do? [18:54:35] i'm so confused [18:54:41] does it send to udp2log? [18:54:48] no [18:54:48] on locke? [18:54:57] collector listens on a port and receives data specifically crafter for it from udp-filter [18:55:01] udp2log -> filter -> collector [18:55:03] ohhhhhhhh [18:55:04] ok [18:55:04] *crafted [18:55:08] cool, oh [18:55:08] oh ok [18:55:15] and now udp-filter can send to collector? [18:55:23] uhmmmmm [18:55:34] use this | log2udp -h 127.0.0.1 -p 3815 [18:55:36] sorry [18:55:46] udp2log > filter > log2 udp > collector [18:56:00] ok [18:56:02] and we are changing to [18:56:03] and filter is now udp-filter [18:56:09] ok [18:56:17] so yeah, we can run this elsewhere, if we have collector installed [18:56:22] check [18:56:24] is collector on stat1 already? [18:56:28] don't' think so [18:56:38] i'm just going to copy the /a/webstats dir over there [18:56:39] afaik it's only installed on locke [18:56:39] and do it that way [18:57:13] sounds good [18:58:41] back in a hour or so [18:59:21] I'll be looking at the editor configuration for wikistats in the meantime [19:13:21] ung, average_drifter, i can't run webstats on stat1 [19:13:24] different versions of libdb [19:14:26] ottomata: I built it on build1 [19:14:41] ottomata: can you install libdb with the same version from build1 ? [19:14:50] no, bulid1 is lucid, stat1 is precise [19:14:54] oh [19:14:55] libdb is at 4.8 or somethign now [19:15:00] then should we build it on stat1 ? [19:15:14] or [19:15:17] maybe you can change dep [19:15:18] >= [19:15:18] ? [19:15:28] alright, give me a version and I'll change dep [19:15:28] :) [19:15:44] or I can check on stat1 [19:15:48] to see what the version is of libdb [19:15:54] oh hmmmmmm, but in this case the package name contains the version [19:16:06] libdb4.6 vs. libdb4.8 [19:16:12] or libdb5.1 [19:16:41] hm, [19:16:45] ungh [19:17:06] wait [19:17:10] do we really need to run this at the same time? [19:17:14] why don't we just take a sampled file [19:17:18] and run it through old filter [19:17:21] and then take the same file [19:17:26] and run it through udp-filter [19:17:29] and compare? [19:18:53] we can try that, but can I update the libdb also [19:18:55] so it fits [19:19:08] so on stat1 [19:19:08] p libdb4.8:i386 - Berkeley v4.8 Database Libraries [runtime] [19:19:23] oh that is avail on build1? [19:19:24] cool [19:19:26] that should work then [19:19:52] ah yeah [19:19:55] 4.8 is on build1 [19:19:58] link against that and it should be fine [19:20:03] 4.8 on stat1 [19:20:06] but it's not installed [19:20:20] it says "p" , so I think it should say "i" if it was installed (libdb 4.8 I mean) [19:20:34] i A libdb4.8 - Berkeley v4.8 Database Libraries [runtime] [19:20:38] p libdb4.8:i386 - Berkeley v4.8 Database Libraries [runtime] [19:20:42] yeah, its also on stat 1? [19:20:42] yeah it is [19:20:43] i libdb4.8 [19:20:43] p libdb4.8-dev - Berkeley v4.8 Database Libraries [development] [19:20:47] p libdb4.8-dev:i386 - Berkeley v4.8 Database Libraries [development] [19:20:54] p libdb4o-cil-dev - native OODBMS for CLI - development files [19:20:58] oh wait [19:21:00] i libdb4.8 [19:21:01] it is installed [19:21:03] so stat1 has libdb 4.8 [19:21:09] ja [19:21:11] so does build1 [19:21:17] so if you build on build1 and link against 4.8 [19:21:19] should be fine [19:21:19] right? [19:21:26] yes [19:21:31] k cool [19:22:33] also, FYI, we need a precise and a lucid version of udp-filter [19:22:35] .dev [19:22:37] .deb* [19:23:15] so you should build one on build2 as well, and name the package with the dist [19:23:18] for example [19:23:19] right now we have [19:23:33] udp-filter_0.2.6-1~lucid_amd64.deb [19:23:33] and [19:23:33] udp-filter_0.2.6-1~precise_amd64.deb [19:24:09] here are the instructions I have to follow to put a package in our apt repo: [19:24:10] http://wikitech.wikimedia.org/view/Reprepro#Importing_packages [19:24:15] (oh maybe you can't read those) [19:25:04] For distribution, use the distribution that the package has been compiled for, and under. Usually, any given compiled package should be for one distribution only, e.g. hardy-wikimedia OR lucid-wikimedia. This should match the field in the package's Changelog. [19:28:09] reading [19:30:07] so, for these [19:30:17] i left of the '-wikimedia' bit when I built them before [19:30:24] and just named them after their ubuntu dist name [19:30:30] so lucid and precise [19:30:42] the apt repository contains these files: [19:30:54] root@brewster:/srv/wikimedia/pool/main/u/udp-filter# ls -l [19:30:54] total 1448 [19:30:54] -rw-r--r-- 1 root root 21692 2012-09-06 19:18 udp-filter_0.2.6-1~lucid_amd64.deb [19:30:54] -rw-r--r-- 1 root root 1363 2012-09-06 19:18 udp-filter_0.2.6-1~lucid.dsc [19:30:54] -rw-r--r-- 1 root root 706034 2012-09-06 19:18 udp-filter_0.2.6-1~lucid.tar.gz [19:30:54] -rw-r--r-- 1 root root 21922 2012-09-06 19:18 udp-filter_0.2.6-1~precise_amd64.deb [19:30:54] -rw-r--r-- 1 root root 1414 2012-09-06 19:18 udp-filter_0.2.6-1~precise.dsc [19:30:55] -rw-r--r-- 1 root root 713991 2012-09-06 19:18 udp-filter_0.2.6-1~precise.tar.gz [20:13:05] be back in 1h [20:29:51] ottoman, drdee: have either of you constructed the appropriate pig LOAD call for importing squid logs? [20:30:01] ottomata ^ [20:30:46] i don't mind doing it myself, but I figured I would check, cause I recall hearing that you had andrew, but I didn't see it in the kraken/src/main/pig scripts [20:31:16] nvm [20:31:18] found it: log_fields = LOAD '$INPUT' USING PigStorage(' ') AS (hostname:chararray, udplog_sequence:chararray, timestamp:chararray, request_time:chararray, remote_addr:chararray, http_status:chararray, bytes_sent:chararray, request_method:chararray, uri:chararray, proxy_host:chararray, content_type:chararray, referer:chararray, x_forwarded_for:chararray, user_agent); [20:32:17] yes [20:32:23] ah gool [20:32:24] yeah [20:59:04] i take it we're skipping demo day today? [20:59:44] i mean, i'm okay with this [20:59:48] i have tons to do, so does dan [21:00:53] okay, officially over :) [21:01:29] demo? [21:01:37] oh [21:01:39] bye demo [21:01:43] yeah, unclear [21:01:51] seems like drdee has to be here to make it happen [21:03:41] no big. we can always do it whenever. [21:04:01] but ottomata is afk as well, and we just have tons to do in limn-land [21:04:05] so i'm fine with skipping otu [21:04:57] i'm here! [21:05:38] no worries :) [21:05:39] aw dschoon, invisible SVG elements again! [21:05:45] I have it almost there! [21:05:46] :) [21:05:47] stop using templates! [21:06:15] here is my demo [21:06:16] http://cl.ly/image/0o3G0r1K1O1L [21:06:26] haha [21:06:29] <3 iterm2 [21:06:36] that's 12 dells [21:06:47] how many do we have total again? [21:06:48] an11 is installed, it worked so i'm doing the other 11 in parallel [21:06:51] 27 [21:06:51] 11 720s? [21:06:53] uh, 28 [21:06:53] hm [21:06:54] oh [21:06:56] 720s [21:06:56] 12 [21:07:01] huh. i thought we had 11 [21:07:10] * robla drops out of the hangout since y'all aren't there [21:07:11] naw, an11 -an22 inclusive [21:07:17] 10 ciscos, 11 720s, 5 420s? [21:07:21] (4xx) [21:07:37] 12 [21:07:39] 720s [21:07:40] hokay. [21:07:42] 10 ciscos [21:07:46] 27 then? [21:07:48] 5 420s [21:07:52] uhhh right [21:08:00] coolio. [21:08:28] and 288T in the 720s [21:08:38] 2T * 12 disks * 12 machines [21:08:44] yup [21:08:51] word. [21:08:57] so, the first 2 disks on each machine, are raid 1 for os [21:08:58] are their specs different? [21:09:01] i forget. [21:09:07] than? each other? don't think so [21:09:10] they ahve 48G mem [21:09:17] that's a pretty huge OS partition, isn't it? [21:09:28] i guess it doesn't matter [21:09:31] no [21:09:32] since we'd have io contention [21:09:34] os partition is 30G [21:09:38] the rest is unpartitioned [21:09:40] but right [21:09:50] we can use it for something, just probably not hdfs space [21:09:53] *blinks* the disks are 2T, and the first two are OS, but they're 30G? [21:10:02] yes [21:10:21] i was gonna make it 100G, but paravoid preferred that I make a more reusable partman recipe, so we compromised at 30G [21:10:29] we can add more partitions on those disks if we need to [21:10:45] sure [21:10:51] gah! black market gas man is not calling me back! [21:13:41] so dschoon, i'm not going to configure these guys today, but we should talk about machine allocation soon, now that we have all (well, minus 2) machines available [21:13:52] absolutely. [21:14:04] we got 3 zks on the 420s [21:14:06] i think that's good [21:14:12] not today, though. i am going insane with limn. [21:14:15] yes, that seems fine for now. [21:14:19] ahhh ok ok [21:14:21] do we have them in ganglia? [21:14:27] because we need to monitor load [21:14:30] no, only the ciscos right now [21:14:38] i'll work on getting everything in the cluster in ganglia once these are all up [21:14:42] we also really need to get a JMX monitoring solution set up [21:14:43] badly [21:15:03] i never saw a reply to https://www.mediawiki.org/wiki/Analytics/Kraken/JMX_Monitoring [21:15:19] i read your doc and looked at one of the solutions, was hoping for an awesome plug and play one, but they all seemed kinda in depth [21:15:49] as in, i'd need to understand the available jmx stats and configure the monitoring software to look at what I want [21:16:07] yeah, sadly. [21:16:16] would be cool if there was one that would take a list of JMX connections and, do like jconsole but aggregated [21:16:28] yep. [21:16:39] Zabbix *almost* does that [21:16:41] but not quite. [21:16:57] which means we tried to hack on it, and god is the source awful [21:17:03] i would be game to try Jolokia [21:17:14] but i can't make any guarantees [21:17:17] ah yeah, that's the one I read about [21:17:17] yeah [21:18:46] i've used zabbix a bit before, so I have some experience there [21:18:56] and it has jmx support, but still, i'd have to manually add them all in [21:21:10] yay! [21:21:11] http://cl.ly/image/3N0o3c3E212I [21:33:35] sorry guys for running behind schedule….did i miss demo friday? [21:33:53] you did :) [21:33:54] we didn't do it [21:33:58] here is my demo [21:34:02] http://cl.ly/image/3N0o3c3E212I [21:34:24] that was totally not 1 hour brb kind of thing [21:34:37] ottomata WOOOOOOOOOOOT [21:35:20] ottomata, did things work out with webstatscollector? [21:37:05] 2 probs: webstats collector was built against a libdb lib that wasn't avail on stat1, so stefan is rebuilding [21:37:06] also [21:37:14] udp-filter needs to be built for both lucid and precise [21:37:14] k [21:37:21] before I can put it in apt repo [21:37:35] For distribution, use the distribution that the package has been compiled for, and under. Usually, any given compiled package should be for one distribution only, e.g. hardy-wikimedia OR lucid-wikimedia. This should match the field in the package's Changelog. [21:37:36] which machine(s) are still running lucid? [21:37:40] locke :) [21:37:45] arrgghh [21:37:57] pretty much anything that was installed before this year [21:38:12] locke, emery, etc. [21:38:25] are you worried that upgrading them will cause issues? [21:39:48] root@build2:/home/spetrea# aptitude install reprepro [21:39:48] The following NEW packages will be installed: libarchive12{a} libgpgme11{a} libnettle4{a} libpth20{a} reprepro [21:39:51] The following packages will be REMOVED: libpgm-5.1-0{u} [21:40:01] is it ok if I let it remove libpgm ? [21:40:16] nothing's depending on it AFAICS [21:40:37] yeah it's just a VM and we can always put it back [21:41:20] i can't login to hue myself either :) [21:41:54] hmmm me neither! [21:42:54] hmm, yes i can [21:43:04] drdee, somethign is weird with redirect, [21:43:07] log in from here [21:43:07] http://analytics1001.eqiad.wmnet:8888/ [21:43:09] erosen: have either of you constructed the appropriate pig LOAD call for importing squid logs? ==> yes [21:43:14] (or .wikimedia.org) [21:43:17] yeah [21:43:21] thanks [21:45:40] ottomata, can't login to hue either [21:45:44] i am gonna restart it [21:46:21] wait, i jsut did [21:46:29] strip off your path porition of your url [21:46:31] im in it [21:46:34] sorry [21:46:36] maybe i'm breaking it? [21:46:53] me too [21:46:57] no you are not [21:47:00] ? [21:47:01] louisdang, try again as well [21:47:09] erosen: you are not breaking hue [21:47:12] oh yeah [21:47:16] i knew that [21:47:22] for some reason I thought you were telling me I wasn't in it [21:47:27] ohhhh [21:47:30] drdee, works now [21:47:35] cool [21:47:46] don't know what was the issue [21:48:28] erosen: what's the problem? [21:48:39] i don't have an issue [21:48:46] sorry for the poor communication [21:48:55] i was just trying to give evidence that it was working [21:49:06] sorry i was referring to pig load [21:51:05] oh I was just going to go through the process of creating a working load call [21:51:10] but I figured it had been done [21:51:15] but then I found what I was looking for [21:54:50] k [21:58:06] drdee: I've a got another grunt/pig question [21:58:12] shoot [21:58:15] what is the way to run a pig script through hue? [21:58:22] I've been messing around with grunt [21:58:33] what exactly do you mean? [21:58:33] but I can't remember how to run a file [21:58:43] do you want to fiddle with a pig console? [21:58:54] or do you want to launch a completed pig script? [21:58:55] like if I wanted to keep my commands in a script and then execute the whole script [21:58:59] the latter [21:59:02] k [21:59:05] 1 sec [21:59:13] as i understand it grunt is a pig console right? [21:59:24] das right [21:59:27] coo [21:59:46] go to oozie editor [21:59:59] click workflows [22:00:21] yeah [22:00:46] click on 'Pig' that is an example Pig job [22:00:57] so you would have to create your own oozie workflow [22:01:08] (Create button on the right of your screen) [22:01:25] i see [22:01:38] i think you can also call exec script.pig from grunt [22:01:53] yes you can [22:02:03] but its unclear how you make jars available to grunt [22:02:17] which jars? [22:02:29] i guess and udf-containing jar [22:02:40] right, that's something i should take care off [22:02:50] and make sure it's always found [22:03:04] but maybe you just define it using a path in hdfs [22:03:21] well if you create an oozie job [22:03:27] then you can specify particular jar files [22:03:32] and their locations [22:03:36] the oozie/workflow just gives you an hdfs file browser for choosing the script [22:03:51] yes [22:04:10] but you can also add 'params', 'files' and 'archives' [22:04:16] so you would need to specify archive [22:04:23] yeah [22:04:33] cool [22:04:43] well this helps [22:04:49] if you want you can share your screen and i can help [22:04:50] I'll keep playing around [22:04:54] kool [22:04:57] I'm good for now [22:04:57] thanks [22:05:01] perfect [22:05:12] ok guys, i gotta go pay too much money for gas [22:05:13] :/ [22:05:20] my dad told me I shoulda filled up before the hurricane [22:05:21] what's the rate? [22:05:23] and I did not listen [22:05:42] $14 / gal [22:05:49] ????!??!?!?!?!???!!??!?!? [22:06:00] either that or wait 4 hours and maybe get gas [22:06:15] * drdee 's eyes are rolling [22:06:29] that sucks monkeyballs [22:06:40] all right have a good weekend! [22:06:56] mmmk laters all! [22:07:00] laterz [23:18:41] louisdang and drdee: have either of you successfully run pig scripts through ooze? [23:18:44] oozie* [23:19:01] I haven't run pig scripts through oozie [23:19:16] trying to run count.pig, but to no avail [23:19:19] cool [23:19:34] as far as you know count.pig works correctly, though? [23:19:34] ok let me try [23:19:49] i have a workflow up there already called zero reports [23:19:56] though you're welcome to try it from scratch [23:19:57] I've ran it on my local machine on test data and drdee ran it on kraken as far as I know [23:20:04] cool [23:21:21] i have [23:21:47] i'll have a look as well [23:22:09] thanks [23:22:29] drdee: is the workflow you made still on ooze? [23:23:13] nah, i don't have a working pig example yet [23:23:21] let me look at your stuff [23:23:22] autocorrect doesn't like oozie [23:23:31] it is running now [23:23:37] but it has been running for like an hour [23:23:45] and it is a single digi-malaysia file [23:25:27] mmmmm not sure if it's actually running [23:25:35] yeha [23:25:39] i suspect as much [23:26:14] http://analytics1001.eqiad.wmnet:8088/cluster/apps/ACCEPTED [23:26:25] so it's accepted but not yet running AFAICT [23:26:44] one thing i noticed, you should not specify a single file as input [23:26:53] just give the entire input folder [23:27:14] cool [23:27:20] I was looking for documentation about that [23:27:33] it will read everything by default [23:27:38] cool [23:27:55] btw, I couldn't view that link [23:28:04] not sure why there are 5 apps accepted but not running [23:28:13] in the link, replace eqiad.wmnet with wikimedia.org [23:28:19] did that [23:28:24] still didn't work [23:28:24] still doesn't work? [23:28:41] take it back [23:28:45] ok [23:28:46] i deleted the port too [23:28:54] works with the port [23:29:04] so you've got 3 jobs [23:29:36] yeah [23:29:39] not intentional [23:29:54] I think some of them are from grunt sessions which hung [23:30:13] or one of them, maybe [23:30:49] mmmmmmm i am digging through some logs to see what's going on [23:30:55] cool [23:30:58] no rush [23:31:10] how did you kill your jobs? [23:31:12] I'm probably going to take a break from pig stuff anyway [23:31:19] i didn't [23:31:27] you just killed pig? [23:31:29] i just opened up a new grunt window .... [23:31:34] hehe [23:31:38] :D [23:31:49] i wasn't sure it was a job hanging, I thought it was just grunt freezing [23:31:53] k [23:32:40] i should probably kill Job6326735994145031411.jar and PigLatin:DefaultJobName [23:32:46] what is the best interface for that? [23:33:06] don't know yet, i do it in the shell [23:33:19] i see [23:33:36] brb [23:33:50] one pain point I've noticed so far is that a lot of the documentation is written assuming you've just started up a grunt session from a shell [23:34:03] it's not clear how much that matters [23:34:06] yeah [23:34:19] i would for now recommend to use the pig shell and run your jobs in there [23:34:19] but when I was looking into referencing an external pig script it seemed like it might matter [23:34:31] then once you know it runs move it to oozie [23:34:37] yeah [23:34:37] the feedback loop is tighter [23:34:40] i agree [23:34:43] I had tried that first [23:35:02] but grunt couldn't find my scripts [23:35:05] which was weird [23:36:31] but grunt was probably expecting your files on the local filesystem and not on hdfs [23:36:38] (or the other way around :D ) [23:37:00] hmm [23:37:51] it seemed like it was operating on hdfs [23:38:24] my pig script that is showed you, the jars were all on the local filesystem [23:38:39] hmm [23:38:45] local meaning stat1001? [23:38:52] or an10001 [23:39:41] also fwiw, i was trying to reference a pig script, not a jar [23:39:52] though I suspect they work the same way [23:41:20] local is an1001