[00:09:59] pgehres: there should be two extra fields [00:10:07] a language accept http header [00:10:15] so accept stuff like en-us [00:10:32] and x-carrier header, that should be 2 or 4 characters long [00:10:33] iirc [00:10:54] dschoon: this is not about zero, this applies to all frontend cache servers [00:11:02] drdee: okay, i can confirm that i am seeing the new fields [00:11:16] just trying to figure out why why regexes still work ... [00:11:20] ok [00:11:24] let me know if you have issues [00:11:36] brb in 90 minutes [00:11:36] my only issue is that I am not having issues [00:11:46] cool! [00:13:16] aha! python match is ignoring the new fields since they are at the end [00:13:17] phew [00:13:26] * pgehres is not crazy [00:15:46] so, the only issue that I see is that cp1043.wikimedia.org seems to still be on the old format [00:51:08] drdee: why is the graph Erik attached to the email so spiky ? [00:51:44] spikey ? dunno the correct way to write it [00:52:24] in any case, I would expect it to have no spikes [00:53:11] I saw this in some other graphs related to req/minute [01:39:53] http://www.regexper.com/ [14:21:34] http://stat1.wikimedia.org/spetrea/new_pageview_mobile_reports/r3/pageviews.html [14:21:39] hello [14:21:44] I have a problem with color ramps [14:21:53] if anyone has dealt with this please let me know [14:22:12] the problem is basically I need to describe graphically the delta between two months [14:22:22] so the delta is basically a percentage of increaase or decrease [14:22:58] and now the problem is that there is some code in wikistats related to the color ramp, I had a look at it, it's like 400 lines.. uhm, trying to make it simple [14:23:09] so I can understand it also.. [14:23:38] if you know some module that does color ramps and it allows me to set the spectrum (like for example I want just green,red and white in the spectrum) [14:23:45] please let me know [14:24:02] it doesn't matter what language the module is in, I can port the code [14:24:31] the link above shows the spectrum I have and the color ramp [14:26:41] in the meantime I'll try to modify the code to get the spectrum needed [15:01:21] morning [15:02:26] hey average_drifter [15:02:32] d3 does color ramps very well [15:04:34] morning [15:04:36] colorScale = d3.scale.linear().domain([-110,90]).range(['green','red','white']) would make a scale that maps that domain to green, red, and white [15:05:03] then you can go colorScale(inputVal). You can try porting the code from just that part of d3: https://github.com/mbostock/d3/wiki/Quantitative-Scales#wiki-linear [15:05:20] hi otto [15:05:40] i saw last night that your script worked but not for you? [15:06:36] still not working for me [15:06:57] i can run it as diederik, but he has the wrong kraken.jar so it doesn't work, i'm thikning maybe my new kraken.jar has a problem [15:07:30] hey milimetric [15:07:42] milimetric: I have to have a look at d3 [15:07:56] i'll help you find the code for that scale [15:08:18] https://github.com/mbostock/d3/blob/master/src/scale/linear.js [15:08:51] * average_drifter is having a look [15:10:45] so it looks like that's mostly setup code, and then the actual scaling is done by this: https://github.com/mbostock/d3/blob/master/src/scale/polylinear.js or by this: https://github.com/mbostock/d3/blob/master/src/scale/bilinear.js [15:17:23] milimetric: where is the implementation of interpolate ? [15:17:28] milimetric, since you asked, and I got no other brain bouncers here...:) [15:17:33] I have grepped through the d3 code and found multiple matches [15:17:35] so, I compiled a new kraken.jar file [15:17:51] It has a pig UDF called GeoIpLookup.java [15:17:52] average_drifter: I'm looking too [15:18:07] otto, which repo is this committed to? [15:18:07] I modified it so that it returns a 7th field for continent [15:18:11] kraken [15:18:32] https://github.com/wmf-analytics/kraken/blob/master/src/main/java/org/wikimedia/analytics/kraken/pig/GeoIpLookup.java [15:18:43] pig has a local and mapred mode [15:18:48] local mode works with files on your local filesystem [15:18:52] https://github.com/mbostock/d3/blob/master/src/core/interpolate.js [15:18:55] mapred mode is default and works with hdfs filesystem [15:18:57] average_drifter ^^ [15:19:04] milimetric: thanks ! [15:19:22] otto, gotcha [15:19:44] so, I just tried running diederiks blog.pig script [15:19:53] with his version of kraken.jar (the old one) [15:19:54] and then mine [15:19:56] it works with his [15:19:58] mine gives [15:20:12] java.io.FileNotFoundException: File does not exist: null [15:20:16] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) [15:20:20] in mapred mode [15:20:39] but my new kraken.jar will work in local mode [15:20:55] hm, so it's just fetching the file from hadoop that fails [15:21:08] (note, the blog.pig script is not using the new continent field, its not even using the GeoIpLookup UDF) [15:21:18] right [15:21:33] can you run blog.pig with your new jar? [15:21:45] so it seems that my modification, or maybe my recompilation broke something [15:21:51] not in mapred mode [15:21:53] local mode is fine [15:21:56] hmmmm, maybe, hmmm [15:22:11] maybe when I compiled it didn't get the correct hadoop deps and now the hadoop stuff doesn't work [15:24:46] is it too complicated for me to build and run the pig script? [15:25:46] no, shouldn't be, it was pretty easy for me [15:25:51] checkout kraken somewhere [15:25:53] run [15:26:00] checkout on an01 probably [15:26:01] then run [15:26:07] mvn compile [15:26:09] mvn package [15:26:15] that shoudl give you a kraken.jar in target/ [15:26:20] cool, trying [15:26:32] make some working directory somewhere, and cp the things you need there: [15:26:48] so doing it locally is not going to work because of access problems, right? [15:26:58] you'll need a file to work on, but it will [15:28:20] cp kraken/target/kraken.jar ./pigdir [15:28:20] cp /home/otto/pig/krakensrc/blog0.log ./pigdir [15:28:20] cp /home/diederik/geoip-1.2.5.jar ./pigdir [15:28:20] cp /usr/share/GeoIP/GeoIP{,City}.dat ./pigdir [15:28:37] cp /home/otto/krakensrc/piggybank.jar ./pigdir [15:28:46] cd pigdir [15:28:59] hadoop fs -put GeoIP.dat [15:28:59] hadoop fs -put GeoIPCity.dat [15:32:26] morning guys [15:32:40] morning drdee [15:32:42] averag_drifter, milimetric, ottomata [15:33:02] hm so I'm not allowed to ssh into an01 [15:33:09] is kripke an ok place to try? [15:33:16] to try what? [15:33:22] naw that won't work [15:33:25] compiling kraken and running pig [15:33:26] you can't ssh into an01? [15:33:31] analytics1001.wikimedia.org [15:33:38] drdee, my problem is not missing jars or .dat files [15:33:42] my new kraken.jar doesn't work [15:33:46] ok [15:33:53] did you push your commit? [15:33:53] yeah, i get permission denied [15:34:01] i can compile kraken.jar [15:34:26] yes [15:34:29] hm [15:34:34] yeah, try it [15:34:41] maybe my key isn't on an01? [15:34:45] yeah lemme see about that [15:35:11] yup, you are not on them, lemme fix that [15:42:01] ottomata, 1 sec working on it [15:43:28] ok on an01:~/kraken_new.jar; try that one [15:48:08] drdee, same error when I use that jar [15:48:14] yes me too [15:48:26] at least i can replicate it now :) [15:48:42] are we compilling with some wrong version of hadoop deps [15:48:48] that is keeping the new .jar from working with hadoop? [15:48:51] cause it works in local mode [15:51:00] no [15:51:08] i think i know the problem [15:51:11] oh? [15:51:23] read line 321 https://github.com/wmf-analytics/kraken/blob/25ec7e8f360c1e87fc3b0b83758dbb8a03b7d94e/src/main/java/org/wikimedia/analytics/kraken/pig/GeoIpLookup.java [15:51:51] hm [15:51:55] we have no support for both ip4 and ip6 addresses [15:52:07] however the getCacheFiles() always tries to load both db's [15:52:15] let me fix this [15:52:19] hm, ok [15:52:26] and we only supply 1 db [15:52:31] so it's trying to load 'null' [15:52:45] that doesn't work obviously [15:52:59] i mean, i guess that makes sense, but this was happening before too [15:53:01] I didn't change that bit [15:53:12] and I'm testing with your blog.pig script, not my new continent one [15:53:13] i know [15:53:18] and blog.pig doesn't even use GeoIpLookup [15:53:51] but it's the getCacheFiles() that distributes the dat file(s) to all nodes [15:53:58] and that's always required by all functions [15:54:04] yea, no i mean, what you are saying seems to make sense [15:54:11] for this error, and it is worth looking into [15:54:15] but, this worked before, and not now [15:54:18] and that didn't change at all [15:54:34] unless, your .jar was made before the ip6dat cache file stuff was added or something [15:54:39] exactly [15:54:45] that's what happened [15:54:50] the old kraken.jar did not have this et [15:54:54] ah [15:54:55] hold on [15:54:58] I think I forgot to modify getCacheFile for the ip6dat support [15:55:03] yes:) [15:55:06] i am fixing it now [16:00:26] I'm mostly done with the Funnel UDF too: https://gist.github.com/4443809 [16:00:59] ottomata, fix tested and pushed [16:01:09] cool louisdang! reading it right now [16:01:31] ottomata, you can copy an:/home/diederik/kraken_new.jar that one works now [16:02:02] ok cool [16:02:55] sorry for all the hassle ;) [16:03:21] yeah sorry about that, my mistake [16:03:47] louisdang; that looks sweet, let's push it and try to come up with some unit-tests [16:03:53] ok [16:06:14] pushed [16:06:58] drdee, i'm still getting the same error [16:07:49] * drdee is puzzled [16:08:45] can you reproduce? [16:08:46] the problem is with the load statement isn't it? [16:09:02] Message: java.io.FileNotFoundException: File does not exist: null [16:09:02] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787) [16:09:09] i just ran the pig script with the new kraken_jar and it works [16:09:18] did you update your pig script to use the new kraken.jar ? [16:09:44] ok, if it works for you then I must be doing something wrong [16:09:51] i copied the new kraken.jar to my cwd [16:10:05] pig script is just doing [16:10:06] REGISTER 'kraken.jar' [16:10:52] but where is your pig script? [16:11:08] i'm currently using [16:11:17] sorry where do you run your pig script from? [16:11:22] /home/otto/pig/krakensrc/c/blog.pig [16:11:29] cwd == /home/otto/pig/krakensrc/c [16:11:45] just to be sure, I cped your kraken_new.jar as is [16:11:51] but that folder has still the old kraken.jar [16:11:53] and then modified blog.pig to register that [16:11:56] where? [16:12:06] /home/otto/pig/krakensrc/c [16:12:17] /home/otto/pig/krakensrc [16:12:19] /c [16:12:30] i'm in a new dir to try to isolate things [16:12:36] and now i'm registering kraken_new.jar anyway [16:12:45] in pig/krakensrc/c/blog.pig [16:12:50] ok [16:13:17] can i try it? [16:13:22] drdee, did you update getCountryCode too or just GeoIPLookup? [16:13:22] ja please [16:13:37] oh yeah, blog.pig uses getCountryCode, not GeoIpLookup [16:14:10] script is running from /home/otto/pig/krakensrc/c [16:14:18] are you running blog.pig? [16:14:22] no continent.pig [16:14:29] aye, I saw you edited that just now [16:14:36] i was about to try that, since it uses GeoIpLookup [16:14:42] blog.pig is what I was testing with [16:14:47] it doesn't use GeoIpLookup [16:14:47] oohhhhhhhhhh [16:15:01] i thought you were looking at the continent stuff [16:15:11] i was trying to use as few new pieces as possible [16:15:25] and we knew that blog.pig worked for you in the past, and I was gettin gthe error with it [16:15:26] hold onn [16:15:33] but I betcha you need to fix the other UDFs that load geoip files too [16:16:01] yeah, continent.pig looks like it is working for me [16:16:07] since it uses your fixed GeoIpLookup [16:16:22] jasaa, works, cool [16:16:29] blog.pig also works forme [16:18:34] hmmmm [16:18:36] ottomata, so all is good? [16:18:38] but this would need fixed too, no? [16:18:38] https://github.com/wmf-analytics/kraken/blob/master/src/main/java/org/wikimedia/analytics/kraken/pig/GetCountryCode.java#L52 [16:18:52] continent.pig works for me, and that's all i'm using right now, so all is good for me [16:18:59] but I think you should make the same fixes in your other UDFs [16:19:09] aight [16:19:11] or mabye even abstract out the Geo .dat using UDfs [16:19:16] so you don't ahve duplicated functions [16:19:40] class PigGeocoder extends EvanFunc ... [16:19:47] class GetCountryCode extends PigGeocoder [16:19:49] or something [16:20:00] class GeoIpLookup extends PigGeocoder [16:20:00] yes that's even better, thanks for the tip [16:20:16] i will open an issue [16:21:11] drdee, as soon as I get this cleaned up a bit, can you help me write an oozie job for mobile continent stuff? [16:21:24] i need to know how to do it, and we want to get something up so that milimetric can make limn automatically graph the results [16:21:27] YES! [16:21:48] but i am also having an oozie issue, with the coordinator part [16:21:54] i'm running continent pig on 2013 mobile logs now, just as a test [16:21:54] so we can work on that together [16:21:59] if that works then i'll clean up a bit then we can do that [16:22:01] let me know when you ned me [16:22:07] ok, prob in 20 or 30 mins [16:22:13] perfect [16:22:17] * drdee is getting coffee [16:22:39] milimetric, just so I can start thinking about it, what format do you need these files to be in? [16:23:56] drdee, should we put the new kraken.jar (and maybe geoip-1.2.5.jar) at /libs in hdfs? [16:24:51] ottomata, dschoon was working on the map file format and I think he tweaked it a bit [16:24:55] yes we should have one canonical place to put that stuff [16:25:15] but for the timeseries format, it's just basic csv, I'll paste an example [16:25:32] /libs already has some of those files [16:25:39] i'd rather it be /wmf/lib maybe [16:25:47] ok [16:25:48] not sure if I created /libs or if someone else did [16:25:51] or if it is in use right now [16:25:56] http://dev-reportcard.wmflabs.org/data/datafiles/rc/rc_comscore_region_uv.csv [16:25:58] how about, we make it /wmf/lib [16:26:00] i think you did [16:26:07] and I copy the new files into /wmf/lib, and just start using that [16:26:08] oozie has /user/oozie/share/libs [16:26:10] i'll leave /libs for now [16:27:01] /user/oozie/share/lib [16:27:02] not plural [16:27:04] but yeah [16:28:09] milimetric, if we have time data in the timestamp as well [16:28:11] is that ok? [16:28:13] what format shoudl that be? [16:28:23] dschoon wanted mobile continents to be per hour [16:28:39] I think it would still parse ok [16:28:48] 2008/07/01 HH:MM:SS [16:28:48] ? [16:28:48] if not I'd just fix the parsing because we need hours yea [16:28:52] yeah, that works [16:29:11] ok cool [16:29:31] hm, drdee, I'm going to add continent name here too :p [16:29:34] as an 8th field [16:29:48] country name comes back, why not continent name too [16:45:01] out to get some stuff, bb in 30m [17:18:54] back [17:19:08] welcome! [17:19:13] :D [17:32:09] ottomata, this is very cool: https://github.com/twitter/ambrose [17:32:25] can we install that :D :D [17:32:57] i think I did once…. [17:33:04] but yeah, could puppetize [17:33:07] adding it to my todo [17:33:10] k [17:40:11] http://www.slideshare.net/hortonworks/new-features-in-pig-011 [17:40:17] cool stuff is coming! [17:41:57] oo looks nice [17:42:04] is cube basically a shortcut for group by? [17:42:09] group by and count? [17:42:30] it looks like that, i have to study it a bit more in detail [17:42:58] i think it creates multidimensional group by and count [17:52:22] morning [17:52:26] morning [17:53:01] did we get all of pgehres|away's questions answered last night? [17:53:32] about the log cutover? [17:53:51] in the future, it'd probably be wise to defer such changes to the morning the next day, so everyone has a chance to react [17:57:40] dschoon, define morning :) [17:57:54] not 8pm EST [17:58:26] actually it's real close to the UTC midnight [17:58:48] which is helpful for erik [17:58:49] z [17:58:58] well, true [17:59:02] but we have a lot of consumers [17:59:11] it was kinda disconcerting to see FR all frantic [17:59:27] why were they frantic, i thought they said we could do it after the fundraiser? [17:59:38] yes, and we communicated this many times [18:00:31] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90?authuser=1 [18:25:11] drdee, ops puppet things are slow atm, what's up with your oozie thing? [18:26:14] hmm, wait weird, ok [18:26:22] the snmptrap is getting to nagios fine [18:26:32] when I run it 'Last Update' changes [18:26:33] but [18:26:37] it isn't changing the status [18:27:38] hm [18:28:18] oops, wrong chat [18:28:20] meant that for ops [19:19:57] hmmmm, drdee, or louisdang, either of you got a sec to help me with a pig script? [19:20:21] ottomata, sure [19:20:31] i'm basically trying to do this [19:20:32] http://stackoverflow.com/questions/11578815/pivoting-in-pig [19:22:12] ottomata, does the given answer work? [19:23:05] something isn't working right [19:23:15] https://gist.github.com/4446225 [19:25:41] i think the problem is that it is filtering on the original count input, rather than the current group [19:26:03] in that FOREACH, is there a way to refer to the current bag? [19:26:12] (i'm not even sure if I am using the correct terms) [19:28:14] group? [19:28:46] let me check [19:31:55] I'm not familiar with using FILTER statements in nested foreach [19:32:45] is it related to: https://issues.apache.org/jira/browse/PIG-1798? [19:33:26] hm, not sure but I don't think so [19:33:45] i think i'm just doing somethign wrong... [19:36:59] oh [19:37:22] what's the count column called in the original load statement ottomata? [19:37:55] I think it col1.value should be col1.total for example [19:38:19] (sorry, in meeting) [19:38:32] k [19:58:32] I'm pumped [19:58:44] just fixed my vimrc , damn molokai and xterm-256color [19:59:07] solarized is awesome for gvim .. but not very sure about the console.. [19:59:25] maybe an if statement in the .vimrc makes sense to select the right colorscheme depending on gvim/vim [20:15:53] louisdang [20:16:03] its called value, but it can be named whatever [20:16:07] COUNT = FOREACH GROUPED GENERATE [20:16:07] FLATTEN(group) AS (hour, continent_name), [20:16:07] COUNT_STAR($1) as value PARALLEL 1; [20:16:10] ottomata, ok [20:16:46] yeah, i'm changed the main text color to something a little brighter/darker for the teminal [20:16:54] not sure If i like it that much [20:26:54] drdee, can help, my head is in this pig thing at the moment though [20:26:56] how's oozie? [20:27:10] i am still stuck with the coordinator [20:27:19] maybe you can try it with a different user account [20:27:36] or maybe it's permission thing in Hue, i have no clue to be honest [20:33:50] louisdang: [20:33:51] https://gist.github.com/4446964 [20:33:58] what is different between these two examples? [20:34:03] drdee you can look too :) [20:34:08] looking [20:34:46] brb [20:37:06] mmmmmm and i assume that the stack overflow example works? [20:38:12] yes, [20:38:16] that gist shows the result [20:38:42] that's me running it [20:39:35] i'll add dump count and dump input in that gist [20:39:36] to make it more clear [20:39:49] got it [20:39:58] is Value a Pig keyword? [20:40:11] because in the stack overflow example it is capitalized and in your code it is not [20:45:26] erg, internet was so good here most of the day [20:46:15] drdee, no [20:46:17] it is a named field [20:47:08] ok [20:49:21] ok updated gist [20:49:21] https://gist.github.com/4446964 [20:49:27] with more of the scripts [20:50:08] you can see that grp and GROUPED_COUNT are of the same data hierarchy [20:50:28] yup [20:50:43] COUNT is also a pig function so maybe not good idea to call your variable like that [20:50:53] a key (Id in grp, hour in GROUPED_COUNT) and then a bag [20:50:53] hmmmm [20:50:55] oh hmmm [20:50:57] maybe yeah [20:50:57] ok [20:53:53] drdee, you might be right about that, it still isn't workign but it is happier atm [20:55:09] new error? [20:55:20] no, just empty results [20:55:25] ok :) [21:05:02] ungghh this should be workingnnnnnggggg [21:12:03] lemme see if cafe internet is working again [21:13:14] ja better [21:13:44]