[13:56:51] morning guys [13:56:58] AND HAPPY NEW YEAR!!!!!! [14:23:23] goooooooooooooooood morning ottomata and the best wishes for 2013!!!!! [14:23:29] mooorning! [14:23:33] danke and back atcha [14:24:27] have all the kafka issues been resolved now? (as David's email implied) [14:25:00] this might be good report to read https://issues.apache.org/jira/browse/KAFKA-180 (mentioned to me by ori-l) [14:35:44] ottomata, i am also pretty sure that you are the kind of guy that went to times square ;) [14:35:58] ha [14:36:00] No way man [14:36:06] i'm in VA anyway, I just hung with fam [14:36:18] couple of younger cousins, aunts and uncles [14:36:33] scattegories, etc :) [14:37:07] also, as far as I can tell, all the kafka hadoop imports have been good since we looked at it last week [16:13:34] hey drdee [16:13:56] if I want to compile a new kraken.jar, what's the best way to do so? [16:30:46] i think mvn compile should do it, but i'm not sure [16:31:30] and it is configured to put the jar in target [16:40:18] ottomata, what did you push to kraken? [16:40:54] i will add a build script that does compiling of the jar, generating of javadocs and pushing them to stat1001 [16:41:15] i got it, mostly [16:41:23] i want to add a way to get continent code [16:41:30] haven't pushed yet, but i'm trying it now [16:41:43] i added a 6th field to the tuple returned by GeoIPLookup [16:41:49] maven should work, i put a lot of work in it [16:42:55] the command is mvn compile IIRC [16:45:19] yeah, i think i got it, mvn compile and mvn package [16:45:21] runs tests too :) [16:45:33] getting this right now in pig: [16:45:34] Could not resolve org.wikimedia.analytics.kraken.pig.GeoIPLookup using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] [16:45:34] hm [16:45:41] even though I did register kraken.jar [16:46:05] oh [16:46:07] maybe capital P :p [16:46:29] yup, doh, that was it [16:50:11] :D [16:50:37] it should also run FindBugs, and do the Javadocs, so probably best to have a simple build script [16:50:50] i think the other command was mvn site [16:51:07] that will deploy the javadocs and find bugsreport [17:15:43] ottomata, would this be a good day to do the log format change? [17:16:03] at the very least the x_carrier http header? [17:16:33] sure! except I can't do it myself [17:16:49] who do we need? [17:16:59] notpeter? [17:20:55] maybe, or asher [17:21:05] ok [17:21:22] or paravoid? [17:21:30] he is on rt duty [17:22:17] maybe [17:22:23] haven't work with him on that stuff before, but he could probably do it [17:22:29] i'll ask hold on [17:35:04] drdee, have you had problems accessing files from pig before? I keep getting this, even though my files exist and shoudl be readable: [17:35:09] Message: java.io.FileNotFoundException: File does not exist: null [17:35:13] Input(s): [17:35:13] Failed to read data from "hdfs://analytics1010.eqiad.wmnet/user/otto/blog1" [17:36:34] which function? [17:36:52] i assume that's the LOAD function [17:38:43] I get it when I run your blog.pig script even [17:38:48] Failed to read data from "/wmf/raw/webrequest/webrequest-blog/*" [17:38:51] does that happen to you? [17:38:53] no [17:38:57] this is new to me [17:39:06] right, but maybe something has changed, does it happen to you now? [17:39:14] maybe an issue related to reading the geoip db [17:39:17] talking to robla, [17:39:21] pko [17:39:22] ok [17:39:22] i'll check soon [17:39:35] naw, not related to that, it says it can't read the input files [17:40:39] hm, nope, even when I ran it as you it gave me the same error [17:40:40] weird [18:01:46] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90?authuser=1 [18:01:48] drdee [18:01:58] still talking with robla [18:02:00] coming soon [18:13:51] ottomata: no more zero imports, but the udp2log.log is still full of 'Pipe restarted' [18:14:32] hm ok, betcha we're dropping a few lines then [18:14:57] they're not timestamped, so it's difficult to tell when they happened, but if the file's mtime is anything to go by, they're still happening (last line of udp2log.log in analytics04, for example, w/mtime of four hours ago) [18:15:26] yeah, definitely four hours ago: [2013-01-02 16:54:54 UTC+0000] Packet rate: 35.066 k/s [18:20:03] ottomata: also, FYI: https://gerrit.wikimedia.org/r/#/c/41942/ [18:20:17] i made the change for the event stream going into kraken too [18:22:15] hm, ok that's cool, thanks! [18:28:40] brb coffee [18:28:56] fyi, dschoon, i just fixed it so that the default hue group now has access to the relevant applicaitons in hue [18:29:03] so dartar should be able to get to oozie etc. [18:29:03] yay [18:29:05] that's good. [18:29:09] brb a moment. [18:29:11] ok [18:29:24] drdee, do we need the no-data-nda group anyore? [18:29:24] anymore? [18:29:40] i think we still need it [18:29:45] for what? [18:29:51] it doesn't do anything with file permis [18:29:52] permissions [18:30:47] aiigh [18:30:52] i vote kill it [18:31:04] ottomata, do you want me to add everything on my followup lists to your asana todos? [18:31:17] not sure what's on those, but ok [18:31:23] if there is something not relevant we can talk about it [18:31:34] today I was goign to work with paravoid to deploy log format changes for zero people [18:31:48] things like "follow up on an07" [18:32:08] ah ok [18:32:09] not all of it is urgent [18:32:14] almost none [18:32:16] but it's mostly for you [18:32:18] i've got an07 on my todos anyway [18:32:21] but i'll bump that ticket [18:32:21] aiight [18:32:30] i just tend to poke you during scrum about them [18:32:36] ottomata: re: https://gerrit.wikimedia.org/r/#/c/41942/ -- can you +1 it to document for asher that the change is ok by you? [18:32:37] because i have them as recurring tasks [18:33:44] ori-l, done [18:33:48] thanks! [19:03:45] this is kind of ridiculous [19:04:00] to update the maintainers page, i have had to create a template that has no fewer than three templated IF statements [19:08:18] dschoon: are you surprised? [19:08:36] is there a polite answer to that question? [19:11:31] it was more a rhetorical question ;) [19:13:00] :) [19:16:15] erosen: what's your preferred mediawiki username? [19:16:19] https://www.mediawiki.org/wiki/User:Evan_(WMF) ? [19:19:37] dschoon: Evan_(WMF) is my wmf SUL account [19:20:03] is that what you use for editing MW? [19:20:05] .org [19:22:18] erosen: ^^ [19:22:30] basically, I'm going to add our projects to https://www.mediawiki.org/wiki/Developers/Maintainers [19:22:31] i don't think I actually have a mediawiki account [19:22:34] which is a small problem [19:22:35] and i want to know what username you want listed [19:22:41] yeah [19:22:41] i suspected you were working on that [19:22:53] same question for average_drifter [19:22:55] i didn't realize you meant the mediawiki.org wiki [19:24:53] dschoon: i'm just going to go with erosen [19:25:02] I am creating one now [19:25:07] but I can't solve the captcha, hehe [19:25:25] lol [19:25:31] what about Evan (WMF)? [19:25:37] finally got the captha, but erosen is to similar to E.Rosen [19:25:38] why not use that? [19:25:49] I hate usernames with non alphnumeric [19:26:00] but i may have to [19:26:05] because E.Rosen already exists [19:26:08] dschoon, do you need the public hdfs thing to be accessible outside of the office and without the http auth? [19:26:26] Hmm. [19:26:30] I think that would be best... [19:26:46] Since the goal is to publish data / visualizations [19:26:48] we need to be very sure we only expose aggregate datasets if we do that [19:26:49] much more annoying :p [19:26:52] sorry :( [19:26:57] drdee, of course. [19:27:02] yeah, drdee, this is a separate directory [19:27:04] it'll be in a special directory, /public [19:28:42] dschoon: quick question about mediawiki.org usernames: is the (WMF) suffix a convention on mediawiki? [19:28:50] no idea. [19:30:47] ok [19:30:52] just use Evan (WMF [19:30:52] ) [19:31:09] it looks like mediawiki is part of the unified login world, so the same account already exists [19:35:15] dschoon: okay, one last change, sorry about the hassle: I'm gonna go with my personal account: erosen [19:35:22] i mean **evanrosen** [19:35:24] geez [19:35:25] ... [19:35:32] full-link to user page, pls [19:35:36] aah [19:36:01] http://en.wikipedia.org/wiki/User:Evanrosen [19:36:44] ...that's not a mediawiki.org account [19:36:51] like, don't you edit pages on mw.org? [19:37:00] i guess it doesn't matter where the link points to [19:37:22] like, when i edit https://www.mediawiki.org/wiki/Analytics [19:37:23] http://www.mediawiki.org/wiki/User:Evanrosen [19:37:26] i am https://www.mediawiki.org/wiki/User:dsc [19:37:29] k [19:37:35] (christ, heh) [19:37:41] sorry about that [19:37:46] don't ever edit mw.org pages [19:37:54] until now! [19:38:50] where do you guys keep your documentation? [19:38:59] (he says naively) [19:40:22] dschoon, do you need to be able to browse, or just read a known url? [19:40:45] just read. [19:40:45] office.wikimedia.org [19:40:48] cool [19:40:53] office isn't world-readable though [19:40:58] but that makes sense [19:48:20] dschoon, just emailed about /wmf/public being public :) [19:48:23] yay! [19:48:27] woo [20:04:18] drdee: https://gerrit.wikimedia.org/r/41979 [20:04:24] drdee: it's in progress [20:04:27] ok [20:15:39] brb lunch [20:50:13] erosen, ottomata, milimetric, drdee -- could you review https://www.mediawiki.org/wiki/Developers/Maintainers to make sure it's right, and I didn't miss anything? [20:50:35] I didn't have a MW.org username for average_drifter, so he's not listed as an owner. [20:50:46] spetrea maybe? [20:50:55] looks good to me [20:51:03] dschoon: the analytics/global-dev/reportcard is a dead repo, any ideas on how to best communicate that? [20:51:16] yep https://www.mediawiki.org/wiki/User:Spetrea [20:51:20] delete it from the list. [20:51:26] and update the gerrit description [20:51:31] does gerrit already have the ability to delete a repo? [20:51:42] no, not to my knowledge [20:51:59] because you never make mistakes using gerrit :D [20:52:05] indeed! [20:52:10] i'll update the mw.org page [20:52:17] yay [20:58:13] dschoon: you related to this guy: http://ethanschoonover.com/ ? [21:01:57] nope [21:02:12] he did solarized, right? [21:05:07] uyp [21:05:21] he uses a tiling window manager [21:05:29] I switched from wmii to i3 [21:05:30] woah [21:05:44] ethan schoonover apparently uses xmonad [21:05:53] ah yeah. [21:05:57] i remember reading about that. [21:30:42] erosen I love solarized, I think it just cured my headache [21:30:52] awesome [21:31:09] i have been on something of a solarized kick lately too [21:31:34] https://github.com/sigurdga/ls-colors-solarized [21:31:48] https://github.com/tomislav/osx-lion-terminal.app-colors-solarized [21:35:11] it doesn't have enough contrast for me [21:37:09] I JUST GOT A DINOSAUR ANALYTICS SHIRT IN THE MAIL [21:37:14] I HEART YOU GUYS [21:37:42] (and also RobH) [21:37:50] finally! [21:38:40] you have to wear it for the next 4 weeks straight [21:39:01] MAYBE I WILL [21:48:49] oh cool, it finally came [21:49:03] RobH is the awesome person that found that one [22:16:04] !log changed log formats to include Accept-Language and X-Carrier on all frontend cache servers [22:16:07] Logged the message, Master [22:16:20] ty ottomata! [22:21:50] ottomata, show me where the pig script is tomorrow. I'd love to try my hand at it [22:24:56] ottomata, my pig script is still working, i will have a look at it before milimetric goes ballistic :D [22:25:32] i haven't committed mine, will do so now... [22:25:46] on which machine is it? [22:25:48] an01? [22:26:21] found it [22:28:59] millimetric: [22:28:59] https://github.com/wmf-analytics/kraken/blob/master/src/main/pig/geocode_group_by_continent.pig [22:30:17] ottomata, not sure if we have tried using GeoIPCity.dat before [22:31:37] i had problems with city [22:31:39] sorry [22:31:40] with regular [22:31:42] why not use city? [22:31:43] we have it [22:31:46] and it has more data [22:31:58] i was able to get this script to work in local mode [22:32:32] so then it's most likely a path issue [22:33:21] can you run it? [22:33:39] reading your script now [22:35:11] i'm peacing out for the eve, i'm around just afk for a bit though [22:37:12] ottomata [22:37:14] fixed it [22:37:29] you didn't' have GeoIPCity.dat on your local fs [22:37:54] so when you run pig in the shell without Hue then all dependencies should be in local fs [22:38:05] when pig is scheduled in oozie [22:38:17] all jars and stuff need to be in hdfs in your home folder (/user/otto/( [22:38:28] yes i did [22:38:32] so your script works [22:38:36] and ran [22:38:42] output is in /user/diederik/foo2/ [22:38:54] GeoIPCity.dat is in my /user/otto dir [22:38:57] local fs not hdfs [22:39:06] local only has to be in swd [22:39:08] cwd [22:39:20] ? [22:39:23] you said hdfs [22:39:28] drdee [22:39:28] all jars and stuff need to be in hdfs in your home folder (/user/otto/( [22:39:34] so [22:39:38] if you run pig using oozie [22:39:40] in local mode, the jars have to be in your cwd [22:39:49] in hadoop mode, they have to be in your hdfs home dir [22:39:53] drdee: so when you run pig in the shell without Hue then all dependencies should be in local fs [22:40:12] that's fine, i've done that too, but not in homedir, in cwd, right? [22:40:18] also, the error i was getting said explicitly [22:40:23] it couldn't find the input file [22:40:29] check /home/otto/pig/krakensrc [22:40:35] there is no GeoIPCity.dat in that folder [22:40:47] Failed to read data from "hdfs://analytics1010.eqiad.wmnet/user/otto/blog1" [22:41:02] ah, when i was running in local i had full path [22:41:08] to /usr/share/GeoIP/GeoIPCity.dat [22:41:12] it works in local mode [22:41:15] just not in hadoop mode [22:41:16] for me [22:41:20] can you run it hadoop mode? [22:41:26] yes i did [22:41:35] and i had GeoIPCity.dat in my cwd [22:41:56] hmmmm, so you need GeoIPCity.dat in your local fs cwd when you run pig in hadoop mode? [22:42:17] alright, hm, i will try that tomorrow then [22:42:19] yes, unless you specify the full path in your pig scriptt [22:42:26] but not sure if pig supports that [22:42:34] it worked in local ode [22:42:35] mode [22:42:39] not sure what it odes in hadoop mode [22:42:48] the dat file in your cwd [22:47:05] you are saying local cwd [22:48:07] yup [22:49:10] ok, will try that [22:49:19] check http://jobs.analytics.wikimedia.org/cluster [22:49:27] you can see that it works [23:16:17] drdee: git2deblogs is fixed [23:16:25] drdee: please pull and try to run it again [23:16:26] awesome! [23:21:38] i filed two minor issues: https://github.com/wmf-analytics/debianize/issues [23:29:03] dschoon: mind confirming that the 2 new fields are added? [23:29:30] pgehres: is this about zero? [23:29:35] i'm about to jump in a meeting in 1m [23:29:38] * pgehres shrugs [23:29:47] who should I ask then :-) [23:30:43] * pgehres is updating the fundriaisng analytics scripts [23:32:04] er [23:32:09] is this about zero? [23:32:16] i think they are for zero [23:32:18] ottomata is the best person to ask [23:32:22] drdee might be able to help [23:32:25] but i'll be back in a few :) [23:32:26] brb [23:32:28] kk [23:39:27] drdee: issue list clear [23:56:54] Isarra: it's not recommended that you use IRC under root account [23:57:05] :D [23:58:31] I don't. I just keep this ident out of spite.