[15:28:47] average: hey, can I help you? [15:29:07] i've got a lull at the moment, i can work on some other stuff, but i've got some time to help you with your stuff [15:31:03] milimetric: Can I add a public IP to limn1 and then move gp.wmflabs.org name over? [15:31:13] qchris: do you need a public IP/ [15:31:14] ? [15:31:17] can you just use the proxy? [15:31:38] Limn0 had one. No idea if we really need it. [15:31:56] ottomata: I am thinking of using Christian 's toolbelt [15:31:59] ottomata: How could you add a hostname if you do not have a proper IP? [15:32:07] qchris: is it ok if I ask you some stuff about compiling/using the toolbelt ? [15:32:22] average: Sure. [15:32:58] ottomata: (Isn't adding host names tied to https://wikitech.wikimedia.org/wiki/Special:NovaAddress ?) [15:34:04] average, yeah, we don't need that anymore [15:34:06] sorry [15:34:06] i mean [15:34:09] qchris [15:34:18] the ip address was before the http proxy was available [15:34:29] ottomata: ah ok. All the better :-D [15:34:39] qchris: [15:34:39] https://wikitech.wikimedia.org/wiki/Special:NovaProxy [15:34:51] Thanks! [15:34:52] ottomata: right now I'm using this to get uncompressed stuff from hadoop hdfs dfs -text [15:34:59] probably will have to remove your hostname from the address though [15:35:13] ok, average, i'm confused about your problem though [15:35:25] you are trying to test using the hcatalog serde with some data that has utf-8 in it? [15:35:27] milimetric: ping [15:36:01] ls [15:36:10] heheh [15:38:07] ottomata: ok so this is what I have right now. hcatalog is breaking on my box with the data I have on my box [15:38:19] where did the data that is on your box come from? [15:38:29] ottomata: the data I have on my box comes from an26 by running the command I wrote above ^^ [15:39:08] ottomata: I am tempted to believe that hcatalog knows UTF-8 since it hasn't had problems in production so far [15:39:34] what I'm left with is that the command I used to get a slice of data from an26, is not UTF-8 aware [15:39:49] hence, me wanting to use qchris 's toolbelt to get UTF-8 aware data [15:40:45] why -text? [15:40:55] why not just take the snappy compressed file as a whole and use it? [15:41:01] -get? [15:41:19] I could do that, but then I'd have to configure my local hadoop/hive to understand that [15:41:25] shouldn't you though? [15:41:32] if you are trying to do stuff that will work in our prod env? [15:41:49] you're right [15:41:56] also, is that hard? you just have to add the hcatalog jar (which comes with hdfs and cdh4 puppet) [15:42:08] and then create a table in the same way that we do in prod [15:42:21] ok, I'll get a snappy thing and I'll continue working out these hive/hadoop settings. This is where I'm at so far http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.1/CDH4-Installation-Guide/cdh4ig_topic_23_5.html [15:43:56] average, can you just use vagrant and puppet-cdh4? [15:44:30] I'd rather keep my local hive for speed reasons [15:44:49] I will however peek at the hive/hadoop configs in the vagrant-puppet-cdh4 to steal from them snappy-related stuff [15:45:15] oook [15:46:28] I know we have 3 environments right now: 1) Prod 2) Vagrant Puppet CDH4 3) My local hive [15:46:44] prod is very close to vagrant puppet [15:46:46] I don't mean to add complexity through 3) , however I feel that speed is important when developing [15:47:05] does it really run that much slower for you? [15:47:25] it does since it's another VM I have to run [15:47:36] another? [15:47:36] ha [15:47:42] how many are you running? [15:48:33] well now I have Chrome in a VM so every time I do standup I have to fire up a VM, that's 1-2G ram and this laptop has 4G [15:48:49] i have a few VMs I use [15:48:52] but I only run one at a time [15:49:03] vagrant suspend; cd ../othervm; vagrant up [15:49:26] then, there's another memory hog called Eclipse [15:49:32] hah [15:49:37] so it adds up [15:49:58] maybe someone should buy you a new laptop; cough cough tnegrin [15:50:49] nono, it's fine [15:55:01] average: I just had a look and kraken-toolbelt does not choke on non-ascii. Try running [15:55:07] hadoop jar /home/qchris/kraken-toolbelt-0.0.2-SNAPSHOT.jar org.wikimedia.analytics.kraken.toolbelt.Dump /wmf/data/external/webrequest_mobile/hourly/2014/02/26/06/webrequest_mobile.21.0.189203.2863735321 | head -n 717 | tail -n 1 [15:55:11] on analytics1010. [15:55:54] The part after "Media Shop" in the "user_agent" is not 7bit. [15:56:09] And it works without problems. [16:05:11] I am all for reproducible environments in which we all might run same problems and can repro each other issues, that on my experience saves time in the long run. [16:17:08] milimetric_: About the gp.wmflabs.org migration. Can I just add the hostname to limn1, or will that break things on your side? [16:18:12] qchris: i don't think that will break anything, aside from the limn0 instance of it [16:18:22] qchris: no breakage here [16:18:26] that's what I was going to do [16:18:33] Ok. I'll try it then. [16:18:35] if you can do that and kick the tires to make sure things are still working, that'd be great [16:18:39] the other hostnames are moved [16:21:51] Waiting for the TTL to pass ... [17:47:09] hey milimetric, we're setting up the room for the hangout on air - I can send a link to attendees to point them to the stream instead of the hangout. Also, do you want me to advertise this more broadly? [17:49:00] Hangout on air? Are we recording again? [17:49:08] hi DarTar [17:49:16] qchris: i'm talking to tnegrin right now [17:49:28] we know what the implications are, and trying to make a decision [17:49:48] Ok. [17:50:24] we're not to going to do it this time [17:50:38] DarTar: ^ [17:50:52] yep, no problem [17:50:56] Erik M asked to do it but I forgot the implications [17:51:18] let's discuss at the retri [17:51:18] also, setting up an account for hangout on air is a huge PITA [17:51:20] retro [18:14:22] hrm [18:14:42] anyone who knows anything about our geoiplookup; I don't suppose it's possible to get access to the actual mmdb files on stat2, is it? If so, where would I look? [18:15:29] ummmm, what are mmdb files? [18:15:30] Ironholds: The dat files are underneath /usr/share/GeoIP [18:15:31] probably? [18:15:32] yeah [18:15:35] oh max mind [18:15:37] ja [18:15:49] qchris, aha, thankee! [18:16:15] Ironholds: yw