[07:25:10] damn keyboard [16:00:07] milimetric: this is totally weird [16:00:20] labs? [16:00:20] the namenode crashes when I try to do hive things [16:00:20] yeha [16:00:33] hm [16:00:53] improper java version? [16:01:29] don't thikn so [16:01:34] dunno its just dying with no output [16:01:49] can you restart the namenode ? [16:02:05] yeah [16:02:07] i can start it and use it fine [16:02:15] and then I can start hive-server2 and hive-metastore [16:02:24] but then when I try to do something with hive, the namenode crashes [16:02:48] stacktrace ? [16:05:23] oof and these nodes are so slow [16:05:52] hm, i should make a good vagrant instance for this, eh? [16:05:54] i need a new one anyway [16:05:57] and have been meaning to :p [16:06:06] haven't done it because others hadn't needed it [16:06:09] now you guys do! [16:06:41] well I can manage over here in my local setup [16:06:51] aye [16:07:04] i want one for me anyway, that uses the cdh4 module and the same setups [16:07:10] not going to do it now [16:07:14] but maybe in a few weeks or something [16:07:30] i had a messy instance that i was using for it, but it is too messy now [16:07:33] need somethign cleaner [16:10:03] hmm you know, it isn't hive that is causing it to die [16:10:06] it just dies after a while [16:10:19] with no error messages [16:10:25] i'm running in a foreground process [16:10:26] and it just died [16:11:08] is there a way to increase log verbosity ? [16:13:14] ottomata: try to do a namenode recovery [16:13:20] ottomata: haddop namenode -recover [16:14:34] *hadoop [16:14:44] or hdfs namenode -recover [16:14:54] ottomata: read about this here http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/ [16:15:23] but maybe backup stuff first if it's possible [16:15:40] uhm, not sure if it's an fs corruption problem though [16:20:35] yeah we don't care about any of the data here [16:20:39] we just want the things to work [16:20:46] but i don't think this is a hdfs problem [16:46:10] milimetric: [16:46:14] i think your node just isn't powerful enough [16:46:21] heh [16:46:21] 1 processor, 2G mem [16:46:29] mysql can't even start with default values as is [16:46:30] ok, cool, i'll delete it [16:46:48] yeah i'd do at least 2 processors on these [16:46:59] ok [16:47:07] and maybe like 8G? [16:53:47] (PS1) Milimetric: Config files can be specified in environment [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97753 [16:53:59] (CR) Milimetric: [C: 2 V: 2] Config files can be specified in environment [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97753 (owner: Milimetric) [16:55:31] ottomata: how do I permanently set environment variables properly? [16:55:41] in /etc/profile.d ? [16:56:25] hm, no... i want these to be available to the whole system [16:56:44] i made wikimetrics read from env variables [16:56:53] but i don't know how to set them properly in labs [16:56:59] average: any idea? [16:59:00] milimetric: on ubuntu /etc/environment https://help.ubuntu.com/community/EnvironmentVariables#System-wide_environment_variables [16:59:22] huh, ok [16:59:30] i saw that but everyone says something different [16:59:33] i'll try it, thanks! [17:01:33] milimetric: I gave you a link to official Ubuntu documentation. The server you deploy on is most likely Ubuntu. There is a note in there about /etc/profile.d/ "Note: Some systems now use an envvar.sh placed in the /etc/profile.d/ directory to set system wide environment strings." [17:02:55] my machine doesn't have /etc/profile.d/envvar.sh . I'm running Ubuntu Raring 13.04 on my laptop [17:03:46] I have some other machines with Ubunt precise 12.04 LTS on them, and those don't have that envvar.sh either [17:04:30] but, the good part is you can try it out and check with printenv or just env , to see what variables are set and what values they have. Apparently you can put your vars in many different files. One of them must be right [17:04:39] milimetric: why do you want to set global env variables? [17:04:40] I told you yesterday it would be done by the end of today [17:05:19] ottomata has a point, you might want to write your settings in a configuration file instead of system-wide environment variables [17:05:36] it's hard to explain [17:05:54] but environment variables are much easier because I have no way of specifying [17:06:01] through wsgi [17:06:06] what config file to use [17:07:12] you can read your configuration from /etc/wikimetrics/config.json for example [17:07:19] that's where configuration files normally sit [17:07:25] ugh [17:07:29] wikimetrics is dead :( [17:07:39] so it doesn't find those variables in /etc/environment [17:09:39] milimetric: still not sure I understand [17:10:07] so in wikimetrics.wsgi, I can only do something like "from wikimetrics.configurables import app as application" [17:10:13] and then mod_wsgi takes it from there [17:10:50] so I have command line parameters for specifying each config file [17:10:54] like you can do this: [17:11:04] wikimetrics --mode web --db-config blah.yaml [17:11:15] but you can't do that from wikimetrics.wsgi [17:11:51] so, in the argument parser, I'm giving --db-config the default os.environ['WIKIMETRICS_DB_CONFIG'] or 'wikimetrics/config/db_config.yaml' [17:12:20] this lets it work in all the 3 ways that people want to run it [17:12:40] SO, the problem is, if I add things to /etc/environment, echo $WIKIMETRICS_DB_CONFIG seems to work [17:12:49] BUT, print os.environ does not show that variable [17:13:06] yes, because mod_wsgi might mess with your env variables... [17:13:40] oh weird... now it does [17:13:49] but yea, apparently mod_wsgi messes with the variable [17:14:07] so os.environ has my stuff if I just run python [17:14:12] but not from the wsgi process [17:14:14] wtf [17:14:20] SetEnv [17:14:27] inside your VirtualHost blocks in Apache [17:14:28] in apache? [17:14:30] k [17:18:42] * average gazes at the window while snow starts to fall [17:19:21] * yuvipanda notes that we have a uwsgi module in puppet now, and mod_wsgi is not really preferred in the python world anyway [17:19:21] yeah that's apache, and will probably work [17:20:14] yuvipanda: I didn't know that mod_wsgi is not preferred. Do you have a good source for me to read up? [17:20:20] moment [17:21:01] damn... ottomata, average, i SetEnv and it doesn't seem to be picking it up [17:21:03] wth [17:21:12] anyone wanna jump in the hangout? [17:21:28] milimetric: have a read through this first http://drumcoder.co.uk/blog/2010/nov/12/apache-environment-variables-and-mod_wsgi/ [17:21:41] k [17:21:52] i did read that, it's exactly what i did... [17:22:11] milimetric: try to isolate the problem with a bare-bone python app that sends out the env variables in the body of the page so you can see them in your browser [17:22:23] SetEnv WIKIMETRICS_DB_CONFIG "/var/lib/wikimetrics-config/db_config.yaml" [17:22:23] SetEnv WIKIMETRICS_WEB_CONFIG "/var/lib/wikimetrics-config/web_config.yaml" [17:22:23] SetEnv WIKIMETRICS_QUEUE_CONFIG "/var/lib/wikimetrics-config/queue_config.yaml" [17:22:23] WSGIDaemonProcess api user=www-data group=wikidev threads=10 python-path=/usr/lib/wikimetrics/wikimetrics [17:22:23] WSGIScriptAlias / /usr/lib/wikimetrics/wikimetrics/api.wsgi [17:22:53] maybe that WSGIDaemonProcess is messing it up [17:23:22] milimetric: put WSGIDaemonProcess before your SetEnvs [17:26:01] http://stackoverflow.com/questions/9016504/apache-setenv-not-working-as-expected-with-mod-wsgi [17:26:03] ugh [17:26:09] i ... hate this [17:27:12] halfak: hmm, so I was looking for concrete writing, and found http://blog.pythonanywhere.com/36/ and http://nichol.as/benchmark-of-python-web-servers and others [17:27:30] Thanks yuvipanda. :) [17:27:32] halfak: uwsgi has also been definitely faster / easier to setup / use in my personal experience [17:29:20] Good to know. I was using CherryPy for a while, but made the switch to Apache2 when I started supporting SSL. I'll have to check out uwsgi. [17:29:37] I do a lot of awkward things to live in Apache land. [17:30:25] halfak: heh, yeah. I don't use apache for anything - even the mw I used to run was on nginx+php-fpm [17:30:53] halfak: I'll also be adding direct uwsgi protocol support to the labs proxy soon, so that'll increase performance [17:32:32] halfak: oh, btw - I'm planning on building a tool that lets anonymous users run SQL queries against the labsdb via a simple web interface. Think it'll be useful to the research community? [17:32:45] Oh god yes. How do I help. [17:32:49] ^? [17:33:02] :D [17:33:19] I've the arch mapped out, should start building once some of my toollabs stuff is done. [17:33:52] halfak: big question in my mind is if it should be something like jsfiddle - it also stores results as a permalink you can share [17:34:03] I'd really like to auto-log queries for transparency and meta-analysis. [17:34:19] Get out of my head :P [17:34:20] indeed, jsfiddle does that :D [17:34:51] so a jsfiddle like thing, or a simpler 'REPL', where you type queries and hit enter and see a result and keep going, copy paste stuff out and you're done [17:34:51] How would you like to handle the problem of running expensive queries? [17:35:10] Yuvipanda -- Any chance you see ipython notebook fitting into this? [17:35:10] halfak: right now? hard limit them to x minutes, where x is something I can find out experimentally [17:35:24] halfak: so, I've a working ipython notebook implementation already... [17:35:25] :P [17:35:29] halfak: just haven't packaged [17:35:30] it [17:35:48] halfak also it's impossible to do ipython notebooks *anonymously*. you will need a toollabs account [17:35:58] while this tool can be made secure enough to be used just over the web [17:36:00] Good [17:36:51] I've got to run in a minute. Do you have a project page or other docs/description I could check out? [17:36:51] halfak: what would be more useful to the community at large? easier ipython notebooks or the SQL thing? [17:36:59] SQL thing. [17:37:03] deal. [17:37:11] halfak: i've them but they're on physical paper [17:37:21] halfak: i'll mail them to you when I write it up [17:37:21] Lame [17:37:21] lol [17:37:24] email I hope [17:37:26] haha [17:37:26] :D [17:37:34] postage to the US is expensive and slow :P [17:37:38] Hard to crtl-F paper. [17:37:59] yup, but I was doing it when I take my hourly break from keyboard use [17:38:03] YAY Carpel Tunnel! [17:38:14] yay! [17:38:26] my doctors sent me my vaccination records. I don't need to get stabbed. [17:38:38] yuvipanda: OK. I'm outta here, but I'll be back to bug you about this query interface later. [17:38:45] halfak: sweet [17:38:45] :) [17:38:59] halfak: happy to find someone interested :P was hard to drum up interest for the ipython thingy :D [17:57:00] (PS1) Milimetric: wsgi needs environment set up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97760 [17:57:10] (CR) Milimetric: [C: 2 V: 2] wsgi needs environment set up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97760 (owner: Milimetric) [17:58:08] yuvipanda: did you get that kinesis ? [17:58:17] average: yup! [17:58:19] helps a fair bit [17:58:26] i still take enforced timeouts [18:01:02] cool/q ottomata [18:01:04] damn [18:01:46] :D [18:02:36] heya [18:02:56] ottomata: did you find time to migrate your haproxy solution to the labs dynamic proxy? [18:05:49] no yuvipanda, i don't use it much [18:05:49] but [18:05:52] i did have a use case today [18:06:01] where I wanted to use instance proxy (for dan's cluster) [18:06:02] just for testing [18:06:12] i didn't want to configure anything because it isn't regularly used [18:06:20] so instance-proxy is kinda nice [18:06:25] because it works with ports for all instances by default [18:06:28] without having to configure anythign else [18:06:28] yeah, true [18:06:38] this is a more 'permanent' thing, plus ssl. [18:06:44] instance-proxy won't go away anytime soon :D [18:07:06] ok good :) [18:07:07] cool [18:07:07] yeah [18:07:08] awesome [18:07:36] I'll probably upgrade it at some point to be a bit faster [18:07:42] and support websockets and such [18:22:34] ottomata: how do you deal with hadoop nodes running out of RAM ? [18:22:39] like in a real full-scale setup [18:23:04] I left desktop machine with the nodes all up for like 48h , now I have to vagrant halt; vagrant up; [18:23:13] cause JVM is greedy [18:23:23] never run into problem in prod, because have lots of ram [18:23:36] aaaaah, I see [18:26:43] average: you have to tune the max number of reducers and mappers that can run and how much memory each of them can consume and make sure that the total memory consumption of the reducers and mappers is not larger than the total memory that you have available [18:28:17] drdee: hi [18:28:34] drdee: oh, I see, I'm gonna have to look in these xmls to see where I can set that [18:28:44] drdee: I have some small VMs, they have like 380mb ram each [18:28:53] yeah that's not a lot [18:29:13] the number of mappers is harder to tune as that depends on the block size of your hdfs partition [19:06:31] milimetric: you around? [19:06:38] bah! [19:06:40] yes [19:06:40] sorry [19:06:51] np [20:35:45] heya milimetric, can I get you to review some python code for me? [20:35:50] sure [20:35:53] https://gerrit.wikimedia.org/r/#/c/97830/ [20:36:00] sorry I wasn't so responsive about the env vars before [20:36:03] did you get that settled? [20:37:25] probably the thing that would do with the best review is the flatten_object method in JsonLogster [20:38:49] ok, sorry, was interrupted [20:38:57] looking now [20:38:59] k [20:44:47] I like it ottomata [20:44:58] you could probably make it shorter with reduce but it wouldn't be as clear [20:45:09] so I just looked at flatten_object [20:46:03] reduce lol. guido v. rossum said reduce was mostly redudant, since it can only be used effectively in sum(..) or product(..) [20:47:52] milimetric: there were a couple of much simpler versions I found [20:47:56] but they didn't work with lists in the object [20:48:26] yeah, I like the hasattr __iter__ check [20:48:28] that's nice [20:48:37] because it should work with everything [20:54:34] thanks milimetric [20:54:39] so ja, did your env thing get worked out? [20:54:52] yes [20:55:06] so environment variables are evil [20:55:22] but necessary to get *something* passed to wikimetrics via wsgi [20:55:46] so in wsgi itself, I just made it read from /etc/config/wikimetrics/ [20:56:07] and store it in os.environ['WIKIMETRICS_*_CONFIG'] [20:56:22] ok, that's cool [20:56:28] /etc/config is weird [20:56:34] then I just used my command line parameters to pass in config overrides for the queue (which starts via upstart) [20:56:46] yeah, we can put it wherever when you puppetize it [20:56:51] i don't know these things :) [20:57:01] i had it in /var/lib/wikimetrics-config before [20:57:14] i just randomly guess paths [20:59:04] haha [20:59:09] /etc/wikimetrics? [20:59:14] sure [20:59:20] :) [20:59:22] or well [20:59:24] even so [20:59:28] when do you wanna work on the puppet for that? [20:59:33] if you are deploying this via git deploy, we might need to talk about that more [20:59:36] yep [20:59:41] ummmmm, i have some time right now! i just kinda finished a bunch of things for the day [21:00:09] I've gotta talk to Teresa for a minute [21:00:11] halfak: re: SQL anonymous access, I just had this other idea that's probably awesomer and also much easier / faster to do. Let me know when you have time to talk for a few minutes :D [21:00:11] but sure [21:00:47] k [21:00:52] yuvipanda: Sure. What's up? [21:01:15] halfak: so I was thinking - I could just use mysqlproxy and expose that directly. [21:01:30] halfak: so anyone who wants can just directly connect from their local mysql client of choice [21:01:33] no registration needed [21:01:43] we can still limit it agressively - mysq1proxy lets you do that [21:01:49] and have logs to figure out what people are using [21:02:08] and I think it's fine securitywise too. [21:02:27] So mysqlproxy would make sure we don't get a DoS? [21:02:33] halfak: pretty much [21:02:40] Looks solid. [21:02:48] * halfak is reading up on it for the first time [21:03:04] halfak: :D [21:03:21] halfak: I hadn't thought of it either. I'm also reading up properly only now [21:05:41] halfak: there'll be folks who would find this useful, right? My only worry is that I'll pour effort into it and then it'll just sit there :) [21:06:46] I have little doubt that the public querying interface will be useful. One of the things that we do a lot when discussing data is cite claims with queries to the DB. Linking those queries to a public UI would be amazing. [21:07:13] Personally, the world changed substantially for me as a researcher when I first got access to a live copy of enwiki that it could run long queries on. [21:07:29] alrighty then! [21:07:41] halfak: I'll setup a proxy that supports just enwiki first, and then let's see where that goes? [21:07:49] Sounds good. [21:07:54] halfak: note that this will have *no* UI. [21:07:59] halfak: you connect via a normal mysql client [21:08:07] we can bolt on a web UI later if we really want. [21:08:35] Hmm.. Less useful without the web UI. [21:08:50] I'm brewing some cheap tea ! [21:09:01] But I'm happy to explore the implications with you. [21:09:20] One great example of uses for this proxy was the L2 hackathon that we had a couple of weeks ago. [21:09:40] I can guarantee some UI free use of such a proxy during the next L2 Research hackathon. [21:09:52] when is that? [21:11:06] Not scheduled yet, but probably Feb. [21:11:09] Maybe March [21:11:51] what's a L2 Research hackathon ? [21:12:05] it sounds like a rocket "L2" [21:12:56] https://meta.wikimedia.org/wiki/Research:L2 [21:13:15] halfak: ah. I expect to have a proxy up way before end of the week :D [21:13:15] L2 is short for Labs^2 ("Labs squared") [21:13:40] oh, that's a clever and sneaky name [21:13:41] The community research lab I'm trying to build on top of/beside labs. [21:14:25] milimetric: is the access to labsdb from wikimetrics puppetized? [21:14:38] no [21:14:50] we're puppetizing wikimetrics as we speak actually [21:14:54] milimetric: ah. [21:15:09] but wait, why would that part specifically need to be puppetized? [21:15:10] I heard there was a solar flare today [21:15:19] milimetric: well, I want to use that specific part :P [21:15:23] isn't it just having some things in your /etc/hosts [21:15:23] in a different project [21:15:25] oh! [21:15:33] can you point me to what needs to be done? [21:15:37] sure [21:15:41] so ottomata did it [21:15:50] yuvipanda: You should join up to #wikimedia-labs2 [21:15:53] but as far as I know, it's just adding all the labs config to /etc/hosts [21:16:05] halfak:set on autojoin :) [21:16:15] milimetric: can you pastebin it? [21:19:14] eh? i dunno how to do that :p [21:22:17] ottomata: definitely, you set up the wikimetrics instance in labs to be able to hit the labsdb hosts [21:22:30] there was one other thing that needed to happen except /etc/hosts [21:23:17] oh hm [21:23:38] i think in prod there willb e somethign else, not sure [21:25:02] yuvipanda: I emailed it to you [21:25:37] milimetric: ty! [22:06:53] alriighty i'm outty laatas [22:46:25] (PS1) Milimetric: Teresa Tso found some bugs [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97850 [22:46:26] (CR) Milimetric: [C: 2 V: 2] Teresa Tso found some bugs [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/97850 (owner: Milimetric)