[07:23:57] moritzm: gooood morning! Something very weird has happened to us yesterday, namely after a restart of the Hadoop Namenodes JVM we got a different behavior of the garbage collector.. https://grafana.wikimedia.org/dashboard/db/analytics-hadoop (first two graphs) [07:24:28] I rolled back my change (that was a no op extra Xmx value) but we are still seeing the behavior.. [07:24:52] let me have a look [07:24:54] I checked the new jdk/jre changelog but I don't find a lot [07:25:58] we are not seeing any horrible regression but the heap size went down to ~2GB and the GC started to collect Young Gen (and also Old but only once) [07:26:07] that is a normal behavior for a JVM [07:26:21] but still it is a complete different behavior [07:26:56] and I checked the max heap size, it should be more than 3GB [07:27:21] it's limited to the name nodes? [07:28:08] sounds like a problem best investigated with some tea, back in five [07:28:13] yeah.. the change that I rolled out was related to a new Xmx value for the Datanodes (you can see that they have changed in the graphs), and they were restarted [07:28:17] sure! [07:28:33] (I keep writing) [07:29:09] so basically the Cloudera distribution does something like this: you set environment variables, like HADOOP_HEAP_SIZE, and some daemons will pick them up [07:30:00] I've set HADOOP_HEAP_SIZE to 2048, and both datanodes and namenodes picked it up (-Xmx-2048m). The namenode also uses other env variables to append JVM parameters, one of them is -Xmx=4096m [07:30:17] that since it is the last, wins over the first (I checked with a simple test on analytics1001) [07:31:04] after I restarted all the JVMs I saw the weird namenode behavior and I thought that the new Xmx setting was somehow capping the Heap size to 2GB [07:31:07] so I rolled back [07:31:13] ran puppet [07:31:22] and restarted only the JVMs of the namenodes [07:31:40] (so the datanodes are running with the new config since we have a OOM issue) [07:31:54] (our Java daemons does not restart on config changes) [07:33:32] but the namenodes' JVMs are still behaving in the same way [07:33:55] that is not super weird, but it changed radically [07:38:04] I doublechecked openjdk software changes, openjdk-7 was updated to 7u95 on the 9th of May and we did restart Hadoop after that, so it's not related to an update of java per so. however, [07:38:57] I remember that Oracle made JDK-related changes in openjdk-8, maybe these got backported to openjdk-7 with the update in May and recent Cloudera config changes now make use of that [07:39:07] I'll look whether I can find what I vaguely remember [07:41:12] sure thanks! but don't waste too much time, just wanted your opinion to rule out quick path of investigations.. [07:47:34] mhh, what I had in mind only provides new options for GC oom handling in java 8 and I couldn't find that in the last java 7 update [07:47:53] this might rather be related to recent cdh changes, not sure [07:51:27] really strange, hopefully I'll come up with some plausible explanation :) [07:54:18] Hi elukey [07:54:37] o/ [07:55:42] which way did you set Xmx? docs say "Heap size specified via the HADOOP_CLIENT_OPTS -Xmx option overrides heap size specified via HADOOP_HEAPSIZE." [07:58:32] so we have set HADOOP_NAMENODE_OPS to 4096 (so -Xmx=4096) a long time ago, meanwhile this time I've set HADOOP_HEAPSIZE to 2048 (that has been rolled back for namenodes) [07:59:08] but even now on analytics1001 I can see bla bla -Xmx1000m bla bla-Xmx4096m [08:00:14] hmm, probably best to reach out to the upstream mailing list I think [08:01:47] on a totally unrelated subject, aqs needs to be update to the nodejs 4.4.6 security update at some point. it's already working fine for all of restbase, so I wouldn't expect any problems, but let's still do it in steps [08:02:15] moritzm: I'm one the go to test that :) [08:04:22] joal: ok :-) [08:10:48] joal: I can't find a good explanation to why the namenodes are behaving in this way [08:11:07] I mean, there is nothing alarming at the moment, the JVMs are behaving fine [08:11:28] regular minor GC collections and veeeery sporadic old gen ones [08:14:57] plus jconsole shows that the JVM can allocate ~3.7GB of heap size (not sure why it doesn't say 4G, but close enough( [08:31:22] Analytics, Services, cassandra, Patch-For-Review: Refactor the default cassandra monitoring into a separate class - https://phabricator.wikimedia.org/T137422#2436388 (elukey) Open>Resolved a:elukey Changes deployed successfully, thanks a lot @Nicko for all the work! [08:37:02] elukey: batcave for a braistorm? [08:37:55] sure.. I've also upgraded node on aqs100[456] if you want to test it [08:38:13] elukey: that's ll make it kindof easier :) [09:34:24] for the channel (cc moritzm): After a lot of debugging with joal we found out a possible explanation, namely that ottomata was deleting tons of files right before me restarting the first namenode JVM (how lucky I am). We went from ~15M to ~6.2M files stored (https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=25&fullscreen) [09:35:11] a restart was needed probably to let the JVM to flush out old garbage completely and start from a clean state [09:35:33] ah [09:35:51] it's still a good idea to blame Java first :-) [09:35:57] of course! [09:35:59] :P [09:37:00] joal: and Young Gen collection started exactly when the file number dropped, some minutes before the heap size drop. I think you were telling this to me earlier on but I was checking metrics and my brain can't do multi-tasking really well ;) [09:37:55] yes definitely since you were saying that there was a drop right before the major one in heap size [09:37:58] sorry :) [09:38:06] goooooood now I feel better [09:38:10] the cluster is healthy [09:38:15] rolling out again my change [09:39:03] :) [09:44:06] all right puppet change done, will not force it since the datanodes haven't been restarted and they have the previous config [10:11:38] morning joal ! [10:11:52] Morning addshore :) [10:12:03] ready to pick up where we left off? :D [10:12:09] addshore: I finally had time to investigate a bit yesterday, I have some answers for you [10:12:15] oooh [10:13:34] addshore: Let me finish my aqs tests and I'll be with you more fully [10:13:38] okay! [10:17:09] elukey: eqi aqs1004 [10:17:13] oops:) [10:19:21] elukey: All my tests have been working for aqs on node4.4.6 [10:19:52] elukey: new cluster is healthy, tests on my own are showing correct results [10:20:16] elukey: I think we are good to deploy on one of three prod nodes (with your agreement of course) [10:21:13] sure, will do it after lunch.. maybe 1004 and then tomorrow 100[56]? [10:21:37] hm, elukey, you mean 1001 and then 100[23] I guess [10:21:44] elukey: in that case, yes :D [10:23:15] addshore: So, yesterday your coordinator failed because of bad luck [10:23:21] haha! [10:23:28] addshore: you were testing at the same moment elukey was restarting nodes [10:23:37] okay! [10:23:37] joal: aaaaarrrgh yes [10:23:38] sorry [10:23:41] :D [10:23:45] brainfarting [10:23:48] it happens :) [10:24:15] now addshore, in finding that, I also found that there is another issue [10:25:03] addshore: We have not yet run any oozified spark job in dynamic allocation mode - You are paying that price [10:26:22] elukey: one confitmation please: the dynamic allocation flag set to true is defined for spark-shell and spark-submit, right? [10:27:54] addshore: I want to try to run your job with explicit dynamic allocation setting (I think that's all it needs) [10:32:48] okay! *loads to docs page* [10:33:12] joal: is that -dynamic or something? ;) [10:33:54] addshore: a bit more complex, currently trying it :) spark.dynamicAllocation.enabled=true [10:42:08] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2436731 (mobrovac) >>! In T134502#2433938, @Ottomata wrote: > If it has meaning, I'm all for keeping it. In the current Ev... [10:49:52] elukey: question [10:50:52] elukey: how the heck do we have different hive conf on analytics nodes and stat nodes ???? [10:51:19] * elukey blames ottomata [10:51:21] addshore: discovering more weird things a I move forward with your poatch ! [10:51:29] * elukey have no idea [10:51:29] elukey :D [10:51:50] elukey: I'd go for moritzm way: Always blams java first ;) [10:51:50] jokes aside, no idea.. [10:52:05] :D [10:52:05] I usually blame me first then Andrew [10:52:06] elukey: That's kind of a big issue [10:52:12] :P [10:52:44] elukey: let me explain, when spark runs in client mode, it works: it uses the hive-site file on stat machine, fine [10:53:15] But when in cluster mode, spark application master is on any node, there uses the hive-site file over there [10:53:44] elukey: on analytics1044 for instance, hive-site file still references analytics1015 as metastore :( [10:54:15] ah snap.. now I see [10:54:25] :( [10:54:48] * elukey looks into puppet [10:55:27] I hate analytics1015 [10:55:39] elukey: So do I [10:55:41] joal, path of hive-site? [10:56:27] elukey: I used both /etc/hive/conf.analytics-hadoop/hive-site.xml and /usr/lib/hive/conf/hive-site.xml [10:57:16] so there is no trace of 1015 in puppet, and the hive-site.xml.erb references a metastore_host that is defined only puppet/hieradata/eqiad/cdh/hive.yaml (corretly as analytics1003) [10:57:26] so could it be that those files are old ones? [10:57:37] currently not versioned by puppet [10:58:02] elukey: possibly so [10:58:42] yes [10:58:55] so the hive-site.xml is defined only in hive.pp [10:59:00] file { "${config_directory}/hive-site.xml": [10:59:00] content => template($hive_site_template), [10:59:00] mode => $hive_site_mode, [10:59:00] owner => 'hive', [10:59:00] group => 'hive', [10:59:03] require => Package['hive'], [10:59:05] } [10:59:08] elukey: I absolutely don't know how the hive conf got copied [10:59:23] elukey: let's wait for Andrew and see [11:02:06] * elukey nods [11:02:20] addshore: in case you didn't follow --^ [11:02:46] :D [11:49:09] hey team :] [12:03:59] Hi mforns [12:04:51] HI joal [12:04:52] :] [12:05:37] mforns: so, checkpointing [12:06:31] joal, yes.. batcave? [12:06:37] mforns: sure, OMW [12:27:11] a-team: stepping afk for ~1.5hrs, will be back soon. [12:27:42] sure elukey [12:43:59] Hi ottomata [12:44:16] hiii [12:44:44] We have questions for you when elukey gets back :) [12:44:48] ottomata: --^ [12:47:43] ok! [12:47:44] i am ready [12:50:28] joal: just a random question unrelated to what we were doing before. But I'm guessing it only makes sense to make oozie jobs etc for data that is in hadoop? [12:50:45] addshore: correct : oozie relies on Hadoop to run [12:51:08] addshore: For regular queries on Mysql for instance, with use mforns awesome ReportUpdater [12:51:24] ooooh, are there docs for that? [12:51:49] addshore: I bet there are, but I never used it (/shame on me/ [12:51:49] as I am doing a bunch of stuff that just uses db queries / dump scans / api queries right now running daily / weekly etc [12:51:54] addshore, https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater :] [12:51:55] hehe [12:52:12] mforns: is this just for db related things really? [12:52:44] ahh yes, thats the thing that makes the tsv files and stuff :) [12:53:15] addshore, reportupdater executes periodically sql queries or scripts that generate sql-like output [12:53:31] addshore, and generates those report files, yes [12:54:33] okay! I need to try and take another look at https://gerrit.wikimedia.org/r/#/c/269467/ (may try and get it in my next sprint) [13:40:23] * elukey backkk [13:40:33] Yay, was wating for you elukey :) [13:40:44] :) [13:41:04] ottomata: hellooooo [13:41:43] HIII [13:44:57] good news: we found what happened yesterday [13:45:17] https://grafana.wikimedia.org/dashboard/db/analytics-hadoop [13:45:35] I added HDFS total files [13:46:28] oh nice :) [13:46:28] hah [13:46:30] yeah [13:46:31] makes senes [13:46:31] so when we cancelled files the HDFS namenodes said YESSSS [13:46:45] but we did two things at the same time aligning them perfectly [13:46:47] ahhhh [13:46:48] i see [13:46:50] WHAT ARE THE CHANCEEESSSS [13:46:51] so mem dropped [13:46:54] makes a lot of sense [13:46:54] yeah [13:46:57] nice [13:47:03] but only when I restarted the JVM [13:47:07] ottomata: log cleanup iws actually good ;) [13:47:11] yes!! [13:47:12] yeah [13:47:16] nuria found the proper setting too [13:47:21] hadoop will take care of removing old aggregated log files [13:47:21] :) [13:47:27] so all good, I rolled out again the change [13:47:32] \o/ [13:47:34] ottomata: Wow, great ! [13:47:41] ottomata: What is the setting? [13:47:54] https://gerrit.wikimedia.org/r/#/c/297643/2/templates/hadoop/yarn-site.xml.erb [13:48:41] ottomata: great, thx :) [13:49:38] ottomata: other thing - joal discovered that hive-site.xml on hadoop workers still references analytics1015 as hive metastore.. [13:50:29] HM [13:51:08] weird [13:51:09] ja? [13:51:10] hm [13:51:19] maybe hive-site.xml isn't rendered there by puppet anymore [13:52:20] yeah or maybe it used to be and now it isn't? [13:52:29] yes, makes sense [13:52:32] looking made me reember [13:52:39] we used to include hive client class on all workers [13:52:47] only so we could have hcatalog there [13:52:54] but, then we realized we could just require the hcatalog pacakge [13:52:56] and not hive [13:53:03] so we should be able to remove /etc/hive [13:53:09] maybe purge hive client [13:53:21] I think joal needs it [13:53:27] ottomata: actually we need it if we want to use spark hiveContext [13:53:43] hm [13:53:47] ottomata: in shell or client mode, it uses statX hive-site.xml [13:54:02] but in cluster it needs it on workers? [13:54:06] this sounds familiar... [13:54:10] but when launched by oozie, in cluster mode, app-master is on any node, and looks for file [13:54:17] right [13:54:21] joal, back, do we have something more to talk about? [13:54:33] mforns: hm, don't thionk so [13:54:38] or can I continue with what we spoke? [13:54:39] hm. [13:54:40] OK [13:54:41] yeah [13:54:43] ok. [13:54:43] mforns: if you want we can pair for duplicating the thing [13:54:44] so hm [13:55:00] yeah, and cdh::spark will creat a symlink to hive-site.xml in its config directory if cdh::hive is included [13:55:08] hmmm [13:55:16] yeah, and it doesn't have it now [13:55:18] hm [13:55:29] joal: does spark hive in cluster mode work now? [13:55:33] i would think it wouldn't... [13:55:45] since the /etc/spark/conf/hive-site.xml symlink doesn't exist [13:55:53] joal, oh yea, Dan and I briefly spoke yesterday about the same stuff, he had the same idea we had yesterday, to collect and re-parallelize the rdds every once in a while [13:55:56] ottomata: I think it doesn't [13:56:06] joal, this would be also an option [13:56:18] joal: do you have an easy test case we can try real quick? [13:56:29] mforns: could be, but really I think overhead comes from RDDs for small data [13:56:41] ottomata: Will find one, yes [13:56:47] ottomata: give me a minute [13:56:52] k [13:57:41] joal, ok [13:57:50] let's try duplicating the code [13:57:57] mforns: At least trying with lists will have us sure :) [13:58:02] aha [13:58:53] joal, so let me know when you have time for pairing [13:59:13] mforns: sure, need to find an example for ottomata, then pair :) [13:59:18] joal, ok [13:59:29] mforns: I'm sorry, everytime I offer to pair, something else show up ! [13:59:43] joal, no problem, do you want me to wait for you? [13:59:59] mforns: no thank you, please move forward, I'll catch up [14:00:06] ok [14:16:22] ottomata: I got proofs :) [14:16:30] ja? :) [14:16:44] So, job in client mode succed [14:17:57] then, job in cluster mode (from stat1002) fails - It needs hive stuff (some jars and hive-site file) [14:18:18] Then job in cluster mode succeeds from stat1002 with jars and hive file passing [14:18:32] And job fails from analytics1030 with same files passing [14:19:04] last job is stuck with last line error.log: 16/07/07 14:14:19 INFO metastore: Trying to connect to metastore with URI thrift://analytics1015.eqiad.wmnet:9083 [14:19:09] ottomata: --^ [14:19:55] HM [14:20:02] joal you laucned spark from analytics1030? [14:20:05] to do that? [14:20:11] correct ottomata [14:20:16] spark-submt [14:20:18] huh [14:20:18] hmmm [14:20:32] This is kinda what happen when launched from oozie [14:20:54] aye makes sense [14:20:58] ottomata: oozie launches stuff from anywhere (how are we supposed to catch tehm up after ...) [14:21:07] just surprised that it was able to even read hive-site.xml [14:21:11] since it isn't in spark/conf [14:21:25] oh ottomata I supplied the path [14:21:29] oh, you did? [14:21:32] indeed [14:21:34] what happens without /etc/hive/conf in the path? [14:21:38] it just fails differently? [14:21:41] I also need to do it in cluster mode from stat1002 [14:21:57] ottomata: batcave? [14:22:10] mmmmm, kinda hard atm, am at a cafe, lots of folks and noise [14:22:15] k [14:22:46] spark manages to grab hive conf alone when launched in client mode [14:22:56] when in cluster mode, it doesn't do it by default [14:24:57] hm [14:25:15] weird. hm. [14:25:27] so, wait, in client mode, it knows to look in hive-site.xml? [14:25:29] no. [14:25:36] it is in /etc/spark/conf on stat1002 [14:25:41] so it grabs it from there [14:26:08] but in cluster mode, it never knows where to look? [14:26:15] hm [14:26:16] ottomata: correct [14:26:26] how do you provide it? in spark or in classpath? [14:26:29] ottomata: plus, it needs some hive jars (datanucleus) [14:26:35] hm [14:26:52] ok, well, the simple solution here is to re-include hive-client on datanodes [14:26:58] ottomata: https://gist.github.com/jobar/7ccfccdb28bcb657df74f03b1ea37a6a [14:27:04] am a little worried, i feel like there was some conflict [14:27:09] but, maybe we can try on one [14:27:26] --files, crazy, hm [14:27:42] ottomata: I can even try to fake a hive-site with correct params from an anlytics machine if you want [14:27:54] ottomata: This is the suggested way to do it [14:28:16] ottomata: However, the fact that oozie can do it from anywhere makes even crazier [14:28:50] joal: perhaps you should grab the hive-site.xml from hdfs? [14:28:52] in cluster mode? [14:28:53] is that possible? [14:28:57] we deploy that with refinery. [14:29:14] hm, I didn't try - Will do it now [14:29:19] ottomata: --^ [14:29:20] ah, but you need the jars too? [14:29:21] hm [14:29:23] hm [14:29:47] hm, yeah if you need those datanucleus jars too, we should ensure they are on the potential appmasters too [14:29:48] hm [14:30:19] i mean, we could deploy those to hdfs too i guess...buuuut [14:30:19] hm [14:30:27] it seems better to just include hive client class on workers [14:30:33] i'll try it on one and see what happens [14:31:27] ottomata: file from hdfs fails :( [14:31:46] k [14:31:50] well, i think you need the jars anyway. [14:31:55] so i'm going to try including the class on analytics1030 [14:32:13] oh actually I'm wrong, it worked ! [14:32:16] ottomata: --^ [14:32:29] ottomata: possibly we could have the jars on hdfs as well [14:32:34] ottomata: Will try that [14:32:41] yeah we could [14:32:42] buuut [14:32:46] sounds pretty annoying [14:32:59] ottomata: To maintain you mean? [14:33:12] yeah [14:33:16] hmmm [14:33:18] except... [14:33:23] oozie does things like that [14:33:28] copies its lib jars to hdfs [14:33:31] hm [14:33:35] ottomata: Couldn't we puppetize the copy? [14:34:06] ottomata: like as part of the hive install, copy those files to hdfs? [14:34:10] hmm joal [14:34:12] they are already there! [14:34:15] as part of oozie sharelib [14:34:23] /user/oozie/share/lib/lib_20160223160848/hive [14:34:31] or hive2 [14:34:32] ? [14:34:32] hm [14:34:40] oh [14:34:41] and spark! [14:34:52] /user/oozie/share/lib/lib_20160223160848/spark [14:37:09] well ottomata, seems that everything is already there indeed (with hive-site in refinery) [14:37:28] ottomata: I'll try to remember that oozie have jars for everything on hdfs :) [14:37:41] ottomata: will try with those path, many thanks ! [14:38:25] ok joal, cool [14:38:40] ottomata: Funny thing is, there is a difference (not big) of version from datanucleus is hive folder and in oozie shared libs [14:38:40] i need to clean up hive client stuff on worker nodes though then [14:38:45] yeah i see that [14:38:55] hopefully its ok [14:39:06] hm, are datanucleus jars not in /usr/lib/spark somewhere? [14:39:54] hm, nope [14:40:05] ottomata: makes sense they ask to include them ;) [14:40:31] hm, joal so, it might be annoying to hardcode links to that oozie sharelib dir though [14:40:33] HMMM [14:40:37] i thikn that is dynamic somehow... [14:41:10] ottomata: first thing, let's provide them manully through --jars [14:41:19] On startup, Oozie will look for the newest lib_ directory and use that. [14:43:06] http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/ [14:43:19] elow are the various ways to include a jar with your workflow: [14:43:19] 1. Set oozie.libpath=/path/to/jars,another/path/to/jars in job.properties. [14:43:30] • There is no need to ever point this at the ShareLib location. (I see that in a lot of workflows.) Oozie knows where the ShareLib is and will include it automatically if you set oozie.use.system.libpath=true injob.properties. [14:44:08] so joal, ja, i *think* for oozie, if we use hive-site.xml from hdfs [14:44:10] the rest will just work [14:44:17] since datanucleus is alreday in the oozie sharelib [14:44:26] makes sense ottomata !!! [14:44:50] joal: aqs1001 upgraded :) [14:44:58] ottomata: however it doesn't when we use spark-submit as-is in cluster mode (but who does) [14:45:04] great elukey !!! [14:45:06] Thanks :) [14:45:30] hm [14:45:34] but that would be nice too [14:45:35] yeah [14:45:44] i guess if you do that you have to manually specify path in oozie sharelib [14:45:46] buuut [14:45:46] hm [14:45:51] joal: i'm still looking at including hive client [14:45:55] ottomata: most of the time, when launching jobs, we do in client mode, in order to get the logs [14:45:56] that may be a fine option too [14:45:58] checking [14:46:01] k [14:48:07] hm, joal for my test, i don't get any conflicts... [14:48:13] i'm going to merge this, and run on analytics1030 [14:48:17] i think it might be good to just do this [14:48:23] doesn't hurt to have hive client on worker nodes... [14:48:33] ottomata: ok [14:48:51] ottomata: currently checking without jars and refinery hive-site on oozie [14:49:09] k [14:49:11] joal, man... the duplicated code is practically the same... but I can not manage to write it as a single generic thing... [14:49:20] :D [14:49:39] mforns: will be with you in minutes or so [14:55:18] ottomata: works great :) [14:55:42] addshore: ottomata found a solution for us :) [14:55:49] :D [14:55:58] in a meeting now, and then going bouldering :/ [14:56:17] addshore: ooools analytics saying: When somthin' wrong, ask for ottomata [14:56:36]