[17:00:37] heya dsaez! [17:00:50] hi ottomata! [17:00:54] first q, i see a few PySparkShell's running in yarn [17:00:59] just checking that those are all intentional: [17:01:08] also, we are getting some spark notebook integration very close to working [17:01:15] is that something useful to you...and would you like to try it out? [17:07:24] ottomata sure [17:07:37] ottomata, first, this are intentional [17:07:45] hope that is not getting too much resources [17:07:58] spark notebook is always interesting [17:08:11] nope its ok! [17:08:13] just checking [17:08:24] thought maybe you just had a bunch of open forgotten shells [17:08:38] i'm in a meeting for just a bit more, then will install it and tell you how... [17:09:18] great [18:00:25] ok dsaez got a sec to give it a try? [18:44:37] ottomata, now! [18:44:55] k [18:45:02] dsaez: on either notebook machine [18:45:06] i guess notebook1003 [18:45:11] see if you have new kernels available [18:45:13] and try em [18:45:13] :) [18:45:29] let me see, I've never trieed 1003 [18:46:00] no permission for 3, would work in 1006? [18:48:16] dsaez: oh [18:48:20] have you not used swap before? [18:48:38] https://wikitech.wikimedia.org/wiki/SWAP#Access [18:48:48] let me see [18:51:42] oh yeah! lot of stuff there! [18:53:23] ottomata, looks great, I'm trying to understand what is for example Spark YARN - SQL , any documentation? [18:53:59] dsaez: you've used pyspark before right? [18:54:07] its the difference between local and --master YARN [18:54:16] SQL is just a SQL CLI using spark [18:54:23] so you can query Hive, etc., without code, but still via spark [18:54:24] like Hive [18:54:25] but spark [18:54:47] they match the spark2-* shells you can get [18:54:50] on the CLI [18:54:55] pyspark2 [18:54:57] etc. [18:55:39] amazing ! thanks [18:57:05] oh, great, and now the xmldumps are there [19:24:00] ottomata, I'm getting some errors, should I open a ticket? [19:24:15] dsaez: sure! [19:49:25] dsaez: quick coment [19:49:30] it looks like your code is spark 1 [19:49:36] you shoudln't need to instantiate a sqlContext [19:49:47] the spark version there is spark2 [19:49:54] ottomata, I needed to create, because was not there [19:50:19] ottomata, I just copy/paste my working code from a spark2, first problem was that the sqlContext was no there, so I've created. [19:50:23] hm [19:51:06] I've added that to the ticket [19:51:39] dsaez: in spark 2 [19:51:43] you shouldn't need sqlContext [19:51:48] it is accessible via the spark session [19:51:49] so [19:51:51] I know, but is not there [19:51:52] spark.sql [19:51:52] or [19:51:55] spark.udf [19:52:01] ok [19:52:05] let me try that [19:52:13] not that that will solve your problem... [19:52:14] :p [19:52:23] but usually I see sqlContext coming within the session [19:53:45] NameError: name 'sqlContext' is not defined [19:53:52] right [19:53:55] you dn't need sqlContext [19:53:58] what are you trying to do? [19:54:06] you want to register a UDF [19:54:10] or execute sql,right? [19:54:16] just call the methods on the spark session [19:54:21] spark.sql("SELECT ...") [19:54:22] or [19:54:43] spark.udf.register(...) [19:55:34] https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html