How to use SPARK to query on HIVE?
I am trying to use spark to run queries on hive table.
I have followed lots of articles present on internet, but had no success.
I have moved the hive-site.xml file to spark location.
Could you please explain how to do that? I am using Spark 1.6
Thank you in advance.
Please find my code below.
import sqlContext.implicits._
import org.apache.spark.sql
val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val header = hospitalDataText.first()
val hospitalData = hospitalDataText.filter(a=>a!=header)
case class Services(uhid:String,locationid:String,doctorid:String)
val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))
val hosService = hData.toDF()
hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")
This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.
But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.
So i run below commands.
hosService.write.mode("overwrite").saveAsTable("hosData")
sqlContext.sql("show tables").show
shows me below result
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| hosdata| false|
+--------------------+-----------+
But again when i check in hive, i can not see table 'hosdata'
Could anyone let me know what step i am missing?
apache-spark hive
add a comment |
I am trying to use spark to run queries on hive table.
I have followed lots of articles present on internet, but had no success.
I have moved the hive-site.xml file to spark location.
Could you please explain how to do that? I am using Spark 1.6
Thank you in advance.
Please find my code below.
import sqlContext.implicits._
import org.apache.spark.sql
val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val header = hospitalDataText.first()
val hospitalData = hospitalDataText.filter(a=>a!=header)
case class Services(uhid:String,locationid:String,doctorid:String)
val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))
val hosService = hData.toDF()
hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")
This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.
But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.
So i run below commands.
hosService.write.mode("overwrite").saveAsTable("hosData")
sqlContext.sql("show tables").show
shows me below result
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| hosdata| false|
+--------------------+-----------+
But again when i check in hive, i can not see table 'hosdata'
Could anyone let me know what step i am missing?
apache-spark hive
add a comment |
I am trying to use spark to run queries on hive table.
I have followed lots of articles present on internet, but had no success.
I have moved the hive-site.xml file to spark location.
Could you please explain how to do that? I am using Spark 1.6
Thank you in advance.
Please find my code below.
import sqlContext.implicits._
import org.apache.spark.sql
val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val header = hospitalDataText.first()
val hospitalData = hospitalDataText.filter(a=>a!=header)
case class Services(uhid:String,locationid:String,doctorid:String)
val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))
val hosService = hData.toDF()
hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")
This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.
But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.
So i run below commands.
hosService.write.mode("overwrite").saveAsTable("hosData")
sqlContext.sql("show tables").show
shows me below result
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| hosdata| false|
+--------------------+-----------+
But again when i check in hive, i can not see table 'hosdata'
Could anyone let me know what step i am missing?
apache-spark hive
I am trying to use spark to run queries on hive table.
I have followed lots of articles present on internet, but had no success.
I have moved the hive-site.xml file to spark location.
Could you please explain how to do that? I am using Spark 1.6
Thank you in advance.
Please find my code below.
import sqlContext.implicits._
import org.apache.spark.sql
val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val header = hospitalDataText.first()
val hospitalData = hospitalDataText.filter(a=>a!=header)
case class Services(uhid:String,locationid:String,doctorid:String)
val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))
val hosService = hData.toDF()
hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")
This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.
But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.
So i run below commands.
hosService.write.mode("overwrite").saveAsTable("hosData")
sqlContext.sql("show tables").show
shows me below result
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| hosdata| false|
+--------------------+-----------+
But again when i check in hive, i can not see table 'hosdata'
Could anyone let me know what step i am missing?
apache-spark hive
apache-spark hive
edited Nov 16 '18 at 4:15
Kedar Divekar
asked Nov 15 '18 at 2:41
Kedar DivekarKedar Divekar
11
11
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
There are multiple ways you can use to query Hive using Spark.
- Like in Hive CLI, you can query using Spark SQL
- Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark Context-sql() method allows you to execute the same query that you might have executed on Hive
Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.
Hope this helps.
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311665%2fhow-to-use-spark-to-query-on-hive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are multiple ways you can use to query Hive using Spark.
- Like in Hive CLI, you can query using Spark SQL
- Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark Context-sql() method allows you to execute the same query that you might have executed on Hive
Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.
Hope this helps.
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
add a comment |
There are multiple ways you can use to query Hive using Spark.
- Like in Hive CLI, you can query using Spark SQL
- Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark Context-sql() method allows you to execute the same query that you might have executed on Hive
Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.
Hope this helps.
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
add a comment |
There are multiple ways you can use to query Hive using Spark.
- Like in Hive CLI, you can query using Spark SQL
- Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark Context-sql() method allows you to execute the same query that you might have executed on Hive
Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.
Hope this helps.
There are multiple ways you can use to query Hive using Spark.
- Like in Hive CLI, you can query using Spark SQL
- Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark Context-sql() method allows you to execute the same query that you might have executed on Hive
Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.
Hope this helps.
answered Nov 15 '18 at 7:43
Rahul Singh BajajRahul Singh Bajaj
143
143
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
add a comment |
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
I have added exact code above. Could you please let me know what i am missing over there ?
– Kedar Divekar
Nov 21 '18 at 4:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311665%2fhow-to-use-spark-to-query-on-hive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown