Pyspark Impala jdbc Driver does not support this optional feature

Multi tool use
up vote
0
down vote
favorite
I am using pyspark for spark streaming. I am able to stream and create the dataframe properly with no issues. I was also able to insert data into Impala table created with only a few(5) sampled columns out of the overall columns(72) in the message from Kafka. But when I create a new a table with proper data types and columns, similarly the dataframe now has all the columns mentioned in the message of Kafka stream. I get the below exception.
java.sql.SQLFeatureNotSupportedException: [Cloudera]JDBC Driver does not support this optional feature.
at com.cloudera.impala.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.setNull(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:627)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have searched a lot on this, but could not find any solution on this. I enabled debug logs as well, still it won't mention what feature does the driver not support.
Any help or proper guidance would be appreciated.
Thank you
Version details :
pyspark : 2.2.0
Kafka : 0.10.2
Cloudera : 5.15.0
Cloudera Impala : 2.12.0-cdh5.15.0
Cloudera Impala JDBC driver : 2.6.4
The code I have used :
import json
from pyspark import SparkContext,SparkConf,HiveContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SparkSession,Row
from pyspark.sql.functions import lit
from pyspark.sql.types import *
conf = SparkConf().setAppName("testkafkarecvstream")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
spark = SparkSession.builder.appName("testkafkarecvstream").getOrCreate()
jdbcUrl = "jdbc:impala://hostname:21050/dbName;AuthMech=0;"
fields = [
StructField("column_name01", StringType(), True),
StructField("column_name02", StringType(), True),
StructField("column_name03", DoubleType(), True),
StructField("column_name04", StringType(), True),
StructField("column_name05", IntegerType(), True),
StructField("column_name06", StringType(), True),
.....................
StructField("column_name72", StringType(), True),
]
schema = StructType(fields)
def make_rows(parts):
customRow = Row(column_name01=datatype(parts['column_name01']),
.....,
column_name72=datatype(parts['column_name72'])
)
return customRow
def createDFToParquet(rdd):
try:
df = spark.createDataFrame(rdd,schema)
df.show()df.write.jdbc(jdbcUrl,
table="table_name",
mode="append",)
except Exception as e:
print str(e)
zkNode = "zkNode_name:2181"
topic = "topic_name"
# Reciever method
kvs = KafkaUtils.createStream(ssc,
zkNode,
"consumer-group-id",
topic:5,
"auto.offset.reset" : "smallest")
lines = kvs.map(lambda x: x[1])
conv = lines.map(lambda x: json.loads(x))
table = conv.map(makeRows)
table.foreachRDD(createDFToParquet)
table.pprint()
ssc.start()
ssc.awaitTermination()
jdbc pyspark spark-streaming cloudera impala
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
0
down vote
favorite
I am using pyspark for spark streaming. I am able to stream and create the dataframe properly with no issues. I was also able to insert data into Impala table created with only a few(5) sampled columns out of the overall columns(72) in the message from Kafka. But when I create a new a table with proper data types and columns, similarly the dataframe now has all the columns mentioned in the message of Kafka stream. I get the below exception.
java.sql.SQLFeatureNotSupportedException: [Cloudera]JDBC Driver does not support this optional feature.
at com.cloudera.impala.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.setNull(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:627)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have searched a lot on this, but could not find any solution on this. I enabled debug logs as well, still it won't mention what feature does the driver not support.
Any help or proper guidance would be appreciated.
Thank you
Version details :
pyspark : 2.2.0
Kafka : 0.10.2
Cloudera : 5.15.0
Cloudera Impala : 2.12.0-cdh5.15.0
Cloudera Impala JDBC driver : 2.6.4
The code I have used :
import json
from pyspark import SparkContext,SparkConf,HiveContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SparkSession,Row
from pyspark.sql.functions import lit
from pyspark.sql.types import *
conf = SparkConf().setAppName("testkafkarecvstream")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
spark = SparkSession.builder.appName("testkafkarecvstream").getOrCreate()
jdbcUrl = "jdbc:impala://hostname:21050/dbName;AuthMech=0;"
fields = [
StructField("column_name01", StringType(), True),
StructField("column_name02", StringType(), True),
StructField("column_name03", DoubleType(), True),
StructField("column_name04", StringType(), True),
StructField("column_name05", IntegerType(), True),
StructField("column_name06", StringType(), True),
.....................
StructField("column_name72", StringType(), True),
]
schema = StructType(fields)
def make_rows(parts):
customRow = Row(column_name01=datatype(parts['column_name01']),
.....,
column_name72=datatype(parts['column_name72'])
)
return customRow
def createDFToParquet(rdd):
try:
df = spark.createDataFrame(rdd,schema)
df.show()df.write.jdbc(jdbcUrl,
table="table_name",
mode="append",)
except Exception as e:
print str(e)
zkNode = "zkNode_name:2181"
topic = "topic_name"
# Reciever method
kvs = KafkaUtils.createStream(ssc,
zkNode,
"consumer-group-id",
topic:5,
"auto.offset.reset" : "smallest")
lines = kvs.map(lambda x: x[1])
conv = lines.map(lambda x: json.loads(x))
table = conv.map(makeRows)
table.foreachRDD(createDFToParquet)
table.pprint()
ssc.start()
ssc.awaitTermination()
jdbc pyspark spark-streaming cloudera impala
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Can you show some code you tried so far? Also, what version ofJDBC
are you using forImpala
?
– karma4917
Nov 9 at 16:54
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using pyspark for spark streaming. I am able to stream and create the dataframe properly with no issues. I was also able to insert data into Impala table created with only a few(5) sampled columns out of the overall columns(72) in the message from Kafka. But when I create a new a table with proper data types and columns, similarly the dataframe now has all the columns mentioned in the message of Kafka stream. I get the below exception.
java.sql.SQLFeatureNotSupportedException: [Cloudera]JDBC Driver does not support this optional feature.
at com.cloudera.impala.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.setNull(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:627)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have searched a lot on this, but could not find any solution on this. I enabled debug logs as well, still it won't mention what feature does the driver not support.
Any help or proper guidance would be appreciated.
Thank you
Version details :
pyspark : 2.2.0
Kafka : 0.10.2
Cloudera : 5.15.0
Cloudera Impala : 2.12.0-cdh5.15.0
Cloudera Impala JDBC driver : 2.6.4
The code I have used :
import json
from pyspark import SparkContext,SparkConf,HiveContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SparkSession,Row
from pyspark.sql.functions import lit
from pyspark.sql.types import *
conf = SparkConf().setAppName("testkafkarecvstream")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
spark = SparkSession.builder.appName("testkafkarecvstream").getOrCreate()
jdbcUrl = "jdbc:impala://hostname:21050/dbName;AuthMech=0;"
fields = [
StructField("column_name01", StringType(), True),
StructField("column_name02", StringType(), True),
StructField("column_name03", DoubleType(), True),
StructField("column_name04", StringType(), True),
StructField("column_name05", IntegerType(), True),
StructField("column_name06", StringType(), True),
.....................
StructField("column_name72", StringType(), True),
]
schema = StructType(fields)
def make_rows(parts):
customRow = Row(column_name01=datatype(parts['column_name01']),
.....,
column_name72=datatype(parts['column_name72'])
)
return customRow
def createDFToParquet(rdd):
try:
df = spark.createDataFrame(rdd,schema)
df.show()df.write.jdbc(jdbcUrl,
table="table_name",
mode="append",)
except Exception as e:
print str(e)
zkNode = "zkNode_name:2181"
topic = "topic_name"
# Reciever method
kvs = KafkaUtils.createStream(ssc,
zkNode,
"consumer-group-id",
topic:5,
"auto.offset.reset" : "smallest")
lines = kvs.map(lambda x: x[1])
conv = lines.map(lambda x: json.loads(x))
table = conv.map(makeRows)
table.foreachRDD(createDFToParquet)
table.pprint()
ssc.start()
ssc.awaitTermination()
jdbc pyspark spark-streaming cloudera impala
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I am using pyspark for spark streaming. I am able to stream and create the dataframe properly with no issues. I was also able to insert data into Impala table created with only a few(5) sampled columns out of the overall columns(72) in the message from Kafka. But when I create a new a table with proper data types and columns, similarly the dataframe now has all the columns mentioned in the message of Kafka stream. I get the below exception.
java.sql.SQLFeatureNotSupportedException: [Cloudera]JDBC Driver does not support this optional feature.
at com.cloudera.impala.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
at com.cloudera.impala.jdbc.common.SPreparedStatement.setNull(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:627)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2064)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have searched a lot on this, but could not find any solution on this. I enabled debug logs as well, still it won't mention what feature does the driver not support.
Any help or proper guidance would be appreciated.
Thank you
Version details :
pyspark : 2.2.0
Kafka : 0.10.2
Cloudera : 5.15.0
Cloudera Impala : 2.12.0-cdh5.15.0
Cloudera Impala JDBC driver : 2.6.4
The code I have used :
import json
from pyspark import SparkContext,SparkConf,HiveContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SparkSession,Row
from pyspark.sql.functions import lit
from pyspark.sql.types import *
conf = SparkConf().setAppName("testkafkarecvstream")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
spark = SparkSession.builder.appName("testkafkarecvstream").getOrCreate()
jdbcUrl = "jdbc:impala://hostname:21050/dbName;AuthMech=0;"
fields = [
StructField("column_name01", StringType(), True),
StructField("column_name02", StringType(), True),
StructField("column_name03", DoubleType(), True),
StructField("column_name04", StringType(), True),
StructField("column_name05", IntegerType(), True),
StructField("column_name06", StringType(), True),
.....................
StructField("column_name72", StringType(), True),
]
schema = StructType(fields)
def make_rows(parts):
customRow = Row(column_name01=datatype(parts['column_name01']),
.....,
column_name72=datatype(parts['column_name72'])
)
return customRow
def createDFToParquet(rdd):
try:
df = spark.createDataFrame(rdd,schema)
df.show()df.write.jdbc(jdbcUrl,
table="table_name",
mode="append",)
except Exception as e:
print str(e)
zkNode = "zkNode_name:2181"
topic = "topic_name"
# Reciever method
kvs = KafkaUtils.createStream(ssc,
zkNode,
"consumer-group-id",
topic:5,
"auto.offset.reset" : "smallest")
lines = kvs.map(lambda x: x[1])
conv = lines.map(lambda x: json.loads(x))
table = conv.map(makeRows)
table.foreachRDD(createDFToParquet)
table.pprint()
ssc.start()
ssc.awaitTermination()
jdbc pyspark spark-streaming cloudera impala
jdbc pyspark spark-streaming cloudera impala
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited yesterday
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Nov 9 at 12:53
Kaustubh Desai
12
12
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Kaustubh Desai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Can you show some code you tried so far? Also, what version ofJDBC
are you using forImpala
?
– karma4917
Nov 9 at 16:54
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday
add a comment |
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Can you show some code you tried so far? Also, what version ofJDBC
are you using forImpala
?
– karma4917
Nov 9 at 16:54
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Can you show some code you tried so far? Also, what version of
JDBC
are you using for Impala
?– karma4917
Nov 9 at 16:54
Can you show some code you tried so far? Also, what version of
JDBC
are you using for Impala
?– karma4917
Nov 9 at 16:54
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Kaustubh Desai is a new contributor. Be nice, and check out our Code of Conduct.
Kaustubh Desai is a new contributor. Be nice, and check out our Code of Conduct.
Kaustubh Desai is a new contributor. Be nice, and check out our Code of Conduct.
Kaustubh Desai is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226069%2fpyspark-impala-jdbc-driver-does-not-support-this-optional-feature%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
MkOooiuegt V7H2JbE9o7P9kjmTBAedRPhDzs1QL fBtQKY TG5fC18n,AXiz6 DRMzzwAOXt BmZmFf7lzwWYb
Are you trying to define a Array or struct?
– karma4917
Nov 9 at 16:37
Can you show some code you tried so far? Also, what version of
JDBC
are you using forImpala
?– karma4917
Nov 9 at 16:54
Q : Are you trying to define a Array or struct? I have defined an array for the schema. Q : Also, what version of JDBC are you using for Impala? A : Cloudera Impala JDBC driver : 2.6.4
– Kaustubh Desai
yesterday