Parse Dataframe and store output in a single file [duplicate]
up vote
0
down vote
favorite
This question already has an answer here:
Spark split a column value into multiple rows
1 answer
I have a data frame using Spark SQL in Scala with columns A and B with values:
A | B
1 a|b|c
2 b|d
3 d|e|f
I need to store the output to a single textfile in following format
1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f
How can I do that?
scala apache-spark apache-spark-sql
marked as duplicate by user6910411
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 10 at 10:56
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
0
down vote
favorite
This question already has an answer here:
Spark split a column value into multiple rows
1 answer
I have a data frame using Spark SQL in Scala with columns A and B with values:
A | B
1 a|b|c
2 b|d
3 d|e|f
I need to store the output to a single textfile in following format
1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f
How can I do that?
scala apache-spark apache-spark-sql
marked as duplicate by user6910411
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 10 at 10:56
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
This question already has an answer here:
Spark split a column value into multiple rows
1 answer
I have a data frame using Spark SQL in Scala with columns A and B with values:
A | B
1 a|b|c
2 b|d
3 d|e|f
I need to store the output to a single textfile in following format
1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f
How can I do that?
scala apache-spark apache-spark-sql
This question already has an answer here:
Spark split a column value into multiple rows
1 answer
I have a data frame using Spark SQL in Scala with columns A and B with values:
A | B
1 a|b|c
2 b|d
3 d|e|f
I need to store the output to a single textfile in following format
1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f
How can I do that?
This question already has an answer here:
Spark split a column value into multiple rows
1 answer
scala apache-spark apache-spark-sql
scala apache-spark apache-spark-sql
edited Nov 10 at 9:41
SCouto
3,73531227
3,73531227
asked Nov 10 at 8:59
Nick
96110
96110
marked as duplicate by user6910411
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 10 at 10:56
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by user6910411
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 10 at 10:56
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
You can get the desired Dataframe with an expode and a split:
val resultDF = df.withColumn("B", explode(split($"B", "\|")))
Result
+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+
Then you can save in a single file with a coalesce(1)
resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
add a comment |
up vote
0
down vote
You can do something like,
val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))
resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
You can get the desired Dataframe with an expode and a split:
val resultDF = df.withColumn("B", explode(split($"B", "\|")))
Result
+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+
Then you can save in a single file with a coalesce(1)
resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
add a comment |
up vote
2
down vote
accepted
You can get the desired Dataframe with an expode and a split:
val resultDF = df.withColumn("B", explode(split($"B", "\|")))
Result
+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+
Then you can save in a single file with a coalesce(1)
resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
You can get the desired Dataframe with an expode and a split:
val resultDF = df.withColumn("B", explode(split($"B", "\|")))
Result
+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+
Then you can save in a single file with a coalesce(1)
resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")
You can get the desired Dataframe with an expode and a split:
val resultDF = df.withColumn("B", explode(split($"B", "\|")))
Result
+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+
Then you can save in a single file with a coalesce(1)
resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")
answered Nov 10 at 9:47
SCouto
3,73531227
3,73531227
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
add a comment |
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17
1
1
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20
add a comment |
up vote
0
down vote
You can do something like,
val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))
resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16
add a comment |
up vote
0
down vote
You can do something like,
val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))
resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16
add a comment |
up vote
0
down vote
up vote
0
down vote
You can do something like,
val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))
resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")
You can do something like,
val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))
resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")
answered Nov 10 at 9:47
Chitral Verma
9341317
9341317
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16
add a comment |
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15
col
comes from org.apache.spark.sql.functions– Chitral Verma
Nov 10 at 11:16
col
comes from org.apache.spark.sql.functions– Chitral Verma
Nov 10 at 11:16
add a comment |