Parse Dataframe and store output in a single file [duplicate]

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B
1 a|b|c
2 b|d
3 d|e|f

I need to store the output to a single textfile in following format

1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B
1 a|b|c
2 b|d
3 d|e|f

I need to store the output to a single textfile in following format

1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B
1 a|b|c
2 b|d
3 d|e|f

I need to store the output to a single textfile in following format

1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f

How can I do that?

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B
1 a|b|c
2 b|d
3 d|e|f

I need to store the output to a single textfile in following format

1 a
1 b
1 c
2 b
2 d
3 d
3 e
3 f

How can I do that?

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

scala apache-spark apache-spark-sql

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

edited Nov 10 at 9:41

SCouto

3,73531227

edited Nov 10 at 9:41

SCouto

3,73531227

edited Nov 10 at 9:41

SCouto

3,73531227

asked Nov 10 at 8:59

Nick

96110

asked Nov 10 at 8:59

Nick

96110

asked Nov 10 at 8:59

Nick

96110

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 10 at 10:56

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+

Then you can save in a single file with a coalesce(1)

 resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,73531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
0
down vote

You can do something like,

val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9341317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+

Then you can save in a single file with a coalesce(1)

 resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,73531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+

Then you can save in a single file with a coalesce(1)

 resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,73531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+

Then you can save in a single file with a coalesce(1)

 resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,73531227

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+
| A| B|
+---+---+
| 1| a|
| 1| b|
| 1| c|
| 2| b|
| 2| d|
| 3| d|
| 3| e|
| 3| f|
+---+---+

Then you can save in a single file with a coalesce(1)

 resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered Nov 10 at 9:47

SCouto

3,73531227

answered Nov 10 at 9:47

SCouto

3,73531227

answered Nov 10 at 9:47

SCouto

3,73531227

answered Nov 10 at 9:47

SCouto

3,73531227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

explode function is not recognized in my code. What dependency do I need to add?
– Nick
Nov 10 at 10:17

this should be enough: import org.apache.spark.sql.functions._
– SCouto
Nov 10 at 10:20

add a comment |

up vote
0
down vote

You can do something like,

val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9341317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

up vote
0
down vote

You can do something like,

val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9341317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

up vote
0
down vote

You can do something like,

val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9341317

You can do something like,

val df = ???
val resDF =df.withColumn("B", explode(split(col("B"), "\|")))

resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered Nov 10 at 9:47

Chitral Verma

9341317

answered Nov 10 at 9:47

Chitral Verma

9341317

answered Nov 10 at 9:47

Chitral Verma

9341317

answered Nov 10 at 9:47

Chitral Verma

9341317

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

explode(split(col : this part of your code is not recognized
– Nick
Nov 10 at 10:15

col comes from org.apache.spark.sql.functions
– Chitral Verma
Nov 10 at 11:16

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb