using sparksql and spark dataframe How can we find the COLUMN NAME based on the minimum value in a row

i have a dataframe df . its having 4 columns

+-------+-------+-------+-------+ 
| dist1 | dist2 | dist3 | dist4 |
+-------+-------+-------+-------+ 
| 42 | 53 | 24 | 17 |
+-------+-------+-------+-------+

output i want is

dist4

seems easy but i did not find any proper solution using dataframe or sparksql query

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

add a comment |

i have a dataframe df . its having 4 columns

+-------+-------+-------+-------+ 
| dist1 | dist2 | dist3 | dist4 |
+-------+-------+-------+-------+ 
| 42 | 53 | 24 | 17 |
+-------+-------+-------+-------+

output i want is

dist4

seems easy but i did not find any proper solution using dataframe or sparksql query

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

add a comment |

i have a dataframe df . its having 4 columns

+-------+-------+-------+-------+ 
| dist1 | dist2 | dist3 | dist4 |
+-------+-------+-------+-------+ 
| 42 | 53 | 24 | 17 |
+-------+-------+-------+-------+

output i want is

dist4

seems easy but i did not find any proper solution using dataframe or sparksql query

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

i have a dataframe df . its having 4 columns

+-------+-------+-------+-------+ 
| dist1 | dist2 | dist3 | dist4 |
+-------+-------+-------+-------+ 
| 42 | 53 | 24 | 17 |
+-------+-------+-------+-------+

output i want is

dist4

seems easy but i did not find any proper solution using dataframe or sparksql query

sql apache-spark apache-spark-sql

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

edited Nov 15 '18 at 7:29

asked Nov 15 '18 at 6:44

stackoverflow

118

asked Nov 15 '18 at 6:44

stackoverflow

118

asked Nov 15 '18 at 6:44

stackoverflow

118

add a comment |

5 Answers
5

active

oldest

votes

-1

you can do something like,

import org.apache.spark.sql.functions._

val cols = df.columns
val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

df.withColumn("res", u1(array("*")))

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

add a comment |

You may use least function as

select least(dist1,dist2,dist3,dist4) as min_dist
 from yourTable;

For the opposite cases greatest may be used.

EDIT :
To detect column names the following maybe used to get rows

select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
 struct(24, 'dist3'), struct(17, 'dist4') ))

42 dist1
53 dist2
24 dist3
17 dist4

and then min function may be applied to get dist4

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

add a comment |

Try this,

df.show
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 5| 4| 3| 1|
+---+---+---+---+

val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

result.show
+---+---+---+---+--------------------+---------+
| A| B| C| D| least|least_col|
+---+---+---+---+--------------------+---------+
|1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
|5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
+---+---+---+---+--------------------+---------+

answered Nov 15 '18 at 10:22

Sathiyan S

513310

add a comment |

RDD way and without udf()s.

scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

scala> val rowarr = df.columns
rowarr: Array[String] = Array(A, B, C, D)

scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
+---+---+---+---+------------+------+
| A| B| C| D| arr|mincol|
+---+---+---+---+------------+------+
| 1| 2| 3| 4|[1, 2, 3, 4]| A|
| 5| 4| 3| 1|[5, 4, 3, 1]| D|
+---+---+---+---+------------+------+


scala>

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

add a comment |

-1

you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.

see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row

it would look roughly like this

dataframe.map(
 row => 
 val schema = row.schema
 val fieldNames:List[String] = ??? //extract names from schema
 fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum
 
)

This would yield a Dataset[String]

answered Nov 15 '18 at 8:38

Dominic Egger

90327

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313818%2fusing-sparksql-and-spark-dataframe-how-can-we-find-the-column-name-based-on-the%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

-1

you can do something like,

import org.apache.spark.sql.functions._

val cols = df.columns
val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

df.withColumn("res", u1(array("*")))

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

add a comment |

-1

you can do something like,

import org.apache.spark.sql.functions._

val cols = df.columns
val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

df.withColumn("res", u1(array("*")))

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

add a comment |

-1

you can do something like,

import org.apache.spark.sql.functions._

val cols = df.columns
val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

df.withColumn("res", u1(array("*")))

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

you can do something like,

import org.apache.spark.sql.functions._

val cols = df.columns
val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

df.withColumn("res", u1(array("*")))

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

answered Nov 15 '18 at 8:59

Chitral Verma

1,0471617

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

add a comment |

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

thanks a lot ... worked perfectly :)

– stackoverflow
Nov 15 '18 at 10:02

i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

– stackoverflow
Nov 15 '18 at 10:15

add a comment |

You may use least function as

select least(dist1,dist2,dist3,dist4) as min_dist
 from yourTable;

For the opposite cases greatest may be used.

EDIT :
To detect column names the following maybe used to get rows

select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
 struct(24, 'dist3'), struct(17, 'dist4') ))

42 dist1
53 dist2
24 dist3
17 dist4

and then min function may be applied to get dist4

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

add a comment |

You may use least function as

select least(dist1,dist2,dist3,dist4) as min_dist
 from yourTable;

For the opposite cases greatest may be used.

EDIT :
To detect column names the following maybe used to get rows

select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
 struct(24, 'dist3'), struct(17, 'dist4') ))

42 dist1
53 dist2
24 dist3
17 dist4

and then min function may be applied to get dist4

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

add a comment |

You may use least function as

select least(dist1,dist2,dist3,dist4) as min_dist
 from yourTable;

For the opposite cases greatest may be used.

EDIT :
To detect column names the following maybe used to get rows

select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
 struct(24, 'dist3'), struct(17, 'dist4') ))

42 dist1
53 dist2
24 dist3
17 dist4

and then min function may be applied to get dist4

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

You may use least function as

select least(dist1,dist2,dist3,dist4) as min_dist
 from yourTable;

For the opposite cases greatest may be used.

EDIT :
To detect column names the following maybe used to get rows

select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
 struct(24, 'dist3'), struct(17, 'dist4') ))

42 dist1
53 dist2
24 dist3
17 dist4

and then min function may be applied to get dist4

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

edited Nov 15 '18 at 8:49

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

answered Nov 15 '18 at 6:56

Barbaros Özhan

14.7k71634

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

add a comment |

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

@Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

– stackoverflow
Nov 15 '18 at 7:29

add a comment |

Try this,

df.show
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 5| 4| 3| 1|
+---+---+---+---+

val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

result.show
+---+---+---+---+--------------------+---------+
| A| B| C| D| least|least_col|
+---+---+---+---+--------------------+---------+
|1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
|5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
+---+---+---+---+--------------------+---------+

answered Nov 15 '18 at 10:22

Sathiyan S

513310

add a comment |

Try this,

df.show
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 5| 4| 3| 1|
+---+---+---+---+

val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

result.show
+---+---+---+---+--------------------+---------+
| A| B| C| D| least|least_col|
+---+---+---+---+--------------------+---------+
|1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
|5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
+---+---+---+---+--------------------+---------+

answered Nov 15 '18 at 10:22

Sathiyan S

513310

add a comment |

Try this,

df.show
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 5| 4| 3| 1|
+---+---+---+---+

val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

result.show
+---+---+---+---+--------------------+---------+
| A| B| C| D| least|least_col|
+---+---+---+---+--------------------+---------+
|1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
|5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
+---+---+---+---+--------------------+---------+

answered Nov 15 '18 at 10:22

Sathiyan S

513310

Try this,

df.show
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 5| 4| 3| 1|
+---+---+---+---+

val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

result.show
+---+---+---+---+--------------------+---------+
| A| B| C| D| least|least_col|
+---+---+---+---+--------------------+---------+
|1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
|5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
+---+---+---+---+--------------------+---------+

answered Nov 15 '18 at 10:22

Sathiyan S

513310

answered Nov 15 '18 at 10:22

Sathiyan S

513310

answered Nov 15 '18 at 10:22

Sathiyan S

513310

answered Nov 15 '18 at 10:22

Sathiyan S

513310

add a comment |

RDD way and without udf()s.

scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

scala> val rowarr = df.columns
rowarr: Array[String] = Array(A, B, C, D)

scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
+---+---+---+---+------------+------+
| A| B| C| D| arr|mincol|
+---+---+---+---+------------+------+
| 1| 2| 3| 4|[1, 2, 3, 4]| A|
| 5| 4| 3| 1|[5, 4, 3, 1]| D|
+---+---+---+---+------------+------+


scala>

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

add a comment |

RDD way and without udf()s.

scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

scala> val rowarr = df.columns
rowarr: Array[String] = Array(A, B, C, D)

scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
+---+---+---+---+------------+------+
| A| B| C| D| arr|mincol|
+---+---+---+---+------------+------+
| 1| 2| 3| 4|[1, 2, 3, 4]| A|
| 5| 4| 3| 1|[5, 4, 3, 1]| D|
+---+---+---+---+------------+------+


scala>

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

add a comment |

RDD way and without udf()s.

scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

scala> val rowarr = df.columns
rowarr: Array[String] = Array(A, B, C, D)

scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
+---+---+---+---+------------+------+
| A| B| C| D| arr|mincol|
+---+---+---+---+------------+------+
| 1| 2| 3| 4|[1, 2, 3, 4]| A|
| 5| 4| 3| 1|[5, 4, 3, 1]| D|
+---+---+---+---+------------+------+


scala>

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

RDD way and without udf()s.

scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

scala> val rowarr = df.columns
rowarr: Array[String] = Array(A, B, C, D)

scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
+---+---+---+---+------------+------+
| A| B| C| D| arr|mincol|
+---+---+---+---+------------+------+
| 1| 2| 3| 4|[1, 2, 3, 4]| A|
| 5| 4| 3| 1|[5, 4, 3, 1]| D|
+---+---+---+---+------------+------+


scala>

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

answered Nov 15 '18 at 12:34

stack0114106

4,9732423

add a comment |

-1

you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.

see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row

it would look roughly like this

dataframe.map(
 row => 
 val schema = row.schema
 val fieldNames:List[String] = ??? //extract names from schema
 fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum
 
)

This would yield a Dataset[String]

answered Nov 15 '18 at 8:38

Dominic Egger

90327

add a comment |

-1

you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.

see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row

it would look roughly like this

dataframe.map(
 row => 
 val schema = row.schema
 val fieldNames:List[String] = ??? //extract names from schema
 fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum
 
)

This would yield a Dataset[String]

answered Nov 15 '18 at 8:38

Dominic Egger

90327

add a comment |

-1

you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.

see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row

it would look roughly like this

dataframe.map(
 row => 
 val schema = row.schema
 val fieldNames:List[String] = ??? //extract names from schema
 fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum
 
)

This would yield a Dataset[String]

answered Nov 15 '18 at 8:38

Dominic Egger

90327

you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.

see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row

it would look roughly like this

dataframe.map(
 row => 
 val schema = row.schema
 val fieldNames:List[String] = ??? //extract names from schema
 fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum
 
)

This would yield a Dataset[String]

answered Nov 15 '18 at 8:38

Dominic Egger

90327

answered Nov 15 '18 at 8:38

Dominic Egger

90327

answered Nov 15 '18 at 8:38

Dominic Egger

90327

answered Nov 15 '18 at 8:38

Dominic Egger

90327

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb