using sparksql and spark dataframe How can we find the COLUMN NAME based on the minimum value in a row










0















i have a dataframe df . its having 4 columns



+-------+-------+-------+-------+ 
| dist1 | dist2 | dist3 | dist4 |
+-------+-------+-------+-------+
| 42 | 53 | 24 | 17 |
+-------+-------+-------+-------+


output i want is



dist4



seems easy but i did not find any proper solution using dataframe or sparksql query










share|improve this question




























    0















    i have a dataframe df . its having 4 columns



    +-------+-------+-------+-------+ 
    | dist1 | dist2 | dist3 | dist4 |
    +-------+-------+-------+-------+
    | 42 | 53 | 24 | 17 |
    +-------+-------+-------+-------+


    output i want is



    dist4



    seems easy but i did not find any proper solution using dataframe or sparksql query










    share|improve this question


























      0












      0








      0








      i have a dataframe df . its having 4 columns



      +-------+-------+-------+-------+ 
      | dist1 | dist2 | dist3 | dist4 |
      +-------+-------+-------+-------+
      | 42 | 53 | 24 | 17 |
      +-------+-------+-------+-------+


      output i want is



      dist4



      seems easy but i did not find any proper solution using dataframe or sparksql query










      share|improve this question
















      i have a dataframe df . its having 4 columns



      +-------+-------+-------+-------+ 
      | dist1 | dist2 | dist3 | dist4 |
      +-------+-------+-------+-------+
      | 42 | 53 | 24 | 17 |
      +-------+-------+-------+-------+


      output i want is



      dist4



      seems easy but i did not find any proper solution using dataframe or sparksql query







      sql apache-spark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 7:29







      stackoverflow

















      asked Nov 15 '18 at 6:44









      stackoverflowstackoverflow

      118




      118






















          5 Answers
          5






          active

          oldest

          votes


















          -1














          you can do something like,



          import org.apache.spark.sql.functions._

          val cols = df.columns
          val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

          df.withColumn("res", u1(array("*")))





          share|improve this answer























          • thanks a lot ... worked perfectly :)

            – stackoverflow
            Nov 15 '18 at 10:02











          • i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

            – stackoverflow
            Nov 15 '18 at 10:15



















          1














          You may use least function as



          select least(dist1,dist2,dist3,dist4) as min_dist
          from yourTable;


          For the opposite cases greatest may be used.



          EDIT :
          To detect column names the following maybe used to get rows



          select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
          struct(24, 'dist3'), struct(17, 'dist4') ))

          42 dist1
          53 dist2
          24 dist3
          17 dist4


          and then min function may be applied to get dist4






          share|improve this answer

























          • @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

            – stackoverflow
            Nov 15 '18 at 7:29


















          1














          Try this,

          df.show
          +---+---+---+---+
          | A| B| C| D|
          +---+---+---+---+
          | 1| 2| 3| 4|
          | 5| 4| 3| 1|
          +---+---+---+---+

          val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

          val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

          val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

          result.show
          +---+---+---+---+--------------------+---------+
          | A| B| C| D| least|least_col|
          +---+---+---+---+--------------------+---------+
          |1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
          |5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
          +---+---+---+---+--------------------+---------+





          share|improve this answer






























            1














            RDD way and without udf()s.



            scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
            df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

            scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
            df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

            scala> val rowarr = df.columns
            rowarr: Array[String] = Array(A, B, C, D)

            scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
            rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

            scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
            +---+---+---+---+------------+------+
            | A| B| C| D| arr|mincol|
            +---+---+---+---+------------+------+
            | 1| 2| 3| 4|[1, 2, 3, 4]| A|
            | 5| 4| 3| 1|[5, 4, 3, 1]| D|
            +---+---+---+---+------------+------+


            scala>





            share|improve this answer






























              -1














              you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.



              see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row



              it would look roughly like this



              dataframe.map(
              row =>
              val schema = row.schema
              val fieldNames:List[String] = ??? //extract names from schema
              fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum

              )


              This would yield a Dataset[String]






              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313818%2fusing-sparksql-and-spark-dataframe-how-can-we-find-the-column-name-based-on-the%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                -1














                you can do something like,



                import org.apache.spark.sql.functions._

                val cols = df.columns
                val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

                df.withColumn("res", u1(array("*")))





                share|improve this answer























                • thanks a lot ... worked perfectly :)

                  – stackoverflow
                  Nov 15 '18 at 10:02











                • i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                  – stackoverflow
                  Nov 15 '18 at 10:15
















                -1














                you can do something like,



                import org.apache.spark.sql.functions._

                val cols = df.columns
                val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

                df.withColumn("res", u1(array("*")))





                share|improve this answer























                • thanks a lot ... worked perfectly :)

                  – stackoverflow
                  Nov 15 '18 at 10:02











                • i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                  – stackoverflow
                  Nov 15 '18 at 10:15














                -1












                -1








                -1







                you can do something like,



                import org.apache.spark.sql.functions._

                val cols = df.columns
                val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

                df.withColumn("res", u1(array("*")))





                share|improve this answer













                you can do something like,



                import org.apache.spark.sql.functions._

                val cols = df.columns
                val u1 = udf((s: Seq[Int]) => cols(s.zipWithIndex.min._2))

                df.withColumn("res", u1(array("*")))






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 15 '18 at 8:59









                Chitral VermaChitral Verma

                1,0471617




                1,0471617












                • thanks a lot ... worked perfectly :)

                  – stackoverflow
                  Nov 15 '18 at 10:02











                • i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                  – stackoverflow
                  Nov 15 '18 at 10:15


















                • thanks a lot ... worked perfectly :)

                  – stackoverflow
                  Nov 15 '18 at 10:02











                • i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                  – stackoverflow
                  Nov 15 '18 at 10:15

















                thanks a lot ... worked perfectly :)

                – stackoverflow
                Nov 15 '18 at 10:02





                thanks a lot ... worked perfectly :)

                – stackoverflow
                Nov 15 '18 at 10:02













                i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                – stackoverflow
                Nov 15 '18 at 10:15






                i have a doubt . i have done it other way like df.first.toSeq.asInstanceOf[Seq[DoubleType]] after that i have used if statements for comparision based on index . its giving me the result but which one is faster and best solution ?

                – stackoverflow
                Nov 15 '18 at 10:15














                1














                You may use least function as



                select least(dist1,dist2,dist3,dist4) as min_dist
                from yourTable;


                For the opposite cases greatest may be used.



                EDIT :
                To detect column names the following maybe used to get rows



                select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
                struct(24, 'dist3'), struct(17, 'dist4') ))

                42 dist1
                53 dist2
                24 dist3
                17 dist4


                and then min function may be applied to get dist4






                share|improve this answer

























                • @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                  – stackoverflow
                  Nov 15 '18 at 7:29















                1














                You may use least function as



                select least(dist1,dist2,dist3,dist4) as min_dist
                from yourTable;


                For the opposite cases greatest may be used.



                EDIT :
                To detect column names the following maybe used to get rows



                select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
                struct(24, 'dist3'), struct(17, 'dist4') ))

                42 dist1
                53 dist2
                24 dist3
                17 dist4


                and then min function may be applied to get dist4






                share|improve this answer

























                • @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                  – stackoverflow
                  Nov 15 '18 at 7:29













                1












                1








                1







                You may use least function as



                select least(dist1,dist2,dist3,dist4) as min_dist
                from yourTable;


                For the opposite cases greatest may be used.



                EDIT :
                To detect column names the following maybe used to get rows



                select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
                struct(24, 'dist3'), struct(17, 'dist4') ))

                42 dist1
                53 dist2
                24 dist3
                17 dist4


                and then min function may be applied to get dist4






                share|improve this answer















                You may use least function as



                select least(dist1,dist2,dist3,dist4) as min_dist
                from yourTable;


                For the opposite cases greatest may be used.



                EDIT :
                To detect column names the following maybe used to get rows



                select inline(array(struct(42, 'dist1'), struct(53, 'dist2'), 
                struct(24, 'dist3'), struct(17, 'dist4') ))

                42 dist1
                53 dist2
                24 dist3
                17 dist4


                and then min function may be applied to get dist4







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 15 '18 at 8:49

























                answered Nov 15 '18 at 6:56









                Barbaros ÖzhanBarbaros Özhan

                14.7k71634




                14.7k71634












                • @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                  – stackoverflow
                  Nov 15 '18 at 7:29

















                • @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                  – stackoverflow
                  Nov 15 '18 at 7:29
















                @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                – stackoverflow
                Nov 15 '18 at 7:29





                @Barbaros Özhan . i want the column name having minimum value . least() is for finding the minimum value , i want the column name

                – stackoverflow
                Nov 15 '18 at 7:29











                1














                Try this,

                df.show
                +---+---+---+---+
                | A| B| C| D|
                +---+---+---+---+
                | 1| 2| 3| 4|
                | 5| 4| 3| 1|
                +---+---+---+---+

                val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

                val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

                val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

                result.show
                +---+---+---+---+--------------------+---------+
                | A| B| C| D| least|least_col|
                +---+---+---+---+--------------------+---------+
                |1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
                |5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
                +---+---+---+---+--------------------+---------+





                share|improve this answer



























                  1














                  Try this,

                  df.show
                  +---+---+---+---+
                  | A| B| C| D|
                  +---+---+---+---+
                  | 1| 2| 3| 4|
                  | 5| 4| 3| 1|
                  +---+---+---+---+

                  val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

                  val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

                  val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

                  result.show
                  +---+---+---+---+--------------------+---------+
                  | A| B| C| D| least|least_col|
                  +---+---+---+---+--------------------+---------+
                  |1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
                  |5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
                  +---+---+---+---+--------------------+---------+





                  share|improve this answer

























                    1












                    1








                    1







                    Try this,

                    df.show
                    +---+---+---+---+
                    | A| B| C| D|
                    +---+---+---+---+
                    | 1| 2| 3| 4|
                    | 5| 4| 3| 1|
                    +---+---+---+---+

                    val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

                    val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

                    val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

                    result.show
                    +---+---+---+---+--------------------+---------+
                    | A| B| C| D| least|least_col|
                    +---+---+---+---+--------------------+---------+
                    |1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
                    |5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
                    +---+---+---+---+--------------------+---------+





                    share|improve this answer













                    Try this,

                    df.show
                    +---+---+---+---+
                    | A| B| C| D|
                    +---+---+---+---+
                    | 1| 2| 3| 4|
                    | 5| 4| 3| 1|
                    +---+---+---+---+

                    val temp_df = df.columns.foldLeft(df) (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(col(colName), lit(","+colName)))

                    val minval = udf((ar: Seq[String]) => ar.min.split(",")(1))

                    val result = temp_df.withColumn("least", split(concat_ws(":",x.columns.map(col(_)):_*),":")).withColumn("least_col", minval(col("least")))

                    result.show
                    +---+---+---+---+--------------------+---------+
                    | A| B| C| D| least|least_col|
                    +---+---+---+---+--------------------+---------+
                    |1,A|2,B|3,C|4,D|[1,A, 2,B, 3,C, 4,D]| A|
                    |5,A|4,B|3,C|1,D|[5,A, 4,B, 3,C, 1,D]| D|
                    +---+---+---+---+--------------------+---------+






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 15 '18 at 10:22









                    Sathiyan SSathiyan S

                    513310




                    513310





















                        1














                        RDD way and without udf()s.



                        scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
                        df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

                        scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
                        df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

                        scala> val rowarr = df.columns
                        rowarr: Array[String] = Array(A, B, C, D)

                        scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
                        rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

                        scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
                        +---+---+---+---+------------+------+
                        | A| B| C| D| arr|mincol|
                        +---+---+---+---+------------+------+
                        | 1| 2| 3| 4|[1, 2, 3, 4]| A|
                        | 5| 4| 3| 1|[5, 4, 3, 1]| D|
                        +---+---+---+---+------------+------+


                        scala>





                        share|improve this answer



























                          1














                          RDD way and without udf()s.



                          scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
                          df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

                          scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
                          df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

                          scala> val rowarr = df.columns
                          rowarr: Array[String] = Array(A, B, C, D)

                          scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
                          rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

                          scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
                          +---+---+---+---+------------+------+
                          | A| B| C| D| arr|mincol|
                          +---+---+---+---+------------+------+
                          | 1| 2| 3| 4|[1, 2, 3, 4]| A|
                          | 5| 4| 3| 1|[5, 4, 3, 1]| D|
                          +---+---+---+---+------------+------+


                          scala>





                          share|improve this answer

























                            1












                            1








                            1







                            RDD way and without udf()s.



                            scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
                            df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

                            scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
                            df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

                            scala> val rowarr = df.columns
                            rowarr: Array[String] = Array(A, B, C, D)

                            scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
                            rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

                            scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
                            +---+---+---+---+------------+------+
                            | A| B| C| D| arr|mincol|
                            +---+---+---+---+------------+------+
                            | 1| 2| 3| 4|[1, 2, 3, 4]| A|
                            | 5| 4| 3| 1|[5, 4, 3, 1]| D|
                            +---+---+---+---+------------+------+


                            scala>





                            share|improve this answer













                            RDD way and without udf()s.



                            scala> val df = Seq((1,2,3,4),(5,4,3,1)).toDF("A","B","C","D")
                            df: org.apache.spark.sql.DataFrame = [A: int, B: int ... 2 more fields]

                            scala> val df2 = df.withColumn("arr", array(df.columns.map(col(_)):_*))
                            df2: org.apache.spark.sql.DataFrame = [A: int, B: int ... 3 more fields]

                            scala> val rowarr = df.columns
                            rowarr: Array[String] = Array(A, B, C, D)

                            scala> val rdd1 = df2.rdd.map( x=> val p = x.getAs[WrappedArray[Int]]("arr").toArray; val q=rowarr(p.indexWhere(_==p.min));Row.merge(x,Row(q)) )
                            rdd1: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[83] at map at <console>:47

                            scala> spark.createDataFrame(rdd1,df2.schema.add(StructField("mincol",StringType))).show
                            +---+---+---+---+------------+------+
                            | A| B| C| D| arr|mincol|
                            +---+---+---+---+------------+------+
                            | 1| 2| 3| 4|[1, 2, 3, 4]| A|
                            | 5| 4| 3| 1|[5, 4, 3, 1]| D|
                            +---+---+---+---+------------+------+


                            scala>






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 15 '18 at 12:34









                            stack0114106stack0114106

                            4,9732423




                            4,9732423





















                                -1














                                you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.



                                see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row



                                it would look roughly like this



                                dataframe.map(
                                row =>
                                val schema = row.schema
                                val fieldNames:List[String] = ??? //extract names from schema
                                fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum

                                )


                                This would yield a Dataset[String]






                                share|improve this answer



























                                  -1














                                  you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.



                                  see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row



                                  it would look roughly like this



                                  dataframe.map(
                                  row =>
                                  val schema = row.schema
                                  val fieldNames:List[String] = ??? //extract names from schema
                                  fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum

                                  )


                                  This would yield a Dataset[String]






                                  share|improve this answer

























                                    -1












                                    -1








                                    -1







                                    you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.



                                    see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row



                                    it would look roughly like this



                                    dataframe.map(
                                    row =>
                                    val schema = row.schema
                                    val fieldNames:List[String] = ??? //extract names from schema
                                    fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum

                                    )


                                    This would yield a Dataset[String]






                                    share|improve this answer













                                    you could access the rows schema, retrieve a list of names out of there and access the rows value by name and then figure it out that way.



                                    see: https://spark.apache.org/docs/2.3.2/api/scala/index.html#org.apache.spark.sql.Row



                                    it would look roughly like this



                                    dataframe.map(
                                    row =>
                                    val schema = row.schema
                                    val fieldNames:List[String] = ??? //extract names from schema
                                    fieldNames.foldLeft(("", 0))(???) // retireve field value using it's name and retain maxiumum

                                    )


                                    This would yield a Dataset[String]







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 15 '18 at 8:38









                                    Dominic EggerDominic Egger

                                    90327




                                    90327



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313818%2fusing-sparksql-and-spark-dataframe-how-can-we-find-the-column-name-based-on-the%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Use pre created SQLite database for Android project in kotlin

                                        Darth Vader #20

                                        Ondo