Why is Key always 0 when creating map









up vote
0
down vote

favorite












My code is supposed to extract a Map from a dataframe. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId is always retrieved as 0.



Simplified version of the code:



case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")

//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap

m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))


The output has only the very last record, because all values captured by row.getAs("TransactionId") is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction]).



Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.










share|improve this question

























    up vote
    0
    down vote

    favorite












    My code is supposed to extract a Map from a dataframe. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId is always retrieved as 0.



    Simplified version of the code:



    case class SalesTransaction(
    CustomerId : Int,
    Score : Int,
    Revenue : Double,
    Type : String,
    Credited : Double = 0.0,
    LinkedTransactionId : Int = 0,
    IsProcessed : Boolean = false
    )
    val df = Seq(
    (1, 1, 123, "Sales", 100),
    (1, 2, 122, "Credit", 100),
    (1, 3, 99, "Sales", 70),
    (1, 4, 101, "Sales", 77),
    (1, 5, 102, "Credit", 75),
    (1, 6, 98, "Sales", 71),
    (2, 7, 200, "Sales", 55),
    (2, 8, 220, "Sales", 55),
    (2, 9, 200, "Credit", 50),
    (2, 10, 205, "Sales", 50)
    ).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
    .withColumn("Revenue", $"Revenue".cast(DoubleType))
    .repartition($"CustomerId")

    //map generation:
    val m2 : Map[Int, SalesTransaction] =
    df.map(row => (
    row.getAs("TransactionId")
    , new SalesTransaction(row.getAs("CustomerId")
    , row.getAs("TransactionAttributesScore")
    , row.getAs("Revenue")
    , row.getAs("TransactionType")
    )
    )
    ).collect.toMap

    m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))


    The output has only the very last record, because all values captured by row.getAs("TransactionId") is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction]).



    Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.










    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      My code is supposed to extract a Map from a dataframe. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId is always retrieved as 0.



      Simplified version of the code:



      case class SalesTransaction(
      CustomerId : Int,
      Score : Int,
      Revenue : Double,
      Type : String,
      Credited : Double = 0.0,
      LinkedTransactionId : Int = 0,
      IsProcessed : Boolean = false
      )
      val df = Seq(
      (1, 1, 123, "Sales", 100),
      (1, 2, 122, "Credit", 100),
      (1, 3, 99, "Sales", 70),
      (1, 4, 101, "Sales", 77),
      (1, 5, 102, "Credit", 75),
      (1, 6, 98, "Sales", 71),
      (2, 7, 200, "Sales", 55),
      (2, 8, 220, "Sales", 55),
      (2, 9, 200, "Credit", 50),
      (2, 10, 205, "Sales", 50)
      ).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
      .withColumn("Revenue", $"Revenue".cast(DoubleType))
      .repartition($"CustomerId")

      //map generation:
      val m2 : Map[Int, SalesTransaction] =
      df.map(row => (
      row.getAs("TransactionId")
      , new SalesTransaction(row.getAs("CustomerId")
      , row.getAs("TransactionAttributesScore")
      , row.getAs("Revenue")
      , row.getAs("TransactionType")
      )
      )
      ).collect.toMap

      m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))


      The output has only the very last record, because all values captured by row.getAs("TransactionId") is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction]).



      Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.










      share|improve this question













      My code is supposed to extract a Map from a dataframe. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId is always retrieved as 0.



      Simplified version of the code:



      case class SalesTransaction(
      CustomerId : Int,
      Score : Int,
      Revenue : Double,
      Type : String,
      Credited : Double = 0.0,
      LinkedTransactionId : Int = 0,
      IsProcessed : Boolean = false
      )
      val df = Seq(
      (1, 1, 123, "Sales", 100),
      (1, 2, 122, "Credit", 100),
      (1, 3, 99, "Sales", 70),
      (1, 4, 101, "Sales", 77),
      (1, 5, 102, "Credit", 75),
      (1, 6, 98, "Sales", 71),
      (2, 7, 200, "Sales", 55),
      (2, 8, 220, "Sales", 55),
      (2, 9, 200, "Credit", 50),
      (2, 10, 205, "Sales", 50)
      ).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
      .withColumn("Revenue", $"Revenue".cast(DoubleType))
      .repartition($"CustomerId")

      //map generation:
      val m2 : Map[Int, SalesTransaction] =
      df.map(row => (
      row.getAs("TransactionId")
      , new SalesTransaction(row.getAs("CustomerId")
      , row.getAs("TransactionAttributesScore")
      , row.getAs("Revenue")
      , row.getAs("TransactionType")
      )
      )
      ).collect.toMap

      m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))


      The output has only the very last record, because all values captured by row.getAs("TransactionId") is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction]).



      Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.







      scala apache-spark






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 8 at 18:57









      Dan

      716




      716






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You can also use row.getAs[Int]("TransactionId") as shown below :



          val m2 : Map[Int, SalesTransaction] =
          df.map(row => (
          row.getAs[Int]("TransactionId"),
          new SalesTransaction(row.getAs("CustomerId"),
          row.getAs("TransactionAttributesScore"),
          row.getAs("Revenue"),
          row.getAs("TransactionType"))
          )
          ).collect.toMap


          It is always better to use the casted version of getAs to avoid errors like this.






          share|improve this answer



























            up vote
            0
            down vote













            The issue is related to data type obtained from row.getAs("TransactionId"). Despite underlying $"TransactionId" being integer. Converting the input explicitly resolved the issue:



            //… code above unchanged
            val m2 : Map[Int, SlTransaction] =
            df.map(row =>
            val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
            val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
            , row.getAs("TransactionAttributesScore")
            , row.getAs("Revenue")
            , row.getAs("TransactionType")
            )
            (mKey, mValue)

            ).collect.toMap





            share|improve this answer




















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53214409%2fwhy-is-key-always-0-when-creating-map%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote



              accepted










              You can also use row.getAs[Int]("TransactionId") as shown below :



              val m2 : Map[Int, SalesTransaction] =
              df.map(row => (
              row.getAs[Int]("TransactionId"),
              new SalesTransaction(row.getAs("CustomerId"),
              row.getAs("TransactionAttributesScore"),
              row.getAs("Revenue"),
              row.getAs("TransactionType"))
              )
              ).collect.toMap


              It is always better to use the casted version of getAs to avoid errors like this.






              share|improve this answer
























                up vote
                1
                down vote



                accepted










                You can also use row.getAs[Int]("TransactionId") as shown below :



                val m2 : Map[Int, SalesTransaction] =
                df.map(row => (
                row.getAs[Int]("TransactionId"),
                new SalesTransaction(row.getAs("CustomerId"),
                row.getAs("TransactionAttributesScore"),
                row.getAs("Revenue"),
                row.getAs("TransactionType"))
                )
                ).collect.toMap


                It is always better to use the casted version of getAs to avoid errors like this.






                share|improve this answer






















                  up vote
                  1
                  down vote



                  accepted







                  up vote
                  1
                  down vote



                  accepted






                  You can also use row.getAs[Int]("TransactionId") as shown below :



                  val m2 : Map[Int, SalesTransaction] =
                  df.map(row => (
                  row.getAs[Int]("TransactionId"),
                  new SalesTransaction(row.getAs("CustomerId"),
                  row.getAs("TransactionAttributesScore"),
                  row.getAs("Revenue"),
                  row.getAs("TransactionType"))
                  )
                  ).collect.toMap


                  It is always better to use the casted version of getAs to avoid errors like this.






                  share|improve this answer












                  You can also use row.getAs[Int]("TransactionId") as shown below :



                  val m2 : Map[Int, SalesTransaction] =
                  df.map(row => (
                  row.getAs[Int]("TransactionId"),
                  new SalesTransaction(row.getAs("CustomerId"),
                  row.getAs("TransactionAttributesScore"),
                  row.getAs("Revenue"),
                  row.getAs("TransactionType"))
                  )
                  ).collect.toMap


                  It is always better to use the casted version of getAs to avoid errors like this.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 10 at 16:36









                  user238607

                  680711




                  680711






















                      up vote
                      0
                      down vote













                      The issue is related to data type obtained from row.getAs("TransactionId"). Despite underlying $"TransactionId" being integer. Converting the input explicitly resolved the issue:



                      //… code above unchanged
                      val m2 : Map[Int, SlTransaction] =
                      df.map(row =>
                      val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
                      val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
                      , row.getAs("TransactionAttributesScore")
                      , row.getAs("Revenue")
                      , row.getAs("TransactionType")
                      )
                      (mKey, mValue)

                      ).collect.toMap





                      share|improve this answer
























                        up vote
                        0
                        down vote













                        The issue is related to data type obtained from row.getAs("TransactionId"). Despite underlying $"TransactionId" being integer. Converting the input explicitly resolved the issue:



                        //… code above unchanged
                        val m2 : Map[Int, SlTransaction] =
                        df.map(row =>
                        val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
                        val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
                        , row.getAs("TransactionAttributesScore")
                        , row.getAs("Revenue")
                        , row.getAs("TransactionType")
                        )
                        (mKey, mValue)

                        ).collect.toMap





                        share|improve this answer






















                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          The issue is related to data type obtained from row.getAs("TransactionId"). Despite underlying $"TransactionId" being integer. Converting the input explicitly resolved the issue:



                          //… code above unchanged
                          val m2 : Map[Int, SlTransaction] =
                          df.map(row =>
                          val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
                          val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
                          , row.getAs("TransactionAttributesScore")
                          , row.getAs("Revenue")
                          , row.getAs("TransactionType")
                          )
                          (mKey, mValue)

                          ).collect.toMap





                          share|improve this answer












                          The issue is related to data type obtained from row.getAs("TransactionId"). Despite underlying $"TransactionId" being integer. Converting the input explicitly resolved the issue:



                          //… code above unchanged
                          val m2 : Map[Int, SlTransaction] =
                          df.map(row =>
                          val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
                          val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
                          , row.getAs("TransactionAttributesScore")
                          , row.getAs("Revenue")
                          , row.getAs("TransactionType")
                          )
                          (mKey, mValue)

                          ).collect.toMap






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 9 at 20:44









                          Dan

                          716




                          716



























                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53214409%2fwhy-is-key-always-0-when-creating-map%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                              Darth Vader #20

                              Ondo