Why is Key always 0 when creating map
up vote
0
down vote
favorite
My code is supposed to extract a Map
from a dataframe
. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId
is always retrieved as 0.
Simplified version of the code:
case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")
//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap
m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))
The output has only the very last record, because all values captured by row.getAs("TransactionId")
is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction])
.
Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.
scala apache-spark
add a comment |
up vote
0
down vote
favorite
My code is supposed to extract a Map
from a dataframe
. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId
is always retrieved as 0.
Simplified version of the code:
case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")
//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap
m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))
The output has only the very last record, because all values captured by row.getAs("TransactionId")
is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction])
.
Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.
scala apache-spark
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
My code is supposed to extract a Map
from a dataframe
. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId
is always retrieved as 0.
Simplified version of the code:
case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")
//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap
m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))
The output has only the very last record, because all values captured by row.getAs("TransactionId")
is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction])
.
Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.
scala apache-spark
My code is supposed to extract a Map
from a dataframe
. The map will be used later for some calculations (mapping Credit to best matching original Billing). However the first step is failing already - the TransactionId
is always retrieved as 0.
Simplified version of the code:
case class SalesTransaction(
CustomerId : Int,
Score : Int,
Revenue : Double,
Type : String,
Credited : Double = 0.0,
LinkedTransactionId : Int = 0,
IsProcessed : Boolean = false
)
val df = Seq(
(1, 1, 123, "Sales", 100),
(1, 2, 122, "Credit", 100),
(1, 3, 99, "Sales", 70),
(1, 4, 101, "Sales", 77),
(1, 5, 102, "Credit", 75),
(1, 6, 98, "Sales", 71),
(2, 7, 200, "Sales", 55),
(2, 8, 220, "Sales", 55),
(2, 9, 200, "Credit", 50),
(2, 10, 205, "Sales", 50)
).toDF("CustomerId", "TransactionId", "TransactionAttributesScore", "TransactionType", "Revenue")
.withColumn("Revenue", $"Revenue".cast(DoubleType))
.repartition($"CustomerId")
//map generation:
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs("TransactionId")
, new SalesTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
)
).collect.toMap
m2.foreach(m => println("key: " + m._1 +" Value: "+ m._2))
The output has only the very last record, because all values captured by row.getAs("TransactionId")
is null (i.e. translates as 0 in the m2 Map) thus tuple created in each iteration is (null, [current row SalesTransaction])
.
Could you please advice me what might be wrong with my code? I'm quite new to Scala and must be missing some syntactical nuance here.
scala apache-spark
scala apache-spark
asked Nov 8 at 18:57
Dan
716
716
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
You can also use row.getAs[Int]("TransactionId")
as shown below :
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
It is always better to use the casted version of getAs to avoid errors like this.
add a comment |
up vote
0
down vote
The issue is related to data type obtained from row.getAs("TransactionId")
. Despite underlying $"TransactionId"
being integer. Converting the input explicitly resolved the issue:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row =>
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
).collect.toMap
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can also use row.getAs[Int]("TransactionId")
as shown below :
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
It is always better to use the casted version of getAs to avoid errors like this.
add a comment |
up vote
1
down vote
accepted
You can also use row.getAs[Int]("TransactionId")
as shown below :
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
It is always better to use the casted version of getAs to avoid errors like this.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can also use row.getAs[Int]("TransactionId")
as shown below :
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
It is always better to use the casted version of getAs to avoid errors like this.
You can also use row.getAs[Int]("TransactionId")
as shown below :
val m2 : Map[Int, SalesTransaction] =
df.map(row => (
row.getAs[Int]("TransactionId"),
new SalesTransaction(row.getAs("CustomerId"),
row.getAs("TransactionAttributesScore"),
row.getAs("Revenue"),
row.getAs("TransactionType"))
)
).collect.toMap
It is always better to use the casted version of getAs to avoid errors like this.
answered Nov 10 at 16:36
user238607
680711
680711
add a comment |
add a comment |
up vote
0
down vote
The issue is related to data type obtained from row.getAs("TransactionId")
. Despite underlying $"TransactionId"
being integer. Converting the input explicitly resolved the issue:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row =>
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
).collect.toMap
add a comment |
up vote
0
down vote
The issue is related to data type obtained from row.getAs("TransactionId")
. Despite underlying $"TransactionId"
being integer. Converting the input explicitly resolved the issue:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row =>
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
).collect.toMap
add a comment |
up vote
0
down vote
up vote
0
down vote
The issue is related to data type obtained from row.getAs("TransactionId")
. Despite underlying $"TransactionId"
being integer. Converting the input explicitly resolved the issue:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row =>
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
).collect.toMap
The issue is related to data type obtained from row.getAs("TransactionId")
. Despite underlying $"TransactionId"
being integer. Converting the input explicitly resolved the issue:
//… code above unchanged
val m2 : Map[Int, SlTransaction] =
df.map(row =>
val mKey : Int = row.getAs("TransactionId") //forcing into Int variable
val mValue : SlTransaction = new SlTransaction(row.getAs("CustomerId")
, row.getAs("TransactionAttributesScore")
, row.getAs("Revenue")
, row.getAs("TransactionType")
)
(mKey, mValue)
).collect.toMap
answered Nov 9 at 20:44
Dan
716
716
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53214409%2fwhy-is-key-always-0-when-creating-map%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown