Strange slowdown in some simple scala code
up vote
0
down vote
favorite
I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)
type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here
I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.
I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:
def index(id: String, map: Lookup, reverse: Reverse): Int =
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number
Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.
scala
add a comment |
up vote
0
down vote
favorite
I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)
type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here
I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.
I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:
def index(id: String, map: Lookup, reverse: Reverse): Int =
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number
Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.
scala
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)
type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here
I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.
I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:
def index(id: String, map: Lookup, reverse: Reverse): Int =
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number
Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.
scala
I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)
type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here
I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.
I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:
def index(id: String, map: Lookup, reverse: Reverse): Int =
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number
Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.
scala
scala
edited Nov 10 at 14:51
asked Nov 9 at 18:59
Luis Sisamon
43
43
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49
add a comment |
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
What is mapCellRvs
? Default scala Map
's .size
(and .keys.size
, which is the same thing) simply counts all elements by scanning them linearly.
Try replacing mapCellRvs.keys.size == 0
with mapCellRvs.isEmpty
...
Also, reverse.keys.max
is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
What is mapCellRvs
? Default scala Map
's .size
(and .keys.size
, which is the same thing) simply counts all elements by scanning them linearly.
Try replacing mapCellRvs.keys.size == 0
with mapCellRvs.isEmpty
...
Also, reverse.keys.max
is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
add a comment |
up vote
1
down vote
What is mapCellRvs
? Default scala Map
's .size
(and .keys.size
, which is the same thing) simply counts all elements by scanning them linearly.
Try replacing mapCellRvs.keys.size == 0
with mapCellRvs.isEmpty
...
Also, reverse.keys.max
is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
add a comment |
up vote
1
down vote
up vote
1
down vote
What is mapCellRvs
? Default scala Map
's .size
(and .keys.size
, which is the same thing) simply counts all elements by scanning them linearly.
Try replacing mapCellRvs.keys.size == 0
with mapCellRvs.isEmpty
...
Also, reverse.keys.max
is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.
What is mapCellRvs
? Default scala Map
's .size
(and .keys.size
, which is the same thing) simply counts all elements by scanning them linearly.
Try replacing mapCellRvs.keys.size == 0
with mapCellRvs.isEmpty
...
Also, reverse.keys.max
is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.
answered Nov 9 at 19:56
Dima
23.2k32234
23.2k32234
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
add a comment |
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231812%2fstrange-slowdown-in-some-simple-scala-code%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25
Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11
You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00
The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49