Strange slowdown in some simple scala code









up vote
0
down vote

favorite












I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)



type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here


I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.



I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:



def index(id: String, map: Lookup, reverse: Reverse): Int = 
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number




Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.










share|improve this question























  • Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
    – Dima
    Nov 9 at 19:25










  • Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
    – Luis Sisamon
    Nov 10 at 11:11










  • You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
    – Dima
    Nov 10 at 13:00










  • The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
    – Luis Sisamon
    Nov 10 at 14:49















up vote
0
down vote

favorite












I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)



type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here


I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.



I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:



def index(id: String, map: Lookup, reverse: Reverse): Int = 
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number




Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.










share|improve this question























  • Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
    – Dima
    Nov 9 at 19:25










  • Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
    – Luis Sisamon
    Nov 10 at 11:11










  • You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
    – Dima
    Nov 10 at 13:00










  • The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
    – Luis Sisamon
    Nov 10 at 14:49













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)



type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here


I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.



I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:



def index(id: String, map: Lookup, reverse: Reverse): Int = 
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number




Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.










share|improve this question















I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)



type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here


I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.



I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:



def index(id: String, map: Lookup, reverse: Reverse): Int = 
if (map.contains(id))
map(id)._1
else
val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
reverse += ( number -> id)
map += (id -> (number, 0.toFloat))
number




Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.







scala






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 14:51

























asked Nov 9 at 18:59









Luis Sisamon

43




43











  • Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
    – Dima
    Nov 9 at 19:25










  • Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
    – Luis Sisamon
    Nov 10 at 11:11










  • You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
    – Dima
    Nov 10 at 13:00










  • The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
    – Luis Sisamon
    Nov 10 at 14:49

















  • Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
    – Dima
    Nov 9 at 19:25










  • Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
    – Luis Sisamon
    Nov 10 at 11:11










  • You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
    – Dima
    Nov 10 at 13:00










  • The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
    – Luis Sisamon
    Nov 10 at 14:49
















Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25




Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25












Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11




Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11












You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00




You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00












The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49





The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49













1 Answer
1






active

oldest

votes

















up vote
1
down vote













What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.



Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...



Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.






share|improve this answer




















  • Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
    – Luis Sisamon
    Nov 10 at 11:13











  • Mutable map is no less "sinful" then a mutable variable. More so if anything.
    – Dima
    Nov 10 at 13:02










  • I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
    – Luis Sisamon
    Nov 10 at 14:48










  • Ok, I am not religious, so ... If you have more questions about scala, let me know.
    – Dima
    Nov 10 at 19:41










  • I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
    – Luis Sisamon
    Nov 10 at 21:15










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231812%2fstrange-slowdown-in-some-simple-scala-code%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.



Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...



Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.






share|improve this answer




















  • Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
    – Luis Sisamon
    Nov 10 at 11:13











  • Mutable map is no less "sinful" then a mutable variable. More so if anything.
    – Dima
    Nov 10 at 13:02










  • I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
    – Luis Sisamon
    Nov 10 at 14:48










  • Ok, I am not religious, so ... If you have more questions about scala, let me know.
    – Dima
    Nov 10 at 19:41










  • I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
    – Luis Sisamon
    Nov 10 at 21:15














up vote
1
down vote













What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.



Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...



Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.






share|improve this answer




















  • Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
    – Luis Sisamon
    Nov 10 at 11:13











  • Mutable map is no less "sinful" then a mutable variable. More so if anything.
    – Dima
    Nov 10 at 13:02










  • I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
    – Luis Sisamon
    Nov 10 at 14:48










  • Ok, I am not religious, so ... If you have more questions about scala, let me know.
    – Dima
    Nov 10 at 19:41










  • I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
    – Luis Sisamon
    Nov 10 at 21:15












up vote
1
down vote










up vote
1
down vote









What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.



Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...



Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.






share|improve this answer












What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.



Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...



Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 9 at 19:56









Dima

23.2k32234




23.2k32234











  • Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
    – Luis Sisamon
    Nov 10 at 11:13











  • Mutable map is no less "sinful" then a mutable variable. More so if anything.
    – Dima
    Nov 10 at 13:02










  • I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
    – Luis Sisamon
    Nov 10 at 14:48










  • Ok, I am not religious, so ... If you have more questions about scala, let me know.
    – Dima
    Nov 10 at 19:41










  • I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
    – Luis Sisamon
    Nov 10 at 21:15
















  • Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
    – Luis Sisamon
    Nov 10 at 11:13











  • Mutable map is no less "sinful" then a mutable variable. More so if anything.
    – Dima
    Nov 10 at 13:02










  • I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
    – Luis Sisamon
    Nov 10 at 14:48










  • Ok, I am not religious, so ... If you have more questions about scala, let me know.
    – Dima
    Nov 10 at 19:41










  • I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
    – Luis Sisamon
    Nov 10 at 21:15















Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13





Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13













Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02




Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02












I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48




I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48












Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41




Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41












I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15




I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231812%2fstrange-slowdown-in-some-simple-scala-code%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo