Strange slowdown in some simple scala code

up vote
0
down vote

favorite

I am processing a large number of records (CDRS) that are essentially (who, where, how much), to save space I use a lookup to map the strings into integer and aggregate the traffic on a map of maps (who maps to a map (where maps how much)

type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here

I have found a strange behavior, when I build the lookup tables in advance the code runs as expected, however when I start processing and build the maps on the fly it slows down as it processes the records.

I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:

def index(id: String, map: Lookup, reverse: Reverse): Int = 
 if (map.contains(id)) 
 map(id)._1
 else 
 val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
 reverse += ( number -> id)
 map += (id -> (number, 0.toFloat))
 number

Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25

Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11

You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00

The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49

add a comment |

up vote
0
down vote

favorite

type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here

I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:

def index(id: String, map: Lookup, reverse: Reverse): Int = 
 if (map.contains(id)) 
 map(id)._1
 else 
 val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
 reverse += ( number -> id)
 map += (id -> (number, 0.toFloat))
 number

Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25

Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11

You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00

The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49

add a comment |

up vote
0
down vote

favorite

type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here

I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:

def index(id: String, map: Lookup, reverse: Reverse): Int = 
 if (map.contains(id)) 
 map(id)._1
 else 
 val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
 reverse += ( number -> id)
 map += (id -> (number, 0.toFloat))
 number

Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

type CDR = (String, String, Int)
type Lookup = scala.collection.mutable.HashMap[String, (Int, Float)]
type Traffic = scala.collection.mutable.HashMap[Int,scala.collection.mutable.HashMap[Int,Int]]enter code here

I use the same function to build the lookup tables for this comparison. I essentially check if the code for the lookup is there, if not i create a new entry (it is a mutable map), like this:

def index(id: String, map: Lookup, reverse: Reverse): Int = 
 if (map.contains(id)) 
 map(id)._1
 else 
 val number = if (map.keys.size == 0) 0 else reverse.keys.max + 1
 reverse += ( number -> id)
 map += (id -> (number, 0.toFloat))
 number

Am I missing something here?
EDIT----> I can no longer reproduce the slowdown. I will assume I was either too tired or dumber than usual. Running time now seems to be same as I expected to be.

scala

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

edited Nov 10 at 14:51

asked Nov 9 at 18:59

Luis Sisamon

asked Nov 9 at 18:59

Luis Sisamon

asked Nov 9 at 18:59

Luis Sisamon

Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25

Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11

You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00

The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49

add a comment |

Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25

Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11

You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00

The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49

Not sure what surprises you ... Building lookup tables as you go is slower than ... well ... not building them :)
– Dima
Nov 9 at 19:25

Because i actually build them. I execute exactly the same instructions. I insert the same number of entries into the lookup tables.
– Luis Sisamon
Nov 10 at 11:11

You are talking about two different cases, but saying you execute exactly the same instructions. If instructions are the same, so are the cases.
– Dima
Nov 10 at 13:00

The order of execution was different. But lets forget, I can no longer reproduce the slowdown. They run now in exactly the same time either filling the look up table up front or filling on the fly.
– Luis Sisamon
Nov 10 at 14:49

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.

Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...

Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.

answered Nov 9 at 19:56

Dima

23.2k32234

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231812%2fstrange-slowdown-in-some-simple-scala-code%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.

Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...

Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.

answered Nov 9 at 19:56

Dima

23.2k32234

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

add a comment |

up vote
1
down vote

What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.

Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...

Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.

answered Nov 9 at 19:56

Dima

23.2k32234

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

add a comment |

up vote
1
down vote

What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.

Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...

Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.

answered Nov 9 at 19:56

Dima

23.2k32234

What is mapCellRvs? Default scala Map's .size (and .keys.size, which is the same thing) simply counts all elements by scanning them linearly.

Try replacing mapCellRvs.keys.size == 0 with mapCellRvs.isEmpty ...

Also, reverse.keys.max is linear as well. You may want to just remember the max somewhere separately, rather than compute it every time.

answered Nov 9 at 19:56

Dima

23.2k32234

answered Nov 9 at 19:56

Dima

23.2k32234

answered Nov 9 at 19:56

Dima

23.2k32234

answered Nov 9 at 19:56

Dima

23.2k32234

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

add a comment |

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

Thanks for your comment, i used a mutable variable before but you know, mutability, sin,... those things. But then it is linear when i build the lookup table before processing the records or I build as I go.
– Luis Sisamon
Nov 10 at 11:13

Mutable map is no less "sinful" then a mutable variable. More so if anything.
– Dima
Nov 10 at 13:02

I know, I know... but for this I am afraid i have no other choice (I need to aggregate some billons rows) Not using a var is my penance :)
– Luis Sisamon
Nov 10 at 14:48

Ok, I am not religious, so ... If you have more questions about scala, let me know.
– Dima
Nov 10 at 19:41

I have tried to use inmutable as much as possible and I have hit the wall now. I had a first, iterative, solution using several mutable trees. Now I have a recursive one with just one mutable map with essentially no loss of performance. I will open a new question for that. Thanks a lot for your help!
– Luis Sisamon
Nov 10 at 21:15

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb