Concurrent processes a lot slower than single process
I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.
When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.
I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?
Any thoughts would be appreciated.
multithreading concurrency parallel-processing cplex ampl
add a comment |
I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.
When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.
I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?
Any thoughts would be appreciated.
multithreading concurrency parallel-processing cplex ampl
1
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16
add a comment |
I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.
When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.
I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?
Any thoughts would be appreciated.
multithreading concurrency parallel-processing cplex ampl
I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.
When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.
I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?
Any thoughts would be appreciated.
multithreading concurrency parallel-processing cplex ampl
multithreading concurrency parallel-processing cplex ampl
asked Nov 14 '18 at 16:02
Drakkar NoirDrakkar Noir
61
61
1
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16
add a comment |
1
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16
1
1
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16
add a comment |
1 Answer
1
active
oldest
votes
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304285%2fconcurrent-processes-a-lot-slower-than-single-process%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
add a comment |
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
add a comment |
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
answered Nov 14 '18 at 17:04
AndyAndy
1,399815
1,399815
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304285%2fconcurrent-processes-a-lot-slower-than-single-process%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?
– LaszloLadanyi
Nov 14 '18 at 18:25
Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.
– Andy
Nov 14 '18 at 21:11
Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?
– Drakkar Noir
Nov 15 '18 at 23:16