Concurrent processes a lot slower than single process










1















I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.



When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.



I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?



Any thoughts would be appreciated.










share|improve this question

















  • 1





    I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

    – LaszloLadanyi
    Nov 14 '18 at 18:25











  • Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

    – Andy
    Nov 14 '18 at 21:11











  • Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

    – Drakkar Noir
    Nov 15 '18 at 23:16















1















I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.



When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.



I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?



Any thoughts would be appreciated.










share|improve this question

















  • 1





    I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

    – LaszloLadanyi
    Nov 14 '18 at 18:25











  • Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

    – Andy
    Nov 14 '18 at 21:11











  • Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

    – Drakkar Noir
    Nov 15 '18 at 23:16













1












1








1








I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.



When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.



I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?



Any thoughts would be appreciated.










share|improve this question














I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.



When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.



I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?



Any thoughts would be appreciated.







multithreading concurrency parallel-processing cplex ampl






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 16:02









Drakkar NoirDrakkar Noir

61




61







  • 1





    I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

    – LaszloLadanyi
    Nov 14 '18 at 18:25











  • Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

    – Andy
    Nov 14 '18 at 21:11











  • Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

    – Drakkar Noir
    Nov 15 '18 at 23:16












  • 1





    I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

    – LaszloLadanyi
    Nov 14 '18 at 18:25











  • Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

    – Andy
    Nov 14 '18 at 21:11











  • Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

    – Drakkar Noir
    Nov 15 '18 at 23:16







1




1





I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

– LaszloLadanyi
Nov 14 '18 at 18:25





I agree with Andy that there is a resource starvation somewhere, but I think it is memory and not disk IO. Neither AMPL nor CPLEX does a lot of IO, but the size of the LPs/MIPs that cplex solves while solving an NLP via AMPL can get quite large. Can you check with top/htop how much memory you have and how much the individual solve processes consume?

– LaszloLadanyi
Nov 14 '18 at 18:25













Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

– Andy
Nov 14 '18 at 21:11





Agreed - could be and/or RAM depending on file size. One way to possibly tell is to run it a couple times in a row - if the first run is significantly slower than the second, then it's probably Disk IO being sped up by page cacheing on the 2nd+ run. If not, it's likely elsewhere.

– Andy
Nov 14 '18 at 21:11













Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

– Drakkar Noir
Nov 15 '18 at 23:16





Thanks for all comments. They are very helpfull. I have checked memory and each process use about 3% of memory. I have 62.5G of RAM memory. I read data and after that there is only processing for solve these models. I have tested taskset to see if would give me some improvement in terms of ALU and FPU sharing, but times remain slow even defining different cores. Maybe it would be cache missing?

– Drakkar Noir
Nov 15 '18 at 23:16












1 Answer
1






active

oldest

votes


















0














It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:



1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?



2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).



3) Is access to common resources properly synchronized?



To this end, and without knowing more about your problem, I'd recommend:



a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.



b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304285%2fconcurrent-processes-a-lot-slower-than-single-process%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:



    1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?



    2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).



    3) Is access to common resources properly synchronized?



    To this end, and without knowing more about your problem, I'd recommend:



    a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.



    b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.






    share|improve this answer



























      0














      It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:



      1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?



      2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).



      3) Is access to common resources properly synchronized?



      To this end, and without knowing more about your problem, I'd recommend:



      a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.



      b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.






      share|improve this answer

























        0












        0








        0







        It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:



        1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?



        2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).



        3) Is access to common resources properly synchronized?



        To this end, and without knowing more about your problem, I'd recommend:



        a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.



        b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.






        share|improve this answer













        It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:



        1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?



        2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).



        3) Is access to common resources properly synchronized?



        To this end, and without knowing more about your problem, I'd recommend:



        a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.



        b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 17:04









        AndyAndy

        1,399815




        1,399815





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53304285%2fconcurrent-processes-a-lot-slower-than-single-process%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo