I don't know what is wrong with this loop parallelization










0















I do not know why, but the parallelization of this loop is not working fine:



#pragma omp parallel for private(d) 
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;



ps[j] = im;


The d variable is private and the dm and im are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!










share|improve this question

















  • 3





    Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

    – Zulan
    Nov 13 '18 at 15:15











  • This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

    – John Bollinger
    Nov 13 '18 at 19:28















0















I do not know why, but the parallelization of this loop is not working fine:



#pragma omp parallel for private(d) 
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;



ps[j] = im;


The d variable is private and the dm and im are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!










share|improve this question

















  • 3





    Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

    – Zulan
    Nov 13 '18 at 15:15











  • This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

    – John Bollinger
    Nov 13 '18 at 19:28













0












0








0








I do not know why, but the parallelization of this loop is not working fine:



#pragma omp parallel for private(d) 
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;



ps[j] = im;


The d variable is private and the dm and im are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!










share|improve this question














I do not know why, but the parallelization of this loop is not working fine:



#pragma omp parallel for private(d) 
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;



ps[j] = im;


The d variable is private and the dm and im are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!







c parallel-processing openmp






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 14:56









edoelasedoelas

124




124







  • 3





    Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

    – Zulan
    Nov 13 '18 at 15:15











  • This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

    – John Bollinger
    Nov 13 '18 at 19:28












  • 3





    Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

    – Zulan
    Nov 13 '18 at 15:15











  • This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

    – John Bollinger
    Nov 13 '18 at 19:28







3




3





Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

– Zulan
Nov 13 '18 at 15:15





Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read dm unprotected. It is, however, unlikely that this is the only problem in your code.

– Zulan
Nov 13 '18 at 15:15













This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

– John Bollinger
Nov 13 '18 at 19:28





This looks like it might appear inside a loop over variable j. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.

– John Bollinger
Nov 13 '18 at 19:28












2 Answers
2






active

oldest

votes


















0














The first if ( d < dm ) is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush).



The distancia function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).



Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.



Unless the calculation of the distancia function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.



float dm=INFINITY;
int im=-1;

#pragma omp parallel shared(dm,im)

// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;

// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;

// end for

// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical

if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;

//end critical

//end parallel

if(im==-1)
throw_some_error();
else
do_something_with(im);





share|improve this answer
































    0














    I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i value and adding this condition: (dm == d && i < im) in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
    Here is the full code with the changes:



    for ( j = 0 ; j < nt2 ; j++ ) 
    dm = MAX_LON + 1;
    #pragma omp parallel for private(d)
    for ( i = 0 ; i < nt1 ; i++ )
    ps[j] = im;






    share|improve this answer
























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283756%2fi-dont-know-what-is-wrong-with-this-loop-parallelization%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      The first if ( d < dm ) is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush).



      The distancia function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).



      Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.



      Unless the calculation of the distancia function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm.
      It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.



      float dm=INFINITY;
      int im=-1;

      #pragma omp parallel shared(dm,im)

      // Declare private variables
      float d,dmin_thread,imin_thread;
      dmin_thread=INFINITY;
      imin_thread=-1;

      // loop through i
      #pragma omp for
      for ( i = 0 ; i < nt1 ; i++ )
      d = some_function(i);
      // the following lines operate on thread-private variables
      if ( d < dmin_thread)
      dmin_thread= d;
      imin_thread= i;

      // end for

      // Reduction part after the loop has been completed
      // The critical section is entered only once by each thread
      #pragma omp critical

      if ( dmin_thread < dm)
      dm = dmin_thread;
      im = imin_thread;

      //end critical

      //end parallel

      if(im==-1)
      throw_some_error();
      else
      do_something_with(im);





      share|improve this answer





























        0














        The first if ( d < dm ) is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush).



        The distancia function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).



        Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.



        Unless the calculation of the distancia function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm.
        It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.



        float dm=INFINITY;
        int im=-1;

        #pragma omp parallel shared(dm,im)

        // Declare private variables
        float d,dmin_thread,imin_thread;
        dmin_thread=INFINITY;
        imin_thread=-1;

        // loop through i
        #pragma omp for
        for ( i = 0 ; i < nt1 ; i++ )
        d = some_function(i);
        // the following lines operate on thread-private variables
        if ( d < dmin_thread)
        dmin_thread= d;
        imin_thread= i;

        // end for

        // Reduction part after the loop has been completed
        // The critical section is entered only once by each thread
        #pragma omp critical

        if ( dmin_thread < dm)
        dm = dmin_thread;
        im = imin_thread;

        //end critical

        //end parallel

        if(im==-1)
        throw_some_error();
        else
        do_something_with(im);





        share|improve this answer



























          0












          0








          0







          The first if ( d < dm ) is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush).



          The distancia function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).



          Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.



          Unless the calculation of the distancia function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm.
          It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.



          float dm=INFINITY;
          int im=-1;

          #pragma omp parallel shared(dm,im)

          // Declare private variables
          float d,dmin_thread,imin_thread;
          dmin_thread=INFINITY;
          imin_thread=-1;

          // loop through i
          #pragma omp for
          for ( i = 0 ; i < nt1 ; i++ )
          d = some_function(i);
          // the following lines operate on thread-private variables
          if ( d < dmin_thread)
          dmin_thread= d;
          imin_thread= i;

          // end for

          // Reduction part after the loop has been completed
          // The critical section is entered only once by each thread
          #pragma omp critical

          if ( dmin_thread < dm)
          dm = dmin_thread;
          im = imin_thread;

          //end critical

          //end parallel

          if(im==-1)
          throw_some_error();
          else
          do_something_with(im);





          share|improve this answer















          The first if ( d < dm ) is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush).



          The distancia function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).



          Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.



          Unless the calculation of the distancia function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm.
          It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.



          float dm=INFINITY;
          int im=-1;

          #pragma omp parallel shared(dm,im)

          // Declare private variables
          float d,dmin_thread,imin_thread;
          dmin_thread=INFINITY;
          imin_thread=-1;

          // loop through i
          #pragma omp for
          for ( i = 0 ; i < nt1 ; i++ )
          d = some_function(i);
          // the following lines operate on thread-private variables
          if ( d < dmin_thread)
          dmin_thread= d;
          imin_thread= i;

          // end for

          // Reduction part after the loop has been completed
          // The critical section is entered only once by each thread
          #pragma omp critical

          if ( dmin_thread < dm)
          dm = dmin_thread;
          im = imin_thread;

          //end critical

          //end parallel

          if(im==-1)
          throw_some_error();
          else
          do_something_with(im);






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 10:38

























          answered Nov 14 '18 at 10:29









          BriceBrice

          1,400110




          1,400110























              0














              I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i value and adding this condition: (dm == d && i < im) in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
              Here is the full code with the changes:



              for ( j = 0 ; j < nt2 ; j++ ) 
              dm = MAX_LON + 1;
              #pragma omp parallel for private(d)
              for ( i = 0 ; i < nt1 ; i++ )
              ps[j] = im;






              share|improve this answer





























                0














                I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i value and adding this condition: (dm == d && i < im) in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
                Here is the full code with the changes:



                for ( j = 0 ; j < nt2 ; j++ ) 
                dm = MAX_LON + 1;
                #pragma omp parallel for private(d)
                for ( i = 0 ; i < nt1 ; i++ )
                ps[j] = im;






                share|improve this answer



























                  0












                  0








                  0







                  I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i value and adding this condition: (dm == d && i < im) in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
                  Here is the full code with the changes:



                  for ( j = 0 ; j < nt2 ; j++ ) 
                  dm = MAX_LON + 1;
                  #pragma omp parallel for private(d)
                  for ( i = 0 ; i < nt1 ; i++ )
                  ps[j] = im;






                  share|improve this answer















                  I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i value and adding this condition: (dm == d && i < im) in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
                  Here is the full code with the changes:



                  for ( j = 0 ; j < nt2 ; j++ ) 
                  dm = MAX_LON + 1;
                  #pragma omp parallel for private(d)
                  for ( i = 0 ; i < nt1 ; i++ )
                  ps[j] = im;







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 16 '18 at 22:41

























                  answered Nov 16 '18 at 22:35









                  edoelasedoelas

                  124




                  124



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283756%2fi-dont-know-what-is-wrong-with-this-loop-parallelization%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                      Syphilis

                      Darth Vader #20