I don't know what is wrong with this loop parallelization
I do not know why, but the parallelization of this loop is not working fine:
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;
ps[j] = im;
The d
variable is private and the dm
and im
are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!
c parallel-processing openmp
add a comment |
I do not know why, but the parallelization of this loop is not working fine:
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;
ps[j] = im;
The d
variable is private and the dm
and im
are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!
c parallel-processing openmp
3
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even readdm
unprotected. It is, however, unlikely that this is the only problem in your code.
– Zulan
Nov 13 '18 at 15:15
This looks like it might appear inside a loop over variablej
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.
– John Bollinger
Nov 13 '18 at 19:28
add a comment |
I do not know why, but the parallelization of this loop is not working fine:
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;
ps[j] = im;
The d
variable is private and the dm
and im
are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!
c parallel-processing openmp
I do not know why, but the parallelization of this loop is not working fine:
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
d = distancia(tabla1[i],tabla2[j]);
if ( d < dm )
#pragma omp critical
if ( d < dm )
dm = d;
im = i;
ps[j] = im;
The d
variable is private and the dm
and im
are inside a critical zone but when I exec this code (the full code is bigger, this is just a small part of it) the output changes from the sequential version.
Thanks!
c parallel-processing openmp
c parallel-processing openmp
asked Nov 13 '18 at 14:56
edoelasedoelas
124
124
3
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even readdm
unprotected. It is, however, unlikely that this is the only problem in your code.
– Zulan
Nov 13 '18 at 15:15
This looks like it might appear inside a loop over variablej
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.
– John Bollinger
Nov 13 '18 at 19:28
add a comment |
3
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even readdm
unprotected. It is, however, unlikely that this is the only problem in your code.
– Zulan
Nov 13 '18 at 15:15
This looks like it might appear inside a loop over variablej
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.
– John Bollinger
Nov 13 '18 at 19:28
3
3
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read
dm
unprotected. It is, however, unlikely that this is the only problem in your code.– Zulan
Nov 13 '18 at 15:15
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read
dm
unprotected. It is, however, unlikely that this is the only problem in your code.– Zulan
Nov 13 '18 at 15:15
This looks like it might appear inside a loop over variable
j
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.– John Bollinger
Nov 13 '18 at 19:28
This looks like it might appear inside a loop over variable
j
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.– John Bollinger
Nov 13 '18 at 19:28
add a comment |
2 Answers
2
active
oldest
votes
The first if ( d < dm )
is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush
).
The distancia
function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).
Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im
), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.
Unless the calculation of the distancia
function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm
.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.
float dm=INFINITY;
int im=-1;
#pragma omp parallel shared(dm,im)
// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;
// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;
// end for
// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical
if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;
//end critical
//end parallel
if(im==-1)
throw_some_error();
else
do_something_with(im);
add a comment |
I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i
value and adding this condition: (dm == d && i < im)
in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
Here is the full code with the changes:
for ( j = 0 ; j < nt2 ; j++ )
dm = MAX_LON + 1;
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
ps[j] = im;
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283756%2fi-dont-know-what-is-wrong-with-this-loop-parallelization%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The first if ( d < dm )
is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush
).
The distancia
function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).
Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im
), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.
Unless the calculation of the distancia
function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm
.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.
float dm=INFINITY;
int im=-1;
#pragma omp parallel shared(dm,im)
// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;
// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;
// end for
// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical
if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;
//end critical
//end parallel
if(im==-1)
throw_some_error();
else
do_something_with(im);
add a comment |
The first if ( d < dm )
is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush
).
The distancia
function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).
Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im
), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.
Unless the calculation of the distancia
function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm
.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.
float dm=INFINITY;
int im=-1;
#pragma omp parallel shared(dm,im)
// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;
// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;
// end for
// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical
if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;
//end critical
//end parallel
if(im==-1)
throw_some_error();
else
do_something_with(im);
add a comment |
The first if ( d < dm )
is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush
).
The distancia
function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).
Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im
), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.
Unless the calculation of the distancia
function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm
.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.
float dm=INFINITY;
int im=-1;
#pragma omp parallel shared(dm,im)
// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;
// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;
// end for
// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical
if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;
//end critical
//end parallel
if(im==-1)
throw_some_error();
else
do_something_with(im);
The first if ( d < dm )
is outside of a critical section and could be the cause of a race condition. It should be removed, since there is already one in the critical section. If your purpose is to avoid going too often in the critical part, then this is not the way to do that (see below). You may have to manually flush dm too (see here for some explanations on #pragma omp flush
).
The distancia
function may also not be thread-safe (i.e. rely on some global variable that cannot be concurrently edited by several threads).
Also, it is normal that the result differ if some values of distance are equal. Since there isn't a rule to specifically treat that case (i.e. take the smallest im
), the index of the first minimum value encountered will be kept: it may not be the same in the serial and parallel versions as the iterations are not done in the same order.
Unless the calculation of the distancia
function is CPU intensive, your code is likely to run slower than a serial code for many reasons. Notably, the synchronization overhead of the critical section and the cache sharing issue with variable dm
.
It is better to use a reduction approach, where each thread computes the minimum of whatever iteration it was given to process, and the global minimum is taken as the smallest local minimum of all threads.
float dm=INFINITY;
int im=-1;
#pragma omp parallel shared(dm,im)
// Declare private variables
float d,dmin_thread,imin_thread;
dmin_thread=INFINITY;
imin_thread=-1;
// loop through i
#pragma omp for
for ( i = 0 ; i < nt1 ; i++ )
d = some_function(i);
// the following lines operate on thread-private variables
if ( d < dmin_thread)
dmin_thread= d;
imin_thread= i;
// end for
// Reduction part after the loop has been completed
// The critical section is entered only once by each thread
#pragma omp critical
if ( dmin_thread < dm)
dm = dmin_thread;
im = imin_thread;
//end critical
//end parallel
if(im==-1)
throw_some_error();
else
do_something_with(im);
edited Nov 14 '18 at 10:38
answered Nov 14 '18 at 10:29
BriceBrice
1,400110
1,400110
add a comment |
add a comment |
I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i
value and adding this condition: (dm == d && i < im)
in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
Here is the full code with the changes:
for ( j = 0 ; j < nt2 ; j++ )
dm = MAX_LON + 1;
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
ps[j] = im;
add a comment |
I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i
value and adding this condition: (dm == d && i < im)
in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
Here is the full code with the changes:
for ( j = 0 ; j < nt2 ; j++ )
dm = MAX_LON + 1;
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
ps[j] = im;
add a comment |
I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i
value and adding this condition: (dm == d && i < im)
in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
Here is the full code with the changes:
for ( j = 0 ; j < nt2 ; j++ )
dm = MAX_LON + 1;
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
ps[j] = im;
I believe the problem was that even if it was a critical zone, in the sequential version we will always get the lowest i
value and adding this condition: (dm == d && i < im)
in the conditionals we make sure that the code still works in the same way. I'm not sure if this the best solution, but it's pretty simple and worked for me.
Here is the full code with the changes:
for ( j = 0 ; j < nt2 ; j++ )
dm = MAX_LON + 1;
#pragma omp parallel for private(d)
for ( i = 0 ; i < nt1 ; i++ )
ps[j] = im;
edited Nov 16 '18 at 22:41
answered Nov 16 '18 at 22:35
edoelasedoelas
124
124
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283756%2fi-dont-know-what-is-wrong-with-this-loop-parallelization%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Please provide a Minimal, Complete, and Verifiable example with a clear description of expected and actual output. Note: You must not use a double-checked-lock-pattern like this. It is incorrect to even read
dm
unprotected. It is, however, unlikely that this is the only problem in your code.– Zulan
Nov 13 '18 at 15:15
This looks like it might appear inside a loop over variable
j
. If so, then consider parallelizing the outer loop instead of the inner. Very possibly, you will that way be able to do without a critical section, which would be an enormous win.– John Bollinger
Nov 13 '18 at 19:28