Parallel Addition of Vectors using RcppParallel
up vote
3
down vote
favorite
I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h>
using namespace RcppParallel;
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
results = first + second;
return results;
// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
for(unsigned i = 0; i != first.length(); i++)
results[i] = first[i] + second[i];
return results;
struct VectorAddition : public Worker
const RVector<double> first, second;
RVector<double> results;
VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)
void operator()(std::size_t a1, std::size_t a2)
std::transform(first.begin() + a1, first.begin() + a2,
second.begin() + a1,
results.begin() + a1,
(double i, double j) return i + j;);
;
// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
VectorAddition myVectorAddition(first, second, results);
parallelFor(0, first.length(), myVectorAddition);
return results;
It seems to work, but doesn't speed up things (at least not on a 4-core machine).
> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
> result[,1:4]
test replications elapsed relative
1 v1 + v2 100 0.206 1.000
4 parallelVectorAddition(v1, v2) 100 0.993 4.820
2 directVectorAddition(v1, v2) 100 1.015 4.927
3 loopVectorAddition(v1, v2) 100 1.056 5.126
Can this be implemented more efficiently?
Thanks a lot in advance,
mce
vector rcpp rcppparallel
add a comment |
up vote
3
down vote
favorite
I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h>
using namespace RcppParallel;
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
results = first + second;
return results;
// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
for(unsigned i = 0; i != first.length(); i++)
results[i] = first[i] + second[i];
return results;
struct VectorAddition : public Worker
const RVector<double> first, second;
RVector<double> results;
VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)
void operator()(std::size_t a1, std::size_t a2)
std::transform(first.begin() + a1, first.begin() + a2,
second.begin() + a1,
results.begin() + a1,
(double i, double j) return i + j;);
;
// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
VectorAddition myVectorAddition(first, second, results);
parallelFor(0, first.length(), myVectorAddition);
return results;
It seems to work, but doesn't speed up things (at least not on a 4-core machine).
> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
> result[,1:4]
test replications elapsed relative
1 v1 + v2 100 0.206 1.000
4 parallelVectorAddition(v1, v2) 100 0.993 4.820
2 directVectorAddition(v1, v2) 100 1.015 4.927
3 loopVectorAddition(v1, v2) 100 1.056 5.126
Can this be implemented more efficiently?
Thanks a lot in advance,
mce
vector rcpp rcppparallel
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h>
using namespace RcppParallel;
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
results = first + second;
return results;
// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
for(unsigned i = 0; i != first.length(); i++)
results[i] = first[i] + second[i];
return results;
struct VectorAddition : public Worker
const RVector<double> first, second;
RVector<double> results;
VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)
void operator()(std::size_t a1, std::size_t a2)
std::transform(first.begin() + a1, first.begin() + a2,
second.begin() + a1,
results.begin() + a1,
(double i, double j) return i + j;);
;
// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
VectorAddition myVectorAddition(first, second, results);
parallelFor(0, first.length(), myVectorAddition);
return results;
It seems to work, but doesn't speed up things (at least not on a 4-core machine).
> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
> result[,1:4]
test replications elapsed relative
1 v1 + v2 100 0.206 1.000
4 parallelVectorAddition(v1, v2) 100 0.993 4.820
2 directVectorAddition(v1, v2) 100 1.015 4.927
3 loopVectorAddition(v1, v2) 100 1.056 5.126
Can this be implemented more efficiently?
Thanks a lot in advance,
mce
vector rcpp rcppparallel
I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h>
using namespace RcppParallel;
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
results = first + second;
return results;
// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
for(unsigned i = 0; i != first.length(); i++)
results[i] = first[i] + second[i];
return results;
struct VectorAddition : public Worker
const RVector<double> first, second;
RVector<double> results;
VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)
void operator()(std::size_t a1, std::size_t a2)
std::transform(first.begin() + a1, first.begin() + a2,
second.begin() + a1,
results.begin() + a1,
(double i, double j) return i + j;);
;
// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
VectorAddition myVectorAddition(first, second, results);
parallelFor(0, first.length(), myVectorAddition);
return results;
It seems to work, but doesn't speed up things (at least not on a 4-core machine).
> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
> result[,1:4]
test replications elapsed relative
1 v1 + v2 100 0.206 1.000
4 parallelVectorAddition(v1, v2) 100 0.993 4.820
2 directVectorAddition(v1, v2) 100 1.015 4.927
3 loopVectorAddition(v1, v2) 100 1.056 5.126
Can this be implemented more efficiently?
Thanks a lot in advance,
mce
vector rcpp rcppparallel
vector rcpp rcppparallel
edited Nov 9 at 21:36
asked Nov 9 at 21:31
mce
476
476
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!
Make it
v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)
instead, and on a machine with lots of cores (at work) I then see
R> result[,1:4]
test replications elapsed relative
4 parallelVectorAddition(v1, v2) 100 0.301 1.000
2 directVectorAddition(v1, v2) 100 0.424 1.409
1 v1 + v2 100 0.436 1.449
3 loopVectorAddition(v1, v2) 100 0.736 2.445
The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.
But the good news is that you wrote your parallel code correctly. Not a small task.
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!
Make it
v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)
instead, and on a machine with lots of cores (at work) I then see
R> result[,1:4]
test replications elapsed relative
4 parallelVectorAddition(v1, v2) 100 0.301 1.000
2 directVectorAddition(v1, v2) 100 0.424 1.409
1 v1 + v2 100 0.436 1.449
3 loopVectorAddition(v1, v2) 100 0.736 2.445
The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.
But the good news is that you wrote your parallel code correctly. Not a small task.
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
add a comment |
up vote
4
down vote
accepted
Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!
Make it
v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)
instead, and on a machine with lots of cores (at work) I then see
R> result[,1:4]
test replications elapsed relative
4 parallelVectorAddition(v1, v2) 100 0.301 1.000
2 directVectorAddition(v1, v2) 100 0.424 1.409
1 v1 + v2 100 0.436 1.449
3 loopVectorAddition(v1, v2) 100 0.736 2.445
The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.
But the good news is that you wrote your parallel code correctly. Not a small task.
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!
Make it
v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)
instead, and on a machine with lots of cores (at work) I then see
R> result[,1:4]
test replications elapsed relative
4 parallelVectorAddition(v1, v2) 100 0.301 1.000
2 directVectorAddition(v1, v2) 100 0.424 1.409
1 v1 + v2 100 0.436 1.449
3 loopVectorAddition(v1, v2) 100 0.736 2.445
The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.
But the good news is that you wrote your parallel code correctly. Not a small task.
Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!
Make it
v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)
instead, and on a machine with lots of cores (at work) I then see
R> result[,1:4]
test replications elapsed relative
4 parallelVectorAddition(v1, v2) 100 0.301 1.000
2 directVectorAddition(v1, v2) 100 0.424 1.409
1 v1 + v2 100 0.436 1.449
3 loopVectorAddition(v1, v2) 100 0.736 2.445
The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.
But the good news is that you wrote your parallel code correctly. Not a small task.
answered Nov 9 at 21:41
Dirk Eddelbuettel
273k37506597
273k37506597
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
add a comment |
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
OK! Thanks a lot for the quick reply!
– mce
Nov 9 at 21:46
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233512%2fparallel-addition-of-vectors-using-rcppparallel%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown