Parallel Addition of Vectors using RcppParallel









up vote
3
down vote

favorite












I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.



// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h>
using namespace RcppParallel;
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
results = first + second;
return results;


// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
for(unsigned i = 0; i != first.length(); i++)
results[i] = first[i] + second[i];
return results;


struct VectorAddition : public Worker

const RVector<double> first, second;
RVector<double> results;
VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)

void operator()(std::size_t a1, std::size_t a2)
std::transform(first.begin() + a1, first.begin() + a2,
second.begin() + a1,
results.begin() + a1,
(double i, double j) return i + j;);

;


// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
assert (first.length() == second.length());
NumericVector results(first.length());
VectorAddition myVectorAddition(first, second, results);
parallelFor(0, first.length(), myVectorAddition);
return results;



It seems to work, but doesn't speed up things (at least not on a 4-core machine).



> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
> result[,1:4]
test replications elapsed relative
1 v1 + v2 100 0.206 1.000
4 parallelVectorAddition(v1, v2) 100 0.993 4.820
2 directVectorAddition(v1, v2) 100 1.015 4.927
3 loopVectorAddition(v1, v2) 100 1.056 5.126


Can this be implemented more efficiently?



Thanks a lot in advance,



mce










share|improve this question



























    up vote
    3
    down vote

    favorite












    I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.



    // [[Rcpp::depends(RcppParallel)]]
    #include <RcppParallel.h>
    #include <Rcpp.h>
    #include <assert.h>
    using namespace RcppParallel;
    using namespace Rcpp;

    // [[Rcpp::export]]
    NumericVector directVectorAddition(NumericVector first, NumericVector second)
    assert (first.length() == second.length());
    NumericVector results(first.length());
    results = first + second;
    return results;


    // [[Rcpp::export]]
    NumericVector loopVectorAddition(NumericVector first, NumericVector second)
    assert (first.length() == second.length());
    NumericVector results(first.length());
    for(unsigned i = 0; i != first.length(); i++)
    results[i] = first[i] + second[i];
    return results;


    struct VectorAddition : public Worker

    const RVector<double> first, second;
    RVector<double> results;
    VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)

    void operator()(std::size_t a1, std::size_t a2)
    std::transform(first.begin() + a1, first.begin() + a2,
    second.begin() + a1,
    results.begin() + a1,
    (double i, double j) return i + j;);

    ;


    // [[Rcpp::export]]
    NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
    assert (first.length() == second.length());
    NumericVector results(first.length());
    VectorAddition myVectorAddition(first, second, results);
    parallelFor(0, first.length(), myVectorAddition);
    return results;



    It seems to work, but doesn't speed up things (at least not on a 4-core machine).



    > v1 <- 1:1000000
    > v2 <- 1000000:1
    > all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
    [1] TRUE
    > all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
    [1] TRUE
    > result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
    > result[,1:4]
    test replications elapsed relative
    1 v1 + v2 100 0.206 1.000
    4 parallelVectorAddition(v1, v2) 100 0.993 4.820
    2 directVectorAddition(v1, v2) 100 1.015 4.927
    3 loopVectorAddition(v1, v2) 100 1.056 5.126


    Can this be implemented more efficiently?



    Thanks a lot in advance,



    mce










    share|improve this question

























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.



      // [[Rcpp::depends(RcppParallel)]]
      #include <RcppParallel.h>
      #include <Rcpp.h>
      #include <assert.h>
      using namespace RcppParallel;
      using namespace Rcpp;

      // [[Rcpp::export]]
      NumericVector directVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      results = first + second;
      return results;


      // [[Rcpp::export]]
      NumericVector loopVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      for(unsigned i = 0; i != first.length(); i++)
      results[i] = first[i] + second[i];
      return results;


      struct VectorAddition : public Worker

      const RVector<double> first, second;
      RVector<double> results;
      VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)

      void operator()(std::size_t a1, std::size_t a2)
      std::transform(first.begin() + a1, first.begin() + a2,
      second.begin() + a1,
      results.begin() + a1,
      (double i, double j) return i + j;);

      ;


      // [[Rcpp::export]]
      NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      VectorAddition myVectorAddition(first, second, results);
      parallelFor(0, first.length(), myVectorAddition);
      return results;



      It seems to work, but doesn't speed up things (at least not on a 4-core machine).



      > v1 <- 1:1000000
      > v2 <- 1000000:1
      > all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
      [1] TRUE
      > all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
      [1] TRUE
      > result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
      > result[,1:4]
      test replications elapsed relative
      1 v1 + v2 100 0.206 1.000
      4 parallelVectorAddition(v1, v2) 100 0.993 4.820
      2 directVectorAddition(v1, v2) 100 1.015 4.927
      3 loopVectorAddition(v1, v2) 100 1.056 5.126


      Can this be implemented more efficiently?



      Thanks a lot in advance,



      mce










      share|improve this question















      I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.



      // [[Rcpp::depends(RcppParallel)]]
      #include <RcppParallel.h>
      #include <Rcpp.h>
      #include <assert.h>
      using namespace RcppParallel;
      using namespace Rcpp;

      // [[Rcpp::export]]
      NumericVector directVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      results = first + second;
      return results;


      // [[Rcpp::export]]
      NumericVector loopVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      for(unsigned i = 0; i != first.length(); i++)
      results[i] = first[i] + second[i];
      return results;


      struct VectorAddition : public Worker

      const RVector<double> first, second;
      RVector<double> results;
      VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three)

      void operator()(std::size_t a1, std::size_t a2)
      std::transform(first.begin() + a1, first.begin() + a2,
      second.begin() + a1,
      results.begin() + a1,
      (double i, double j) return i + j;);

      ;


      // [[Rcpp::export]]
      NumericVector parallelVectorAddition(NumericVector first, NumericVector second)
      assert (first.length() == second.length());
      NumericVector results(first.length());
      VectorAddition myVectorAddition(first, second, results);
      parallelFor(0, first.length(), myVectorAddition);
      return results;



      It seems to work, but doesn't speed up things (at least not on a 4-core machine).



      > v1 <- 1:1000000
      > v2 <- 1000000:1
      > all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
      [1] TRUE
      > all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
      [1] TRUE
      > result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
      > result[,1:4]
      test replications elapsed relative
      1 v1 + v2 100 0.206 1.000
      4 parallelVectorAddition(v1, v2) 100 0.993 4.820
      2 directVectorAddition(v1, v2) 100 1.015 4.927
      3 loopVectorAddition(v1, v2) 100 1.056 5.126


      Can this be implemented more efficiently?



      Thanks a lot in advance,



      mce







      vector rcpp rcppparallel






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 9 at 21:36

























      asked Nov 9 at 21:31









      mce

      476




      476






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!



          Make it



          v1 <- as.double(1:1000000)
          v2 <- as.double(1000000:1)


          instead, and on a machine with lots of cores (at work) I then see



          R> result[,1:4]
          test replications elapsed relative
          4 parallelVectorAddition(v1, v2) 100 0.301 1.000
          2 directVectorAddition(v1, v2) 100 0.424 1.409
          1 v1 + v2 100 0.436 1.449
          3 loopVectorAddition(v1, v2) 100 0.736 2.445


          The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.



          But the good news is that you wrote your parallel code correctly. Not a small task.






          share|improve this answer




















          • OK! Thanks a lot for the quick reply!
            – mce
            Nov 9 at 21:46










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233512%2fparallel-addition-of-vectors-using-rcppparallel%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          4
          down vote



          accepted










          Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!



          Make it



          v1 <- as.double(1:1000000)
          v2 <- as.double(1000000:1)


          instead, and on a machine with lots of cores (at work) I then see



          R> result[,1:4]
          test replications elapsed relative
          4 parallelVectorAddition(v1, v2) 100 0.301 1.000
          2 directVectorAddition(v1, v2) 100 0.424 1.409
          1 v1 + v2 100 0.436 1.449
          3 loopVectorAddition(v1, v2) 100 0.736 2.445


          The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.



          But the good news is that you wrote your parallel code correctly. Not a small task.






          share|improve this answer




















          • OK! Thanks a lot for the quick reply!
            – mce
            Nov 9 at 21:46














          up vote
          4
          down vote



          accepted










          Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!



          Make it



          v1 <- as.double(1:1000000)
          v2 <- as.double(1000000:1)


          instead, and on a machine with lots of cores (at work) I then see



          R> result[,1:4]
          test replications elapsed relative
          4 parallelVectorAddition(v1, v2) 100 0.301 1.000
          2 directVectorAddition(v1, v2) 100 0.424 1.409
          1 v1 + v2 100 0.436 1.449
          3 loopVectorAddition(v1, v2) 100 0.736 2.445


          The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.



          But the good news is that you wrote your parallel code correctly. Not a small task.






          share|improve this answer




















          • OK! Thanks a lot for the quick reply!
            – mce
            Nov 9 at 21:46












          up vote
          4
          down vote



          accepted







          up vote
          4
          down vote



          accepted






          Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!



          Make it



          v1 <- as.double(1:1000000)
          v2 <- as.double(1000000:1)


          instead, and on a machine with lots of cores (at work) I then see



          R> result[,1:4]
          test replications elapsed relative
          4 parallelVectorAddition(v1, v2) 100 0.301 1.000
          2 directVectorAddition(v1, v2) 100 0.424 1.409
          1 v1 + v2 100 0.436 1.449
          3 loopVectorAddition(v1, v2) 100 0.736 2.445


          The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.



          But the good news is that you wrote your parallel code correctly. Not a small task.






          share|improve this answer












          Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!



          Make it



          v1 <- as.double(1:1000000)
          v2 <- as.double(1000000:1)


          instead, and on a machine with lots of cores (at work) I then see



          R> result[,1:4]
          test replications elapsed relative
          4 parallelVectorAddition(v1, v2) 100 0.301 1.000
          2 directVectorAddition(v1, v2) 100 0.424 1.409
          1 v1 + v2 100 0.436 1.449
          3 loopVectorAddition(v1, v2) 100 0.736 2.445


          The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.



          But the good news is that you wrote your parallel code correctly. Not a small task.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 9 at 21:41









          Dirk Eddelbuettel

          273k37506597




          273k37506597











          • OK! Thanks a lot for the quick reply!
            – mce
            Nov 9 at 21:46
















          • OK! Thanks a lot for the quick reply!
            – mce
            Nov 9 at 21:46















          OK! Thanks a lot for the quick reply!
          – mce
          Nov 9 at 21:46




          OK! Thanks a lot for the quick reply!
          – mce
          Nov 9 at 21:46

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233512%2fparallel-addition-of-vectors-using-rcppparallel%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kleinkühnau

          Makov (Slowakei)

          Deutsches Schauspielhaus