select highest correlated pairs from a matrix SAS










0















I have a dataset like this



data have;
do i = 1 to 1000;
y = ranuni(0);
x1 = y ** 2;
x2 = x1 ** 3;
x3 = x2 - x1/2;
output;
end;
run;


I build a correlation matrix like this:



proc corr
data = have
out = correlation_matrix
(where = (_TYPE_ = "CORR"))
noprint;
run;


I've attempted to think out loud some code which would achieve something similar to what I'm looking for, the syntax nor the logic is correct but I'm having a go at describing what I'm looking for



proc sort
data = correlation_matrix
by _NAME_;
run;

data _temp;
set correlation_matrix;
array col[*] _numeric_;

by _NAME_;

do i = 1 to dim(col);
if col(i) > 0.6 then do;
%let list = append(vname(col));
end;
run;


And from the correlation matrix, I'm looking for a way to return pairs which have a correlation of 60% or is above some threshold, the pairs I will then use to build scatter plot / histogram matrices like this



proc contents;
data = high_correlation_pairs
out = contents
noprint;
run;

proc sort
data = contents
nodupkey;
by name;
run;

proc sql noprint;
select name INTO: highly_correlated_pairs
separated by " "
from contents
;
quit;

ODS GRAPHICS /
IMAGEMAP=OFF;
OPTIONS VALIDVARNAME=ANY;
PROC SGSCATTER
DATA=have;
TITLE "Scatter Plot Matrix";
FOOTNOTE;
MATRIX &highly_correlated_pairs
/
DIAGONAL=(HISTOGRAM )
START=TOPLEFT
NOLEGEND
;
RUN;
TITLE; FOOTNOTE;


I'm just unsure how to select variables from the matrix which have a pair of over 60% correlation, it could even by a by NAME return the columns with corr over 60%










share|improve this question


























    0















    I have a dataset like this



    data have;
    do i = 1 to 1000;
    y = ranuni(0);
    x1 = y ** 2;
    x2 = x1 ** 3;
    x3 = x2 - x1/2;
    output;
    end;
    run;


    I build a correlation matrix like this:



    proc corr
    data = have
    out = correlation_matrix
    (where = (_TYPE_ = "CORR"))
    noprint;
    run;


    I've attempted to think out loud some code which would achieve something similar to what I'm looking for, the syntax nor the logic is correct but I'm having a go at describing what I'm looking for



    proc sort
    data = correlation_matrix
    by _NAME_;
    run;

    data _temp;
    set correlation_matrix;
    array col[*] _numeric_;

    by _NAME_;

    do i = 1 to dim(col);
    if col(i) > 0.6 then do;
    %let list = append(vname(col));
    end;
    run;


    And from the correlation matrix, I'm looking for a way to return pairs which have a correlation of 60% or is above some threshold, the pairs I will then use to build scatter plot / histogram matrices like this



    proc contents;
    data = high_correlation_pairs
    out = contents
    noprint;
    run;

    proc sort
    data = contents
    nodupkey;
    by name;
    run;

    proc sql noprint;
    select name INTO: highly_correlated_pairs
    separated by " "
    from contents
    ;
    quit;

    ODS GRAPHICS /
    IMAGEMAP=OFF;
    OPTIONS VALIDVARNAME=ANY;
    PROC SGSCATTER
    DATA=have;
    TITLE "Scatter Plot Matrix";
    FOOTNOTE;
    MATRIX &highly_correlated_pairs
    /
    DIAGONAL=(HISTOGRAM )
    START=TOPLEFT
    NOLEGEND
    ;
    RUN;
    TITLE; FOOTNOTE;


    I'm just unsure how to select variables from the matrix which have a pair of over 60% correlation, it could even by a by NAME return the columns with corr over 60%










    share|improve this question
























      0












      0








      0








      I have a dataset like this



      data have;
      do i = 1 to 1000;
      y = ranuni(0);
      x1 = y ** 2;
      x2 = x1 ** 3;
      x3 = x2 - x1/2;
      output;
      end;
      run;


      I build a correlation matrix like this:



      proc corr
      data = have
      out = correlation_matrix
      (where = (_TYPE_ = "CORR"))
      noprint;
      run;


      I've attempted to think out loud some code which would achieve something similar to what I'm looking for, the syntax nor the logic is correct but I'm having a go at describing what I'm looking for



      proc sort
      data = correlation_matrix
      by _NAME_;
      run;

      data _temp;
      set correlation_matrix;
      array col[*] _numeric_;

      by _NAME_;

      do i = 1 to dim(col);
      if col(i) > 0.6 then do;
      %let list = append(vname(col));
      end;
      run;


      And from the correlation matrix, I'm looking for a way to return pairs which have a correlation of 60% or is above some threshold, the pairs I will then use to build scatter plot / histogram matrices like this



      proc contents;
      data = high_correlation_pairs
      out = contents
      noprint;
      run;

      proc sort
      data = contents
      nodupkey;
      by name;
      run;

      proc sql noprint;
      select name INTO: highly_correlated_pairs
      separated by " "
      from contents
      ;
      quit;

      ODS GRAPHICS /
      IMAGEMAP=OFF;
      OPTIONS VALIDVARNAME=ANY;
      PROC SGSCATTER
      DATA=have;
      TITLE "Scatter Plot Matrix";
      FOOTNOTE;
      MATRIX &highly_correlated_pairs
      /
      DIAGONAL=(HISTOGRAM )
      START=TOPLEFT
      NOLEGEND
      ;
      RUN;
      TITLE; FOOTNOTE;


      I'm just unsure how to select variables from the matrix which have a pair of over 60% correlation, it could even by a by NAME return the columns with corr over 60%










      share|improve this question














      I have a dataset like this



      data have;
      do i = 1 to 1000;
      y = ranuni(0);
      x1 = y ** 2;
      x2 = x1 ** 3;
      x3 = x2 - x1/2;
      output;
      end;
      run;


      I build a correlation matrix like this:



      proc corr
      data = have
      out = correlation_matrix
      (where = (_TYPE_ = "CORR"))
      noprint;
      run;


      I've attempted to think out loud some code which would achieve something similar to what I'm looking for, the syntax nor the logic is correct but I'm having a go at describing what I'm looking for



      proc sort
      data = correlation_matrix
      by _NAME_;
      run;

      data _temp;
      set correlation_matrix;
      array col[*] _numeric_;

      by _NAME_;

      do i = 1 to dim(col);
      if col(i) > 0.6 then do;
      %let list = append(vname(col));
      end;
      run;


      And from the correlation matrix, I'm looking for a way to return pairs which have a correlation of 60% or is above some threshold, the pairs I will then use to build scatter plot / histogram matrices like this



      proc contents;
      data = high_correlation_pairs
      out = contents
      noprint;
      run;

      proc sort
      data = contents
      nodupkey;
      by name;
      run;

      proc sql noprint;
      select name INTO: highly_correlated_pairs
      separated by " "
      from contents
      ;
      quit;

      ODS GRAPHICS /
      IMAGEMAP=OFF;
      OPTIONS VALIDVARNAME=ANY;
      PROC SGSCATTER
      DATA=have;
      TITLE "Scatter Plot Matrix";
      FOOTNOTE;
      MATRIX &highly_correlated_pairs
      /
      DIAGONAL=(HISTOGRAM )
      START=TOPLEFT
      NOLEGEND
      ;
      RUN;
      TITLE; FOOTNOTE;


      I'm just unsure how to select variables from the matrix which have a pair of over 60% correlation, it could even by a by NAME return the columns with corr over 60%







      arrays select macros sas select-into






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 11:22









      7828221978282219

      124




      124






















          2 Answers
          2






          active

          oldest

          votes


















          0














          You can get the pairs like this - the key is the vname function, which returns the name of an array element:



          data high_corrs;
          set correlation_matrix;
          array coefs i--x3;
          length var1 var2 $32.;
          do j = 1 to dim(coefs);
          corr = coefs(j);
          if _n_ < j and corr > 0.6 then do;
          var1 = vname(coefs(_n_));
          var2 = vname(coefs(j));
          output;
          end;
          end;
          keep var1 var2 corr;
          run;


          Perhaps from there you can work out the rest?






          share|improve this answer























          • thanks, I'll get looking at this

            – 78282219
            Nov 14 '18 at 13:46


















          0














          Edit: to include complete answer:



          PROC TRANSPOSE is used to transpose the correlation matric to x,y pairs and subset to correlations of interest. A macro variable is created to use in PROC SGSCATTER.



          NOTE: PLOTREQUESTS=x1*x2 x1*y x2*x3 x2*y



          data have;
          do i = 1 to 1000;
          y = ranuni(0);
          x1 = y ** 2;
          x2 = x1 ** 3;
          x3 = x2 - x1/2;
          output;
          end;
          run;
          proc corr data=have out=corr noprint;
          run;
          proc transpose name=with data=corr out=pair(where=(.6 le abs(col1) lt 1));
          where _type_ eq 'CORR';
          by _name_ notsorted;
          run;
          data pairV / view=pairv;
          set pair;
          call sortc(_name_,with);
          run;
          proc sort data=pairv out=pair2 nodupkey;
          by _name_ with;
          run;
          proc sql noprint;
          select catx('*',_name_,with) into :plotrequests separated by ' ' from pair2;
          quit;
          %put NOTE: &=plotrequests;
          proc sgscatter data=have;
          plot &plotrequests;
          run;
          quit;


          enter image description here






          share|improve this answer

























          • Thanks, I'll start building on this

            – 78282219
            Nov 14 '18 at 13:46










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279943%2fselect-highest-correlated-pairs-from-a-matrix-sas%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          You can get the pairs like this - the key is the vname function, which returns the name of an array element:



          data high_corrs;
          set correlation_matrix;
          array coefs i--x3;
          length var1 var2 $32.;
          do j = 1 to dim(coefs);
          corr = coefs(j);
          if _n_ < j and corr > 0.6 then do;
          var1 = vname(coefs(_n_));
          var2 = vname(coefs(j));
          output;
          end;
          end;
          keep var1 var2 corr;
          run;


          Perhaps from there you can work out the rest?






          share|improve this answer























          • thanks, I'll get looking at this

            – 78282219
            Nov 14 '18 at 13:46















          0














          You can get the pairs like this - the key is the vname function, which returns the name of an array element:



          data high_corrs;
          set correlation_matrix;
          array coefs i--x3;
          length var1 var2 $32.;
          do j = 1 to dim(coefs);
          corr = coefs(j);
          if _n_ < j and corr > 0.6 then do;
          var1 = vname(coefs(_n_));
          var2 = vname(coefs(j));
          output;
          end;
          end;
          keep var1 var2 corr;
          run;


          Perhaps from there you can work out the rest?






          share|improve this answer























          • thanks, I'll get looking at this

            – 78282219
            Nov 14 '18 at 13:46













          0












          0








          0







          You can get the pairs like this - the key is the vname function, which returns the name of an array element:



          data high_corrs;
          set correlation_matrix;
          array coefs i--x3;
          length var1 var2 $32.;
          do j = 1 to dim(coefs);
          corr = coefs(j);
          if _n_ < j and corr > 0.6 then do;
          var1 = vname(coefs(_n_));
          var2 = vname(coefs(j));
          output;
          end;
          end;
          keep var1 var2 corr;
          run;


          Perhaps from there you can work out the rest?






          share|improve this answer













          You can get the pairs like this - the key is the vname function, which returns the name of an array element:



          data high_corrs;
          set correlation_matrix;
          array coefs i--x3;
          length var1 var2 $32.;
          do j = 1 to dim(coefs);
          corr = coefs(j);
          if _n_ < j and corr > 0.6 then do;
          var1 = vname(coefs(_n_));
          var2 = vname(coefs(j));
          output;
          end;
          end;
          keep var1 var2 corr;
          run;


          Perhaps from there you can work out the rest?







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 '18 at 14:10









          user667489user667489

          7,96721227




          7,96721227












          • thanks, I'll get looking at this

            – 78282219
            Nov 14 '18 at 13:46

















          • thanks, I'll get looking at this

            – 78282219
            Nov 14 '18 at 13:46
















          thanks, I'll get looking at this

          – 78282219
          Nov 14 '18 at 13:46





          thanks, I'll get looking at this

          – 78282219
          Nov 14 '18 at 13:46













          0














          Edit: to include complete answer:



          PROC TRANSPOSE is used to transpose the correlation matric to x,y pairs and subset to correlations of interest. A macro variable is created to use in PROC SGSCATTER.



          NOTE: PLOTREQUESTS=x1*x2 x1*y x2*x3 x2*y



          data have;
          do i = 1 to 1000;
          y = ranuni(0);
          x1 = y ** 2;
          x2 = x1 ** 3;
          x3 = x2 - x1/2;
          output;
          end;
          run;
          proc corr data=have out=corr noprint;
          run;
          proc transpose name=with data=corr out=pair(where=(.6 le abs(col1) lt 1));
          where _type_ eq 'CORR';
          by _name_ notsorted;
          run;
          data pairV / view=pairv;
          set pair;
          call sortc(_name_,with);
          run;
          proc sort data=pairv out=pair2 nodupkey;
          by _name_ with;
          run;
          proc sql noprint;
          select catx('*',_name_,with) into :plotrequests separated by ' ' from pair2;
          quit;
          %put NOTE: &=plotrequests;
          proc sgscatter data=have;
          plot &plotrequests;
          run;
          quit;


          enter image description here






          share|improve this answer

























          • Thanks, I'll start building on this

            – 78282219
            Nov 14 '18 at 13:46















          0














          Edit: to include complete answer:



          PROC TRANSPOSE is used to transpose the correlation matric to x,y pairs and subset to correlations of interest. A macro variable is created to use in PROC SGSCATTER.



          NOTE: PLOTREQUESTS=x1*x2 x1*y x2*x3 x2*y



          data have;
          do i = 1 to 1000;
          y = ranuni(0);
          x1 = y ** 2;
          x2 = x1 ** 3;
          x3 = x2 - x1/2;
          output;
          end;
          run;
          proc corr data=have out=corr noprint;
          run;
          proc transpose name=with data=corr out=pair(where=(.6 le abs(col1) lt 1));
          where _type_ eq 'CORR';
          by _name_ notsorted;
          run;
          data pairV / view=pairv;
          set pair;
          call sortc(_name_,with);
          run;
          proc sort data=pairv out=pair2 nodupkey;
          by _name_ with;
          run;
          proc sql noprint;
          select catx('*',_name_,with) into :plotrequests separated by ' ' from pair2;
          quit;
          %put NOTE: &=plotrequests;
          proc sgscatter data=have;
          plot &plotrequests;
          run;
          quit;


          enter image description here






          share|improve this answer

























          • Thanks, I'll start building on this

            – 78282219
            Nov 14 '18 at 13:46













          0












          0








          0







          Edit: to include complete answer:



          PROC TRANSPOSE is used to transpose the correlation matric to x,y pairs and subset to correlations of interest. A macro variable is created to use in PROC SGSCATTER.



          NOTE: PLOTREQUESTS=x1*x2 x1*y x2*x3 x2*y



          data have;
          do i = 1 to 1000;
          y = ranuni(0);
          x1 = y ** 2;
          x2 = x1 ** 3;
          x3 = x2 - x1/2;
          output;
          end;
          run;
          proc corr data=have out=corr noprint;
          run;
          proc transpose name=with data=corr out=pair(where=(.6 le abs(col1) lt 1));
          where _type_ eq 'CORR';
          by _name_ notsorted;
          run;
          data pairV / view=pairv;
          set pair;
          call sortc(_name_,with);
          run;
          proc sort data=pairv out=pair2 nodupkey;
          by _name_ with;
          run;
          proc sql noprint;
          select catx('*',_name_,with) into :plotrequests separated by ' ' from pair2;
          quit;
          %put NOTE: &=plotrequests;
          proc sgscatter data=have;
          plot &plotrequests;
          run;
          quit;


          enter image description here






          share|improve this answer















          Edit: to include complete answer:



          PROC TRANSPOSE is used to transpose the correlation matric to x,y pairs and subset to correlations of interest. A macro variable is created to use in PROC SGSCATTER.



          NOTE: PLOTREQUESTS=x1*x2 x1*y x2*x3 x2*y



          data have;
          do i = 1 to 1000;
          y = ranuni(0);
          x1 = y ** 2;
          x2 = x1 ** 3;
          x3 = x2 - x1/2;
          output;
          end;
          run;
          proc corr data=have out=corr noprint;
          run;
          proc transpose name=with data=corr out=pair(where=(.6 le abs(col1) lt 1));
          where _type_ eq 'CORR';
          by _name_ notsorted;
          run;
          data pairV / view=pairv;
          set pair;
          call sortc(_name_,with);
          run;
          proc sort data=pairv out=pair2 nodupkey;
          by _name_ with;
          run;
          proc sql noprint;
          select catx('*',_name_,with) into :plotrequests separated by ' ' from pair2;
          quit;
          %put NOTE: &=plotrequests;
          proc sgscatter data=have;
          plot &plotrequests;
          run;
          quit;


          enter image description here







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 18:00

























          answered Nov 13 '18 at 14:11









          data _null_data _null_

          5,342610




          5,342610












          • Thanks, I'll start building on this

            – 78282219
            Nov 14 '18 at 13:46

















          • Thanks, I'll start building on this

            – 78282219
            Nov 14 '18 at 13:46
















          Thanks, I'll start building on this

          – 78282219
          Nov 14 '18 at 13:46





          Thanks, I'll start building on this

          – 78282219
          Nov 14 '18 at 13:46

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279943%2fselect-highest-correlated-pairs-from-a-matrix-sas%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Use pre created SQLite database for Android project in kotlin

          Darth Vader #20

          Ondo