How to reshape data from long to wide format?









up vote
192
down vote

favorite
70












I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question



















  • 1




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 3




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 2




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 1




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 1




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13














up vote
192
down vote

favorite
70












I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question



















  • 1




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 3




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 2




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 1




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 1




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13












up vote
192
down vote

favorite
70









up vote
192
down vote

favorite
70






70





I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question















I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.







r reshape r-faq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 '17 at 16:25









Taryn

187k45284348




187k45284348










asked May 4 '11 at 22:27









Steve

1,82472428




1,82472428







  • 1




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 3




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 2




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 1




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 1




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13












  • 1




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 3




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 2




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 1




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 1




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13







1




1




possible duplicate of Reshape three column data frame to matrix
– Frank
Oct 8 '13 at 20:53




possible duplicate of Reshape three column data frame to matrix
– Frank
Oct 8 '13 at 20:53




3




3




@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
– smci
Apr 11 '14 at 5:21




@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
– smci
Apr 11 '14 at 5:21




2




2




@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
– Frank
Apr 11 '14 at 16:49




@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
– Frank
Apr 11 '14 at 16:49




1




1




^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
– smci
Apr 11 '14 at 21:03




^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
– smci
Apr 11 '14 at 21:03




1




1




@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
– Frank
Apr 11 '14 at 21:13




@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
– Frank
Apr 11 '14 at 21:13












9 Answers
9






active

oldest

votes

















up vote
186
down vote



accepted










Using reshape function:



reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





share|improve this answer
















  • 12




    +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
    – aL3xa
    May 5 '11 at 0:07







  • 102




    Good luck figuring out the arguments you need though
    – hadley
    May 5 '11 at 1:40







  • 2




    @hadley, yepp... that's true! =)
    – aL3xa
    May 5 '11 at 9:45






  • 2




    reshape is an outstanding example for a horrible function API. It is very close to useless.
    – NoBackingDown
    Oct 26 '17 at 15:18






  • 2




    The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
    – Brian D
    Nov 17 '17 at 17:11


















up vote
103
down vote













The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



library(tidyr)
spread(dat1, key = numbers, value = value)


From github,




tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







share|improve this answer


















  • 2




    Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
    – Jake
    Apr 12 '17 at 13:01

















up vote
63
down vote













You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



library(reshape)
cast(dat1, name ~ numbers)


Or using reshape2



library(reshape2)
dcast(dat1, name ~ numbers)





share|improve this answer






















  • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
    – Steve
    May 4 '11 at 22:45






  • 7




    +1 And use reshape2 for a performance gain.
    – Andrie
    May 6 '11 at 11:56






  • 1




    It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
    – thelatemail
    Jun 21 '17 at 22:37

















up vote
27
down vote













Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



(Reference: Efficient reshaping using data.tables)



library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

# name 1 2 3 4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


And, as of data.table v1.9.6 we can cast on multiple columns



## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

# name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





share|improve this answer


















  • 3




    data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
    – joel.wilson
    Aug 31 '17 at 12:06






  • 3




    I personally think this is the best answer
    – cloudscomputes
    Oct 20 '17 at 4:37

















up vote
22
down vote













Using your example dataframe, we could:



xtabs(value ~ name + numbers, data = dat1)





share|improve this answer


















  • 2




    this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
    – cloudscomputes
    Oct 20 '17 at 4:44

















up vote
14
down vote













Other two options:



Base package:



df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df


sqldf package:



library(sqldf)
sqldf('SELECT name,
MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
FROM dat1
GROUP BY name')





share|improve this answer



























    up vote
    9
    down vote













    Using base R aggregate function:



    aggregate(value ~ name, dat1, I)

    # name value.1 value.2 value.3 value.4
    #1 firstName 0.4145 -0.4747 0.0659 -0.5024
    #2 secondName -0.8259 0.1669 -0.8962 0.1681





    share|improve this answer


















    • 1




      why use the paste function instead of the I (identity function)?
      – Onyambu
      Dec 25 '17 at 4:04

















    up vote
    4
    down vote













    There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




    The whole system is based on two primitives or operators
    cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
    operators have pivot, un-pivot, one-hot encode, transpose, moving
    multiple rows and columns, and many other transforms as simple special
    cases.



    It is easy to write many different operations in terms of the
    cdata primitives. These operators can work-in memory or at big data
    scale (with databases and Apache Spark; for big data use the
    cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
    variants). The transforms are controlled by a control table that
    itself is a diagram of (or picture of) the transform.




    We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



    library(cdata)
    # first build the control table
    pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
    columnToTakeKeysFrom = 'numbers', # this will become column headers
    columnToTakeValuesFrom = 'value', # this contains data
    sep="_") # optional for making column names

    # perform the move of data to columns
    dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
    keyColumns = c('name'), # this(these) column(s) should stay untouched
    controlTable = pivotControlTable# control table above
    )
    dat_wide

    #> name numbers_1 numbers_2 numbers_3 numbers_4
    #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
    #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





    share|improve this answer



























      up vote
      2
      down vote













      The base reshape function works perfectly fine:



      df <- data.frame(
      year = c(rep(2000, 12), rep(2001, 12)),
      month = rep(1:12, 2),
      values = rnorm(24)
      )
      df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
      df_wide


      Where




      • idvar is the column of classes that separates rows


      • timevar is the column of classes to cast wide


      • v.names is the column containing numeric values


      • direction specifies wide or long format

      • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

      If no idvar exists, create one before using the reshape() function:



      df$id <- c(rep("year1", 12), rep("year2", 12))
      df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
      df_wide


      Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






      share|improve this answer






















        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5890584%2fhow-to-reshape-data-from-long-to-wide-format%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        9 Answers
        9






        active

        oldest

        votes








        9 Answers
        9






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        186
        down vote



        accepted










        Using reshape function:



        reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





        share|improve this answer
















        • 12




          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07







        • 102




          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40







        • 2




          @hadley, yepp... that's true! =)
          – aL3xa
          May 5 '11 at 9:45






        • 2




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18






        • 2




          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11















        up vote
        186
        down vote



        accepted










        Using reshape function:



        reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





        share|improve this answer
















        • 12




          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07







        • 102




          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40







        • 2




          @hadley, yepp... that's true! =)
          – aL3xa
          May 5 '11 at 9:45






        • 2




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18






        • 2




          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11













        up vote
        186
        down vote



        accepted







        up vote
        186
        down vote



        accepted






        Using reshape function:



        reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





        share|improve this answer












        Using reshape function:



        reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 4 '11 at 23:20









        Chase

        47.9k12115147




        47.9k12115147







        • 12




          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07







        • 102




          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40







        • 2




          @hadley, yepp... that's true! =)
          – aL3xa
          May 5 '11 at 9:45






        • 2




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18






        • 2




          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11













        • 12




          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07







        • 102




          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40







        • 2




          @hadley, yepp... that's true! =)
          – aL3xa
          May 5 '11 at 9:45






        • 2




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18






        • 2




          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11








        12




        12




        +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
        – aL3xa
        May 5 '11 at 0:07





        +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
        – aL3xa
        May 5 '11 at 0:07





        102




        102




        Good luck figuring out the arguments you need though
        – hadley
        May 5 '11 at 1:40





        Good luck figuring out the arguments you need though
        – hadley
        May 5 '11 at 1:40





        2




        2




        @hadley, yepp... that's true! =)
        – aL3xa
        May 5 '11 at 9:45




        @hadley, yepp... that's true! =)
        – aL3xa
        May 5 '11 at 9:45




        2




        2




        reshape is an outstanding example for a horrible function API. It is very close to useless.
        – NoBackingDown
        Oct 26 '17 at 15:18




        reshape is an outstanding example for a horrible function API. It is very close to useless.
        – NoBackingDown
        Oct 26 '17 at 15:18




        2




        2




        The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
        – Brian D
        Nov 17 '17 at 17:11





        The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
        – Brian D
        Nov 17 '17 at 17:11













        up vote
        103
        down vote













        The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



        library(tidyr)
        spread(dat1, key = numbers, value = value)


        From github,




        tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



        Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







        share|improve this answer


















        • 2




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01














        up vote
        103
        down vote













        The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



        library(tidyr)
        spread(dat1, key = numbers, value = value)


        From github,




        tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



        Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







        share|improve this answer


















        • 2




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01












        up vote
        103
        down vote










        up vote
        103
        down vote









        The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



        library(tidyr)
        spread(dat1, key = numbers, value = value)


        From github,




        tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



        Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







        share|improve this answer














        The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



        library(tidyr)
        spread(dat1, key = numbers, value = value)


        From github,




        tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



        Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jul 29 '15 at 16:39

























        answered Jul 29 '14 at 19:37









        Gregor

        61.3k988163




        61.3k988163







        • 2




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01












        • 2




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01







        2




        2




        Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
        – Jake
        Apr 12 '17 at 13:01




        Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
        – Jake
        Apr 12 '17 at 13:01










        up vote
        63
        down vote













        You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



        library(reshape)
        cast(dat1, name ~ numbers)


        Or using reshape2



        library(reshape2)
        dcast(dat1, name ~ numbers)





        share|improve this answer






















        • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45






        • 7




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56






        • 1




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37














        up vote
        63
        down vote













        You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



        library(reshape)
        cast(dat1, name ~ numbers)


        Or using reshape2



        library(reshape2)
        dcast(dat1, name ~ numbers)





        share|improve this answer






















        • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45






        • 7




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56






        • 1




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37












        up vote
        63
        down vote










        up vote
        63
        down vote









        You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



        library(reshape)
        cast(dat1, name ~ numbers)


        Or using reshape2



        library(reshape2)
        dcast(dat1, name ~ numbers)





        share|improve this answer














        You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



        library(reshape)
        cast(dat1, name ~ numbers)


        Or using reshape2



        library(reshape2)
        dcast(dat1, name ~ numbers)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 26 '15 at 14:52









        David Arenburg

        77.3k1092156




        77.3k1092156










        answered May 4 '11 at 22:42









        Ista

        7,03312426




        7,03312426











        • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45






        • 7




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56






        • 1




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37
















        • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45






        • 7




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56






        • 1




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37















        thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
        – Steve
        May 4 '11 at 22:45




        thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
        – Steve
        May 4 '11 at 22:45




        7




        7




        +1 And use reshape2 for a performance gain.
        – Andrie
        May 6 '11 at 11:56




        +1 And use reshape2 for a performance gain.
        – Andrie
        May 6 '11 at 11:56




        1




        1




        It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
        – thelatemail
        Jun 21 '17 at 22:37




        It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
        – thelatemail
        Jun 21 '17 at 22:37










        up vote
        27
        down vote













        Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



        (Reference: Efficient reshaping using data.tables)



        library(data.table)

        setDT(dat1)
        dcast(dat1, name ~ numbers, value.var = "value")

        # name 1 2 3 4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


        And, as of data.table v1.9.6 we can cast on multiple columns



        ## add an extra column
        dat1[, value2 := value * 2]

        ## cast multiple value columns
        dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

        # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





        share|improve this answer


















        • 3




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06






        • 3




          I personally think this is the best answer
          – cloudscomputes
          Oct 20 '17 at 4:37














        up vote
        27
        down vote













        Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



        (Reference: Efficient reshaping using data.tables)



        library(data.table)

        setDT(dat1)
        dcast(dat1, name ~ numbers, value.var = "value")

        # name 1 2 3 4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


        And, as of data.table v1.9.6 we can cast on multiple columns



        ## add an extra column
        dat1[, value2 := value * 2]

        ## cast multiple value columns
        dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

        # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





        share|improve this answer


















        • 3




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06






        • 3




          I personally think this is the best answer
          – cloudscomputes
          Oct 20 '17 at 4:37












        up vote
        27
        down vote










        up vote
        27
        down vote









        Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



        (Reference: Efficient reshaping using data.tables)



        library(data.table)

        setDT(dat1)
        dcast(dat1, name ~ numbers, value.var = "value")

        # name 1 2 3 4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


        And, as of data.table v1.9.6 we can cast on multiple columns



        ## add an extra column
        dat1[, value2 := value * 2]

        ## cast multiple value columns
        dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

        # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





        share|improve this answer














        Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



        (Reference: Efficient reshaping using data.tables)



        library(data.table)

        setDT(dat1)
        dcast(dat1, name ~ numbers, value.var = "value")

        # name 1 2 3 4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


        And, as of data.table v1.9.6 we can cast on multiple columns



        ## add an extra column
        dat1[, value2 := value * 2]

        ## cast multiple value columns
        dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

        # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
        # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
        # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 27 '16 at 22:51

























        answered Mar 27 '16 at 22:35









        SymbolixAU

        16k32885




        16k32885







        • 3




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06






        • 3




          I personally think this is the best answer
          – cloudscomputes
          Oct 20 '17 at 4:37












        • 3




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06






        • 3




          I personally think this is the best answer
          – cloudscomputes
          Oct 20 '17 at 4:37







        3




        3




        data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
        – joel.wilson
        Aug 31 '17 at 12:06




        data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
        – joel.wilson
        Aug 31 '17 at 12:06




        3




        3




        I personally think this is the best answer
        – cloudscomputes
        Oct 20 '17 at 4:37




        I personally think this is the best answer
        – cloudscomputes
        Oct 20 '17 at 4:37










        up vote
        22
        down vote













        Using your example dataframe, we could:



        xtabs(value ~ name + numbers, data = dat1)





        share|improve this answer


















        • 2




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44














        up vote
        22
        down vote













        Using your example dataframe, we could:



        xtabs(value ~ name + numbers, data = dat1)





        share|improve this answer


















        • 2




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44












        up vote
        22
        down vote










        up vote
        22
        down vote









        Using your example dataframe, we could:



        xtabs(value ~ name + numbers, data = dat1)





        share|improve this answer














        Using your example dataframe, we could:



        xtabs(value ~ name + numbers, data = dat1)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Sep 2 '16 at 7:37









        zx8754

        28.5k76394




        28.5k76394










        answered May 4 '11 at 22:58









        Jim M.

        4,48411430




        4,48411430







        • 2




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44












        • 2




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44







        2




        2




        this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
        – cloudscomputes
        Oct 20 '17 at 4:44




        this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
        – cloudscomputes
        Oct 20 '17 at 4:44










        up vote
        14
        down vote













        Other two options:



        Base package:



        df <- unstack(dat1, form = value ~ numbers)
        rownames(df) <- unique(dat1$name)
        df


        sqldf package:



        library(sqldf)
        sqldf('SELECT name,
        MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
        MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
        MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
        MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
        FROM dat1
        GROUP BY name')





        share|improve this answer
























          up vote
          14
          down vote













          Other two options:



          Base package:



          df <- unstack(dat1, form = value ~ numbers)
          rownames(df) <- unique(dat1$name)
          df


          sqldf package:



          library(sqldf)
          sqldf('SELECT name,
          MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
          MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
          MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
          MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
          FROM dat1
          GROUP BY name')





          share|improve this answer






















            up vote
            14
            down vote










            up vote
            14
            down vote









            Other two options:



            Base package:



            df <- unstack(dat1, form = value ~ numbers)
            rownames(df) <- unique(dat1$name)
            df


            sqldf package:



            library(sqldf)
            sqldf('SELECT name,
            MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
            MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
            MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
            MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
            FROM dat1
            GROUP BY name')





            share|improve this answer












            Other two options:



            Base package:



            df <- unstack(dat1, form = value ~ numbers)
            rownames(df) <- unique(dat1$name)
            df


            sqldf package:



            library(sqldf)
            sqldf('SELECT name,
            MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
            MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
            MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
            MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
            FROM dat1
            GROUP BY name')






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jul 14 '15 at 17:44









            mpalanco

            7,07313239




            7,07313239




















                up vote
                9
                down vote













                Using base R aggregate function:



                aggregate(value ~ name, dat1, I)

                # name value.1 value.2 value.3 value.4
                #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                #2 secondName -0.8259 0.1669 -0.8962 0.1681





                share|improve this answer


















                • 1




                  why use the paste function instead of the I (identity function)?
                  – Onyambu
                  Dec 25 '17 at 4:04














                up vote
                9
                down vote













                Using base R aggregate function:



                aggregate(value ~ name, dat1, I)

                # name value.1 value.2 value.3 value.4
                #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                #2 secondName -0.8259 0.1669 -0.8962 0.1681





                share|improve this answer


















                • 1




                  why use the paste function instead of the I (identity function)?
                  – Onyambu
                  Dec 25 '17 at 4:04












                up vote
                9
                down vote










                up vote
                9
                down vote









                Using base R aggregate function:



                aggregate(value ~ name, dat1, I)

                # name value.1 value.2 value.3 value.4
                #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                #2 secondName -0.8259 0.1669 -0.8962 0.1681





                share|improve this answer














                Using base R aggregate function:



                aggregate(value ~ name, dat1, I)

                # name value.1 value.2 value.3 value.4
                #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                #2 secondName -0.8259 0.1669 -0.8962 0.1681






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 25 '17 at 4:05









                Onyambu

                15.2k1519




                15.2k1519










                answered Sep 2 '16 at 7:52









                Ronak Shah

                27.5k93551




                27.5k93551







                • 1




                  why use the paste function instead of the I (identity function)?
                  – Onyambu
                  Dec 25 '17 at 4:04












                • 1




                  why use the paste function instead of the I (identity function)?
                  – Onyambu
                  Dec 25 '17 at 4:04







                1




                1




                why use the paste function instead of the I (identity function)?
                – Onyambu
                Dec 25 '17 at 4:04




                why use the paste function instead of the I (identity function)?
                – Onyambu
                Dec 25 '17 at 4:04










                up vote
                4
                down vote













                There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                The whole system is based on two primitives or operators
                cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                operators have pivot, un-pivot, one-hot encode, transpose, moving
                multiple rows and columns, and many other transforms as simple special
                cases.



                It is easy to write many different operations in terms of the
                cdata primitives. These operators can work-in memory or at big data
                scale (with databases and Apache Spark; for big data use the
                cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                variants). The transforms are controlled by a control table that
                itself is a diagram of (or picture of) the transform.




                We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                library(cdata)
                # first build the control table
                pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                columnToTakeKeysFrom = 'numbers', # this will become column headers
                columnToTakeValuesFrom = 'value', # this contains data
                sep="_") # optional for making column names

                # perform the move of data to columns
                dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                keyColumns = c('name'), # this(these) column(s) should stay untouched
                controlTable = pivotControlTable# control table above
                )
                dat_wide

                #> name numbers_1 numbers_2 numbers_3 numbers_4
                #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                share|improve this answer
























                  up vote
                  4
                  down vote













                  There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                  The whole system is based on two primitives or operators
                  cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                  operators have pivot, un-pivot, one-hot encode, transpose, moving
                  multiple rows and columns, and many other transforms as simple special
                  cases.



                  It is easy to write many different operations in terms of the
                  cdata primitives. These operators can work-in memory or at big data
                  scale (with databases and Apache Spark; for big data use the
                  cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                  variants). The transforms are controlled by a control table that
                  itself is a diagram of (or picture of) the transform.




                  We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                  library(cdata)
                  # first build the control table
                  pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                  columnToTakeKeysFrom = 'numbers', # this will become column headers
                  columnToTakeValuesFrom = 'value', # this contains data
                  sep="_") # optional for making column names

                  # perform the move of data to columns
                  dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                  keyColumns = c('name'), # this(these) column(s) should stay untouched
                  controlTable = pivotControlTable# control table above
                  )
                  dat_wide

                  #> name numbers_1 numbers_2 numbers_3 numbers_4
                  #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                  #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                  share|improve this answer






















                    up vote
                    4
                    down vote










                    up vote
                    4
                    down vote









                    There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                    The whole system is based on two primitives or operators
                    cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                    operators have pivot, un-pivot, one-hot encode, transpose, moving
                    multiple rows and columns, and many other transforms as simple special
                    cases.



                    It is easy to write many different operations in terms of the
                    cdata primitives. These operators can work-in memory or at big data
                    scale (with databases and Apache Spark; for big data use the
                    cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                    variants). The transforms are controlled by a control table that
                    itself is a diagram of (or picture of) the transform.




                    We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                    library(cdata)
                    # first build the control table
                    pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                    columnToTakeKeysFrom = 'numbers', # this will become column headers
                    columnToTakeValuesFrom = 'value', # this contains data
                    sep="_") # optional for making column names

                    # perform the move of data to columns
                    dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                    keyColumns = c('name'), # this(these) column(s) should stay untouched
                    controlTable = pivotControlTable# control table above
                    )
                    dat_wide

                    #> name numbers_1 numbers_2 numbers_3 numbers_4
                    #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                    #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                    share|improve this answer












                    There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                    The whole system is based on two primitives or operators
                    cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                    operators have pivot, un-pivot, one-hot encode, transpose, moving
                    multiple rows and columns, and many other transforms as simple special
                    cases.



                    It is easy to write many different operations in terms of the
                    cdata primitives. These operators can work-in memory or at big data
                    scale (with databases and Apache Spark; for big data use the
                    cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                    variants). The transforms are controlled by a control table that
                    itself is a diagram of (or picture of) the transform.




                    We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                    library(cdata)
                    # first build the control table
                    pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                    columnToTakeKeysFrom = 'numbers', # this will become column headers
                    columnToTakeValuesFrom = 'value', # this contains data
                    sep="_") # optional for making column names

                    # perform the move of data to columns
                    dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                    keyColumns = c('name'), # this(these) column(s) should stay untouched
                    controlTable = pivotControlTable# control table above
                    )
                    dat_wide

                    #> name numbers_1 numbers_2 numbers_3 numbers_4
                    #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                    #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 23 '17 at 23:01









                    dmi3kno

                    1,738521




                    1,738521




















                        up vote
                        2
                        down vote













                        The base reshape function works perfectly fine:



                        df <- data.frame(
                        year = c(rep(2000, 12), rep(2001, 12)),
                        month = rep(1:12, 2),
                        values = rnorm(24)
                        )
                        df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                        df_wide


                        Where




                        • idvar is the column of classes that separates rows


                        • timevar is the column of classes to cast wide


                        • v.names is the column containing numeric values


                        • direction specifies wide or long format

                        • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                        If no idvar exists, create one before using the reshape() function:



                        df$id <- c(rep("year1", 12), rep("year2", 12))
                        df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                        df_wide


                        Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                        share|improve this answer


























                          up vote
                          2
                          down vote













                          The base reshape function works perfectly fine:



                          df <- data.frame(
                          year = c(rep(2000, 12), rep(2001, 12)),
                          month = rep(1:12, 2),
                          values = rnorm(24)
                          )
                          df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                          df_wide


                          Where




                          • idvar is the column of classes that separates rows


                          • timevar is the column of classes to cast wide


                          • v.names is the column containing numeric values


                          • direction specifies wide or long format

                          • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                          If no idvar exists, create one before using the reshape() function:



                          df$id <- c(rep("year1", 12), rep("year2", 12))
                          df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                          df_wide


                          Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                          share|improve this answer
























                            up vote
                            2
                            down vote










                            up vote
                            2
                            down vote









                            The base reshape function works perfectly fine:



                            df <- data.frame(
                            year = c(rep(2000, 12), rep(2001, 12)),
                            month = rep(1:12, 2),
                            values = rnorm(24)
                            )
                            df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                            df_wide


                            Where




                            • idvar is the column of classes that separates rows


                            • timevar is the column of classes to cast wide


                            • v.names is the column containing numeric values


                            • direction specifies wide or long format

                            • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                            If no idvar exists, create one before using the reshape() function:



                            df$id <- c(rep("year1", 12), rep("year2", 12))
                            df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                            df_wide


                            Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                            share|improve this answer














                            The base reshape function works perfectly fine:



                            df <- data.frame(
                            year = c(rep(2000, 12), rep(2001, 12)),
                            month = rep(1:12, 2),
                            values = rnorm(24)
                            )
                            df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                            df_wide


                            Where




                            • idvar is the column of classes that separates rows


                            • timevar is the column of classes to cast wide


                            • v.names is the column containing numeric values


                            • direction specifies wide or long format

                            • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                            If no idvar exists, create one before using the reshape() function:



                            df$id <- c(rep("year1", 12), rep("year2", 12))
                            df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                            df_wide


                            Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Aug 29 at 3:00









                            SymbolixAU

                            16k32885




                            16k32885










                            answered Aug 2 at 23:50









                            Adam Erickson

                            1,6141320




                            1,6141320



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5890584%2fhow-to-reshape-data-from-long-to-wide-format%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Use pre created SQLite database for Android project in kotlin

                                Darth Vader #20

                                Ondo