Using dplyr to get multiple max values of a dataframe










2















I have a dataframe that looks something like this:



 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>


In all rows, there is either a Variable1/Value1 pair, or a Variable2/Value2 pair.
I'm trying to use dplyr with a combination of group_by and filter functions to generate a dataframe that has (1) only one row for each name and (2) the max-Value pairs for both 1 and 2. In this case, my final table would look something like:



 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>


I can't seem to get the correct flow of grouping to execute this properly. One solution I've thought of is to split the dataframe into two for each pair, filter, using group_by and filter on each separately, and then joining them back together based on Name, but I was wondering if anybody had a more straightforward solution to offer. Maybe something with summarise?



Thank you!



EDIT: Corrected missing NA values in table.










share|improve this question
























  • aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

    – W-B
    Nov 13 '18 at 1:55











  • In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

    – r2evans
    Nov 13 '18 at 2:06












  • These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

    – TactfulCactus
    Nov 13 '18 at 2:23












  • Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

    – neilfws
    Nov 13 '18 at 2:34











  • They were meant to be NA. Thanks! Fixed.

    – TactfulCactus
    Nov 13 '18 at 2:50















2















I have a dataframe that looks something like this:



 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>


In all rows, there is either a Variable1/Value1 pair, or a Variable2/Value2 pair.
I'm trying to use dplyr with a combination of group_by and filter functions to generate a dataframe that has (1) only one row for each name and (2) the max-Value pairs for both 1 and 2. In this case, my final table would look something like:



 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>


I can't seem to get the correct flow of grouping to execute this properly. One solution I've thought of is to split the dataframe into two for each pair, filter, using group_by and filter on each separately, and then joining them back together based on Name, but I was wondering if anybody had a more straightforward solution to offer. Maybe something with summarise?



Thank you!



EDIT: Corrected missing NA values in table.










share|improve this question
























  • aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

    – W-B
    Nov 13 '18 at 1:55











  • In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

    – r2evans
    Nov 13 '18 at 2:06












  • These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

    – TactfulCactus
    Nov 13 '18 at 2:23












  • Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

    – neilfws
    Nov 13 '18 at 2:34











  • They were meant to be NA. Thanks! Fixed.

    – TactfulCactus
    Nov 13 '18 at 2:50













2












2








2








I have a dataframe that looks something like this:



 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>


In all rows, there is either a Variable1/Value1 pair, or a Variable2/Value2 pair.
I'm trying to use dplyr with a combination of group_by and filter functions to generate a dataframe that has (1) only one row for each name and (2) the max-Value pairs for both 1 and 2. In this case, my final table would look something like:



 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>


I can't seem to get the correct flow of grouping to execute this properly. One solution I've thought of is to split the dataframe into two for each pair, filter, using group_by and filter on each separately, and then joining them back together based on Name, but I was wondering if anybody had a more straightforward solution to offer. Maybe something with summarise?



Thank you!



EDIT: Corrected missing NA values in table.










share|improve this question
















I have a dataframe that looks something like this:



 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>


In all rows, there is either a Variable1/Value1 pair, or a Variable2/Value2 pair.
I'm trying to use dplyr with a combination of group_by and filter functions to generate a dataframe that has (1) only one row for each name and (2) the max-Value pairs for both 1 and 2. In this case, my final table would look something like:



 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>


I can't seem to get the correct flow of grouping to execute this properly. One solution I've thought of is to split the dataframe into two for each pair, filter, using group_by and filter on each separately, and then joining them back together based on Name, but I was wondering if anybody had a more straightforward solution to offer. Maybe something with summarise?



Thank you!



EDIT: Corrected missing NA values in table.







r dplyr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 2:49







TactfulCactus

















asked Nov 13 '18 at 1:43









TactfulCactusTactfulCactus

233




233












  • aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

    – W-B
    Nov 13 '18 at 1:55











  • In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

    – r2evans
    Nov 13 '18 at 2:06












  • These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

    – TactfulCactus
    Nov 13 '18 at 2:23












  • Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

    – neilfws
    Nov 13 '18 at 2:34











  • They were meant to be NA. Thanks! Fixed.

    – TactfulCactus
    Nov 13 '18 at 2:50

















  • aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

    – W-B
    Nov 13 '18 at 1:55











  • In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

    – r2evans
    Nov 13 '18 at 2:06












  • These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

    – TactfulCactus
    Nov 13 '18 at 2:23












  • Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

    – neilfws
    Nov 13 '18 at 2:34











  • They were meant to be NA. Thanks! Fixed.

    – TactfulCactus
    Nov 13 '18 at 2:50
















aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55





aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55













In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06






In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06














These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23






These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23














Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34





Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34













They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50





They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50












1 Answer
1






active

oldest

votes


















1














library(dplyr)

df[is.na(df)] <- 0

df1 <-
df %>%
select(1:3)

df1_max <-
df %>%
group_by(Name) %>%
summarise(Value1 = max(Value1))

df2 <-
df %>%
select(c(1, 4:5))

df2_max <-
df %>%
group_by(Name) %>%
summarise(Value2 = max(Value2))

result <- left_join(
left_join(df1_max, df1),
left_join(df2_max, df2) %>%
distinct()
) %>%
select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA


Result:



result

# A tibble: 3 x 5
Name Variable1 Value1 Variable2 Value2
<chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA


Notes:



  • In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.


  • The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.


  • You are missing two values in your data and I considered them to be NA.






share|improve this answer




















  • 1





    Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

    – TactfulCactus
    Nov 13 '18 at 2:51










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272580%2fusing-dplyr-to-get-multiple-max-values-of-a-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














library(dplyr)

df[is.na(df)] <- 0

df1 <-
df %>%
select(1:3)

df1_max <-
df %>%
group_by(Name) %>%
summarise(Value1 = max(Value1))

df2 <-
df %>%
select(c(1, 4:5))

df2_max <-
df %>%
group_by(Name) %>%
summarise(Value2 = max(Value2))

result <- left_join(
left_join(df1_max, df1),
left_join(df2_max, df2) %>%
distinct()
) %>%
select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA


Result:



result

# A tibble: 3 x 5
Name Variable1 Value1 Variable2 Value2
<chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA


Notes:



  • In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.


  • The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.


  • You are missing two values in your data and I considered them to be NA.






share|improve this answer




















  • 1





    Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

    – TactfulCactus
    Nov 13 '18 at 2:51















1














library(dplyr)

df[is.na(df)] <- 0

df1 <-
df %>%
select(1:3)

df1_max <-
df %>%
group_by(Name) %>%
summarise(Value1 = max(Value1))

df2 <-
df %>%
select(c(1, 4:5))

df2_max <-
df %>%
group_by(Name) %>%
summarise(Value2 = max(Value2))

result <- left_join(
left_join(df1_max, df1),
left_join(df2_max, df2) %>%
distinct()
) %>%
select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA


Result:



result

# A tibble: 3 x 5
Name Variable1 Value1 Variable2 Value2
<chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA


Notes:



  • In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.


  • The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.


  • You are missing two values in your data and I considered them to be NA.






share|improve this answer




















  • 1





    Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

    – TactfulCactus
    Nov 13 '18 at 2:51













1












1








1







library(dplyr)

df[is.na(df)] <- 0

df1 <-
df %>%
select(1:3)

df1_max <-
df %>%
group_by(Name) %>%
summarise(Value1 = max(Value1))

df2 <-
df %>%
select(c(1, 4:5))

df2_max <-
df %>%
group_by(Name) %>%
summarise(Value2 = max(Value2))

result <- left_join(
left_join(df1_max, df1),
left_join(df2_max, df2) %>%
distinct()
) %>%
select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA


Result:



result

# A tibble: 3 x 5
Name Variable1 Value1 Variable2 Value2
<chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA


Notes:



  • In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.


  • The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.


  • You are missing two values in your data and I considered them to be NA.






share|improve this answer















library(dplyr)

df[is.na(df)] <- 0

df1 <-
df %>%
select(1:3)

df1_max <-
df %>%
group_by(Name) %>%
summarise(Value1 = max(Value1))

df2 <-
df %>%
select(c(1, 4:5))

df2_max <-
df %>%
group_by(Name) %>%
summarise(Value2 = max(Value2))

result <- left_join(
left_join(df1_max, df1),
left_join(df2_max, df2) %>%
distinct()
) %>%
select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA


Result:



result

# A tibble: 3 x 5
Name Variable1 Value1 Variable2 Value2
<chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA


Notes:



  • In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.


  • The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.


  • You are missing two values in your data and I considered them to be NA.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 2:57

























answered Nov 13 '18 at 2:34









prosoitosprosoitos

935419




935419







  • 1





    Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

    – TactfulCactus
    Nov 13 '18 at 2:51












  • 1





    Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

    – TactfulCactus
    Nov 13 '18 at 2:51







1




1





Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51





Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272580%2fusing-dplyr-to-get-multiple-max-values-of-a-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo