Using dplyr to get multiple max values of a dataframe

I have a dataframe that looks something like this:

 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>

In all rows, there is either a Variable1/Value1 pair, or a Variable2/Value2 pair.
I'm trying to use dplyr with a combination of group_by and filter functions to generate a dataframe that has (1) only one row for each name and (2) the max-Value pairs for both 1 and 2. In this case, my final table would look something like:

 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>

I can't seem to get the correct flow of grouping to execute this properly. One solution I've thought of is to split the dataframe into two for each pair, filter, using group_by and filter on each separately, and then joining them back together based on Name, but I was wondering if anybody had a more straightforward solution to offer. Maybe something with summarise?

Thank you!

EDIT: Corrected missing NA values in table.

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55

In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06

These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23

Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34

They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50

add a comment |

I have a dataframe that looks something like this:

 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>

 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>

Thank you!

EDIT: Corrected missing NA values in table.

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55

In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06

These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23

Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34

They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50

add a comment |

I have a dataframe that looks something like this:

 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>

 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>

Thank you!

EDIT: Corrected missing NA values in table.

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

I have a dataframe that looks something like this:

 Name Variable1 Value1 Variable2 Value2
1 Michael A 2 <NA> <NA>
2 Michael B 3 <NA> <NA>
3 Michael <NA> <NA> X 10
4 John B 5 <NA> <NA>
5 John E 3 <NA> <NA>
6 John <NA> <NA> Y 20
7 John <NA> <NA> Z 40
8 Paul C 10 <NA> <NA>
9 Paul D 5 <NA> <NA>

 Name Variable1 Value1 Variable2 Value2
1 Michael B 3 X 10
2 John B 5 Z 40
3 Paul C 10 <NA> <NA>

Thank you!

EDIT: Corrected missing NA values in table.

r dplyr

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

edited Nov 13 '18 at 2:49

asked Nov 13 '18 at 1:43

TactfulCactus

233

asked Nov 13 '18 at 1:43

TactfulCactus

233

asked Nov 13 '18 at 1:43

TactfulCactus

233

aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55

In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06

These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23

Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34

They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50

add a comment |

aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55

In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06

These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23

Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34

They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50

aggregate(df, by=list(Name), + FUN=max, na.rm=TRUE)

– W-B
Nov 13 '18 at 1:55

In a dplyr method: group_by(x, Name) %>% summarise_all(max). In data.table-speak: x[,lapply(.SD, max),by="Name"].

– r2evans
Nov 13 '18 at 2:06

These solutions don't seem to work -- both give me errors regarding dealing with factors, presumably because max doesn't work with the columns Variable1 and Variable2. I want to be able to get the max of Value1 and Value2, and then get the associated Variable as well.

– TactfulCactus
Nov 13 '18 at 2:23

Are the missing value for Variable 2 in rows 4 and 5 blank, or NA?

– neilfws
Nov 13 '18 at 2:34

They were meant to be NA. Thanks! Fixed.

– TactfulCactus
Nov 13 '18 at 2:50

add a comment |

1 Answer
1

active

oldest

votes

library(dplyr)

df[is.na(df)] <- 0

df1 <-
 df %>%
 select(1:3)

df1_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value1 = max(Value1))

df2 <-
 df %>%
 select(c(1, 4:5))

df2_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value2 = max(Value2))

result <- left_join(
 left_join(df1_max, df1),
 left_join(df2_max, df2) %>%
 distinct()
) %>%
 select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA

Result:

result

# A tibble: 3 x 5
 Name Variable1 Value1 Variable2 Value2
 <chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA

Notes:

In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.

The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.

You are missing two values in your data and I considered them to be NA.

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

1

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272580%2fusing-dplyr-to-get-multiple-max-values-of-a-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

library(dplyr)

df[is.na(df)] <- 0

df1 <-
 df %>%
 select(1:3)

df1_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value1 = max(Value1))

df2 <-
 df %>%
 select(c(1, 4:5))

df2_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value2 = max(Value2))

result <- left_join(
 left_join(df1_max, df1),
 left_join(df2_max, df2) %>%
 distinct()
) %>%
 select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA

Result:

result

# A tibble: 3 x 5
 Name Variable1 Value1 Variable2 Value2
 <chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA

Notes:

In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.

The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.

You are missing two values in your data and I considered them to be NA.

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

1

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

add a comment |

library(dplyr)

df[is.na(df)] <- 0

df1 <-
 df %>%
 select(1:3)

df1_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value1 = max(Value1))

df2 <-
 df %>%
 select(c(1, 4:5))

df2_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value2 = max(Value2))

result <- left_join(
 left_join(df1_max, df1),
 left_join(df2_max, df2) %>%
 distinct()
) %>%
 select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA

Result:

result

# A tibble: 3 x 5
 Name Variable1 Value1 Variable2 Value2
 <chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA

Notes:

In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.

The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.

You are missing two values in your data and I considered them to be NA.

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

1

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

add a comment |

library(dplyr)

df[is.na(df)] <- 0

df1 <-
 df %>%
 select(1:3)

df1_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value1 = max(Value1))

df2 <-
 df %>%
 select(c(1, 4:5))

df2_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value2 = max(Value2))

result <- left_join(
 left_join(df1_max, df1),
 left_join(df2_max, df2) %>%
 distinct()
) %>%
 select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA

Result:

result

# A tibble: 3 x 5
 Name Variable1 Value1 Variable2 Value2
 <chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA

Notes:

In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.

The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.

You are missing two values in your data and I considered them to be NA.

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

library(dplyr)

df[is.na(df)] <- 0

df1 <-
 df %>%
 select(1:3)

df1_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value1 = max(Value1))

df2 <-
 df %>%
 select(c(1, 4:5))

df2_max <-
 df %>%
 group_by(Name) %>%
 summarise(Value2 = max(Value2))

result <- left_join(
 left_join(df1_max, df1),
 left_join(df2_max, df2) %>%
 distinct()
) %>%
 select(c(1, 3, 2, 5, 4))

result[result == 0] <- NA

Result:

result

# A tibble: 3 x 5
 Name Variable1 Value1 Variable2 Value2
 <chr> <chr> <dbl> <chr> <dbl>
1 John B 5 Z 40
2 Michael B 3 X 10
3 Paul C 10 <NA> NA

Notes:

In data frames, the elements in a row are not independent from each other. So you shouldn't be able to dissociate them from each other. But this is what you are trying to do since the max value for Variable1 and the max value for Variable2 are not on the same row, yet you want your result to have the max for each (thus creating a row that does not exist in your original data frame). This is why I split your data frame into 2, then re-associated them with a join.

The max of NA and a value is NA (we don't know what value NA is since it is missing, so we cannot know what the max is). Yet, you are considering NA to be inferior to any value. Since this is not how R works, I had to assign 0 to your NA to get the max the way you consider it, then reassign NA to 0 at the end.

You are missing two values in your data and I considered them to be NA.

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

edited Nov 13 '18 at 2:57

answered Nov 13 '18 at 2:34

prosoitos

935419

answered Nov 13 '18 at 2:34

prosoitos

935419

answered Nov 13 '18 at 2:34

prosoitos

935419

1

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

add a comment |

1

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

Okay! That all makes sense to me and was what I was thinking, though I didn't know about the max function with NA. Thanks for the explanation!

– TactfulCactus
Nov 13 '18 at 2:51

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb