R Summarise dplyr grouped data with certain rows excluded based on another column










2















I would like to summarize data across multiple columns, based on all rows other than rows with a certain value in a separate grouping variable column. For example, in the df below, I want to get the medians of A, B, C, D, and E based on values from rows that are not assigned to the cluster matching a given row.



df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

df %>%
group_by(cluster) %>%
summarise_at(toupper(letters[1:5]), funs(m = fun_i_need_help_with(.)))


fun_i_need_help_with would give the equivalent of:



 first row: median(df[which(df$cluster != 1), "A"])
second row: median(df[which(df$cluster != 2), "A"])
and so on...


I can do it with nested for loops, but it's pretty slow to run and doesn't seem like a good R-like solution.



for(col in toupper(letters[1:5]))
for(clust in unique(df$cluster))
df[which(df$cluster == clust), col] <-
median(df[which(df$cluster != clust), col])











share|improve this question

















  • 1





    Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

    – Taiki Sakai
    Nov 15 '18 at 1:28











  • Thanks, that is good to know!

    – lblum
    Nov 15 '18 at 17:51















2















I would like to summarize data across multiple columns, based on all rows other than rows with a certain value in a separate grouping variable column. For example, in the df below, I want to get the medians of A, B, C, D, and E based on values from rows that are not assigned to the cluster matching a given row.



df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

df %>%
group_by(cluster) %>%
summarise_at(toupper(letters[1:5]), funs(m = fun_i_need_help_with(.)))


fun_i_need_help_with would give the equivalent of:



 first row: median(df[which(df$cluster != 1), "A"])
second row: median(df[which(df$cluster != 2), "A"])
and so on...


I can do it with nested for loops, but it's pretty slow to run and doesn't seem like a good R-like solution.



for(col in toupper(letters[1:5]))
for(clust in unique(df$cluster))
df[which(df$cluster == clust), col] <-
median(df[which(df$cluster != clust), col])











share|improve this question

















  • 1





    Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

    – Taiki Sakai
    Nov 15 '18 at 1:28











  • Thanks, that is good to know!

    – lblum
    Nov 15 '18 at 17:51













2












2








2








I would like to summarize data across multiple columns, based on all rows other than rows with a certain value in a separate grouping variable column. For example, in the df below, I want to get the medians of A, B, C, D, and E based on values from rows that are not assigned to the cluster matching a given row.



df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

df %>%
group_by(cluster) %>%
summarise_at(toupper(letters[1:5]), funs(m = fun_i_need_help_with(.)))


fun_i_need_help_with would give the equivalent of:



 first row: median(df[which(df$cluster != 1), "A"])
second row: median(df[which(df$cluster != 2), "A"])
and so on...


I can do it with nested for loops, but it's pretty slow to run and doesn't seem like a good R-like solution.



for(col in toupper(letters[1:5]))
for(clust in unique(df$cluster))
df[which(df$cluster == clust), col] <-
median(df[which(df$cluster != clust), col])











share|improve this question














I would like to summarize data across multiple columns, based on all rows other than rows with a certain value in a separate grouping variable column. For example, in the df below, I want to get the medians of A, B, C, D, and E based on values from rows that are not assigned to the cluster matching a given row.



df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

df %>%
group_by(cluster) %>%
summarise_at(toupper(letters[1:5]), funs(m = fun_i_need_help_with(.)))


fun_i_need_help_with would give the equivalent of:



 first row: median(df[which(df$cluster != 1), "A"])
second row: median(df[which(df$cluster != 2), "A"])
and so on...


I can do it with nested for loops, but it's pretty slow to run and doesn't seem like a good R-like solution.



for(col in toupper(letters[1:5]))
for(clust in unique(df$cluster))
df[which(df$cluster == clust), col] <-
median(df[which(df$cluster != clust), col])








r dplyr






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 15 '18 at 1:07









lblumlblum

8314




8314







  • 1





    Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

    – Taiki Sakai
    Nov 15 '18 at 1:28











  • Thanks, that is good to know!

    – lblum
    Nov 15 '18 at 17:51












  • 1





    Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

    – Taiki Sakai
    Nov 15 '18 at 1:28











  • Thanks, that is good to know!

    – lblum
    Nov 15 '18 at 17:51







1




1





Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

– Taiki Sakai
Nov 15 '18 at 1:28





Random note: LETTERS[1:5] is the same as toupper(letters[1:5])

– Taiki Sakai
Nov 15 '18 at 1:28













Thanks, that is good to know!

– lblum
Nov 15 '18 at 17:51





Thanks, that is good to know!

– lblum
Nov 15 '18 at 17:51












1 Answer
1






active

oldest

votes


















2














A solution using the tidyverse.



set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
~df %>%
filter(cluster != .x) %>%
summarize_at(vars(-cluster), funs(median(.))) %>%
# Add a label to show the content of this row is not from a certain cluster number
mutate(not_cluster = .x))
df2
# A B C D E not_cluster
# 1 2.070508 5.110683 0.1820251 3.553918 0.7920827 1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446 2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568 3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152 4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152 5





share|improve this answer























  • Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

    – lblum
    Nov 15 '18 at 17:54










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311031%2fr-summarise-dplyr-grouped-data-with-certain-rows-excluded-based-on-another-colum%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














A solution using the tidyverse.



set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
~df %>%
filter(cluster != .x) %>%
summarize_at(vars(-cluster), funs(median(.))) %>%
# Add a label to show the content of this row is not from a certain cluster number
mutate(not_cluster = .x))
df2
# A B C D E not_cluster
# 1 2.070508 5.110683 0.1820251 3.553918 0.7920827 1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446 2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568 3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152 4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152 5





share|improve this answer























  • Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

    – lblum
    Nov 15 '18 at 17:54















2














A solution using the tidyverse.



set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
~df %>%
filter(cluster != .x) %>%
summarize_at(vars(-cluster), funs(median(.))) %>%
# Add a label to show the content of this row is not from a certain cluster number
mutate(not_cluster = .x))
df2
# A B C D E not_cluster
# 1 2.070508 5.110683 0.1820251 3.553918 0.7920827 1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446 2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568 3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152 4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152 5





share|improve this answer























  • Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

    – lblum
    Nov 15 '18 at 17:54













2












2








2







A solution using the tidyverse.



set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
~df %>%
filter(cluster != .x) %>%
summarize_at(vars(-cluster), funs(median(.))) %>%
# Add a label to show the content of this row is not from a certain cluster number
mutate(not_cluster = .x))
df2
# A B C D E not_cluster
# 1 2.070508 5.110683 0.1820251 3.553918 0.7920827 1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446 2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568 3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152 4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152 5





share|improve this answer













A solution using the tidyverse.



set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
A = rnorm(10, 2),
B = rnorm(10, 5),
C = rnorm(10, 0.4),
D = rnorm(10, 3),
E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
~df %>%
filter(cluster != .x) %>%
summarize_at(vars(-cluster), funs(median(.))) %>%
# Add a label to show the content of this row is not from a certain cluster number
mutate(not_cluster = .x))
df2
# A B C D E not_cluster
# 1 2.070508 5.110683 0.1820251 3.553918 0.7920827 1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446 2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568 3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152 4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152 5






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 15 '18 at 1:31









wwwwww

28.4k112344




28.4k112344












  • Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

    – lblum
    Nov 15 '18 at 17:54

















  • Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

    – lblum
    Nov 15 '18 at 17:54
















Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

– lblum
Nov 15 '18 at 17:54





Thanks, this works exactly as I hoped. I'll have to read a bit about map_dfr so that I really understand what's happening there.

– lblum
Nov 15 '18 at 17:54



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311031%2fr-summarise-dplyr-grouped-data-with-certain-rows-excluded-based-on-another-colum%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo