extracting rows depending on the value of multiple columns
I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.
dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))
col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y
So in this case I would need to return just the following:
col1 col2 col3
1 A Z
1 A Y
If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?
Thanks
r
add a comment |
I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.
dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))
col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y
So in this case I would need to return just the following:
col1 col2 col3
1 A Z
1 A Y
If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?
Thanks
r
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by usingas.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.
– Cris
Nov 14 '18 at 23:59
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04
add a comment |
I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.
dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))
col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y
So in this case I would need to return just the following:
col1 col2 col3
1 A Z
1 A Y
If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?
Thanks
r
I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.
dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))
col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y
So in this case I would need to return just the following:
col1 col2 col3
1 A Z
1 A Y
If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?
Thanks
r
r
edited Nov 15 '18 at 1:03
user1658170
asked Nov 14 '18 at 23:29
user1658170user1658170
3011718
3011718
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by usingas.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.
– Cris
Nov 14 '18 at 23:59
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04
add a comment |
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by usingas.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.
– Cris
Nov 14 '18 at 23:59
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using
as.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.– Cris
Nov 14 '18 at 23:59
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using
as.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.– Cris
Nov 14 '18 at 23:59
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04
add a comment |
2 Answers
2
active
oldest
votes
Try this with:
library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()
If dat
is as follows:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))
You get no rows as follows:
# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>
If dat
is as you provided in the original post, you get the output as you needed:
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
Notice that I am using length(unique())
in the filter instead of n_distinct
because there is a dplyr
bug that makes n_distinct
in a filter of a grouped data frame run extremely slowly.
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
add a comment |
Nice easy dplyr
solution:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
dat %>% group_by(col1,col2) %>% distinct()
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
In your version of dat
you have a space in one of your col2
values, if that's not a typo, you'd need to fix that first so that distinct()
aggregates correctly:
dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310320%2fextracting-rows-depending-on-the-value-of-multiple-columns%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this with:
library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()
If dat
is as follows:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))
You get no rows as follows:
# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>
If dat
is as you provided in the original post, you get the output as you needed:
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
Notice that I am using length(unique())
in the filter instead of n_distinct
because there is a dplyr
bug that makes n_distinct
in a filter of a grouped data frame run extremely slowly.
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
add a comment |
Try this with:
library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()
If dat
is as follows:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))
You get no rows as follows:
# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>
If dat
is as you provided in the original post, you get the output as you needed:
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
Notice that I am using length(unique())
in the filter instead of n_distinct
because there is a dplyr
bug that makes n_distinct
in a filter of a grouped data frame run extremely slowly.
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
add a comment |
Try this with:
library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()
If dat
is as follows:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))
You get no rows as follows:
# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>
If dat
is as you provided in the original post, you get the output as you needed:
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
Notice that I am using length(unique())
in the filter instead of n_distinct
because there is a dplyr
bug that makes n_distinct
in a filter of a grouped data frame run extremely slowly.
Try this with:
library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()
If dat
is as follows:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))
You get no rows as follows:
# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>
If dat
is as you provided in the original post, you get the output as you needed:
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
Notice that I am using length(unique())
in the filter instead of n_distinct
because there is a dplyr
bug that makes n_distinct
in a filter of a grouped data frame run extremely slowly.
answered Nov 15 '18 at 0:37
GopalaGopala
7,14322049
7,14322049
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
add a comment |
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
This does the trick. Thanks
– user1658170
Nov 15 '18 at 1:12
add a comment |
Nice easy dplyr
solution:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
dat %>% group_by(col1,col2) %>% distinct()
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
In your version of dat
you have a space in one of your col2
values, if that's not a typo, you'd need to fix that first so that distinct()
aggregates correctly:
dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
add a comment |
Nice easy dplyr
solution:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
dat %>% group_by(col1,col2) %>% distinct()
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
In your version of dat
you have a space in one of your col2
values, if that's not a typo, you'd need to fix that first so that distinct()
aggregates correctly:
dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
add a comment |
Nice easy dplyr
solution:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
dat %>% group_by(col1,col2) %>% distinct()
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
In your version of dat
you have a space in one of your col2
values, if that's not a typo, you'd need to fix that first so that distinct()
aggregates correctly:
dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()
Nice easy dplyr
solution:
dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))
library(dplyr)
dat %>% group_by(col1,col2) %>% distinct()
# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y
In your version of dat
you have a space in one of your col2
values, if that's not a typo, you'd need to fix that first so that distinct()
aggregates correctly:
dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()
edited Nov 15 '18 at 0:08
answered Nov 15 '18 at 0:03
Mako212Mako212
4,2971927
4,2971927
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
add a comment |
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.
– Gopala
Nov 15 '18 at 0:33
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310320%2fextracting-rows-depending-on-the-value-of-multiple-columns%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using
as.data.frame(apply(dat, 2, unique))
. Let me know if this works for you.– Cris
Nov 14 '18 at 23:59
Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?
– Gopala
Nov 15 '18 at 0:30
@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.
– user1658170
Nov 15 '18 at 1:04