extracting rows depending on the value of multiple columns










0















I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.



dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y


So in this case I would need to return just the following:



 col1 col2 col3
1 A Z
1 A Y


If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?



Thanks










share|improve this question
























  • Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

    – Cris
    Nov 14 '18 at 23:59












  • Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

    – Gopala
    Nov 15 '18 at 0:30











  • @ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

    – user1658170
    Nov 15 '18 at 1:04















0















I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.



dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y


So in this case I would need to return just the following:



 col1 col2 col3
1 A Z
1 A Y


If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?



Thanks










share|improve this question
























  • Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

    – Cris
    Nov 14 '18 at 23:59












  • Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

    – Gopala
    Nov 15 '18 at 0:30











  • @ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

    – user1658170
    Nov 15 '18 at 1:04













0












0








0








I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.



dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y


So in this case I would need to return just the following:



 col1 col2 col3
1 A Z
1 A Y


If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?



Thanks










share|improve this question
















I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.



dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
"col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
"col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y


So in this case I would need to return just the following:



 col1 col2 col3
1 A Z
1 A Y


If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?



Thanks







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 1:03







user1658170

















asked Nov 14 '18 at 23:29









user1658170user1658170

3011718




3011718












  • Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

    – Cris
    Nov 14 '18 at 23:59












  • Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

    – Gopala
    Nov 15 '18 at 0:30











  • @ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

    – user1658170
    Nov 15 '18 at 1:04

















  • Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

    – Cris
    Nov 14 '18 at 23:59












  • Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

    – Gopala
    Nov 15 '18 at 0:30











  • @ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

    – user1658170
    Nov 15 '18 at 1:04
















Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59






Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59














Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30





Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30













@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04





@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04












2 Answers
2






active

oldest

votes


















1














Try this with:



library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()


If dat is as follows:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))


You get no rows as follows:



# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>


If dat is as you provided in the original post, you get the output as you needed:



# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.






share|improve this answer























  • This does the trick. Thanks

    – user1658170
    Nov 15 '18 at 1:12


















0














Nice easy dplyr solution:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:



dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()





share|improve this answer

























  • This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

    – Gopala
    Nov 15 '18 at 0:33










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310320%2fextracting-rows-depending-on-the-value-of-multiple-columns%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Try this with:



library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()


If dat is as follows:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))


You get no rows as follows:



# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>


If dat is as you provided in the original post, you get the output as you needed:



# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.






share|improve this answer























  • This does the trick. Thanks

    – user1658170
    Nov 15 '18 at 1:12















1














Try this with:



library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()


If dat is as follows:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))


You get no rows as follows:



# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>


If dat is as you provided in the original post, you get the output as you needed:



# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.






share|improve this answer























  • This does the trick. Thanks

    – user1658170
    Nov 15 '18 at 1:12













1












1








1







Try this with:



library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()


If dat is as follows:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))


You get no rows as follows:



# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>


If dat is as you provided in the original post, you get the output as you needed:



# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.






share|improve this answer













Try this with:



library(dplyr)
dat %>%
group_by(col1, col2) %>%
filter(length(unique(col3)) > 1) %>%
distinct()


If dat is as follows:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA,
-15L))


You get no rows as follows:



# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>


If dat is as you provided in the original post, you get the output as you needed:



# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 15 '18 at 0:37









GopalaGopala

7,14322049




7,14322049












  • This does the trick. Thanks

    – user1658170
    Nov 15 '18 at 1:12

















  • This does the trick. Thanks

    – user1658170
    Nov 15 '18 at 1:12
















This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12





This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12













0














Nice easy dplyr solution:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:



dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()





share|improve this answer

























  • This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

    – Gopala
    Nov 15 '18 at 0:33















0














Nice easy dplyr solution:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:



dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()





share|improve this answer

























  • This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

    – Gopala
    Nov 15 '18 at 0:33













0












0








0







Nice easy dplyr solution:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:



dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()





share|improve this answer















Nice easy dplyr solution:



dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z",
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
col1 col2 col3
<dbl> <chr> <chr>
1 1 A Z
2 1 A Y


In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:



dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 0:08

























answered Nov 15 '18 at 0:03









Mako212Mako212

4,2971927




4,2971927












  • This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

    – Gopala
    Nov 15 '18 at 0:33

















  • This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

    – Gopala
    Nov 15 '18 at 0:33
















This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33





This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310320%2fextracting-rows-depending-on-the-value-of-multiple-columns%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo