extracting rows depending on the value of multiple columns

I have a large dataframe that is simplified below. Given the following data frame structure, I need to collapse to return two distinct rows where col3 has different values but col1 and col2 have unique values.

dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
 "col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
 "col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

 col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y

So in this case I would need to return just the following:

 col1 col2 col3
 1 A Z
 1 A Y

If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?

Thanks

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59

Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30

@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04

add a comment |

dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
 "col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
 "col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

 col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y

So in this case I would need to return just the following:

 col1 col2 col3
 1 A Z
 1 A Y

If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?

Thanks

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59

Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30

@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04

add a comment |

dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
 "col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
 "col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

 col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y

So in this case I would need to return just the following:

 col1 col2 col3
 1 A Z
 1 A Y

If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?

Thanks

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

dat <- data.frame("col1" = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
 "col2" = c( "A","A", "A", "A", "A", "A", "A", "A", "A", "A"," A", "A", "A", "A", "A"),
 "col3" = c( "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y"))

 col1 col2 col3
1 1 A Z
2 1 A Z
3 1 A Z
4 1 A Z
5 1 A Z
6 1 A Z
7 1 A Z
8 1 A Z
9 1 A Z
10 1 A Y
11 1 A Y
12 1 A Y
13 1 A Y
14 1 A Y
15 1 A Y

So in this case I would need to return just the following:

 col1 col2 col3
 1 A Z
 1 A Y

If however, col3 was only z's I would return no rows. I can get counts of these data with the table function but I need to see the actual rows.
Any ideas?

Thanks

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

edited Nov 15 '18 at 1:03

asked Nov 14 '18 at 23:29

user1658170

3011718

asked Nov 14 '18 at 23:29

user1658170

3011718

asked Nov 14 '18 at 23:29

user1658170

3011718

Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59

Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30

@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04

add a comment |

Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59

Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30

@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04

Well, I managed to get the desired output but I wonder if it work with your real data. I got it by using as.data.frame(apply(dat, 2, unique)). Let me know if this works for you.

– Cris
Nov 14 '18 at 23:59

Where do I see col1 = col2 in the example data? Do you mean for each unique combination of col1, col2 values?

– Gopala
Nov 15 '18 at 0:30

@ Gopala, yes, that's what I mean. I apologize for the poor wording. I have updated to make more sense.

– user1658170
Nov 15 '18 at 1:04

add a comment |

2 Answers
2

active

oldest

votes

Try this with:

library(dplyr)
dat %>%
 group_by(col1, col2) %>%
 filter(length(unique(col3)) > 1) %>%
 distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

answered Nov 15 '18 at 0:37

Gopala

7,14322049

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

add a comment |

Nice easy dplyr solution:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:

dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310320%2fextracting-rows-depending-on-the-value-of-multiple-columns%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Try this with:

library(dplyr)
dat %>%
 group_by(col1, col2) %>%
 filter(length(unique(col3)) > 1) %>%
 distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

answered Nov 15 '18 at 0:37

Gopala

7,14322049

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

add a comment |

Try this with:

library(dplyr)
dat %>%
 group_by(col1, col2) %>%
 filter(length(unique(col3)) > 1) %>%
 distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

answered Nov 15 '18 at 0:37

Gopala

7,14322049

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

add a comment |

Try this with:

library(dplyr)
dat %>%
 group_by(col1, col2) %>%
 filter(length(unique(col3)) > 1) %>%
 distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

answered Nov 15 '18 at 0:37

Gopala

7,14322049

Try this with:

library(dplyr)
dat %>%
 group_by(col1, col2) %>%
 filter(length(unique(col3)) > 1) %>%
 distinct()

If dat is as follows:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z")), class = "data.frame", row.names = c(NA, 
-15L))

You get no rows as follows:

# A tibble: 0 x 3
# Groups: col1, col2 [0]
# ... with 3 variables: col1 <dbl>, col2 <chr>, col3 <chr>

If dat is as you provided in the original post, you get the output as you needed:

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

Notice that I am using length(unique()) in the filter instead of n_distinct because there is a dplyr bug that makes n_distinct in a filter of a grouped data frame run extremely slowly.

answered Nov 15 '18 at 0:37

Gopala

7,14322049

answered Nov 15 '18 at 0:37

Gopala

7,14322049

answered Nov 15 '18 at 0:37

Gopala

7,14322049

answered Nov 15 '18 at 0:37

Gopala

7,14322049

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

add a comment |

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

This does the trick. Thanks

– user1658170
Nov 15 '18 at 1:12

add a comment |

Nice easy dplyr solution:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:

dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

add a comment |

Nice easy dplyr solution:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:

dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

add a comment |

Nice easy dplyr solution:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:

dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

Nice easy dplyr solution:

dat <- structure(list(col1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), col2 = c("A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A"), col3 = c("Z", "Z", "Z", "Z", "Z", 
"Z", "Z", "Z", "Z", "Y", "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-15L))

library(dplyr)

dat %>% group_by(col1,col2) %>% distinct()

# A tibble: 2 x 3
# Groups: col1, col2 [1]
 col1 col2 col3 
 <dbl> <chr> <chr>
1 1 A Z 
2 1 A Y

In your version of dat you have a space in one of your col2 values, if that's not a typo, you'd need to fix that first so that distinct() aggregates correctly:

dat %>% mutate(col2 = trimws(col2)) %>% group_by(col1,col2) %>% distinct()

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

edited Nov 15 '18 at 0:08

answered Nov 15 '18 at 0:03

Mako212

4,2971927

answered Nov 15 '18 at 0:03

Mako212

4,2971927

answered Nov 15 '18 at 0:03

Mako212

4,2971927

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

add a comment |

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

This won't work to produce 'no rows' output when col3 value is common throughout - as the poster suggests the output should be.

– Gopala
Nov 15 '18 at 0:33

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

t1HpTsD vnwP,Ra,Y651KgV F,5j n,fq8oj7CW

搜尋此網誌

Pfthb