R remove rows from dataframe after occurrences of a value reach a limit

up vote
-4
down vote

favorite

I have an R dataframe sorted by the first value.

There are many different rows with each first value.

I want to keep the first 200 rows with each first value, and remove all the others.

So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.

Thanks in advance...

asked Nov 11 at 3:18

Phil Rennert

Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43

Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43

add a comment |

up vote
-4
down vote

favorite

I have an R dataframe sorted by the first value.

There are many different rows with each first value.

I want to keep the first 200 rows with each first value, and remove all the others.

So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.

Thanks in advance...

asked Nov 11 at 3:18

Phil Rennert

Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43

Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43

add a comment |

up vote
-4
down vote

favorite

I have an R dataframe sorted by the first value.

There are many different rows with each first value.

I want to keep the first 200 rows with each first value, and remove all the others.

So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.

Thanks in advance...

asked Nov 11 at 3:18

Phil Rennert

I have an R dataframe sorted by the first value.

There are many different rows with each first value.

I want to keep the first 200 rows with each first value, and remove all the others.

So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.

Thanks in advance...

r rows counting

asked Nov 11 at 3:18

Phil Rennert

asked Nov 11 at 3:18

Phil Rennert

asked Nov 11 at 3:18

Phil Rennert

asked Nov 11 at 3:18

Phil Rennert

asked Nov 11 at 3:18

Phil Rennert

Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43

Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43

add a comment |

Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43

Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43

Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43

Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.

Here is a little example I made up using the dplyr package:

library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
 group_by(X) %>%
 top_n(200)

This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245560%2fr-remove-rows-from-dataframe-after-occurrences-of-a-value-reach-a-limit%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.

Here is a little example I made up using the dplyr package:

library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
 group_by(X) %>%
 top_n(200)

This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

add a comment |

up vote
1
down vote

Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.

Here is a little example I made up using the dplyr package:

library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
 group_by(X) %>%
 top_n(200)

This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

add a comment |

up vote
1
down vote

Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.

Here is a little example I made up using the dplyr package:

library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
 group_by(X) %>%
 top_n(200)

This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.

Here is a little example I made up using the dplyr package:

library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
 group_by(X) %>%
 top_n(200)

This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

edited Nov 11 at 10:17

answered Nov 11 at 10:09

RAB

50015

answered Nov 11 at 10:09

RAB

50015

answered Nov 11 at 10:09

RAB

50015

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

add a comment |

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32

Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23

Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb