R remove rows from dataframe after occurrences of a value reach a limit









up vote
-4
down vote

favorite












I have an R dataframe sorted by the first value.



There are many different rows with each first value.



I want to keep the first 200 rows with each first value, and remove all the others.



So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.



Thanks in advance...










share|improve this question





















  • Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
    – r2evans
    Nov 11 at 4:43










  • Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
    – r2evans
    Nov 11 at 4:43














up vote
-4
down vote

favorite












I have an R dataframe sorted by the first value.



There are many different rows with each first value.



I want to keep the first 200 rows with each first value, and remove all the others.



So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.



Thanks in advance...










share|improve this question





















  • Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
    – r2evans
    Nov 11 at 4:43










  • Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
    – r2evans
    Nov 11 at 4:43












up vote
-4
down vote

favorite









up vote
-4
down vote

favorite











I have an R dataframe sorted by the first value.



There are many different rows with each first value.



I want to keep the first 200 rows with each first value, and remove all the others.



So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.



Thanks in advance...










share|improve this question













I have an R dataframe sorted by the first value.



There are many different rows with each first value.



I want to keep the first 200 rows with each first value, and remove all the others.



So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.



Thanks in advance...







r rows counting






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 3:18









Phil Rennert

1




1











  • Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
    – r2evans
    Nov 11 at 4:43










  • Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
    – r2evans
    Nov 11 at 4:43
















  • Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
    – r2evans
    Nov 11 at 4:43










  • Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
    – r2evans
    Nov 11 at 4:43















Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43




Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g., dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43












Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43




Though if I had to guess, it could be something like do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43












1 Answer
1






active

oldest

votes

















up vote
1
down vote













Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.



Here is a little example I made up using the dplyr package:



library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
group_by(X) %>%
top_n(200)


This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.






share|improve this answer






















  • Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
    – Phil Rennert
    Nov 12 at 17:32










  • Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
    – RAB
    Nov 12 at 20:23










  • Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
    – Phil Rennert
    Nov 14 at 22:16










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245560%2fr-remove-rows-from-dataframe-after-occurrences-of-a-value-reach-a-limit%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.



Here is a little example I made up using the dplyr package:



library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
group_by(X) %>%
top_n(200)


This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.






share|improve this answer






















  • Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
    – Phil Rennert
    Nov 12 at 17:32










  • Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
    – RAB
    Nov 12 at 20:23










  • Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
    – Phil Rennert
    Nov 14 at 22:16














up vote
1
down vote













Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.



Here is a little example I made up using the dplyr package:



library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
group_by(X) %>%
top_n(200)


This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.






share|improve this answer






















  • Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
    – Phil Rennert
    Nov 12 at 17:32










  • Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
    – RAB
    Nov 12 at 20:23










  • Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
    – Phil Rennert
    Nov 14 at 22:16












up vote
1
down vote










up vote
1
down vote









Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.



Here is a little example I made up using the dplyr package:



library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
group_by(X) %>%
top_n(200)


This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.






share|improve this answer














Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.



Here is a little example I made up using the dplyr package:



library(dplyr) # group_by() and top_n() 
library(magrittr) # %>% - piping function

data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)

subdata <- data %>%
group_by(X) %>%
top_n(200)


This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 11 at 10:17

























answered Nov 11 at 10:09









RAB

50015




50015











  • Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
    – Phil Rennert
    Nov 12 at 17:32










  • Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
    – RAB
    Nov 12 at 20:23










  • Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
    – Phil Rennert
    Nov 14 at 22:16
















  • Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
    – Phil Rennert
    Nov 12 at 17:32










  • Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
    – RAB
    Nov 12 at 20:23










  • Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
    – Phil Rennert
    Nov 14 at 22:16















Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32




Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32












Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23




Thats weird, can you show me your data using the dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23












Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16




Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245560%2fr-remove-rows-from-dataframe-after-occurrences-of-a-value-reach-a-limit%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kleinkühnau

Makov (Slowakei)

Deutsches Schauspielhaus