R remove rows from dataframe after occurrences of a value reach a limit
up vote
-4
down vote
favorite
I have an R dataframe sorted by the first value.
There are many different rows with each first value.
I want to keep the first 200 rows with each first value, and remove all the others.
So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.
Thanks in advance...
r rows counting
add a comment |
up vote
-4
down vote
favorite
I have an R dataframe sorted by the first value.
There are many different rows with each first value.
I want to keep the first 200 rows with each first value, and remove all the others.
So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.
Thanks in advance...
r rows counting
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something likedo.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43
add a comment |
up vote
-4
down vote
favorite
up vote
-4
down vote
favorite
I have an R dataframe sorted by the first value.
There are many different rows with each first value.
I want to keep the first 200 rows with each first value, and remove all the others.
So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.
Thanks in advance...
r rows counting
I have an R dataframe sorted by the first value.
There are many different rows with each first value.
I want to keep the first 200 rows with each first value, and remove all the others.
So for example if I start with 300
"1 whatever..." rows and 400
"2 whatever..." rows,
what I want is 400 rows: the first 200 "1" rows, then the first 200 "2" rows.
Thanks in advance...
r rows counting
r rows counting
asked Nov 11 at 3:18
Phil Rennert
1
1
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something likedo.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43
add a comment |
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.
– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something likedo.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).
– r2evans
Nov 11 at 4:43
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,
dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.– r2evans
Nov 11 at 4:43
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,
dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something like
do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something like
do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).– r2evans
Nov 11 at 4:43
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.
Here is a little example I made up using the dplyr package:
library(dplyr) # group_by() and top_n()
library(magrittr) # %>% - piping function
data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)
subdata <- data %>%
group_by(X) %>%
top_n(200)
This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using thedput(data)command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245560%2fr-remove-rows-from-dataframe-after-occurrences-of-a-value-reach-a-limit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.
Here is a little example I made up using the dplyr package:
library(dplyr) # group_by() and top_n()
library(magrittr) # %>% - piping function
data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)
subdata <- data %>%
group_by(X) %>%
top_n(200)
This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using thedput(data)command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
add a comment |
up vote
1
down vote
Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.
Here is a little example I made up using the dplyr package:
library(dplyr) # group_by() and top_n()
library(magrittr) # %>% - piping function
data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)
subdata <- data %>%
group_by(X) %>%
top_n(200)
This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using thedput(data)command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
add a comment |
up vote
1
down vote
up vote
1
down vote
Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.
Here is a little example I made up using the dplyr package:
library(dplyr) # group_by() and top_n()
library(magrittr) # %>% - piping function
data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)
subdata <- data %>%
group_by(X) %>%
top_n(200)
This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.
Please make answers reproducible in the future and also include information about what steps you have already tried. Example data is another useful tool to help us answer you more quickly.
Here is a little example I made up using the dplyr package:
library(dplyr) # group_by() and top_n()
library(magrittr) # %>% - piping function
data <- data.frame(X=c(rep(1,300),rep(2,300)), Y=1:600)
subdata <- data %>%
group_by(X) %>%
top_n(200)
This will end with 400 rows, 200 '1' rows and 200 '2' rows. Let me know if you have any issues.
edited Nov 11 at 10:17
answered Nov 11 at 10:09
RAB
50015
50015
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using thedput(data)command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
add a comment |
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using thedput(data)command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue
– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thanks; this worked, partly. I did everything you said, working with my dataframe; but when I exported subdata and looked at it, there were 247 lines with the first value of userID (the first column in my dataframe), then 222 lines with the next value of userID, then 215 lines with the next value, then 235 with the next, etc. So this is pruning the number of rows for each userID, but not uniformly. I haven't used dplyr before, and I don't know why.
– Phil Rennert
Nov 12 at 17:32
Thats weird, can you show me your data using the
dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue– RAB
Nov 12 at 20:23
Thats weird, can you show me your data using the
dput(data) command? There could be some weirdness going on if some things are factors or something, but I can do some testing to see if I can reproduce your issue– RAB
Nov 12 at 20:23
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
Okay, thanks. I output the file with dput. It's 219K, 3400 lines (original is 12,600 lines). I can't just attach it, can I? The first lines look like: structure(list(userID = c(78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, 78L, and so no for all the lines for user 78, then the next one... can I get it to you? – Phil Rennert 5 mins ago
– Phil Rennert
Nov 14 at 22:16
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245560%2fr-remove-rows-from-dataframe-after-occurrences-of-a-value-reach-a-limit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample data (e.g.,
dput(head(x))), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.– r2evans
Nov 11 at 4:43
Though if I had to guess, it could be something like
do.call(rbind.data.frame, by(mtcars, mtcars$cyl, head, n=3)).– r2evans
Nov 11 at 4:43