Repeated Sampling
I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?
r random statistics sampling
add a comment |
I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?
r random statistics sampling
Yes, but it's simpler to do it the first way:X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead ofrnorm
use the distribution of your choice. And you shouldset.seed(<something>)
before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38
add a comment |
I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?
r random statistics sampling
I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?
r random statistics sampling
r random statistics sampling
asked Nov 11 '18 at 23:41
Johannes Heß
1
1
Yes, but it's simpler to do it the first way:X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead ofrnorm
use the distribution of your choice. And you shouldset.seed(<something>)
before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38
add a comment |
Yes, but it's simpler to do it the first way:X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead ofrnorm
use the distribution of your choice. And you shouldset.seed(<something>)
before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38
Yes, but it's simpler to do it the first way:
X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead of rnorm
use the distribution of your choice. And you should set.seed(<something>)
before generating pseudo-random numbers.– Rui Barradas
Nov 12 '18 at 11:38
Yes, but it's simpler to do it the first way:
X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead of rnorm
use the distribution of your choice. And you should set.seed(<something>)
before generating pseudo-random numbers.– Rui Barradas
Nov 12 '18 at 11:38
add a comment |
4 Answers
4
active
oldest
votes
If you're controlling for the seed, both approaches should yield identical outcomes:
set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333
set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
add a comment |
Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.
# set seed to make reproducible
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000)
mean_x <- c(mean_x,mean(runif(1000)))
hist(mean_x)
...and the output:
add a comment |
@ Len Greski
I can also do it that way right?
a <- runif(10000000)
j <- 1
x <- NULL
while (j <= 10000000)
x <- c(x,mean(a[j:(j+999)]))
j <- j + 1000
x
hist(x)
Yes, you can also accomplish the same result with the code you posted above if you addset.seed(95014)
before thea <- runif(...)
line. You can confirm this by calculating the difference betweenx
in your code andmean_x
in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11
add a comment |
I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.
#First Method
seed <- 5554
set.seed(seed)
group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
set.seed(seed)
mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))
#Method you propose
set.seed(5554)
big_sample <- data.frame(
group=rep(1:10000, each=1000),
samples=rnorm(10000 * 1000, 0, 1)
)
group_means_2 <- aggregate(samples ~ group,
FUN = mean,
data=big_sample)
mean_of_means_2 <- mean(group_means_2$samples)
#comparison
mean_of_means_1 == mean_of_means_2
t.test(group_of_means_1, group_means_2$samples)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254368%2frepeated-sampling%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you're controlling for the seed, both approaches should yield identical outcomes:
set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333
set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
add a comment |
If you're controlling for the seed, both approaches should yield identical outcomes:
set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333
set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
add a comment |
If you're controlling for the seed, both approaches should yield identical outcomes:
set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333
set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333
If you're controlling for the seed, both approaches should yield identical outcomes:
set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333
set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333
answered Nov 11 '18 at 23:59
12b345b6b78
767115
767115
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
add a comment |
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21
add a comment |
Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.
# set seed to make reproducible
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000)
mean_x <- c(mean_x,mean(runif(1000)))
hist(mean_x)
...and the output:
add a comment |
Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.
# set seed to make reproducible
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000)
mean_x <- c(mean_x,mean(runif(1000)))
hist(mean_x)
...and the output:
add a comment |
Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.
# set seed to make reproducible
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000)
mean_x <- c(mean_x,mean(runif(1000)))
hist(mean_x)
...and the output:
Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.
# set seed to make reproducible
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000)
mean_x <- c(mean_x,mean(runif(1000)))
hist(mean_x)
...and the output:
answered Nov 12 '18 at 0:15
Len Greski
3,1201421
3,1201421
add a comment |
add a comment |
@ Len Greski
I can also do it that way right?
a <- runif(10000000)
j <- 1
x <- NULL
while (j <= 10000000)
x <- c(x,mean(a[j:(j+999)]))
j <- j + 1000
x
hist(x)
Yes, you can also accomplish the same result with the code you posted above if you addset.seed(95014)
before thea <- runif(...)
line. You can confirm this by calculating the difference betweenx
in your code andmean_x
in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11
add a comment |
@ Len Greski
I can also do it that way right?
a <- runif(10000000)
j <- 1
x <- NULL
while (j <= 10000000)
x <- c(x,mean(a[j:(j+999)]))
j <- j + 1000
x
hist(x)
Yes, you can also accomplish the same result with the code you posted above if you addset.seed(95014)
before thea <- runif(...)
line. You can confirm this by calculating the difference betweenx
in your code andmean_x
in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11
add a comment |
@ Len Greski
I can also do it that way right?
a <- runif(10000000)
j <- 1
x <- NULL
while (j <= 10000000)
x <- c(x,mean(a[j:(j+999)]))
j <- j + 1000
x
hist(x)
@ Len Greski
I can also do it that way right?
a <- runif(10000000)
j <- 1
x <- NULL
while (j <= 10000000)
x <- c(x,mean(a[j:(j+999)]))
j <- j + 1000
x
hist(x)
answered Nov 12 '18 at 1:32
Johannes Heß
1
1
Yes, you can also accomplish the same result with the code you posted above if you addset.seed(95014)
before thea <- runif(...)
line. You can confirm this by calculating the difference betweenx
in your code andmean_x
in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11
add a comment |
Yes, you can also accomplish the same result with the code you posted above if you addset.seed(95014)
before thea <- runif(...)
line. You can confirm this by calculating the difference betweenx
in your code andmean_x
in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11
Yes, you can also accomplish the same result with the code you posted above if you add
set.seed(95014)
before the a <- runif(...)
line. You can confirm this by calculating the difference between x
in your code and mean_x
in mine. The result will be a vector of zeroes.– Len Greski
Nov 22 '18 at 15:11
Yes, you can also accomplish the same result with the code you posted above if you add
set.seed(95014)
before the a <- runif(...)
line. You can confirm this by calculating the difference between x
in your code and mean_x
in mine. The result will be a vector of zeroes.– Len Greski
Nov 22 '18 at 15:11
add a comment |
I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.
#First Method
seed <- 5554
set.seed(seed)
group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
set.seed(seed)
mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))
#Method you propose
set.seed(5554)
big_sample <- data.frame(
group=rep(1:10000, each=1000),
samples=rnorm(10000 * 1000, 0, 1)
)
group_means_2 <- aggregate(samples ~ group,
FUN = mean,
data=big_sample)
mean_of_means_2 <- mean(group_means_2$samples)
#comparison
mean_of_means_1 == mean_of_means_2
t.test(group_of_means_1, group_means_2$samples)
add a comment |
I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.
#First Method
seed <- 5554
set.seed(seed)
group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
set.seed(seed)
mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))
#Method you propose
set.seed(5554)
big_sample <- data.frame(
group=rep(1:10000, each=1000),
samples=rnorm(10000 * 1000, 0, 1)
)
group_means_2 <- aggregate(samples ~ group,
FUN = mean,
data=big_sample)
mean_of_means_2 <- mean(group_means_2$samples)
#comparison
mean_of_means_1 == mean_of_means_2
t.test(group_of_means_1, group_means_2$samples)
add a comment |
I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.
#First Method
seed <- 5554
set.seed(seed)
group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
set.seed(seed)
mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))
#Method you propose
set.seed(5554)
big_sample <- data.frame(
group=rep(1:10000, each=1000),
samples=rnorm(10000 * 1000, 0, 1)
)
group_means_2 <- aggregate(samples ~ group,
FUN = mean,
data=big_sample)
mean_of_means_2 <- mean(group_means_2$samples)
#comparison
mean_of_means_1 == mean_of_means_2
t.test(group_of_means_1, group_means_2$samples)
I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.
#First Method
seed <- 5554
set.seed(seed)
group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
set.seed(seed)
mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))
#Method you propose
set.seed(5554)
big_sample <- data.frame(
group=rep(1:10000, each=1000),
samples=rnorm(10000 * 1000, 0, 1)
)
group_means_2 <- aggregate(samples ~ group,
FUN = mean,
data=big_sample)
mean_of_means_2 <- mean(group_means_2$samples)
#comparison
mean_of_means_1 == mean_of_means_2
t.test(group_of_means_1, group_means_2$samples)
answered Nov 12 '18 at 5:47
Kgrey
1613
1613
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254368%2frepeated-sampling%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Yes, but it's simpler to do it the first way:
X <- replicate(10000, rnorm(1000)); colMeans(X)
. Instead ofrnorm
use the distribution of your choice. And you shouldset.seed(<something>)
before generating pseudo-random numbers.– Rui Barradas
Nov 12 '18 at 11:38