Mean of data with different propabilities
I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.
To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.
My implementation for this is:
import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels
mean = sum((channels+1)/2 * counts)/len(counts)
plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()
The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well
mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)
But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?
As mentioned above, here is the something similar to the data set I'm using:
counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])
python scientific-computing
add a comment |
I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.
To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.
My implementation for this is:
import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels
mean = sum((channels+1)/2 * counts)/len(counts)
plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()
The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well
mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)
But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?
As mentioned above, here is the something similar to the data set I'm using:
counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])
python scientific-computing
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01
add a comment |
I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.
To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.
My implementation for this is:
import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels
mean = sum((channels+1)/2 * counts)/len(counts)
plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()
The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well
mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)
But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?
As mentioned above, here is the something similar to the data set I'm using:
counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])
python scientific-computing
I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.
To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.
My implementation for this is:
import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels
mean = sum((channels+1)/2 * counts)/len(counts)
plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()
The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well
mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)
But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?
As mentioned above, here is the something similar to the data set I'm using:
counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])
python scientific-computing
python scientific-computing
asked Nov 2 '18 at 13:38
SitoSito
192212
192212
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01
add a comment |
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01
add a comment |
1 Answer
1
active
oldest
votes
My assumptions:
- You have the counts of events for each channel.
- You know the probability of each channel.
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li =
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li =
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53119628%2fmean-of-data-with-different-propabilities%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
My assumptions:
- You have the counts of events for each channel.
- You know the probability of each channel.
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li =
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li =
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
add a comment |
My assumptions:
- You have the counts of events for each channel.
- You know the probability of each channel.
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li =
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li =
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
add a comment |
My assumptions:
- You have the counts of events for each channel.
- You know the probability of each channel.
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li =
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li =
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
My assumptions:
- You have the counts of events for each channel.
- You know the probability of each channel.
Suppose you have a nice dice with nine sides, each side has a number.
numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]
Each number has the same probability: 1/9
. You may ask, what is the expected value of the dice? Well, with Python it is easy.
prob_li =
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)
print(sum(prob_li)) # 15
If the probability of each side changes, say something like
probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]
the expected value is
prob_li =
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)
print(sum(prob_li)) # 16.13
Now suppose that you construct a matrix, and each column has a probability probs[i]
np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))
mat
is a matrix with a shape (3,9). I would find the expected value as
result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46
For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.
answered Nov 13 '18 at 3:41
DavidDavid
1357
1357
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
add a comment |
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!
– Sito
Nov 14 '18 at 14:06
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53119628%2fmean-of-data-with-different-propabilities%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.
– Mr. T
Nov 2 '18 at 14:33
Could you please share the number of channels and the probability of each channel?
– David
Nov 12 '18 at 18:01