Mean of data with different propabilities

I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.

To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.

My implementation for this is:

import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()

The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well

mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)

But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?

As mentioned above, here is the something similar to the data set I'm using:

counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
 2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
 1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
 1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
 2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
 2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
 2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
 2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
 2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
 2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
 2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
 2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
 2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
 1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
 1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])

asked Nov 2 '18 at 13:38

Sito

192212

This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33

Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01

add a comment |

My implementation for this is:

import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()

The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well

mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)

But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?

As mentioned above, here is the something similar to the data set I'm using:

counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
 2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
 1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
 1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
 2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
 2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
 2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
 2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
 2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
 2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
 2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
 2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
 2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
 1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
 1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])

asked Nov 2 '18 at 13:38

Sito

192212

This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33

Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01

add a comment |

My implementation for this is:

import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()

The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well

mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)

But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?

As mentioned above, here is the something similar to the data set I'm using:

counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
 2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
 1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
 1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
 2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
 2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
 2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
 2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
 2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
 2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
 2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
 2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
 2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
 1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
 1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])

asked Nov 2 '18 at 13:38

Sito

192212

My implementation for this is:

import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()

The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well

mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)

But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?

As mentioned above, here is the something similar to the data set I'm using:

counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
 2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
 1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
 1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
 2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
 2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
 2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
 2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
 2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
 2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
 2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
 2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
 2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
 1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
 1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])

python scientific-computing

asked Nov 2 '18 at 13:38

Sito

192212

asked Nov 2 '18 at 13:38

Sito

192212

asked Nov 2 '18 at 13:38

Sito

192212

asked Nov 2 '18 at 13:38

Sito

192212

asked Nov 2 '18 at 13:38

Sito

192212

This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33

Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01

add a comment |

This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33

Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01

This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33

Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01

add a comment |

1 Answer
1

active

oldest

votes

My assumptions:

You have the counts of events for each channel.

You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.

numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]

Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.

prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
 prob_li.append(l * prob)

print(sum(prob_li)) # 15

If the probability of each side changes, say something like

probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]

the expected value is

prob_li = 
 for l, prob in zip(numbs, probs): 
 prob_li.append(l * prob) 

print(sum(prob_li)) # 16.13

Now suppose that you construct a matrix, and each column has a probability probs[i]

np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))

mat is a matrix with a shape (3,9). I would find the expected value as

result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46

For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.

answered Nov 13 '18 at 3:41

David

1357

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53119628%2fmean-of-data-with-different-propabilities%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

My assumptions:

You have the counts of events for each channel.

You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.

numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]

Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.

prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
 prob_li.append(l * prob)

print(sum(prob_li)) # 15

If the probability of each side changes, say something like

probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]

the expected value is

prob_li = 
 for l, prob in zip(numbs, probs): 
 prob_li.append(l * prob) 

print(sum(prob_li)) # 16.13

Now suppose that you construct a matrix, and each column has a probability probs[i]

np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))

mat is a matrix with a shape (3,9). I would find the expected value as

result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46

answered Nov 13 '18 at 3:41

David

1357

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

add a comment |

My assumptions:

You have the counts of events for each channel.

You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.

numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]

Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.

prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
 prob_li.append(l * prob)

print(sum(prob_li)) # 15

If the probability of each side changes, say something like

probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]

the expected value is

prob_li = 
 for l, prob in zip(numbs, probs): 
 prob_li.append(l * prob) 

print(sum(prob_li)) # 16.13

Now suppose that you construct a matrix, and each column has a probability probs[i]

np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))

mat is a matrix with a shape (3,9). I would find the expected value as

result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46

answered Nov 13 '18 at 3:41

David

1357

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

add a comment |

My assumptions:

You have the counts of events for each channel.

You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.

numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]

Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.

prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
 prob_li.append(l * prob)

print(sum(prob_li)) # 15

If the probability of each side changes, say something like

probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]

the expected value is

prob_li = 
 for l, prob in zip(numbs, probs): 
 prob_li.append(l * prob) 

print(sum(prob_li)) # 16.13

Now suppose that you construct a matrix, and each column has a probability probs[i]

np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))

mat is a matrix with a shape (3,9). I would find the expected value as

result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46

answered Nov 13 '18 at 3:41

David

1357

My assumptions:

You have the counts of events for each channel.

You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.

numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]

Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.

prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
 prob_li.append(l * prob)

print(sum(prob_li)) # 15

If the probability of each side changes, say something like

probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]

the expected value is

prob_li = 
 for l, prob in zip(numbs, probs): 
 prob_li.append(l * prob) 

print(sum(prob_li)) # 16.13

Now suppose that you construct a matrix, and each column has a probability probs[i]

np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))

mat is a matrix with a shape (3,9). I would find the expected value as

result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46

answered Nov 13 '18 at 3:41

David

1357

answered Nov 13 '18 at 3:41

David

1357

answered Nov 13 '18 at 3:41

David

1357

answered Nov 13 '18 at 3:41

David

1357

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

add a comment |

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ixO1kdcXGAiwJg,O84Z4Q3O5Xxa6TO,O9cfn9sAt4nVNHh0hdS7s UBkHekvM7y,hTqEbfW,Zh,v

搜尋此網誌

Pfthb