Mean of data with different propabilities










1















I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.



To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.



My implementation for this is:



import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()


The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well



mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)


But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?




As mentioned above, here is the something similar to the data set I'm using:



counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])









share|improve this question






















  • This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

    – Mr. T
    Nov 2 '18 at 14:33











  • Could you please share the number of channels and the probability of each channel?

    – David
    Nov 12 '18 at 18:01















1















I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.



To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.



My implementation for this is:



import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()


The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well



mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)


But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?




As mentioned above, here is the something similar to the data set I'm using:



counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])









share|improve this question






















  • This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

    – Mr. T
    Nov 2 '18 at 14:33











  • Could you please share the number of channels and the probability of each channel?

    – David
    Nov 12 '18 at 18:01













1












1








1








I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.



To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.



My implementation for this is:



import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()


The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well



mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)


But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?




As mentioned above, here is the something similar to the data set I'm using:



counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])









share|improve this question














I'm currently facing the following problem:
During an experiment I gathered the counts of events per channel (detector). Plotting the counts against the channels gives almost a normal distribution. I'd now like to calculate the mean of this data set. The problem is that not all of the events that generated the data occur with the same probability, but the probability for each channel is known.



To me this situation seems very similar to wanting to calculate the mean of a histogram, therefore I would take the middle value of the channels, multiply it by the corresponding value of the channel, sum all the values up and then divide by the total number of channels.



My implementation for this is:



import numpy as np
import matplotlib.pyplot as plt
counts = ... # see at the end of the post for the data set in question
channels = np.arange(1,len(counts)+1)
channel_probability = .... # probability for different parts of channels

mean = sum((channels+1)/2 * counts)/len(counts)

plt.figure()
plt.plot(counts, channels)
plt.stem([mean], [100])
plt.xlabel("channels")
plt.ylabel("counts")
plt.show()


The problem is that this assumes the same probability for all the events... Therefore I tried the naive approach of just multiplying the probability as well



mean = sum((channels+1)/2 * counts * channels_probability)/len(counts)


But this of course only led to completely unreasonable results... So, can someone maybe explain how I would find the mean of such a distribution and how to calculate it?




As mentioned above, here is the something similar to the data set I'm using:



counts = np.array([2.05209753 2.07860064 2.06269877 2.0706497 2.07595033 2.03619567
2.03619567 2.06269877 2.02029381 2.00439194 2.01499318 1.9937907
1.98583977 1.99909132 1.99909132 2.00439194 1.98583977 1.98849008
1.99644101 2.01499318 2.00439194 2.0176435 2.02824474 1.99909132
2.00174163 2.03354536 2.05474784 2.05474784 2.04944722 2.11305467
2.07330002 2.13955778 2.18461305 2.19256399 2.21906709 2.25617144
2.23496895 2.25617144 2.31182796 2.32772982 2.36483417 2.3992882
2.42844162 2.49734969 2.56890807 2.56095714 2.59541118 2.59541118
2.63516583 2.68817204 2.6272149 2.66961987 2.6272149 2.66961987
2.60336211 2.62191428 2.56890807 2.5503559 2.53975466 2.52385279
2.45229441 2.42844162 2.39133727 2.29592609 2.27737392 2.26147206
2.21906709 2.14220809 2.17666212 2.09185219 2.03619567 2.02824474
2.05209753 2.00439194 1.97788884 1.97788884 1.9672876 1.96463729
1.96993791 1.95403604 1.94608511 1.9434348 1.9434348 1.93548387
1.93813418 1.9434348 1.94078449 1.93813418 1.94078449 1.9434348])






python scientific-computing






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 2 '18 at 13:38









SitoSito

192212




192212












  • This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

    – Mr. T
    Nov 2 '18 at 14:33











  • Could you please share the number of channels and the probability of each channel?

    – David
    Nov 12 '18 at 18:01

















  • This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

    – Mr. T
    Nov 2 '18 at 14:33











  • Could you please share the number of channels and the probability of each channel?

    – David
    Nov 12 '18 at 18:01
















This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33





This sounds like a question for CrossValidated because it is rather a question about the statistical concept than its implementation into a script.

– Mr. T
Nov 2 '18 at 14:33













Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01





Could you please share the number of channels and the probability of each channel?

– David
Nov 12 '18 at 18:01












1 Answer
1






active

oldest

votes


















1














My assumptions:



  • You have the counts of events for each channel.

  • You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.



numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]



Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.



prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)

print(sum(prob_li)) # 15


If the probability of each side changes, say something like



probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]



the expected value is



prob_li = 
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)

print(sum(prob_li)) # 16.13


Now suppose that you construct a matrix, and each column has a probability probs[i]



np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))


mat is a matrix with a shape (3,9). I would find the expected value as



result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46


For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.






share|improve this answer























  • Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

    – Sito
    Nov 14 '18 at 14:06










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53119628%2fmean-of-data-with-different-propabilities%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














My assumptions:



  • You have the counts of events for each channel.

  • You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.



numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]



Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.



prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)

print(sum(prob_li)) # 15


If the probability of each side changes, say something like



probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]



the expected value is



prob_li = 
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)

print(sum(prob_li)) # 16.13


Now suppose that you construct a matrix, and each column has a probability probs[i]



np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))


mat is a matrix with a shape (3,9). I would find the expected value as



result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46


For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.






share|improve this answer























  • Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

    – Sito
    Nov 14 '18 at 14:06















1














My assumptions:



  • You have the counts of events for each channel.

  • You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.



numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]



Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.



prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)

print(sum(prob_li)) # 15


If the probability of each side changes, say something like



probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]



the expected value is



prob_li = 
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)

print(sum(prob_li)) # 16.13


Now suppose that you construct a matrix, and each column has a probability probs[i]



np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))


mat is a matrix with a shape (3,9). I would find the expected value as



result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46


For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.






share|improve this answer























  • Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

    – Sito
    Nov 14 '18 at 14:06













1












1








1







My assumptions:



  • You have the counts of events for each channel.

  • You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.



numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]



Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.



prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)

print(sum(prob_li)) # 15


If the probability of each side changes, say something like



probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]



the expected value is



prob_li = 
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)

print(sum(prob_li)) # 16.13


Now suppose that you construct a matrix, and each column has a probability probs[i]



np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))


mat is a matrix with a shape (3,9). I would find the expected value as



result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46


For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.






share|improve this answer













My assumptions:



  • You have the counts of events for each channel.

  • You know the probability of each channel.

Suppose you have a nice dice with nine sides, each side has a number.



numbs = [10, 24, 26, 8, 17, 6, 9, 15, 20]



Each number has the same probability: 1/9. You may ask, what is the expected value of the dice? Well, with Python it is easy.



prob_li = 
for l, prob in zip(numbs, [1/9] * 9):
prob_li.append(l * prob)

print(sum(prob_li)) # 15


If the probability of each side changes, say something like



probs = [1/9, 1/9, 1/9, 1/9, 1/9, 1/10, 1/20, 1/20, 11/45]



the expected value is



prob_li = 
for l, prob in zip(numbs, probs):
prob_li.append(l * prob)

print(sum(prob_li)) # 16.13


Now suppose that you construct a matrix, and each column has a probability probs[i]



np.random.seed(4)
mat = np.random.randint(6, 20, size=(3, 9))


mat is a matrix with a shape (3,9). I would find the expected value as



result = mat * probs
print(sum(mat.mean(axis=0) * probs)) #12.82
print(sum(result.sum(axis=0) * probs)) #38.46


For me, 12.82 has more sense than 38.46. Besides, you said that plotting the counts against the channels gives almost a normal distribution, you would only need to find the mean of each channel and then the expected value.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 3:41









DavidDavid

1357




1357












  • Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

    – Sito
    Nov 14 '18 at 14:06

















  • Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

    – Sito
    Nov 14 '18 at 14:06
















Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06





Sorry for not answering to your comment, didn't have time till now. Your solution produces reasonable results for me, so I marked it as answered. Thank you very much!

– Sito
Nov 14 '18 at 14:06

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53119628%2fmean-of-data-with-different-propabilities%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo