pandas - vectorized formula computation with nans

up vote
0
down vote

favorite

I have a DataFrame (Called signal) that is a simple timeseries with 5 columns. This is what its .describe() looks like:

 ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630

I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:

weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )

However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows(), but this is not an efficient solution.

Are there any smart solutions to this problem?

asked Nov 9 at 22:09

Évariste Galois

30912

What are you trying to do?
– coldspeed
Nov 9 at 22:12

For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13

My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14

The formula for 1 row is: w = (-1/N) * (r_i - r_mean), where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24

When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06

add a comment |

up vote
0
down vote

favorite

I have a DataFrame (Called signal) that is a simple timeseries with 5 columns. This is what its .describe() looks like:

 ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630

I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:

weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )

Are there any smart solutions to this problem?

asked Nov 9 at 22:09

Évariste Galois

30912

What are you trying to do?
– coldspeed
Nov 9 at 22:12

For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13

My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14

The formula for 1 row is: w = (-1/N) * (r_i - r_mean), where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24

When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06

add a comment |

up vote
0
down vote

favorite

I have a DataFrame (Called signal) that is a simple timeseries with 5 columns. This is what its .describe() looks like:

 ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630

I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:

weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )

Are there any smart solutions to this problem?

asked Nov 9 at 22:09

Évariste Galois

30912

I have a DataFrame (Called signal) that is a simple timeseries with 5 columns. This is what its .describe() looks like:

 ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630

I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:

weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )

Are there any smart solutions to this problem?

python pandas numpy

asked Nov 9 at 22:09

Évariste Galois

30912

asked Nov 9 at 22:09

Évariste Galois

30912

asked Nov 9 at 22:09

Évariste Galois

30912

asked Nov 9 at 22:09

Évariste Galois

30912

asked Nov 9 at 22:09

Évariste Galois

30912

What are you trying to do?
– coldspeed
Nov 9 at 22:12

For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13

My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14

The formula for 1 row is: w = (-1/N) * (r_i - r_mean), where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24

When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06

add a comment |

What are you trying to do?
– coldspeed
Nov 9 at 22:12

For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13

My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14

The formula for 1 row is: w = (-1/N) * (r_i - r_mean), where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24

When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06

What are you trying to do?
– coldspeed
Nov 9 at 22:12

For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13

My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14

The formula for 1 row is: w = (-1/N) * (r_i - r_mean), where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24

When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

The thing is, the pandas mean and sum methods already exclude NaN values by default (see the description of the skipna keyword in the linked docs). Additionally, subtract and divide allow for the use of a fill_value keyword arg:

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.

However, I suspect that the default behavior (NaN is ignored in mean and sum, NaN - anything = NaN, NaNanything = NaN) is what you actually want. In that case, your problem isn't directly related to NaNs, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233893%2fpandas-vectorized-formula-computation-with-nans%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

add a comment |

up vote
2
down vote

accepted

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

add a comment |

up vote
2
down vote

accepted

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

fill_value : None or float value, default None

Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing

So you may be able to get what you want by setting fill_value=0 in the calls to subtract, and fill_value=1 in the calls to divide.

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

edited Nov 10 at 2:18

answered Nov 10 at 2:09

tel

3,3011427

answered Nov 10 at 2:09

tel

3,3011427

answered Nov 10 at 2:09

tel

3,3011427

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb