pandas - vectorized formula computation with nans
up vote
0
down vote
favorite
I have a DataFrame (Called signal
) that is a simple timeseries with 5 columns. This is what its .describe()
looks like:
ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630
I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:
weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )
However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows()
, but this is not an efficient solution.
Are there any smart solutions to this problem?
python pandas numpy
add a comment |
up vote
0
down vote
favorite
I have a DataFrame (Called signal
) that is a simple timeseries with 5 columns. This is what its .describe()
looks like:
ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630
I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:
weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )
However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows()
, but this is not an efficient solution.
Are there any smart solutions to this problem?
python pandas numpy
What are you trying to do?
– coldspeed
Nov 9 at 22:12
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
The formula for 1 row is:w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a DataFrame (Called signal
) that is a simple timeseries with 5 columns. This is what its .describe()
looks like:
ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630
I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:
weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )
However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows()
, but this is not an efficient solution.
Are there any smart solutions to this problem?
python pandas numpy
I have a DataFrame (Called signal
) that is a simple timeseries with 5 columns. This is what its .describe()
looks like:
ES NK NQ YM
count 5294.000000 6673.000000 4798.000000 3415.000000
mean -0.000340 0.000074 -0.000075 -0.000420
std 0.016726 0.018401 0.023868 0.015399
min -0.118724 -0.156342 -0.144667 -0.103101
25% -0.008862 -0.010297 -0.011481 -0.008162
50% -0.001422 -0.000590 -0.001747 -0.001324
75% 0.007069 0.009163 0.009841 0.006304
max 0.156365 0.192686 0.181245 0.132630
I want to apply a simple function on every single row, and receive back a matrix with the same dimensions:
weights = -2*signal.subtract( signal.mean(axis=1), axis=0).divide( signal.sub( signal.mean(axis=1), axis=0).abs().sum(axis=1), axis=0 )
However, when I run this line, the program gets stuck. I believe this issue comes from the difference in length/presence of nans. Dropping the nans/filling it is not an option, for any given row that has a nan I want that nan to simply be excluded from the computation. A temporary solution would be to do this iteratively using .iterrows()
, but this is not an efficient solution.
Are there any smart solutions to this problem?
python pandas numpy
python pandas numpy
asked Nov 9 at 22:09
Évariste Galois
30912
30912
What are you trying to do?
– coldspeed
Nov 9 at 22:12
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
The formula for 1 row is:w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06
add a comment |
What are you trying to do?
– coldspeed
Nov 9 at 22:12
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
The formula for 1 row is:w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.
– Évariste Galois
Nov 9 at 22:24
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06
What are you trying to do?
– coldspeed
Nov 9 at 22:12
What are you trying to do?
– coldspeed
Nov 9 at 22:12
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
The formula for 1 row is:
w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.– Évariste Galois
Nov 9 at 22:24
The formula for 1 row is:
w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.– Évariste Galois
Nov 9 at 22:24
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
The thing is, the pandas mean
and sum
methods already exclude NaN
values by default (see the description of the skipna
keyword in the linked docs). Additionally, subtract
and divide
allow for the use of a fill_value
keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0
in the calls to subtract
, and fill_value=1
in the calls to divide
.
However, I suspect that the default behavior (NaN
is ignored in mean
and sum
, NaN - anything = NaN
, NaNanything = NaN
) is what you actually want. In that case, your problem isn't directly related to NaN
s, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
The thing is, the pandas mean
and sum
methods already exclude NaN
values by default (see the description of the skipna
keyword in the linked docs). Additionally, subtract
and divide
allow for the use of a fill_value
keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0
in the calls to subtract
, and fill_value=1
in the calls to divide
.
However, I suspect that the default behavior (NaN
is ignored in mean
and sum
, NaN - anything = NaN
, NaNanything = NaN
) is what you actually want. In that case, your problem isn't directly related to NaN
s, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.
add a comment |
up vote
2
down vote
accepted
The thing is, the pandas mean
and sum
methods already exclude NaN
values by default (see the description of the skipna
keyword in the linked docs). Additionally, subtract
and divide
allow for the use of a fill_value
keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0
in the calls to subtract
, and fill_value=1
in the calls to divide
.
However, I suspect that the default behavior (NaN
is ignored in mean
and sum
, NaN - anything = NaN
, NaNanything = NaN
) is what you actually want. In that case, your problem isn't directly related to NaN
s, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
The thing is, the pandas mean
and sum
methods already exclude NaN
values by default (see the description of the skipna
keyword in the linked docs). Additionally, subtract
and divide
allow for the use of a fill_value
keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0
in the calls to subtract
, and fill_value=1
in the calls to divide
.
However, I suspect that the default behavior (NaN
is ignored in mean
and sum
, NaN - anything = NaN
, NaNanything = NaN
) is what you actually want. In that case, your problem isn't directly related to NaN
s, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.
The thing is, the pandas mean
and sum
methods already exclude NaN
values by default (see the description of the skipna
keyword in the linked docs). Additionally, subtract
and divide
allow for the use of a fill_value
keyword arg:
fill_value : None or float value, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing
So you may be able to get what you want by setting fill_value=0
in the calls to subtract
, and fill_value=1
in the calls to divide
.
However, I suspect that the default behavior (NaN
is ignored in mean
and sum
, NaN - anything = NaN
, NaNanything = NaN
) is what you actually want. In that case, your problem isn't directly related to NaN
s, and you're going to have to clarify your statement "when I run this line, the program gets stuck" in order to get a useful answer.
edited Nov 10 at 2:18
answered Nov 10 at 2:09
tel
3,3011427
3,3011427
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233893%2fpandas-vectorized-formula-computation-with-nans%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What are you trying to do?
– coldspeed
Nov 9 at 22:12
For every row, I want to apply a formula on that row that gives me a new value for each column. I want to expand that operation downwards without looping over every row.
– Évariste Galois
Nov 9 at 22:13
My question is what is that operation? Perhaps a sickit learn package will do it better.
– coldspeed
Nov 9 at 22:14
The formula for 1 row is:
w = (-1/N) * (r_i - r_mean)
, where N is the number of non-nan values in the row, r_i is each column value, and r_mean is the average across the columns.– Évariste Galois
Nov 9 at 22:24
When you say "the program gets stuck", do you mean that it raises an error? If so, could you please include the text of the error message in your question?
– tel
Nov 10 at 2:06