Conditional Replace Pandas
I'm probably doing something very stupid, but I'm stumped.
I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
df2 = df.my_channel
df2[df2 > 20000] = 0
this does exactly what I want, but seems not to work with the channel as part of the original dataframe.
python replace pandas conditional
add a comment |
I'm probably doing something very stupid, but I'm stumped.
I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
df2 = df.my_channel
df2[df2 > 20000] = 0
this does exactly what I want, but seems not to work with the channel as part of the original dataframe.
python replace pandas conditional
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21
add a comment |
I'm probably doing something very stupid, but I'm stumped.
I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
df2 = df.my_channel
df2[df2 > 20000] = 0
this does exactly what I want, but seems not to work with the channel as part of the original dataframe.
python replace pandas conditional
I'm probably doing something very stupid, but I'm stumped.
I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
df2 = df.my_channel
df2[df2 > 20000] = 0
this does exactly what I want, but seems not to work with the channel as part of the original dataframe.
python replace pandas conditional
python replace pandas conditional
edited Aug 17 '18 at 3:03
lmiguelvargasf
11.7k1085104
11.7k1085104
asked Feb 6 '14 at 16:16
BMichellBMichell
67221021
67221021
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21
add a comment |
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21
add a comment |
5 Answers
5
active
oldest
votes
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
Can you useiloc
with this kind of mask? It doesn't seem to work for me (althoughloc
works fine). Ifiloc
doesn't work in this case maybe it's worth clarifying thatloc
should replaceix
to solve this problem, and in other situations might be replaced byiloc
?
– LangeHaare
Jul 14 '17 at 16:26
2
@LangeHaare, I just tried what you said, and you are right, it does not work foriloc
. I will update my answer to address this issue. Thank you so much for letting me know.
– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
in one statement:df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used
– Martien Lubberink
Dec 8 '18 at 19:13
|
show 1 more comment
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
add a comment |
I personally like using the np.where
function which works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
add a comment |
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexing
loc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
add a comment |
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can useloc
here, likedf.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.
– cyber-math
Nov 10 '18 at 4:56
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21608228%2fconditional-replace-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
Can you useiloc
with this kind of mask? It doesn't seem to work for me (althoughloc
works fine). Ifiloc
doesn't work in this case maybe it's worth clarifying thatloc
should replaceix
to solve this problem, and in other situations might be replaced byiloc
?
– LangeHaare
Jul 14 '17 at 16:26
2
@LangeHaare, I just tried what you said, and you are right, it does not work foriloc
. I will update my answer to address this issue. Thank you so much for letting me know.
– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
in one statement:df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used
– Martien Lubberink
Dec 8 '18 at 19:13
|
show 1 more comment
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
Can you useiloc
with this kind of mask? It doesn't seem to work for me (althoughloc
works fine). Ifiloc
doesn't work in this case maybe it's worth clarifying thatloc
should replaceix
to solve this problem, and in other situations might be replaced byiloc
?
– LangeHaare
Jul 14 '17 at 16:26
2
@LangeHaare, I just tried what you said, and you are right, it does not work foriloc
. I will update my answer to address this issue. Thank you so much for letting me know.
– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
in one statement:df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used
– Martien Lubberink
Dec 8 '18 at 19:13
|
show 1 more comment
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
edited Jul 14 '17 at 16:47
answered Jun 1 '17 at 15:18
lmiguelvargasflmiguelvargasf
11.7k1085104
11.7k1085104
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
Can you useiloc
with this kind of mask? It doesn't seem to work for me (althoughloc
works fine). Ifiloc
doesn't work in this case maybe it's worth clarifying thatloc
should replaceix
to solve this problem, and in other situations might be replaced byiloc
?
– LangeHaare
Jul 14 '17 at 16:26
2
@LangeHaare, I just tried what you said, and you are right, it does not work foriloc
. I will update my answer to address this issue. Thank you so much for letting me know.
– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
in one statement:df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used
– Martien Lubberink
Dec 8 '18 at 19:13
|
show 1 more comment
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
Can you useiloc
with this kind of mask? It doesn't seem to work for me (althoughloc
works fine). Ifiloc
doesn't work in this case maybe it's worth clarifying thatloc
should replaceix
to solve this problem, and in other situations might be replaced byiloc
?
– LangeHaare
Jul 14 '17 at 16:26
2
@LangeHaare, I just tried what you said, and you are right, it does not work foriloc
. I will update my answer to address this issue. Thank you so much for letting me know.
– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
in one statement:df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used
– Martien Lubberink
Dec 8 '18 at 19:13
8
8
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.
– ramhiser
Jun 12 '17 at 19:08
1
1
Can you use
iloc
with this kind of mask? It doesn't seem to work for me (although loc
works fine). If iloc
doesn't work in this case maybe it's worth clarifying that loc
should replace ix
to solve this problem, and in other situations might be replaced by iloc
?– LangeHaare
Jul 14 '17 at 16:26
Can you use
iloc
with this kind of mask? It doesn't seem to work for me (although loc
works fine). If iloc
doesn't work in this case maybe it's worth clarifying that loc
should replace ix
to solve this problem, and in other situations might be replaced by iloc
?– LangeHaare
Jul 14 '17 at 16:26
2
2
@LangeHaare, I just tried what you said, and you are right, it does not work for
iloc
. I will update my answer to address this issue. Thank you so much for letting me know.– lmiguelvargasf
Jul 14 '17 at 16:39
@LangeHaare, I just tried what you said, and you are right, it does not work for
iloc
. I will update my answer to address this issue. Thank you so much for letting me know.– lmiguelvargasf
Jul 14 '17 at 16:39
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
this should be the answer
– Rutger Hofste
Oct 10 '17 at 15:28
1
1
in one statement:
df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used– Martien Lubberink
Dec 8 '18 at 19:13
in one statement:
df.loc[df.my_channel > 20000, 'my_channel'] = 0
The use of 'mask' is confusing, as no mask function is used– Martien Lubberink
Dec 8 '18 at 19:13
|
show 1 more comment
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
add a comment |
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
add a comment |
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
edited Jan 26 at 21:40
answered Feb 6 '14 at 16:24
lowtechlowtech
1,55011629
1,55011629
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
add a comment |
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
7
7
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0
– BMichell
Feb 6 '14 at 16:40
2
2
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet
– lowtech
Feb 6 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.
– Rutger Hofste
Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
@RutgerHofste thanks for mentioning that, yet another argument never use Python3
– lowtech
Oct 10 '17 at 15:29
add a comment |
I personally like using the np.where
function which works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
add a comment |
I personally like using the np.where
function which works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
add a comment |
I personally like using the np.where
function which works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
I personally like using the np.where
function which works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
edited Jun 9 '18 at 22:27
ccpizza
12.5k58486
12.5k58486
answered Feb 14 '18 at 20:41
seeiespiseeiespi
75911217
75911217
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
add a comment |
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
2
2
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
– Sagar Shah
Feb 16 '18 at 22:02
add a comment |
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexing
loc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
add a comment |
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexing
loc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
add a comment |
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexing
loc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexing
loc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
edited Dec 11 '18 at 11:19
answered Nov 10 '18 at 4:45
jppjpp
99.8k2161110
99.8k2161110
add a comment |
add a comment |
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can useloc
here, likedf.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.
– cyber-math
Nov 10 '18 at 4:56
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
add a comment |
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can useloc
here, likedf.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.
– cyber-math
Nov 10 '18 at 4:56
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
add a comment |
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
answered Nov 10 '18 at 4:18
cyber-mathcyber-math
8916
8916
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can useloc
here, likedf.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.
– cyber-math
Nov 10 '18 at 4:56
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
add a comment |
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can useloc
here, likedf.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.
– cyber-math
Nov 10 '18 at 4:56
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
1
1
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.
– jpp
Nov 10 '18 at 4:49
Thank you, I guess we can use
loc
here, like df.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.– cyber-math
Nov 10 '18 at 4:56
Thank you, I guess we can use
loc
here, like df.loc[: , 'my_column'] = df['my_column'].map(f)
. I do not know if it is fast like the ones you added below.– cyber-math
Nov 10 '18 at 4:56
1
1
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
Nope, still slow as you are still operating row-wise rather than column-wise.
– jpp
Nov 10 '18 at 5:00
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21608228%2fconditional-replace-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Found what I think you were looking for here.
– feetwet
Jan 5 '17 at 20:21