Conditional Replace Pandas

I'm probably doing something very stupid, but I'm stumped.

I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

this does exactly what I want, but seems not to work with the channel as part of the original dataframe.

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21

add a comment |

I'm probably doing something very stupid, but I'm stumped.

I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

this does exactly what I want, but seems not to work with the channel as part of the original dataframe.

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21

add a comment |

I'm probably doing something very stupid, but I'm stumped.

I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

this does exactly what I want, but seems not to work with the channel as part of the original dataframe.

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

I'm probably doing something very stupid, but I'm stumped.

I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

this does exactly what I want, but seems not to work with the channel as part of the original dataframe.

python replace pandas conditional

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

edited Aug 17 '18 at 3:03

lmiguelvargasf

11.7k1085104

asked Feb 6 '14 at 16:16

BMichell

67221021

asked Feb 6 '14 at 16:16

BMichell

67221021

asked Feb 6 '14 at 16:16

BMichell

67221021

Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21

add a comment |

Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21

Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21

add a comment |

5 Answers
5

active

oldest

votes

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

8

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

1

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

2

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

1

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

|
show 1 more comment

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

7

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

add a comment |

I personally like using the np.where function which works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

2

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

add a comment |

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

add a comment |

-1

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 4:18

cyber-math

8916

1

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

1

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21608228%2fconditional-replace-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

8

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

1

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

2

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

1

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

|
show 1 more comment

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

8

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

1

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

2

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

1

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

|
show 1 more comment

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

edited Jul 14 '17 at 16:47

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

answered Jun 1 '17 at 15:18

lmiguelvargasf

11.7k1085104

8

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

1

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

2

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

1

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

|
show 1 more comment

8

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

1

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

2

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

1

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

– ramhiser
Jun 12 '17 at 19:08

Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

– LangeHaare
Jul 14 '17 at 16:26

@LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

– lmiguelvargasf
Jul 14 '17 at 16:39

this should be the answer

– Rutger Hofste
Oct 10 '17 at 15:28

in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

– Martien Lubberink
Dec 8 '18 at 19:13

|
show 1 more comment

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

7

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

add a comment |

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

7

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

add a comment |

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

edited Jan 26 at 21:40

answered Feb 6 '14 at 16:24

lowtech

1,55011629

answered Feb 6 '14 at 16:24

lowtech

1,55011629

answered Feb 6 '14 at 16:24

lowtech

1,55011629

7

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

add a comment |

7

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

– BMichell
Feb 6 '14 at 16:40

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

– lowtech
Feb 6 '14 at 19:14

yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

– Rutger Hofste
Oct 10 '17 at 15:25

@RutgerHofste thanks for mentioning that, yet another argument never use Python3

– lowtech
Oct 10 '17 at 15:29

add a comment |

I personally like using the np.where function which works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

2

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

add a comment |

I personally like using the np.where function which works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

2

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

add a comment |

I personally like using the np.where function which works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

I personally like using the np.where function which works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

edited Jun 9 '18 at 22:27

ccpizza

12.5k58486

answered Feb 14 '18 at 20:41

seeiespi

75911217

answered Feb 14 '18 at 20:41

seeiespi

75911217

answered Feb 14 '18 at 20:41

seeiespi

75911217

2

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

add a comment |

2

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

– Sagar Shah
Feb 16 '18 at 22:02

add a comment |

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

add a comment |

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

add a comment |

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

edited Dec 11 '18 at 11:19

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

answered Nov 10 '18 at 4:45

jpp

99.8k2161110

add a comment |

-1

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 4:18

cyber-math

8916

1

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

1

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

add a comment |

-1

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 4:18

cyber-math

8916

1

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

1

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

add a comment |

-1

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 4:18

cyber-math

8916

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 4:18

cyber-math

8916

answered Nov 10 '18 at 4:18

cyber-math

8916

answered Nov 10 '18 at 4:18

cyber-math

8916

answered Nov 10 '18 at 4:18

cyber-math

8916

1

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

1

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

add a comment |

1

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

1

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

– jpp
Nov 10 '18 at 4:49

Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

– cyber-math
Nov 10 '18 at 4:56

Nope, still slow as you are still operating row-wise rather than column-wise.

– jpp
Nov 10 '18 at 5:00

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

47 o gwth0dNCnxcX9HNtFQv9BGp

搜尋此網誌

Pfthb