Conditional Replace Pandas










66















I'm probably doing something very stupid, but I'm stumped.



I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:



df[df.my_channel > 20000].my_channel = 0


If I copy the channel into a new data frame it's simple:



df2 = df.my_channel 

df2[df2 > 20000] = 0


this does exactly what I want, but seems not to work with the channel as part of the original dataframe.










share|improve this question
























  • Found what I think you were looking for here.

    – feetwet
    Jan 5 '17 at 20:21















66















I'm probably doing something very stupid, but I'm stumped.



I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:



df[df.my_channel > 20000].my_channel = 0


If I copy the channel into a new data frame it's simple:



df2 = df.my_channel 

df2[df2 > 20000] = 0


this does exactly what I want, but seems not to work with the channel as part of the original dataframe.










share|improve this question
























  • Found what I think you were looking for here.

    – feetwet
    Jan 5 '17 at 20:21













66












66








66


25






I'm probably doing something very stupid, but I'm stumped.



I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:



df[df.my_channel > 20000].my_channel = 0


If I copy the channel into a new data frame it's simple:



df2 = df.my_channel 

df2[df2 > 20000] = 0


this does exactly what I want, but seems not to work with the channel as part of the original dataframe.










share|improve this question
















I'm probably doing something very stupid, but I'm stumped.



I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:



df[df.my_channel > 20000].my_channel = 0


If I copy the channel into a new data frame it's simple:



df2 = df.my_channel 

df2[df2 > 20000] = 0


this does exactly what I want, but seems not to work with the channel as part of the original dataframe.







python replace pandas conditional






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 17 '18 at 3:03









lmiguelvargasf

11.7k1085104




11.7k1085104










asked Feb 6 '14 at 16:16









BMichellBMichell

67221021




67221021












  • Found what I think you were looking for here.

    – feetwet
    Jan 5 '17 at 20:21

















  • Found what I think you were looking for here.

    – feetwet
    Jan 5 '17 at 20:21
















Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21





Found what I think you were looking for here.

– feetwet
Jan 5 '17 at 20:21












5 Answers
5






active

oldest

votes


















93














.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:



mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0


mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.



Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.






share|improve this answer




















  • 8





    lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

    – ramhiser
    Jun 12 '17 at 19:08







  • 1





    Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

    – LangeHaare
    Jul 14 '17 at 16:26






  • 2





    @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

    – lmiguelvargasf
    Jul 14 '17 at 16:39











  • this should be the answer

    – Rutger Hofste
    Oct 10 '17 at 15:28






  • 1





    in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

    – Martien Lubberink
    Dec 8 '18 at 19:13



















61














Try



df.loc[df.my_channel > 20000, 'my_channel'] = 0


Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.






share|improve this answer




















  • 7





    Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

    – BMichell
    Feb 6 '14 at 16:40







  • 2





    @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

    – lowtech
    Feb 6 '14 at 19:14











  • yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

    – Rutger Hofste
    Oct 10 '17 at 15:25











  • @RutgerHofste thanks for mentioning that, yet another argument never use Python3

    – lowtech
    Oct 10 '17 at 15:29


















17














I personally like using the np.where function which works as follows:



df['X'] = np.where(df['Y']>=50, 'yes', 'no')


In your case you would want:



import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)






share|improve this answer




















  • 2





    I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

    – Sagar Shah
    Feb 16 '18 at 22:02



















7














The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:




When setting values in a pandas object, care must be taken to avoid
what is called chained indexing.




You have a few alternatives:-




loc + Boolean indexing



loc may be used for setting values and supports Boolean masks:



df.loc[df['my_channel'] > 20000, 'my_channel'] = 0



mask + Boolean indexing



You can assign to your series:



df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)


Or you can update your series in place:



df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)



np.where + Boolean indexing



You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.



df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])





share|improve this answer
































    -1














    I would use lambda function on a Series of a DataFrame like this:



    f = lambda x: 0 if x>100 else 1
    df['my_column'] = df['my_column'].map(f)


    I do not assert that this is an efficient way, but it works fine.






    share|improve this answer


















    • 1





      This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

      – jpp
      Nov 10 '18 at 4:49











    • Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

      – cyber-math
      Nov 10 '18 at 4:56







    • 1





      Nope, still slow as you are still operating row-wise rather than column-wise.

      – jpp
      Nov 10 '18 at 5:00











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21608228%2fconditional-replace-pandas%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    93














    .ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:



    mask = df.my_channel > 20000
    column_name = 'my_channel'
    df.loc[mask, column_name] = 0


    mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.



    Update:
    In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.






    share|improve this answer




















    • 8





      lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

      – ramhiser
      Jun 12 '17 at 19:08







    • 1





      Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

      – LangeHaare
      Jul 14 '17 at 16:26






    • 2





      @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

      – lmiguelvargasf
      Jul 14 '17 at 16:39











    • this should be the answer

      – Rutger Hofste
      Oct 10 '17 at 15:28






    • 1





      in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

      – Martien Lubberink
      Dec 8 '18 at 19:13
















    93














    .ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:



    mask = df.my_channel > 20000
    column_name = 'my_channel'
    df.loc[mask, column_name] = 0


    mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.



    Update:
    In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.






    share|improve this answer




















    • 8





      lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

      – ramhiser
      Jun 12 '17 at 19:08







    • 1





      Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

      – LangeHaare
      Jul 14 '17 at 16:26






    • 2





      @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

      – lmiguelvargasf
      Jul 14 '17 at 16:39











    • this should be the answer

      – Rutger Hofste
      Oct 10 '17 at 15:28






    • 1





      in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

      – Martien Lubberink
      Dec 8 '18 at 19:13














    93












    93








    93







    .ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:



    mask = df.my_channel > 20000
    column_name = 'my_channel'
    df.loc[mask, column_name] = 0


    mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.



    Update:
    In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.






    share|improve this answer















    .ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:



    mask = df.my_channel > 20000
    column_name = 'my_channel'
    df.loc[mask, column_name] = 0


    mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.



    Update:
    In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 14 '17 at 16:47

























    answered Jun 1 '17 at 15:18









    lmiguelvargasflmiguelvargasf

    11.7k1085104




    11.7k1085104







    • 8





      lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

      – ramhiser
      Jun 12 '17 at 19:08







    • 1





      Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

      – LangeHaare
      Jul 14 '17 at 16:26






    • 2





      @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

      – lmiguelvargasf
      Jul 14 '17 at 16:39











    • this should be the answer

      – Rutger Hofste
      Oct 10 '17 at 15:28






    • 1





      in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

      – Martien Lubberink
      Dec 8 '18 at 19:13













    • 8





      lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

      – ramhiser
      Jun 12 '17 at 19:08







    • 1





      Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

      – LangeHaare
      Jul 14 '17 at 16:26






    • 2





      @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

      – lmiguelvargasf
      Jul 14 '17 at 16:39











    • this should be the answer

      – Rutger Hofste
      Oct 10 '17 at 15:28






    • 1





      in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

      – Martien Lubberink
      Dec 8 '18 at 19:13








    8




    8





    lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

    – ramhiser
    Jun 12 '17 at 19:08






    lmiguelvargasf's answer should be tagged as the correct one, given the recent changes to pandas.

    – ramhiser
    Jun 12 '17 at 19:08





    1




    1





    Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

    – LangeHaare
    Jul 14 '17 at 16:26





    Can you use iloc with this kind of mask? It doesn't seem to work for me (although loc works fine). If iloc doesn't work in this case maybe it's worth clarifying that loc should replace ix to solve this problem, and in other situations might be replaced by iloc?

    – LangeHaare
    Jul 14 '17 at 16:26




    2




    2





    @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

    – lmiguelvargasf
    Jul 14 '17 at 16:39





    @LangeHaare, I just tried what you said, and you are right, it does not work for iloc. I will update my answer to address this issue. Thank you so much for letting me know.

    – lmiguelvargasf
    Jul 14 '17 at 16:39













    this should be the answer

    – Rutger Hofste
    Oct 10 '17 at 15:28





    this should be the answer

    – Rutger Hofste
    Oct 10 '17 at 15:28




    1




    1





    in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

    – Martien Lubberink
    Dec 8 '18 at 19:13






    in one statement: df.loc[df.my_channel > 20000, 'my_channel'] = 0 The use of 'mask' is confusing, as no mask function is used

    – Martien Lubberink
    Dec 8 '18 at 19:13














    61














    Try



    df.loc[df.my_channel > 20000, 'my_channel'] = 0


    Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.






    share|improve this answer




















    • 7





      Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

      – BMichell
      Feb 6 '14 at 16:40







    • 2





      @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

      – lowtech
      Feb 6 '14 at 19:14











    • yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

      – Rutger Hofste
      Oct 10 '17 at 15:25











    • @RutgerHofste thanks for mentioning that, yet another argument never use Python3

      – lowtech
      Oct 10 '17 at 15:29















    61














    Try



    df.loc[df.my_channel > 20000, 'my_channel'] = 0


    Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.






    share|improve this answer




















    • 7





      Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

      – BMichell
      Feb 6 '14 at 16:40







    • 2





      @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

      – lowtech
      Feb 6 '14 at 19:14











    • yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

      – Rutger Hofste
      Oct 10 '17 at 15:25











    • @RutgerHofste thanks for mentioning that, yet another argument never use Python3

      – lowtech
      Oct 10 '17 at 15:29













    61












    61








    61







    Try



    df.loc[df.my_channel > 20000, 'my_channel'] = 0


    Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.






    share|improve this answer















    Try



    df.loc[df.my_channel > 20000, 'my_channel'] = 0


    Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 26 at 21:40

























    answered Feb 6 '14 at 16:24









    lowtechlowtech

    1,55011629




    1,55011629







    • 7





      Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

      – BMichell
      Feb 6 '14 at 16:40







    • 2





      @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

      – lowtech
      Feb 6 '14 at 19:14











    • yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

      – Rutger Hofste
      Oct 10 '17 at 15:25











    • @RutgerHofste thanks for mentioning that, yet another argument never use Python3

      – lowtech
      Oct 10 '17 at 15:29












    • 7





      Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

      – BMichell
      Feb 6 '14 at 16:40







    • 2





      @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

      – lowtech
      Feb 6 '14 at 19:14











    • yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

      – Rutger Hofste
      Oct 10 '17 at 15:25











    • @RutgerHofste thanks for mentioning that, yet another argument never use Python3

      – lowtech
      Oct 10 '17 at 15:29







    7




    7





    Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

    – BMichell
    Feb 6 '14 at 16:40






    Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0

    – BMichell
    Feb 6 '14 at 16:40





    2




    2





    @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

    – lowtech
    Feb 6 '14 at 19:14





    @BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet

    – lowtech
    Feb 6 '14 at 19:14













    yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

    – Rutger Hofste
    Oct 10 '17 at 15:25





    yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… """Entry point for launching an IPython kernel.

    – Rutger Hofste
    Oct 10 '17 at 15:25













    @RutgerHofste thanks for mentioning that, yet another argument never use Python3

    – lowtech
    Oct 10 '17 at 15:29





    @RutgerHofste thanks for mentioning that, yet another argument never use Python3

    – lowtech
    Oct 10 '17 at 15:29











    17














    I personally like using the np.where function which works as follows:



    df['X'] = np.where(df['Y']>=50, 'yes', 'no')


    In your case you would want:



    import numpy as np
    df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)






    share|improve this answer




















    • 2





      I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

      – Sagar Shah
      Feb 16 '18 at 22:02
















    17














    I personally like using the np.where function which works as follows:



    df['X'] = np.where(df['Y']>=50, 'yes', 'no')


    In your case you would want:



    import numpy as np
    df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)






    share|improve this answer




















    • 2





      I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

      – Sagar Shah
      Feb 16 '18 at 22:02














    17












    17








    17







    I personally like using the np.where function which works as follows:



    df['X'] = np.where(df['Y']>=50, 'yes', 'no')


    In your case you would want:



    import numpy as np
    df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)






    share|improve this answer















    I personally like using the np.where function which works as follows:



    df['X'] = np.where(df['Y']>=50, 'yes', 'no')


    In your case you would want:



    import numpy as np
    df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jun 9 '18 at 22:27









    ccpizza

    12.5k58486




    12.5k58486










    answered Feb 14 '18 at 20:41









    seeiespiseeiespi

    75911217




    75911217







    • 2





      I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

      – Sagar Shah
      Feb 16 '18 at 22:02













    • 2





      I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

      – Sagar Shah
      Feb 16 '18 at 22:02








    2




    2





    I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

    – Sagar Shah
    Feb 16 '18 at 22:02






    I like np.where too only "." needs to be remove from statement. so it should be. df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

    – Sagar Shah
    Feb 16 '18 at 22:02












    7














    The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:




    When setting values in a pandas object, care must be taken to avoid
    what is called chained indexing.




    You have a few alternatives:-




    loc + Boolean indexing



    loc may be used for setting values and supports Boolean masks:



    df.loc[df['my_channel'] > 20000, 'my_channel'] = 0



    mask + Boolean indexing



    You can assign to your series:



    df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)


    Or you can update your series in place:



    df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)



    np.where + Boolean indexing



    You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.



    df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])





    share|improve this answer





























      7














      The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:




      When setting values in a pandas object, care must be taken to avoid
      what is called chained indexing.




      You have a few alternatives:-




      loc + Boolean indexing



      loc may be used for setting values and supports Boolean masks:



      df.loc[df['my_channel'] > 20000, 'my_channel'] = 0



      mask + Boolean indexing



      You can assign to your series:



      df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)


      Or you can update your series in place:



      df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)



      np.where + Boolean indexing



      You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.



      df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])





      share|improve this answer



























        7












        7








        7







        The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:




        When setting values in a pandas object, care must be taken to avoid
        what is called chained indexing.




        You have a few alternatives:-




        loc + Boolean indexing



        loc may be used for setting values and supports Boolean masks:



        df.loc[df['my_channel'] > 20000, 'my_channel'] = 0



        mask + Boolean indexing



        You can assign to your series:



        df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)


        Or you can update your series in place:



        df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)



        np.where + Boolean indexing



        You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.



        df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])





        share|improve this answer















        The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:




        When setting values in a pandas object, care must be taken to avoid
        what is called chained indexing.




        You have a few alternatives:-




        loc + Boolean indexing



        loc may be used for setting values and supports Boolean masks:



        df.loc[df['my_channel'] > 20000, 'my_channel'] = 0



        mask + Boolean indexing



        You can assign to your series:



        df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)


        Or you can update your series in place:



        df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)



        np.where + Boolean indexing



        You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.



        df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 11 '18 at 11:19

























        answered Nov 10 '18 at 4:45









        jppjpp

        99.8k2161110




        99.8k2161110





















            -1














            I would use lambda function on a Series of a DataFrame like this:



            f = lambda x: 0 if x>100 else 1
            df['my_column'] = df['my_column'].map(f)


            I do not assert that this is an efficient way, but it works fine.






            share|improve this answer


















            • 1





              This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

              – jpp
              Nov 10 '18 at 4:49











            • Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

              – cyber-math
              Nov 10 '18 at 4:56







            • 1





              Nope, still slow as you are still operating row-wise rather than column-wise.

              – jpp
              Nov 10 '18 at 5:00
















            -1














            I would use lambda function on a Series of a DataFrame like this:



            f = lambda x: 0 if x>100 else 1
            df['my_column'] = df['my_column'].map(f)


            I do not assert that this is an efficient way, but it works fine.






            share|improve this answer


















            • 1





              This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

              – jpp
              Nov 10 '18 at 4:49











            • Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

              – cyber-math
              Nov 10 '18 at 4:56







            • 1





              Nope, still slow as you are still operating row-wise rather than column-wise.

              – jpp
              Nov 10 '18 at 5:00














            -1












            -1








            -1







            I would use lambda function on a Series of a DataFrame like this:



            f = lambda x: 0 if x>100 else 1
            df['my_column'] = df['my_column'].map(f)


            I do not assert that this is an efficient way, but it works fine.






            share|improve this answer













            I would use lambda function on a Series of a DataFrame like this:



            f = lambda x: 0 if x>100 else 1
            df['my_column'] = df['my_column'].map(f)


            I do not assert that this is an efficient way, but it works fine.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 10 '18 at 4:18









            cyber-mathcyber-math

            8916




            8916







            • 1





              This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

              – jpp
              Nov 10 '18 at 4:49











            • Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

              – cyber-math
              Nov 10 '18 at 4:56







            • 1





              Nope, still slow as you are still operating row-wise rather than column-wise.

              – jpp
              Nov 10 '18 at 5:00













            • 1





              This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

              – jpp
              Nov 10 '18 at 4:49











            • Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

              – cyber-math
              Nov 10 '18 at 4:56







            • 1





              Nope, still slow as you are still operating row-wise rather than column-wise.

              – jpp
              Nov 10 '18 at 5:00








            1




            1





            This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

            – jpp
            Nov 10 '18 at 4:49





            This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation.

            – jpp
            Nov 10 '18 at 4:49













            Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

            – cyber-math
            Nov 10 '18 at 4:56






            Thank you, I guess we can use loc here, like df.loc[: , 'my_column'] = df['my_column'].map(f) . I do not know if it is fast like the ones you added below.

            – cyber-math
            Nov 10 '18 at 4:56





            1




            1





            Nope, still slow as you are still operating row-wise rather than column-wise.

            – jpp
            Nov 10 '18 at 5:00






            Nope, still slow as you are still operating row-wise rather than column-wise.

            – jpp
            Nov 10 '18 at 5:00


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21608228%2fconditional-replace-pandas%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo