Find non-overlap instances (5-day stock returns) of rolling window product which meet condition









up vote
0
down vote

favorite












I have a time series of Amazon close price from 31 Dec 2009 to present.



I am trying to find the number of instances where 5-day returns of Amzn falls more than 15%



Date Open High Low Close Adj Close Volume
2009-12-30 138.399994 138.399994 135.279999 136.490005 136.490005 6913200
2009-12-31 137.089996 137.279999 134.520004 134.520004 134.520004 4523000



## to get the 1-day returns
df['returns'] = df['Close'] / df['Close'].shift(1)

## to get the rolling 5-day performance
df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())


## filter returns -15% or more
df2 = df[df['roll']<.85]


While the above outputs a table of rows where 5-day returns is -15% or more, the output of the df2 has overlapping 5-day windows.



1) How can I output in a list the rolling 5-day window as a column in the data frame => for example : ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3']



2) How can I remove overlapping periods in df2. If there is any days in the 5-day window that overlaps with another row, only keep 1 of the row.










share|improve this question



























    up vote
    0
    down vote

    favorite












    I have a time series of Amazon close price from 31 Dec 2009 to present.



    I am trying to find the number of instances where 5-day returns of Amzn falls more than 15%



    Date Open High Low Close Adj Close Volume
    2009-12-30 138.399994 138.399994 135.279999 136.490005 136.490005 6913200
    2009-12-31 137.089996 137.279999 134.520004 134.520004 134.520004 4523000



    ## to get the 1-day returns
    df['returns'] = df['Close'] / df['Close'].shift(1)

    ## to get the rolling 5-day performance
    df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())


    ## filter returns -15% or more
    df2 = df[df['roll']<.85]


    While the above outputs a table of rows where 5-day returns is -15% or more, the output of the df2 has overlapping 5-day windows.



    1) How can I output in a list the rolling 5-day window as a column in the data frame => for example : ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3']



    2) How can I remove overlapping periods in df2. If there is any days in the 5-day window that overlaps with another row, only keep 1 of the row.










    share|improve this question

























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a time series of Amazon close price from 31 Dec 2009 to present.



      I am trying to find the number of instances where 5-day returns of Amzn falls more than 15%



      Date Open High Low Close Adj Close Volume
      2009-12-30 138.399994 138.399994 135.279999 136.490005 136.490005 6913200
      2009-12-31 137.089996 137.279999 134.520004 134.520004 134.520004 4523000



      ## to get the 1-day returns
      df['returns'] = df['Close'] / df['Close'].shift(1)

      ## to get the rolling 5-day performance
      df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())


      ## filter returns -15% or more
      df2 = df[df['roll']<.85]


      While the above outputs a table of rows where 5-day returns is -15% or more, the output of the df2 has overlapping 5-day windows.



      1) How can I output in a list the rolling 5-day window as a column in the data frame => for example : ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3']



      2) How can I remove overlapping periods in df2. If there is any days in the 5-day window that overlaps with another row, only keep 1 of the row.










      share|improve this question















      I have a time series of Amazon close price from 31 Dec 2009 to present.



      I am trying to find the number of instances where 5-day returns of Amzn falls more than 15%



      Date Open High Low Close Adj Close Volume
      2009-12-30 138.399994 138.399994 135.279999 136.490005 136.490005 6913200
      2009-12-31 137.089996 137.279999 134.520004 134.520004 134.520004 4523000



      ## to get the 1-day returns
      df['returns'] = df['Close'] / df['Close'].shift(1)

      ## to get the rolling 5-day performance
      df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())


      ## filter returns -15% or more
      df2 = df[df['roll']<.85]


      While the above outputs a table of rows where 5-day returns is -15% or more, the output of the df2 has overlapping 5-day windows.



      1) How can I output in a list the rolling 5-day window as a column in the data frame => for example : ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3']



      2) How can I remove overlapping periods in df2. If there is any days in the 5-day window that overlaps with another row, only keep 1 of the row.







      python pandas dataframe machine-learning quantitative-finance






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 9 at 13:23

























      asked Nov 9 at 12:52









      Lko

      506




      506






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution



          def find_dates(df, center=False):

          dates = df.index.values
          dates_list = np.zeros((dates.shape[0],5), dtype=object)
          if center:
          for i,r in enumerate([2,1,0,-1,-2]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

          else:
          for i,r in enumerate([4,3,2,1,0]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,2,3]

          dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
          return dates_list


          # make a quick dataframe
          index_leters = 'a b c d e f g h i j'
          indexes = index_leters.split(' ')
          df = pd.DataFrame('B': list(np.arange(len(indexes))), index=indexes)

          center = False # can set to False
          #apply rolling function
          df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center)

          # extract index windows (will work on dates)
          df['dates'] = find_dates(df, center=center)


          2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows



          ## to get the rolling 5-day performance 
          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
          df_nonoverlapping = df[::5]


          Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use



          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 


          The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.



          Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.






          share|improve this answer






















          • The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
            – Lko
            Nov 9 at 13:55











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226063%2ffind-non-overlap-instances-5-day-stock-returns-of-rolling-window-product-which%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution



          def find_dates(df, center=False):

          dates = df.index.values
          dates_list = np.zeros((dates.shape[0],5), dtype=object)
          if center:
          for i,r in enumerate([2,1,0,-1,-2]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

          else:
          for i,r in enumerate([4,3,2,1,0]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,2,3]

          dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
          return dates_list


          # make a quick dataframe
          index_leters = 'a b c d e f g h i j'
          indexes = index_leters.split(' ')
          df = pd.DataFrame('B': list(np.arange(len(indexes))), index=indexes)

          center = False # can set to False
          #apply rolling function
          df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center)

          # extract index windows (will work on dates)
          df['dates'] = find_dates(df, center=center)


          2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows



          ## to get the rolling 5-day performance 
          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
          df_nonoverlapping = df[::5]


          Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use



          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 


          The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.



          Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.






          share|improve this answer






















          • The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
            – Lko
            Nov 9 at 13:55















          up vote
          0
          down vote













          I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution



          def find_dates(df, center=False):

          dates = df.index.values
          dates_list = np.zeros((dates.shape[0],5), dtype=object)
          if center:
          for i,r in enumerate([2,1,0,-1,-2]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

          else:
          for i,r in enumerate([4,3,2,1,0]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,2,3]

          dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
          return dates_list


          # make a quick dataframe
          index_leters = 'a b c d e f g h i j'
          indexes = index_leters.split(' ')
          df = pd.DataFrame('B': list(np.arange(len(indexes))), index=indexes)

          center = False # can set to False
          #apply rolling function
          df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center)

          # extract index windows (will work on dates)
          df['dates'] = find_dates(df, center=center)


          2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows



          ## to get the rolling 5-day performance 
          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
          df_nonoverlapping = df[::5]


          Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use



          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 


          The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.



          Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.






          share|improve this answer






















          • The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
            – Lko
            Nov 9 at 13:55













          up vote
          0
          down vote










          up vote
          0
          down vote









          I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution



          def find_dates(df, center=False):

          dates = df.index.values
          dates_list = np.zeros((dates.shape[0],5), dtype=object)
          if center:
          for i,r in enumerate([2,1,0,-1,-2]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

          else:
          for i,r in enumerate([4,3,2,1,0]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,2,3]

          dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
          return dates_list


          # make a quick dataframe
          index_leters = 'a b c d e f g h i j'
          indexes = index_leters.split(' ')
          df = pd.DataFrame('B': list(np.arange(len(indexes))), index=indexes)

          center = False # can set to False
          #apply rolling function
          df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center)

          # extract index windows (will work on dates)
          df['dates'] = find_dates(df, center=center)


          2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows



          ## to get the rolling 5-day performance 
          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
          df_nonoverlapping = df[::5]


          Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use



          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 


          The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.



          Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.






          share|improve this answer














          I'm not 100% sure what you mean in question 1, but if you want to extract all the dates from the dataframe index and put them into a list you can use list(df.index). If you meant that you would like to have a column which contains the dates used in each 5-day rolling window then this is likely to be non-trivial and a bit hacky. Pandas won't allow you just to calculate this with a roll. However here is a working solution



          def find_dates(df, center=False):

          dates = df.index.values
          dates_list = np.zeros((dates.shape[0],5), dtype=object)
          if center:
          for i,r in enumerate([2,1,0,-1,-2]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,len(dates)-1,len(dates)-2]

          else:
          for i,r in enumerate([4,3,2,1,0]):
          dates_list[:,i]=np.roll(dates,r)
          nul_dates_numbers = [0,1,2,3]

          dates_list = [list(d) if j not in nul_dates_numbers else None for j,d in enumerate(dates_list)]
          return dates_list


          # make a quick dataframe
          index_leters = 'a b c d e f g h i j'
          indexes = index_leters.split(' ')
          df = pd.DataFrame('B': list(np.arange(len(indexes))), index=indexes)

          center = False # can set to False
          #apply rolling function
          df['roll']= pd.rolling_apply(df.B,5,lambda x : x.prod(), center=center)

          # extract index windows (will work on dates)
          df['dates'] = find_dates(df, center=center)


          2) Assuming that you have data for each date then a simple way of removing overlapping periods is just to slice the dataframe to only keep every 5 rows



          ## to get the rolling 5-day performance 
          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod())
          df_nonoverlapping = df[::5]


          Also if you would like to have the date assigned to each window as the middle element in your window. ie in the window ['2009-12-30',2009-12-31','2010-1-1','2010-1-2','2010-1-3'] you want the date to be '2010-1-1' then you should use



          df['roll']= pd.rolling_apply(df.returns,5,lambda x : x.prod(), center=True) 


          The way you have it now then date index for each 5 day window will be set as the last one, ie '2010-1-3' in this case.



          Finally you should note that you will have a few NaN values in your 'roll' column because you can't calculate the 5 day rolling average at both the start and end of your dataframe. SO expect to have 4 NaN values in your 'roll' column.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 9 at 15:36

























          answered Nov 9 at 13:52









          James Fulton

          212




          212











          • The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
            – Lko
            Nov 9 at 13:55

















          • The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
            – Lko
            Nov 9 at 13:55
















          The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
          – Lko
          Nov 9 at 13:55





          The removal of overlapping windows comes after the filtering condition. So after filtering for rolling windows where returns is -10% or more, there could be rows where the periods overlap. I want to remove those periods. df[::5] does not work in this case as it comes before the filtering condition
          – Lko
          Nov 9 at 13:55


















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226063%2ffind-non-overlap-instances-5-day-stock-returns-of-rolling-window-product-which%23new-answer', 'question_page');

          );

          Post as a guest














































































          Popular posts from this blog

          Use pre created SQLite database for Android project in kotlin

          Darth Vader #20

          Ondo