Create FY based on the range of date in pandas










1















I am working on dates and FY.
My working data set contains ten thousands of rows with date.
I want to create a new column to identify the season or FY.
The sample data frame is as follow:



df = pd.DataFrame()
df['date'] = ['10/08/2018','12/09/2018','15/08/2017','16/05/2018']


what I wanted to do is to create a new column season based on the range of the date.
for example,
if the month of the date is from AUgust to June, it will be considered as a season.
It means that the date ranges from 01/08/2005 to 30/06/2006, the season will be 2005-06.



For the sample date frame, the expected output will be as follow:



 date season
10/082018 2018-19
12/09/2018 2018-19
15/08/2017 2017-18
16/05/2018 2017-18


Below is my expected output:



enter image description here



How would I define the range and the season??
Thanks,



Zep.










share|improve this question
























  • Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

    – Sandeep Kadapa
    Nov 13 '18 at 5:02















1















I am working on dates and FY.
My working data set contains ten thousands of rows with date.
I want to create a new column to identify the season or FY.
The sample data frame is as follow:



df = pd.DataFrame()
df['date'] = ['10/08/2018','12/09/2018','15/08/2017','16/05/2018']


what I wanted to do is to create a new column season based on the range of the date.
for example,
if the month of the date is from AUgust to June, it will be considered as a season.
It means that the date ranges from 01/08/2005 to 30/06/2006, the season will be 2005-06.



For the sample date frame, the expected output will be as follow:



 date season
10/082018 2018-19
12/09/2018 2018-19
15/08/2017 2017-18
16/05/2018 2017-18


Below is my expected output:



enter image description here



How would I define the range and the season??
Thanks,



Zep.










share|improve this question
























  • Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

    – Sandeep Kadapa
    Nov 13 '18 at 5:02













1












1








1








I am working on dates and FY.
My working data set contains ten thousands of rows with date.
I want to create a new column to identify the season or FY.
The sample data frame is as follow:



df = pd.DataFrame()
df['date'] = ['10/08/2018','12/09/2018','15/08/2017','16/05/2018']


what I wanted to do is to create a new column season based on the range of the date.
for example,
if the month of the date is from AUgust to June, it will be considered as a season.
It means that the date ranges from 01/08/2005 to 30/06/2006, the season will be 2005-06.



For the sample date frame, the expected output will be as follow:



 date season
10/082018 2018-19
12/09/2018 2018-19
15/08/2017 2017-18
16/05/2018 2017-18


Below is my expected output:



enter image description here



How would I define the range and the season??
Thanks,



Zep.










share|improve this question
















I am working on dates and FY.
My working data set contains ten thousands of rows with date.
I want to create a new column to identify the season or FY.
The sample data frame is as follow:



df = pd.DataFrame()
df['date'] = ['10/08/2018','12/09/2018','15/08/2017','16/05/2018']


what I wanted to do is to create a new column season based on the range of the date.
for example,
if the month of the date is from AUgust to June, it will be considered as a season.
It means that the date ranges from 01/08/2005 to 30/06/2006, the season will be 2005-06.



For the sample date frame, the expected output will be as follow:



 date season
10/082018 2018-19
12/09/2018 2018-19
15/08/2017 2017-18
16/05/2018 2017-18


Below is my expected output:



enter image description here



How would I define the range and the season??
Thanks,



Zep.







python pandas date date-range






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 5:05







Zephyr

















asked Nov 13 '18 at 4:58









ZephyrZephyr

461110




461110












  • Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

    – Sandeep Kadapa
    Nov 13 '18 at 5:02

















  • Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

    – Sandeep Kadapa
    Nov 13 '18 at 5:02
















Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

– Sandeep Kadapa
Nov 13 '18 at 5:02





Please provide a Minimal, Complete, and Verifiable example. Images are difficult to interpret.

– Sandeep Kadapa
Nov 13 '18 at 5:02












1 Answer
1






active

oldest

votes


















1














You can use the condition based on month and concatenate the year part of the date after converting to string



df.date = pd.to_datetime(df.date, format = '%d/%m/%Y')
cond = df.date.dt.month >=8
df['season'] = np.where(cond, df.date.dt.year.apply(str) + '-' + (df.date.dt.year+1).apply(str).str[2:], (df.date.dt.year-1).apply(str) + '-' + df.date.dt.year.apply(str).str[2:])


date season
0 2018-08-10 2018-19
1 2018-09-12 2018-19
2 2017-08-15 2017-18
3 2018-05-16 2017-18





share|improve this answer























  • Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

    – Zephyr
    Nov 13 '18 at 5:29











  • @Zephyr, df['date'].dt.date will give you the date component of datetime

    – Vaishali
    Nov 13 '18 at 5:30







  • 1





    @Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

    – Sandeep Kadapa
    Nov 13 '18 at 5:37











  • @SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

    – Vaishali
    Nov 13 '18 at 5:44






  • 1





    @Vaishali Oh that's interesting and thank you for letting me know that.

    – Sandeep Kadapa
    Nov 13 '18 at 5:55










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53274077%2fcreate-fy-based-on-the-range-of-date-in-pandas%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You can use the condition based on month and concatenate the year part of the date after converting to string



df.date = pd.to_datetime(df.date, format = '%d/%m/%Y')
cond = df.date.dt.month >=8
df['season'] = np.where(cond, df.date.dt.year.apply(str) + '-' + (df.date.dt.year+1).apply(str).str[2:], (df.date.dt.year-1).apply(str) + '-' + df.date.dt.year.apply(str).str[2:])


date season
0 2018-08-10 2018-19
1 2018-09-12 2018-19
2 2017-08-15 2017-18
3 2018-05-16 2017-18





share|improve this answer























  • Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

    – Zephyr
    Nov 13 '18 at 5:29











  • @Zephyr, df['date'].dt.date will give you the date component of datetime

    – Vaishali
    Nov 13 '18 at 5:30







  • 1





    @Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

    – Sandeep Kadapa
    Nov 13 '18 at 5:37











  • @SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

    – Vaishali
    Nov 13 '18 at 5:44






  • 1





    @Vaishali Oh that's interesting and thank you for letting me know that.

    – Sandeep Kadapa
    Nov 13 '18 at 5:55















1














You can use the condition based on month and concatenate the year part of the date after converting to string



df.date = pd.to_datetime(df.date, format = '%d/%m/%Y')
cond = df.date.dt.month >=8
df['season'] = np.where(cond, df.date.dt.year.apply(str) + '-' + (df.date.dt.year+1).apply(str).str[2:], (df.date.dt.year-1).apply(str) + '-' + df.date.dt.year.apply(str).str[2:])


date season
0 2018-08-10 2018-19
1 2018-09-12 2018-19
2 2017-08-15 2017-18
3 2018-05-16 2017-18





share|improve this answer























  • Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

    – Zephyr
    Nov 13 '18 at 5:29











  • @Zephyr, df['date'].dt.date will give you the date component of datetime

    – Vaishali
    Nov 13 '18 at 5:30







  • 1





    @Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

    – Sandeep Kadapa
    Nov 13 '18 at 5:37











  • @SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

    – Vaishali
    Nov 13 '18 at 5:44






  • 1





    @Vaishali Oh that's interesting and thank you for letting me know that.

    – Sandeep Kadapa
    Nov 13 '18 at 5:55













1












1








1







You can use the condition based on month and concatenate the year part of the date after converting to string



df.date = pd.to_datetime(df.date, format = '%d/%m/%Y')
cond = df.date.dt.month >=8
df['season'] = np.where(cond, df.date.dt.year.apply(str) + '-' + (df.date.dt.year+1).apply(str).str[2:], (df.date.dt.year-1).apply(str) + '-' + df.date.dt.year.apply(str).str[2:])


date season
0 2018-08-10 2018-19
1 2018-09-12 2018-19
2 2017-08-15 2017-18
3 2018-05-16 2017-18





share|improve this answer













You can use the condition based on month and concatenate the year part of the date after converting to string



df.date = pd.to_datetime(df.date, format = '%d/%m/%Y')
cond = df.date.dt.month >=8
df['season'] = np.where(cond, df.date.dt.year.apply(str) + '-' + (df.date.dt.year+1).apply(str).str[2:], (df.date.dt.year-1).apply(str) + '-' + df.date.dt.year.apply(str).str[2:])


date season
0 2018-08-10 2018-19
1 2018-09-12 2018-19
2 2017-08-15 2017-18
3 2018-05-16 2017-18






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 5:21









VaishaliVaishali

19.6k41030




19.6k41030












  • Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

    – Zephyr
    Nov 13 '18 at 5:29











  • @Zephyr, df['date'].dt.date will give you the date component of datetime

    – Vaishali
    Nov 13 '18 at 5:30







  • 1





    @Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

    – Sandeep Kadapa
    Nov 13 '18 at 5:37











  • @SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

    – Vaishali
    Nov 13 '18 at 5:44






  • 1





    @Vaishali Oh that's interesting and thank you for letting me know that.

    – Sandeep Kadapa
    Nov 13 '18 at 5:55

















  • Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

    – Zephyr
    Nov 13 '18 at 5:29











  • @Zephyr, df['date'].dt.date will give you the date component of datetime

    – Vaishali
    Nov 13 '18 at 5:30







  • 1





    @Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

    – Sandeep Kadapa
    Nov 13 '18 at 5:37











  • @SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

    – Vaishali
    Nov 13 '18 at 5:44






  • 1





    @Vaishali Oh that's interesting and thank you for letting me know that.

    – Sandeep Kadapa
    Nov 13 '18 at 5:55
















Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

– Zephyr
Nov 13 '18 at 5:29





Thanks Vaishali. I converted the Date to datetime and got with the following format. time data Timestamp('2018-08-10 00:00:00') how can I remove the time part from date time? Thanks for your help.

– Zephyr
Nov 13 '18 at 5:29













@Zephyr, df['date'].dt.date will give you the date component of datetime

– Vaishali
Nov 13 '18 at 5:30






@Zephyr, df['date'].dt.date will give you the date component of datetime

– Vaishali
Nov 13 '18 at 5:30





1




1





@Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

– Sandeep Kadapa
Nov 13 '18 at 5:37





@Vaishali I would suggest to use year = df.date.dt.year;year_str = df.date.dt.year.astype(str);np.where(cond,year_str + '-' + (year+1).astype(str).str[2:], (year-1).astype(str) + '-' + year_str.str[2:]) to speed up the process since you have lot of intermediate same variables.

– Sandeep Kadapa
Nov 13 '18 at 5:37













@SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

– Vaishali
Nov 13 '18 at 5:44





@SandeepKadapa, may be you compared the time for the given number of rows. Try it on df = pd.concat([df]*10000, ignore_index=True) and times are identical.

– Vaishali
Nov 13 '18 at 5:44




1




1





@Vaishali Oh that's interesting and thank you for letting me know that.

– Sandeep Kadapa
Nov 13 '18 at 5:55





@Vaishali Oh that's interesting and thank you for letting me know that.

– Sandeep Kadapa
Nov 13 '18 at 5:55

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53274077%2fcreate-fy-based-on-the-range-of-date-in-pandas%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo