pad a data frame according to a frequency for each group

I have a pandas.DataFrame df with a pandas.DatetimeIndex and a column named group_column.
I need the df to have a minutely frequency (meaning there is a row for every minute).

however this needs to be case for every value in the group_column, so every minute can potentially have several values.

NOTE:

the group_column can have hundreds of unique values.

some groups can "last" several minutes and others can last for days, the edges are determined by the first and last appearances of the values in group_column.

example

Input:

dates = [pd.Timestamp('2018-01-01 12:00'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:03'), pd.Timestamp('2018-01-01 12:04')]
df = pd.DataFrame('group_column': ['a', 'a','b','a','b'], 'data_column': [1.2, 2.2, 4, 1, 2], index=dates)

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:01:00 b 4.0
2018-01-01 12:03:00 a 1.0
2018-01-01 12:04:00 b 2.0

desired output:

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

my attempt

I have done this, however it seems highly inefficient:

def group_resmaple(df, group_column_name):
 values = df[group_column_name].unique()
 for value in values:
 df_g = df.loc[df[group_column]==value]
 df_g = df_g.asfreq('min', 'pad')
 yield df_g

df_paded = pd.concat(group_resmaple(df, 'group_column'))

asked Nov 11 at 10:19

moshevi

5281119

I think you should change group_column to be part of the index (df.set_index('group_column', inplace=True, drop=False)), then try to figure out how to make resampling work, either (ideally) on the whole DataFrame, or perhaps on one group at a time. See stackoverflow.com/questions/15799162/…
– John Zwinck
Nov 11 at 10:22

add a comment |

I have a pandas.DataFrame df with a pandas.DatetimeIndex and a column named group_column.
I need the df to have a minutely frequency (meaning there is a row for every minute).

however this needs to be case for every value in the group_column, so every minute can potentially have several values.

NOTE:

the group_column can have hundreds of unique values.

some groups can "last" several minutes and others can last for days, the edges are determined by the first and last appearances of the values in group_column.

example

Input:

dates = [pd.Timestamp('2018-01-01 12:00'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:03'), pd.Timestamp('2018-01-01 12:04')]
df = pd.DataFrame('group_column': ['a', 'a','b','a','b'], 'data_column': [1.2, 2.2, 4, 1, 2], index=dates)

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:01:00 b 4.0
2018-01-01 12:03:00 a 1.0
2018-01-01 12:04:00 b 2.0

desired output:

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

my attempt

I have done this, however it seems highly inefficient:

def group_resmaple(df, group_column_name):
 values = df[group_column_name].unique()
 for value in values:
 df_g = df.loc[df[group_column]==value]
 df_g = df_g.asfreq('min', 'pad')
 yield df_g

df_paded = pd.concat(group_resmaple(df, 'group_column'))

asked Nov 11 at 10:19

moshevi

5281119

I think you should change group_column to be part of the index (df.set_index('group_column', inplace=True, drop=False)), then try to figure out how to make resampling work, either (ideally) on the whole DataFrame, or perhaps on one group at a time. See stackoverflow.com/questions/15799162/…
– John Zwinck
Nov 11 at 10:22

add a comment |

I have a pandas.DataFrame df with a pandas.DatetimeIndex and a column named group_column.
I need the df to have a minutely frequency (meaning there is a row for every minute).

however this needs to be case for every value in the group_column, so every minute can potentially have several values.

NOTE:

the group_column can have hundreds of unique values.

some groups can "last" several minutes and others can last for days, the edges are determined by the first and last appearances of the values in group_column.

example

Input:

dates = [pd.Timestamp('2018-01-01 12:00'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:03'), pd.Timestamp('2018-01-01 12:04')]
df = pd.DataFrame('group_column': ['a', 'a','b','a','b'], 'data_column': [1.2, 2.2, 4, 1, 2], index=dates)

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:01:00 b 4.0
2018-01-01 12:03:00 a 1.0
2018-01-01 12:04:00 b 2.0

desired output:

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

my attempt

I have done this, however it seems highly inefficient:

def group_resmaple(df, group_column_name):
 values = df[group_column_name].unique()
 for value in values:
 df_g = df.loc[df[group_column]==value]
 df_g = df_g.asfreq('min', 'pad')
 yield df_g

df_paded = pd.concat(group_resmaple(df, 'group_column'))

asked Nov 11 at 10:19

moshevi

5281119

I have a pandas.DataFrame df with a pandas.DatetimeIndex and a column named group_column.
I need the df to have a minutely frequency (meaning there is a row for every minute).

however this needs to be case for every value in the group_column, so every minute can potentially have several values.

NOTE:

the group_column can have hundreds of unique values.

some groups can "last" several minutes and others can last for days, the edges are determined by the first and last appearances of the values in group_column.

example

Input:

dates = [pd.Timestamp('2018-01-01 12:00'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:03'), pd.Timestamp('2018-01-01 12:04')]
df = pd.DataFrame('group_column': ['a', 'a','b','a','b'], 'data_column': [1.2, 2.2, 4, 1, 2], index=dates)

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:01:00 b 4.0
2018-01-01 12:03:00 a 1.0
2018-01-01 12:04:00 b 2.0

desired output:

 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

my attempt

I have done this, however it seems highly inefficient:

def group_resmaple(df, group_column_name):
 values = df[group_column_name].unique()
 for value in values:
 df_g = df.loc[df[group_column]==value]
 df_g = df_g.asfreq('min', 'pad')
 yield df_g

df_paded = pd.concat(group_resmaple(df, 'group_column'))

python pandas

asked Nov 11 at 10:19

moshevi

5281119

asked Nov 11 at 10:19

moshevi

5281119

asked Nov 11 at 10:19

moshevi

5281119

asked Nov 11 at 10:19

moshevi

5281119

asked Nov 11 at 10:19

moshevi

5281119

I think you should change group_column to be part of the index (df.set_index('group_column', inplace=True, drop=False)), then try to figure out how to make resampling work, either (ideally) on the whole DataFrame, or perhaps on one group at a time. See stackoverflow.com/questions/15799162/…
– John Zwinck
Nov 11 at 10:22

add a comment |

I think you should change group_column to be part of the index (df.set_index('group_column', inplace=True, drop=False)), then try to figure out how to make resampling work, either (ideally) on the whole DataFrame, or perhaps on one group at a time. See stackoverflow.com/questions/15799162/…
– John Zwinck
Nov 11 at 10:22

I think you should change group_column to be part of the index (df.set_index('group_column', inplace=True, drop=False)), then try to figure out how to make resampling work, either (ideally) on the whole DataFrame, or perhaps on one group at a time. See stackoverflow.com/questions/15799162/…
– John Zwinck
Nov 11 at 10:22

add a comment |

2 Answers
2

active

oldest

votes

Use GroupBy.apply with asfreq:

df1 = (df.groupby('group_column')
 .apply(lambda x: x.asfreq('min', 'pad'))
 .reset_index(level=0, drop=True))
print (df1)
 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

answered Nov 11 at 10:22

jezrael

319k22258336

add a comment |

My approach would be

df2 = df.groupby('group_column').resample('min').ffill().reset_index(level=0, drop=True)
print(df2)

 data_column group_column
2018-01-01 12:00:00 1.2 a 
2018-01-01 12:01:00 2.2 a 
2018-01-01 12:02:00 2.2 a 
2018-01-01 12:03:00 1.0 a 
2018-01-01 12:01:00 4.0 b 
2018-01-01 12:02:00 4.0 b 
2018-01-01 12:03:00 4.0 b 
2018-01-01 12:04:00 2.0 b

answered Nov 11 at 15:43

SpghttCd

4,0222313

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247754%2fpad-a-data-frame-according-to-a-frequency-for-each-group%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Use GroupBy.apply with asfreq:

df1 = (df.groupby('group_column')
 .apply(lambda x: x.asfreq('min', 'pad'))
 .reset_index(level=0, drop=True))
print (df1)
 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

answered Nov 11 at 10:22

jezrael

319k22258336

add a comment |

Use GroupBy.apply with asfreq:

df1 = (df.groupby('group_column')
 .apply(lambda x: x.asfreq('min', 'pad'))
 .reset_index(level=0, drop=True))
print (df1)
 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

answered Nov 11 at 10:22

jezrael

319k22258336

add a comment |

Use GroupBy.apply with asfreq:

df1 = (df.groupby('group_column')
 .apply(lambda x: x.asfreq('min', 'pad'))
 .reset_index(level=0, drop=True))
print (df1)
 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

answered Nov 11 at 10:22

jezrael

319k22258336

Use GroupBy.apply with asfreq:

df1 = (df.groupby('group_column')
 .apply(lambda x: x.asfreq('min', 'pad'))
 .reset_index(level=0, drop=True))
print (df1)
 group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0

answered Nov 11 at 10:22

jezrael

319k22258336

answered Nov 11 at 10:22

jezrael

319k22258336

answered Nov 11 at 10:22

jezrael

319k22258336

answered Nov 11 at 10:22

jezrael

319k22258336

add a comment |

My approach would be

df2 = df.groupby('group_column').resample('min').ffill().reset_index(level=0, drop=True)
print(df2)

 data_column group_column
2018-01-01 12:00:00 1.2 a 
2018-01-01 12:01:00 2.2 a 
2018-01-01 12:02:00 2.2 a 
2018-01-01 12:03:00 1.0 a 
2018-01-01 12:01:00 4.0 b 
2018-01-01 12:02:00 4.0 b 
2018-01-01 12:03:00 4.0 b 
2018-01-01 12:04:00 2.0 b

answered Nov 11 at 15:43

SpghttCd

4,0222313

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

add a comment |

My approach would be

df2 = df.groupby('group_column').resample('min').ffill().reset_index(level=0, drop=True)
print(df2)

 data_column group_column
2018-01-01 12:00:00 1.2 a 
2018-01-01 12:01:00 2.2 a 
2018-01-01 12:02:00 2.2 a 
2018-01-01 12:03:00 1.0 a 
2018-01-01 12:01:00 4.0 b 
2018-01-01 12:02:00 4.0 b 
2018-01-01 12:03:00 4.0 b 
2018-01-01 12:04:00 2.0 b

answered Nov 11 at 15:43

SpghttCd

4,0222313

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

add a comment |

My approach would be

df2 = df.groupby('group_column').resample('min').ffill().reset_index(level=0, drop=True)
print(df2)

 data_column group_column
2018-01-01 12:00:00 1.2 a 
2018-01-01 12:01:00 2.2 a 
2018-01-01 12:02:00 2.2 a 
2018-01-01 12:03:00 1.0 a 
2018-01-01 12:01:00 4.0 b 
2018-01-01 12:02:00 4.0 b 
2018-01-01 12:03:00 4.0 b 
2018-01-01 12:04:00 2.0 b

answered Nov 11 at 15:43

SpghttCd

4,0222313

My approach would be

df2 = df.groupby('group_column').resample('min').ffill().reset_index(level=0, drop=True)
print(df2)

 data_column group_column
2018-01-01 12:00:00 1.2 a 
2018-01-01 12:01:00 2.2 a 
2018-01-01 12:02:00 2.2 a 
2018-01-01 12:03:00 1.0 a 
2018-01-01 12:01:00 4.0 b 
2018-01-01 12:02:00 4.0 b 
2018-01-01 12:03:00 4.0 b 
2018-01-01 12:04:00 2.0 b

answered Nov 11 at 15:43

SpghttCd

4,0222313

answered Nov 11 at 15:43

SpghttCd

4,0222313

answered Nov 11 at 15:43

SpghttCd

4,0222313

answered Nov 11 at 15:43

SpghttCd

4,0222313

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

add a comment |

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

why is this better then jezrael answer ? faster ?
– moshevi
Nov 11 at 16:38

I didn't mean to have a better solution, it's just mine in the first place. I didn't time it and normally from my experience here it's generally a good idea to examine whatever @jezrael is posting, at least with regards to pandas. However, in this case he uses apply which my approach does not, so it might be worth comparing performance...
– SpghttCd
Nov 11 at 16:48

In the meantime I was able to timeit and the code of @jezrael is about 25% faster. I always thought "no apply is better", but obviously this is wrong. Now why should perhaps better asked to jezrael...
– SpghttCd
Nov 11 at 22:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb