pandas match date in one df with timeframe in another, then groupby-sum
I have two dataframes, test1
and test2
. For each ID
value in test2
, I want to check the date
in test2
and compare it to the date ranges for that same ID
value in test1
. If any of the date
's in test2
are within a date range in test1
, sum the amount
column and assign that sum as an additional column in test1
.
Output:
So the new test1
df will have a column amount_sum
which is the sum of all amounts in test2
where the date
is within the date range of test1
- for that ID
import random
import string
test1 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
)
test2 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
)
python pandas
add a comment |
I have two dataframes, test1
and test2
. For each ID
value in test2
, I want to check the date
in test2
and compare it to the date ranges for that same ID
value in test1
. If any of the date
's in test2
are within a date range in test1
, sum the amount
column and assign that sum as an additional column in test1
.
Output:
So the new test1
df will have a column amount_sum
which is the sum of all amounts in test2
where the date
is within the date range of test1
- for that ID
import random
import string
test1 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
)
test2 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
)
python pandas
add a comment |
I have two dataframes, test1
and test2
. For each ID
value in test2
, I want to check the date
in test2
and compare it to the date ranges for that same ID
value in test1
. If any of the date
's in test2
are within a date range in test1
, sum the amount
column and assign that sum as an additional column in test1
.
Output:
So the new test1
df will have a column amount_sum
which is the sum of all amounts in test2
where the date
is within the date range of test1
- for that ID
import random
import string
test1 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
)
test2 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
)
python pandas
I have two dataframes, test1
and test2
. For each ID
value in test2
, I want to check the date
in test2
and compare it to the date ranges for that same ID
value in test1
. If any of the date
's in test2
are within a date range in test1
, sum the amount
column and assign that sum as an additional column in test1
.
Output:
So the new test1
df will have a column amount_sum
which is the sum of all amounts in test2
where the date
is within the date range of test1
- for that ID
import random
import string
test1 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'date1':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)],
'date2':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(101, 200, 1))) for n in range(100)]
)
test2 = pd.DataFrame(
'ID':[''.join(random.choice(string.ascii_letters[0:4]) for _ in range(3)) for n in range(100)],
'amount':[random.choice([1,2,3,5,10]) for n in range(100)],
'date':[pd.to_datetime(random.choice(['01-01-2018','05-01-2018','06-01-2018','08-01-2018','09-01-2018'])) + pd.DateOffset(int(np.random.randint(0, 100, 1))) for n in range(100)]
)
python pandas
python pandas
edited Nov 13 '18 at 6:03
jchaykow
asked Nov 13 '18 at 5:55
jchaykowjchaykow
540320
540320
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Use:
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
Sample:
#https://stackoverflow.com/q/21494489
np.random.seed(123)
#https://stackoverflow.com/a/50559321/2901002
def gen(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
n = 10
test1 = pd.DataFrame(
'ID':np.random.choice(list('abc'), n),
'date1': gen(pd.to_datetime('2010-01-01'),pd.to_datetime('2010-03-01'), n).floor('d'),
'date2':gen(pd.to_datetime('2010-03-01'),pd.to_datetime('2010-06-01'), n).floor('d')
)
m = 5
test2 = pd.DataFrame(
'ID': np.random.choice(list('abc'), m),
'amount':np.random.randint(10, size=m),
'date':gen(pd.to_datetime('2010-01-01'), pd.to_datetime('2010-06-01'), m).floor('d')
)
print (test1)
ID date1 date2
0 c 2010-01-15 2010-05-22
1 b 2010-02-08 2010-04-16
2 c 2010-01-24 2010-04-12
3 c 2010-02-01 2010-04-09
4 a 2010-01-19 2010-05-20
5 c 2010-01-27 2010-05-24
6 c 2010-02-23 2010-03-15
7 b 2010-01-31 2010-05-09
8 c 2010-02-23 2010-03-29
9 b 2010-01-08 2010-03-07
print (test2)
ID amount date
0 a 4 2010-05-15
1 b 6 2010-03-26
2 a 1 2010-01-07
3 b 5 2010-02-07
4 a 6 2010-04-13
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
print (df)
ID date1 date2 amount date
6 b 2010-02-08 2010-04-16 6.0 2010-03-26
8 b 2010-01-31 2010-05-09 6.0 2010-03-26
9 b 2010-01-31 2010-05-09 5.0 2010-02-07
11 b 2010-01-08 2010-03-07 5.0 2010-02-07
12 a 2010-01-19 2010-05-20 4.0 2010-05-15
14 a 2010-01-19 2010-05-20 6.0 2010-04-13
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
print (test)
ID date1 date2 amount
0 c 2010-01-15 2010-05-22 NaN
1 b 2010-02-08 2010-04-16 6.0
2 c 2010-01-24 2010-04-12 NaN
3 c 2010-02-01 2010-04-09 NaN
4 a 2010-01-19 2010-05-20 10.0
5 c 2010-01-27 2010-05-24 NaN
6 c 2010-02-23 2010-03-15 NaN
7 b 2010-01-31 2010-05-09 11.0
8 c 2010-02-23 2010-03-29 NaN
9 b 2010-01-08 2010-03-07 5.0
What if anID
shows up more than once intest2
? Will the amount be summed when you outer join?
– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53274669%2fpandas-match-date-in-one-df-with-timeframe-in-another-then-groupby-sum%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use:
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
Sample:
#https://stackoverflow.com/q/21494489
np.random.seed(123)
#https://stackoverflow.com/a/50559321/2901002
def gen(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
n = 10
test1 = pd.DataFrame(
'ID':np.random.choice(list('abc'), n),
'date1': gen(pd.to_datetime('2010-01-01'),pd.to_datetime('2010-03-01'), n).floor('d'),
'date2':gen(pd.to_datetime('2010-03-01'),pd.to_datetime('2010-06-01'), n).floor('d')
)
m = 5
test2 = pd.DataFrame(
'ID': np.random.choice(list('abc'), m),
'amount':np.random.randint(10, size=m),
'date':gen(pd.to_datetime('2010-01-01'), pd.to_datetime('2010-06-01'), m).floor('d')
)
print (test1)
ID date1 date2
0 c 2010-01-15 2010-05-22
1 b 2010-02-08 2010-04-16
2 c 2010-01-24 2010-04-12
3 c 2010-02-01 2010-04-09
4 a 2010-01-19 2010-05-20
5 c 2010-01-27 2010-05-24
6 c 2010-02-23 2010-03-15
7 b 2010-01-31 2010-05-09
8 c 2010-02-23 2010-03-29
9 b 2010-01-08 2010-03-07
print (test2)
ID amount date
0 a 4 2010-05-15
1 b 6 2010-03-26
2 a 1 2010-01-07
3 b 5 2010-02-07
4 a 6 2010-04-13
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
print (df)
ID date1 date2 amount date
6 b 2010-02-08 2010-04-16 6.0 2010-03-26
8 b 2010-01-31 2010-05-09 6.0 2010-03-26
9 b 2010-01-31 2010-05-09 5.0 2010-02-07
11 b 2010-01-08 2010-03-07 5.0 2010-02-07
12 a 2010-01-19 2010-05-20 4.0 2010-05-15
14 a 2010-01-19 2010-05-20 6.0 2010-04-13
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
print (test)
ID date1 date2 amount
0 c 2010-01-15 2010-05-22 NaN
1 b 2010-02-08 2010-04-16 6.0
2 c 2010-01-24 2010-04-12 NaN
3 c 2010-02-01 2010-04-09 NaN
4 a 2010-01-19 2010-05-20 10.0
5 c 2010-01-27 2010-05-24 NaN
6 c 2010-02-23 2010-03-15 NaN
7 b 2010-01-31 2010-05-09 11.0
8 c 2010-02-23 2010-03-29 NaN
9 b 2010-01-08 2010-03-07 5.0
What if anID
shows up more than once intest2
? Will the amount be summed when you outer join?
– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
add a comment |
Use:
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
Sample:
#https://stackoverflow.com/q/21494489
np.random.seed(123)
#https://stackoverflow.com/a/50559321/2901002
def gen(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
n = 10
test1 = pd.DataFrame(
'ID':np.random.choice(list('abc'), n),
'date1': gen(pd.to_datetime('2010-01-01'),pd.to_datetime('2010-03-01'), n).floor('d'),
'date2':gen(pd.to_datetime('2010-03-01'),pd.to_datetime('2010-06-01'), n).floor('d')
)
m = 5
test2 = pd.DataFrame(
'ID': np.random.choice(list('abc'), m),
'amount':np.random.randint(10, size=m),
'date':gen(pd.to_datetime('2010-01-01'), pd.to_datetime('2010-06-01'), m).floor('d')
)
print (test1)
ID date1 date2
0 c 2010-01-15 2010-05-22
1 b 2010-02-08 2010-04-16
2 c 2010-01-24 2010-04-12
3 c 2010-02-01 2010-04-09
4 a 2010-01-19 2010-05-20
5 c 2010-01-27 2010-05-24
6 c 2010-02-23 2010-03-15
7 b 2010-01-31 2010-05-09
8 c 2010-02-23 2010-03-29
9 b 2010-01-08 2010-03-07
print (test2)
ID amount date
0 a 4 2010-05-15
1 b 6 2010-03-26
2 a 1 2010-01-07
3 b 5 2010-02-07
4 a 6 2010-04-13
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
print (df)
ID date1 date2 amount date
6 b 2010-02-08 2010-04-16 6.0 2010-03-26
8 b 2010-01-31 2010-05-09 6.0 2010-03-26
9 b 2010-01-31 2010-05-09 5.0 2010-02-07
11 b 2010-01-08 2010-03-07 5.0 2010-02-07
12 a 2010-01-19 2010-05-20 4.0 2010-05-15
14 a 2010-01-19 2010-05-20 6.0 2010-04-13
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
print (test)
ID date1 date2 amount
0 c 2010-01-15 2010-05-22 NaN
1 b 2010-02-08 2010-04-16 6.0
2 c 2010-01-24 2010-04-12 NaN
3 c 2010-02-01 2010-04-09 NaN
4 a 2010-01-19 2010-05-20 10.0
5 c 2010-01-27 2010-05-24 NaN
6 c 2010-02-23 2010-03-15 NaN
7 b 2010-01-31 2010-05-09 11.0
8 c 2010-02-23 2010-03-29 NaN
9 b 2010-01-08 2010-03-07 5.0
What if anID
shows up more than once intest2
? Will the amount be summed when you outer join?
– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
add a comment |
Use:
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
Sample:
#https://stackoverflow.com/q/21494489
np.random.seed(123)
#https://stackoverflow.com/a/50559321/2901002
def gen(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
n = 10
test1 = pd.DataFrame(
'ID':np.random.choice(list('abc'), n),
'date1': gen(pd.to_datetime('2010-01-01'),pd.to_datetime('2010-03-01'), n).floor('d'),
'date2':gen(pd.to_datetime('2010-03-01'),pd.to_datetime('2010-06-01'), n).floor('d')
)
m = 5
test2 = pd.DataFrame(
'ID': np.random.choice(list('abc'), m),
'amount':np.random.randint(10, size=m),
'date':gen(pd.to_datetime('2010-01-01'), pd.to_datetime('2010-06-01'), m).floor('d')
)
print (test1)
ID date1 date2
0 c 2010-01-15 2010-05-22
1 b 2010-02-08 2010-04-16
2 c 2010-01-24 2010-04-12
3 c 2010-02-01 2010-04-09
4 a 2010-01-19 2010-05-20
5 c 2010-01-27 2010-05-24
6 c 2010-02-23 2010-03-15
7 b 2010-01-31 2010-05-09
8 c 2010-02-23 2010-03-29
9 b 2010-01-08 2010-03-07
print (test2)
ID amount date
0 a 4 2010-05-15
1 b 6 2010-03-26
2 a 1 2010-01-07
3 b 5 2010-02-07
4 a 6 2010-04-13
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
print (df)
ID date1 date2 amount date
6 b 2010-02-08 2010-04-16 6.0 2010-03-26
8 b 2010-01-31 2010-05-09 6.0 2010-03-26
9 b 2010-01-31 2010-05-09 5.0 2010-02-07
11 b 2010-01-08 2010-03-07 5.0 2010-02-07
12 a 2010-01-19 2010-05-20 4.0 2010-05-15
14 a 2010-01-19 2010-05-20 6.0 2010-04-13
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
print (test)
ID date1 date2 amount
0 c 2010-01-15 2010-05-22 NaN
1 b 2010-02-08 2010-04-16 6.0
2 c 2010-01-24 2010-04-12 NaN
3 c 2010-02-01 2010-04-09 NaN
4 a 2010-01-19 2010-05-20 10.0
5 c 2010-01-27 2010-05-24 NaN
6 c 2010-02-23 2010-03-15 NaN
7 b 2010-01-31 2010-05-09 11.0
8 c 2010-02-23 2010-03-29 NaN
9 b 2010-01-08 2010-03-07 5.0
Use:
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
Sample:
#https://stackoverflow.com/q/21494489
np.random.seed(123)
#https://stackoverflow.com/a/50559321/2901002
def gen(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
n = 10
test1 = pd.DataFrame(
'ID':np.random.choice(list('abc'), n),
'date1': gen(pd.to_datetime('2010-01-01'),pd.to_datetime('2010-03-01'), n).floor('d'),
'date2':gen(pd.to_datetime('2010-03-01'),pd.to_datetime('2010-06-01'), n).floor('d')
)
m = 5
test2 = pd.DataFrame(
'ID': np.random.choice(list('abc'), m),
'amount':np.random.randint(10, size=m),
'date':gen(pd.to_datetime('2010-01-01'), pd.to_datetime('2010-06-01'), m).floor('d')
)
print (test1)
ID date1 date2
0 c 2010-01-15 2010-05-22
1 b 2010-02-08 2010-04-16
2 c 2010-01-24 2010-04-12
3 c 2010-02-01 2010-04-09
4 a 2010-01-19 2010-05-20
5 c 2010-01-27 2010-05-24
6 c 2010-02-23 2010-03-15
7 b 2010-01-31 2010-05-09
8 c 2010-02-23 2010-03-29
9 b 2010-01-08 2010-03-07
print (test2)
ID amount date
0 a 4 2010-05-15
1 b 6 2010-03-26
2 a 1 2010-01-07
3 b 5 2010-02-07
4 a 6 2010-04-13
#outer join both df by ID columns
df = test1.merge(test2, on='ID', how='outer')
#filter by range
df = df[(df.date > df.date1) & (df.date < df.date2)]
print (df)
ID date1 date2 amount date
6 b 2010-02-08 2010-04-16 6.0 2010-03-26
8 b 2010-01-31 2010-05-09 6.0 2010-03-26
9 b 2010-01-31 2010-05-09 5.0 2010-02-07
11 b 2010-01-08 2010-03-07 5.0 2010-02-07
12 a 2010-01-19 2010-05-20 4.0 2010-05-15
14 a 2010-01-19 2010-05-20 6.0 2010-04-13
#thank you @Abhi for alternative
#df = df[df.date.between(df.date1, df.date2, inclusive=False)]
#aggregate sum
s = df.groupby(['ID','date1','date2'])['amount'].sum()
#add new column to test1
test = test1.join(s, on=['ID','date1','date2'])
print (test)
ID date1 date2 amount
0 c 2010-01-15 2010-05-22 NaN
1 b 2010-02-08 2010-04-16 6.0
2 c 2010-01-24 2010-04-12 NaN
3 c 2010-02-01 2010-04-09 NaN
4 a 2010-01-19 2010-05-20 10.0
5 c 2010-01-27 2010-05-24 NaN
6 c 2010-02-23 2010-03-15 NaN
7 b 2010-01-31 2010-05-09 11.0
8 c 2010-02-23 2010-03-29 NaN
9 b 2010-01-08 2010-03-07 5.0
edited Nov 13 '18 at 8:46
answered Nov 13 '18 at 6:39
jezraeljezrael
331k24273351
331k24273351
What if anID
shows up more than once intest2
? Will the amount be summed when you outer join?
– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
add a comment |
What if anID
shows up more than once intest2
? Will the amount be summed when you outer join?
– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
What if an
ID
shows up more than once in test2
? Will the amount be summed when you outer join?– jchaykow
Nov 13 '18 at 7:17
What if an
ID
shows up more than once in test2
? Will the amount be summed when you outer join?– jchaykow
Nov 13 '18 at 7:17
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
@jchaykow - aded sample data for check it
– jezrael
Nov 13 '18 at 8:46
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53274669%2fpandas-match-date-in-one-df-with-timeframe-in-another-then-groupby-sum%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown