Selecting rows of a Pandas dataframe based on a dict

I have a pandas dataframe that only have two columns like this

 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750

The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.

l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))

I have a dictionary with me like this

 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
 Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
 Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
 Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
 Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
 Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
 Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
 Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')

The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data

If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.

How to implement this in pandas in a straight forward way rather than using multiple loops.?

Expected output will look like this

126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0

Only significant rows are shown

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25

5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26

How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26

Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28

1

reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31

|
show 3 more comments

I have a pandas dataframe that only have two columns like this

 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750

The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.

l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))

I have a dictionary with me like this

 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
 Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
 Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
 Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
 Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
 Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
 Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
 Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')

The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data

If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.

How to implement this in pandas in a straight forward way rather than using multiple loops.?

Expected output will look like this

126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0

Only significant rows are shown

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25

5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26

How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26

Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28

1

reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31

|
show 3 more comments

I have a pandas dataframe that only have two columns like this

 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750

The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.

l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))

I have a dictionary with me like this

 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
 Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
 Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
 Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
 Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
 Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
 Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
 Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')

The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data

If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.

How to implement this in pandas in a straight forward way rather than using multiple loops.?

Expected output will look like this

126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0

Only significant rows are shown

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

I have a pandas dataframe that only have two columns like this

 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750

The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.

l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))

I have a dictionary with me like this

 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
 Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
 Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
 Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
 Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
 Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
 Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
 Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')

The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data

If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.

How to implement this in pandas in a straight forward way rather than using multiple loops.?

Expected output will look like this

126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0

Only significant rows are shown

python pandas

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

edited Nov 15 '18 at 6:49

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

asked Nov 15 '18 at 6:22

Sreeram TP

3,32431441

Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25

5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26

How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26

Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28

1

reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31

|
show 3 more comments

Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25

5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26

How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26

Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28

1

reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31

Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25

5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26

How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26

Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28

reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31

|
show 3 more comments

1 Answer
1

active

oldest

votes

I believe you need:

d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
 end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
 start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
 .melt('index')
 .set_index('value')
 .groupby('index')
 .resample('10T')['index']
 .ffill()
 .dropna()
 .index
 .get_level_values(1)
 .unique()
 )
#print (v)

l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
 Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

1

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

1

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

|
show 15 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313565%2fselecting-rows-of-a-pandas-dataframe-based-on-a-dict%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I believe you need:

d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
 end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
 start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
 .melt('index')
 .set_index('value')
 .groupby('index')
 .resample('10T')['index']
 .ffill()
 .dropna()
 .index
 .get_level_values(1)
 .unique()
 )
#print (v)

l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
 Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

1

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

1

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

|
show 15 more comments

I believe you need:

d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
 end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
 start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
 .melt('index')
 .set_index('value')
 .groupby('index')
 .resample('10T')['index']
 .ffill()
 .dropna()
 .index
 .get_level_values(1)
 .unique()
 )
#print (v)

l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
 Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

1

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

1

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

|
show 15 more comments

I believe you need:

d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
 end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
 start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
 .melt('index')
 .set_index('value')
 .groupby('index')
 .resample('10T')['index']
 .ffill()
 .dropna()
 .index
 .get_level_values(1)
 .unique()
 )
#print (v)

l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
 Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

I believe you need:

d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
 end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
 start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
 .melt('index')
 .set_index('value')
 .groupby('index')
 .resample('10T')['index']
 .ffill()
 .dropna()
 .index
 .get_level_values(1)
 .unique()
 )
#print (v)

l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
 Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

edited Nov 15 '18 at 14:06

answered Nov 15 '18 at 7:00

jezrael

354k26318394

answered Nov 15 '18 at 7:00

jezrael

354k26318394

answered Nov 15 '18 at 7:00

jezrael

354k26318394

1

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

1

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

|
show 15 more comments

1

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

1

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09

@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10

Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11

oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12

The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16

|
show 15 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb