Selecting rows of a Pandas dataframe based on a dict










2















I have a pandas dataframe that only have two columns like this



 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750


The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.



l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))


I have a dictionary with me like this



 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')


The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data



If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.



How to implement this in pandas in a straight forward way rather than using multiple loops.?



Expected output will look like this



126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0


Only significant rows are shown










share|improve this question
























  • Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

    – timgeb
    Nov 15 '18 at 6:25












  • 5 rows wont give the expected distribution of start and end times. That's why I went for 5000

    – Sreeram TP
    Nov 15 '18 at 6:26











  • How does this matter for the algorithm?

    – timgeb
    Nov 15 '18 at 6:26











  • Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

    – Sreeram TP
    Nov 15 '18 at 6:28






  • 1





    reduced the number of rows

    – Sreeram TP
    Nov 15 '18 at 6:31















2















I have a pandas dataframe that only have two columns like this



 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750


The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.



l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))


I have a dictionary with me like this



 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')


The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data



If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.



How to implement this in pandas in a straight forward way rather than using multiple loops.?



Expected output will look like this



126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0


Only significant rows are shown










share|improve this question
























  • Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

    – timgeb
    Nov 15 '18 at 6:25












  • 5 rows wont give the expected distribution of start and end times. That's why I went for 5000

    – Sreeram TP
    Nov 15 '18 at 6:26











  • How does this matter for the algorithm?

    – timgeb
    Nov 15 '18 at 6:26











  • Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

    – Sreeram TP
    Nov 15 '18 at 6:28






  • 1





    reduced the number of rows

    – Sreeram TP
    Nov 15 '18 at 6:31













2












2








2


0






I have a pandas dataframe that only have two columns like this



 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750


The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.



l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))


I have a dictionary with me like this



 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')


The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data



If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.



How to implement this in pandas in a straight forward way rather than using multiple loops.?



Expected output will look like this



126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0


Only significant rows are shown










share|improve this question
















I have a pandas dataframe that only have two columns like this



 Timestamp X
0 2017-01-01 00:00:00 18450
1 2017-01-01 00:10:00 13787
2 2017-01-01 00:20:00 3249
3 2017-01-01 00:30:00 44354
4 2017-01-01 00:40:00 50750


The Timestamp column is basically 10 minute separated ranging from start of the month to end. To create a sample the following code can be used.



l_data = pd.DataFrame()

l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 00:00:00'), end=pd.Timestamp('2017-01-20 00:00:00'), freq='10T')
l_data['X'] = random.sample(range(0, 100000), len(l_data))


I have a dictionary with me like this



 Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')


The key in the dictionary is starting time and value is the ending time. I want to create a column named L based on this dict in l_data



If the time between key and value in dict is greater than 5 minute I have to mark the Timestamp that falls in that range in l_data as 1.



How to implement this in pandas in a straight forward way rather than using multiple loops.?



Expected output will look like this



126 1/18/2017 21:00 43401 0
127 1/18/2017 21:10 290 0
128 1/18/2017 21:20 92509 0
129 1/18/2017 21:30 64545 0
130 1/18/2017 21:40 47780 1
131 1/18/2017 21:50 53293 1
132 1/18/2017 22:00 45634 0
133 1/18/2017 22:10 51462 0
134 1/18/2017 22:20 44736 0
135 1/18/2017 22:30 11697 1
136 1/18/2017 22:40 82587 1
137 1/18/2017 22:50 76250 0
138 1/18/2017 23:00 33307 0
139 1/18/2017 23:10 25851 0
140 1/18/2017 23:20 71131 0
141 1/18/2017 23:30 88015 0
142 1/18/2017 23:40 45577 0
143 1/18/2017 23:50 76761 0
144 1/19/2017 0:00 45363 0


Only significant rows are shown







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 6:49







Sreeram TP

















asked Nov 15 '18 at 6:22









Sreeram TPSreeram TP

3,32431441




3,32431441












  • Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

    – timgeb
    Nov 15 '18 at 6:25












  • 5 rows wont give the expected distribution of start and end times. That's why I went for 5000

    – Sreeram TP
    Nov 15 '18 at 6:26











  • How does this matter for the algorithm?

    – timgeb
    Nov 15 '18 at 6:26











  • Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

    – Sreeram TP
    Nov 15 '18 at 6:28






  • 1





    reduced the number of rows

    – Sreeram TP
    Nov 15 '18 at 6:31

















  • Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

    – timgeb
    Nov 15 '18 at 6:25












  • 5 rows wont give the expected distribution of start and end times. That's why I went for 5000

    – Sreeram TP
    Nov 15 '18 at 6:26











  • How does this matter for the algorithm?

    – timgeb
    Nov 15 '18 at 6:26











  • Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

    – Sreeram TP
    Nov 15 '18 at 6:28






  • 1





    reduced the number of rows

    – Sreeram TP
    Nov 15 '18 at 6:31
















Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25






Can you please make an example with 5 rows instead of 5000? Then you could post the expected output and I could verify my answer before posting it.

– timgeb
Nov 15 '18 at 6:25














5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26





5 rows wont give the expected distribution of start and end times. That's why I went for 5000

– Sreeram TP
Nov 15 '18 at 6:26













How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26





How does this matter for the algorithm?

– timgeb
Nov 15 '18 at 6:26













Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28





Sometimes start and end will lie within 10 mints range some times it will be between two timestamps.

– Sreeram TP
Nov 15 '18 at 6:28




1




1





reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31





reduced the number of rows

– Sreeram TP
Nov 15 '18 at 6:31












1 Answer
1






active

oldest

votes


















1














I believe you need:



d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'),
end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]



#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
.melt('index')
.set_index('value')
.groupby('index')
.resample('10T')['index']
.ffill()
.dropna()
.index
.get_level_values(1)
.unique()
)
#print (v)



l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0





share|improve this answer




















  • 1





    Give me some time to verify the result

    – Sreeram TP
    Nov 15 '18 at 7:09











  • @SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

    – jezrael
    Nov 15 '18 at 7:10











  • Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

    – Sreeram TP
    Nov 15 '18 at 7:11






  • 1





    oops, so not understand it. give me some time.

    – jezrael
    Nov 15 '18 at 7:12











  • The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

    – Sreeram TP
    Nov 15 '18 at 7:16












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313565%2fselecting-rows-of-a-pandas-dataframe-based-on-a-dict%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I believe you need:



d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'),
end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]



#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
.melt('index')
.set_index('value')
.groupby('index')
.resample('10T')['index']
.ffill()
.dropna()
.index
.get_level_values(1)
.unique()
)
#print (v)



l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0





share|improve this answer




















  • 1





    Give me some time to verify the result

    – Sreeram TP
    Nov 15 '18 at 7:09











  • @SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

    – jezrael
    Nov 15 '18 at 7:10











  • Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

    – Sreeram TP
    Nov 15 '18 at 7:11






  • 1





    oops, so not understand it. give me some time.

    – jezrael
    Nov 15 '18 at 7:12











  • The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

    – Sreeram TP
    Nov 15 '18 at 7:16
















1














I believe you need:



d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'),
end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]



#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
.melt('index')
.set_index('value')
.groupby('index')
.resample('10T')['index']
.ffill()
.dropna()
.index
.get_level_values(1)
.unique()
)
#print (v)



l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0





share|improve this answer




















  • 1





    Give me some time to verify the result

    – Sreeram TP
    Nov 15 '18 at 7:09











  • @SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

    – jezrael
    Nov 15 '18 at 7:10











  • Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

    – Sreeram TP
    Nov 15 '18 at 7:11






  • 1





    oops, so not understand it. give me some time.

    – jezrael
    Nov 15 '18 at 7:12











  • The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

    – Sreeram TP
    Nov 15 '18 at 7:16














1












1








1







I believe you need:



d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'),
end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]



#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
.melt('index')
.set_index('value')
.groupby('index')
.resample('10T')['index']
.ffill()
.dropna()
.index
.get_level_values(1)
.unique()
)
#print (v)



l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0





share|improve this answer















I believe you need:



d = pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'),
end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]



#print (l_data)

df = pd.DataFrame('start': list(d.keys()),'end': list(d.values()))
#fikter by 5 minutes
df = df[(df['end'] - df['start']) > pd.Timedelta(5*60, 's')]

#correct 1 minutes end time
s = df['end'].dt.floor('10T')
df['end1'] = s.where((df['end'] - s) < pd.Timedelta(60, 's'), s + pd.Timedelta(10*60, 's'))
print (df)
start end end1
0 2017-01-18 21:45:02.120 2017-01-18 21:50:29.040 2017-01-18 21:50:00
1 2017-01-18 21:51:02.120 2017-01-18 22:52:00.040 2017-01-18 23:00:00
2 2017-01-18 22:52:02.120 2017-01-18 22:57:59.760 2017-01-18 23:00:00
4 2017-01-18 23:44:52.800 2017-01-18 23:50:30.040 2017-01-18 23:50:00
6 2017-01-19 01:40:32.800 2017-01-19 01:55:30.040 2017-01-19 02:00:00
7 2017-01-19 01:57:32.800 2017-01-19 02:04:30.040 2017-01-19 02:10:00

#for each group resample by 10min and add missimg datetimes
v = (df.reset_index()[['start','end1','index']]
.melt('index')
.set_index('value')
.groupby('index')
.resample('10T')['index']
.ffill()
.dropna()
.index
.get_level_values(1)
.unique()
)
#print (v)



l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
Timestamp expected L
0 2017-01-18 20:00:00 0 0
1 2017-01-18 20:10:00 0 0
2 2017-01-18 20:20:00 0 0
3 2017-01-18 20:30:00 0 0
4 2017-01-18 20:40:00 0 0
5 2017-01-18 20:50:00 0 0
6 2017-01-18 21:00:00 0 0
7 2017-01-18 21:10:00 0 0
8 2017-01-18 21:20:00 0 0
9 2017-01-18 21:30:00 0 0
10 2017-01-18 21:40:00 0 0
11 2017-01-18 21:50:00 1 1
12 2017-01-18 22:00:00 1 1
13 2017-01-18 22:10:00 1 1
14 2017-01-18 22:20:00 1 1
15 2017-01-18 22:30:00 1 1
16 2017-01-18 22:40:00 1 1
17 2017-01-18 22:50:00 1 1
18 2017-01-18 23:00:00 1 1
19 2017-01-18 23:10:00 0 0






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 14:06

























answered Nov 15 '18 at 7:00









jezraeljezrael

354k26318394




354k26318394







  • 1





    Give me some time to verify the result

    – Sreeram TP
    Nov 15 '18 at 7:09











  • @SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

    – jezrael
    Nov 15 '18 at 7:10











  • Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

    – Sreeram TP
    Nov 15 '18 at 7:11






  • 1





    oops, so not understand it. give me some time.

    – jezrael
    Nov 15 '18 at 7:12











  • The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

    – Sreeram TP
    Nov 15 '18 at 7:16













  • 1





    Give me some time to verify the result

    – Sreeram TP
    Nov 15 '18 at 7:09











  • @SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

    – jezrael
    Nov 15 '18 at 7:10











  • Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

    – Sreeram TP
    Nov 15 '18 at 7:11






  • 1





    oops, so not understand it. give me some time.

    – jezrael
    Nov 15 '18 at 7:12











  • The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

    – Sreeram TP
    Nov 15 '18 at 7:16








1




1





Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09





Give me some time to verify the result

– Sreeram TP
Nov 15 '18 at 7:09













@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10





@SreeramTP - sure, not sure about condition, noew i have less like 5 minutes, maybe need more - then change < to > in conditions.

– jezrael
Nov 15 '18 at 7:10













Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11





Since Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000') is more than 5 minutes i want the timestamp Timestamp('2017-01-18 21:40:00.040000') marked as 1

– Sreeram TP
Nov 15 '18 at 7:11




1




1





oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12





oops, so not understand it. give me some time.

– jezrael
Nov 15 '18 at 7:12













The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16






The conditions are if the value - key is more than 5 mints mark the time in l_data that corresponds to value. There may be samples like this Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:43:00.040000'). The difference is more than 5 mints but value is past Timestamp('2017-01-18 21:40:00.040000') then mark Timestamp('2017-01-18 21:50:00.040000')

– Sreeram TP
Nov 15 '18 at 7:16




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313565%2fselecting-rows-of-a-pandas-dataframe-based-on-a-dict%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo