How does window tumbling works in ksql? As query returning same result with or without using window tumbling in ksql
I am using ksql stream and calculating events coming every 5 minutes. Here is my query -
select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;
Providing results -
2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50
query without window tumbling -
select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;
Result -
1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12
Both queries returning same results, so how does window tumbling make difference?
apache-kafka ksql
add a comment |
I am using ksql stream and calculating events coming every 5 minutes. Here is my query -
select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;
Providing results -
2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50
query without window tumbling -
select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;
Result -
1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12
Both queries returning same results, so how does window tumbling make difference?
apache-kafka ksql
add a comment |
I am using ksql stream and calculating events coming every 5 minutes. Here is my query -
select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;
Providing results -
2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50
query without window tumbling -
select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;
Result -
1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12
Both queries returning same results, so how does window tumbling make difference?
apache-kafka ksql
I am using ksql stream and calculating events coming every 5 minutes. Here is my query -
select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;
Providing results -
2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50
query without window tumbling -
select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;
Result -
1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12
Both queries returning same results, so how does window tumbling make difference?
apache-kafka ksql
apache-kafka ksql
edited Nov 14 '18 at 23:44
cricket_007
81.7k1143111
81.7k1143111
asked Nov 13 '18 at 11:02
RituRitu
113
113
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column')
. So you could pass created_on_date
as the timestamp column and then aggregate by the values there.
The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date
).
So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column')
. So you could pass created_on_date
as the timestamp column and then aggregate by the values there.
The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date
).
So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.
add a comment |
The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column')
. So you could pass created_on_date
as the timestamp column and then aggregate by the values there.
The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date
).
So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.
add a comment |
The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column')
. So you could pass created_on_date
as the timestamp column and then aggregate by the values there.
The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date
).
So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.
The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column')
. So you could pass created_on_date
as the timestamp column and then aggregate by the values there.
The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date
).
So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.
answered Nov 13 '18 at 11:22
Robin MoffattRobin Moffatt
7,0751329
7,0751329
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown