How does window tumbling works in ksql? As query returning same result with or without using window tumbling in ksql










0















I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


Providing results -



2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50


query without window tumbling -



select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


Result -



1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12


Both queries returning same results, so how does window tumbling make difference?










share|improve this question




























    0















    I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



    select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


    Providing results -



    2 | 2018-11-13 09:54:50
    3 | 2018-11-13 09:54:49
    3 | 2018-11-13 09:54:52
    3 | 2018-11-13 09:54:51
    3 | 2018-11-13 09:54:50


    query without window tumbling -



    select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


    Result -



    1 | 2018-11-13 09:55:08
    2 | 2018-11-13 09:55:09
    1 | 2018-11-13 09:55:10
    3 | 2018-11-13 09:55:09
    4 | 2018-11-13 09:55:12


    Both queries returning same results, so how does window tumbling make difference?










    share|improve this question


























      0












      0








      0








      I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



      select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


      Providing results -



      2 | 2018-11-13 09:54:50
      3 | 2018-11-13 09:54:49
      3 | 2018-11-13 09:54:52
      3 | 2018-11-13 09:54:51
      3 | 2018-11-13 09:54:50


      query without window tumbling -



      select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


      Result -



      1 | 2018-11-13 09:55:08
      2 | 2018-11-13 09:55:09
      1 | 2018-11-13 09:55:10
      3 | 2018-11-13 09:55:09
      4 | 2018-11-13 09:55:12


      Both queries returning same results, so how does window tumbling make difference?










      share|improve this question
















      I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



      select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


      Providing results -



      2 | 2018-11-13 09:54:50
      3 | 2018-11-13 09:54:49
      3 | 2018-11-13 09:54:52
      3 | 2018-11-13 09:54:51
      3 | 2018-11-13 09:54:50


      query without window tumbling -



      select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


      Result -



      1 | 2018-11-13 09:55:08
      2 | 2018-11-13 09:55:09
      1 | 2018-11-13 09:55:10
      3 | 2018-11-13 09:55:09
      4 | 2018-11-13 09:55:12


      Both queries returning same results, so how does window tumbling make difference?







      apache-kafka ksql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 23:44









      cricket_007

      81.7k1143111




      81.7k1143111










      asked Nov 13 '18 at 11:02









      RituRitu

      113




      113






















          1 Answer
          1






          active

          oldest

          votes


















          1














          The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



          The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



          So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



            The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



            So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






            share|improve this answer



























              1














              The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



              The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



              So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






              share|improve this answer

























                1












                1








                1







                The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



                The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



                So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






                share|improve this answer













                The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



                The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



                So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 13 '18 at 11:22









                Robin MoffattRobin Moffatt

                7,0751329




                7,0751329





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Darth Vader #20

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Ondo