r calculating rolling average with window based on value (not number of rows or date/time variable)










3















I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.



I have the following data as an example:



ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
library(dplyr)
df <- as.data.frame(data_frame(ms, correct))


"ms" are time points in milliseconds and "correct" is whether a specific action is performed correctly (1= correct, 0=not correct).



My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms?



Many thanks!










share|improve this question




























    3















    I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.



    I have the following data as an example:



    ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
    314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
    328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
    337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
    correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
    1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
    1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
    library(dplyr)
    df <- as.data.frame(data_frame(ms, correct))


    "ms" are time points in milliseconds and "correct" is whether a specific action is performed correctly (1= correct, 0=not correct).



    My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms?



    Many thanks!










    share|improve this question


























      3












      3








      3








      I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.



      I have the following data as an example:



      ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
      314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
      328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
      337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
      correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
      1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
      1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
      library(dplyr)
      df <- as.data.frame(data_frame(ms, correct))


      "ms" are time points in milliseconds and "correct" is whether a specific action is performed correctly (1= correct, 0=not correct).



      My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms?



      Many thanks!










      share|improve this question
















      I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.



      I have the following data as an example:



      ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
      314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
      328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
      337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
      correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
      1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
      1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
      library(dplyr)
      df <- as.data.frame(data_frame(ms, correct))


      "ms" are time points in milliseconds and "correct" is whether a specific action is performed correctly (1= correct, 0=not correct).



      My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms?



      Many thanks!







      r filter smoothing rolling-computation rolling-average






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 8 at 7:25









      zx8754

      30k763100




      30k763100










      asked Nov 13 '18 at 20:51









      RmyjuloRRmyjuloR

      6517




      6517






















          3 Answers
          3






          active

          oldest

          votes


















          2














          Try out:



          library(dplyr)

          # count the number of values per ms
          df <- df %>%
          group_by(ms) %>%
          mutate(Nb.values = n())

          # consider a window of 1 ms and compute the percentage for each window
          df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
          df, sum),
          c("ms", "Count.correct"))

          # complete data frame (including unused levels)
          df2 <- tidyr::complete(df2, ms)
          df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
          df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

          # compute a rolling mean of the percentage of correct, with a width of 5
          df2 %>%
          mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
          Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
          partial = TRUE, fill = NA, align = "left") /
          zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
          fill = NA, align = "left")) # add rolling mean

          # A tibble: 43 x 5
          ms Count.correct Nb.values Window Rolling.correct
          <dbl> <dbl> <int> <chr> <dbl>
          1 300 2 3 300-304 0.40
          2 301 0 1 301-305 0.00
          3 302 NA NA 302-306 0.25
          4 303 0 1 303-307 0.25
          5 304 NA NA 304-308 0.25
          6 305 0 2 305-309 0.25
          7 306 1 1 306-310 0.25
          8 307 NA NA 307-311 0.00
          9 308 0 1 308-312 0.20
          10 309 NA NA 309-313 0.25
          # ... with 33 more rows





          share|improve this answer

























          • This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

            – RmyjuloR
            Nov 13 '18 at 22:21












          • Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

            – ANG
            Nov 13 '18 at 23:43











          • This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

            – RmyjuloR
            Nov 14 '18 at 1:08












          • Ah ok, this means that we have to also consider the number of values in each window. See my edit

            – ANG
            Nov 14 '18 at 10:49











          • Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

            – RmyjuloR
            Nov 14 '18 at 13:42


















          0














          You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do:



          df$ms_factor <- cut(df$ms, 5)

          df_new <- df %>% group_by(ms_factor) %>% summarise(mean = mean(correct))





          share|improve this answer























          • I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

            – RmyjuloR
            Nov 13 '18 at 21:53











          • In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

            – pooja p
            Nov 13 '18 at 22:26












          • This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

            – RmyjuloR
            Nov 14 '18 at 1:34


















          0














          This could be done with base R:



          calculate_irregular_ratio <- function(df, time_var = "ms", window_var = 5, calc_var = "correct") 

          sapply(df[[time_var]], function(x) round(mean(df[[calc_var]][df[[time_var]] >= (x - window_var) & df[[time_var]] <= x]), 2))




          You can apply it as follows (the default is set to 5 ms, you can change it with changing the window_var parameter):



          df$window_5_ratio <- calculate_irregular_ratio(df, window_var = 5)


          In your case, you would get (first 10 rows shown only):



           ms correct window_5_ratio
          1 300 1 0.67
          2 300 1 0.67
          3 300 0 0.67
          4 301 0 0.50
          5 303 0 0.40
          6 305 0 0.29
          7 305 0 0.29
          8 306 1 0.20
          9 308 0 0.20
          10 310 0 0.17


          It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.



          For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.



          You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after x - window_var, etc.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289315%2fr-calculating-rolling-average-with-window-based-on-value-not-number-of-rows-or%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            Try out:



            library(dplyr)

            # count the number of values per ms
            df <- df %>%
            group_by(ms) %>%
            mutate(Nb.values = n())

            # consider a window of 1 ms and compute the percentage for each window
            df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
            df, sum),
            c("ms", "Count.correct"))

            # complete data frame (including unused levels)
            df2 <- tidyr::complete(df2, ms)
            df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
            df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

            # compute a rolling mean of the percentage of correct, with a width of 5
            df2 %>%
            mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
            Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
            partial = TRUE, fill = NA, align = "left") /
            zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
            fill = NA, align = "left")) # add rolling mean

            # A tibble: 43 x 5
            ms Count.correct Nb.values Window Rolling.correct
            <dbl> <dbl> <int> <chr> <dbl>
            1 300 2 3 300-304 0.40
            2 301 0 1 301-305 0.00
            3 302 NA NA 302-306 0.25
            4 303 0 1 303-307 0.25
            5 304 NA NA 304-308 0.25
            6 305 0 2 305-309 0.25
            7 306 1 1 306-310 0.25
            8 307 NA NA 307-311 0.00
            9 308 0 1 308-312 0.20
            10 309 NA NA 309-313 0.25
            # ... with 33 more rows





            share|improve this answer

























            • This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

              – RmyjuloR
              Nov 13 '18 at 22:21












            • Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

              – ANG
              Nov 13 '18 at 23:43











            • This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

              – RmyjuloR
              Nov 14 '18 at 1:08












            • Ah ok, this means that we have to also consider the number of values in each window. See my edit

              – ANG
              Nov 14 '18 at 10:49











            • Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

              – RmyjuloR
              Nov 14 '18 at 13:42















            2














            Try out:



            library(dplyr)

            # count the number of values per ms
            df <- df %>%
            group_by(ms) %>%
            mutate(Nb.values = n())

            # consider a window of 1 ms and compute the percentage for each window
            df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
            df, sum),
            c("ms", "Count.correct"))

            # complete data frame (including unused levels)
            df2 <- tidyr::complete(df2, ms)
            df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
            df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

            # compute a rolling mean of the percentage of correct, with a width of 5
            df2 %>%
            mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
            Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
            partial = TRUE, fill = NA, align = "left") /
            zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
            fill = NA, align = "left")) # add rolling mean

            # A tibble: 43 x 5
            ms Count.correct Nb.values Window Rolling.correct
            <dbl> <dbl> <int> <chr> <dbl>
            1 300 2 3 300-304 0.40
            2 301 0 1 301-305 0.00
            3 302 NA NA 302-306 0.25
            4 303 0 1 303-307 0.25
            5 304 NA NA 304-308 0.25
            6 305 0 2 305-309 0.25
            7 306 1 1 306-310 0.25
            8 307 NA NA 307-311 0.00
            9 308 0 1 308-312 0.20
            10 309 NA NA 309-313 0.25
            # ... with 33 more rows





            share|improve this answer

























            • This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

              – RmyjuloR
              Nov 13 '18 at 22:21












            • Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

              – ANG
              Nov 13 '18 at 23:43











            • This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

              – RmyjuloR
              Nov 14 '18 at 1:08












            • Ah ok, this means that we have to also consider the number of values in each window. See my edit

              – ANG
              Nov 14 '18 at 10:49











            • Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

              – RmyjuloR
              Nov 14 '18 at 13:42













            2












            2








            2







            Try out:



            library(dplyr)

            # count the number of values per ms
            df <- df %>%
            group_by(ms) %>%
            mutate(Nb.values = n())

            # consider a window of 1 ms and compute the percentage for each window
            df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
            df, sum),
            c("ms", "Count.correct"))

            # complete data frame (including unused levels)
            df2 <- tidyr::complete(df2, ms)
            df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
            df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

            # compute a rolling mean of the percentage of correct, with a width of 5
            df2 %>%
            mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
            Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
            partial = TRUE, fill = NA, align = "left") /
            zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
            fill = NA, align = "left")) # add rolling mean

            # A tibble: 43 x 5
            ms Count.correct Nb.values Window Rolling.correct
            <dbl> <dbl> <int> <chr> <dbl>
            1 300 2 3 300-304 0.40
            2 301 0 1 301-305 0.00
            3 302 NA NA 302-306 0.25
            4 303 0 1 303-307 0.25
            5 304 NA NA 304-308 0.25
            6 305 0 2 305-309 0.25
            7 306 1 1 306-310 0.25
            8 307 NA NA 307-311 0.00
            9 308 0 1 308-312 0.20
            10 309 NA NA 309-313 0.25
            # ... with 33 more rows





            share|improve this answer















            Try out:



            library(dplyr)

            # count the number of values per ms
            df <- df %>%
            group_by(ms) %>%
            mutate(Nb.values = n())

            # consider a window of 1 ms and compute the percentage for each window
            df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
            df, sum),
            c("ms", "Count.correct"))

            # complete data frame (including unused levels)
            df2 <- tidyr::complete(df2, ms)
            df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
            df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

            # compute a rolling mean of the percentage of correct, with a width of 5
            df2 %>%
            mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
            Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
            partial = TRUE, fill = NA, align = "left") /
            zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
            fill = NA, align = "left")) # add rolling mean

            # A tibble: 43 x 5
            ms Count.correct Nb.values Window Rolling.correct
            <dbl> <dbl> <int> <chr> <dbl>
            1 300 2 3 300-304 0.40
            2 301 0 1 301-305 0.00
            3 302 NA NA 302-306 0.25
            4 303 0 1 303-307 0.25
            5 304 NA NA 304-308 0.25
            6 305 0 2 305-309 0.25
            7 306 1 1 306-310 0.25
            8 307 NA NA 307-311 0.00
            9 308 0 1 308-312 0.20
            10 309 NA NA 309-313 0.25
            # ... with 33 more rows






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 14 '18 at 12:00

























            answered Nov 13 '18 at 21:54









            ANGANG

            4,5012820




            4,5012820












            • This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

              – RmyjuloR
              Nov 13 '18 at 22:21












            • Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

              – ANG
              Nov 13 '18 at 23:43











            • This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

              – RmyjuloR
              Nov 14 '18 at 1:08












            • Ah ok, this means that we have to also consider the number of values in each window. See my edit

              – ANG
              Nov 14 '18 at 10:49











            • Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

              – RmyjuloR
              Nov 14 '18 at 13:42

















            • This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

              – RmyjuloR
              Nov 13 '18 at 22:21












            • Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

              – ANG
              Nov 13 '18 at 23:43











            • This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

              – RmyjuloR
              Nov 14 '18 at 1:08












            • Ah ok, this means that we have to also consider the number of values in each window. See my edit

              – ANG
              Nov 14 '18 at 10:49











            • Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

              – RmyjuloR
              Nov 14 '18 at 13:42
















            This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

            – RmyjuloR
            Nov 13 '18 at 22:21






            This looks neat! Is this also possible with a sliding window? So, windows that go 300-304, 301-305, 302-306 etc?

            – RmyjuloR
            Nov 13 '18 at 22:21














            Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

            – ANG
            Nov 13 '18 at 23:43





            Humm, in this case it should relevant to start with one value per ms and then use the window when computing the average. I edited my answer

            – ANG
            Nov 13 '18 at 23:43













            This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

            – RmyjuloR
            Nov 14 '18 at 1:08






            This is coming close to what I want, but if you look at the original data the values for the first 3 windows should be: 300-304 --> 2/5values = 0.4; 301-305 --> 0/4values = 0; 302-306 --> 1/4values = 0.25

            – RmyjuloR
            Nov 14 '18 at 1:08














            Ah ok, this means that we have to also consider the number of values in each window. See my edit

            – ANG
            Nov 14 '18 at 10:49





            Ah ok, this means that we have to also consider the number of values in each window. See my edit

            – ANG
            Nov 14 '18 at 10:49













            Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

            – RmyjuloR
            Nov 14 '18 at 13:42





            Thank you for doing the math! I guess my brain was a bit cooked after exploring all the different packages :$

            – RmyjuloR
            Nov 14 '18 at 13:42













            0














            You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do:



            df$ms_factor <- cut(df$ms, 5)

            df_new <- df %>% group_by(ms_factor) %>% summarise(mean = mean(correct))





            share|improve this answer























            • I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

              – RmyjuloR
              Nov 13 '18 at 21:53











            • In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

              – pooja p
              Nov 13 '18 at 22:26












            • This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

              – RmyjuloR
              Nov 14 '18 at 1:34















            0














            You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do:



            df$ms_factor <- cut(df$ms, 5)

            df_new <- df %>% group_by(ms_factor) %>% summarise(mean = mean(correct))





            share|improve this answer























            • I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

              – RmyjuloR
              Nov 13 '18 at 21:53











            • In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

              – pooja p
              Nov 13 '18 at 22:26












            • This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

              – RmyjuloR
              Nov 14 '18 at 1:34













            0












            0








            0







            You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do:



            df$ms_factor <- cut(df$ms, 5)

            df_new <- df %>% group_by(ms_factor) %>% summarise(mean = mean(correct))





            share|improve this answer













            You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do:



            df$ms_factor <- cut(df$ms, 5)

            df_new <- df %>% group_by(ms_factor) %>% summarise(mean = mean(correct))






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 13 '18 at 21:03









            pooja ppooja p

            1297




            1297












            • I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

              – RmyjuloR
              Nov 13 '18 at 21:53











            • In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

              – pooja p
              Nov 13 '18 at 22:26












            • This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

              – RmyjuloR
              Nov 14 '18 at 1:34

















            • I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

              – RmyjuloR
              Nov 13 '18 at 21:53











            • In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

              – pooja p
              Nov 13 '18 at 22:26












            • This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

              – RmyjuloR
              Nov 14 '18 at 1:34
















            I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

            – RmyjuloR
            Nov 13 '18 at 21:53





            I'd actually like a rolling average for a predefined window. For example a window of 5 ms: an average for the window 300-304, 301-305, 302-306, etc. Runnig till the max value of ms.

            – RmyjuloR
            Nov 13 '18 at 21:53













            In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

            – pooja p
            Nov 13 '18 at 22:26






            In that case you can try something like this: df$ms_factor <- cut(df$ms, seq(300, 345, by = 5))

            – pooja p
            Nov 13 '18 at 22:26














            This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

            – RmyjuloR
            Nov 14 '18 at 1:34





            This gives me the windows 300-305, 305-310, 310-315, etc., right? Could the code be altered to be calculating for a sliding window, so: 300-304, 301-305, 302-306, etc?

            – RmyjuloR
            Nov 14 '18 at 1:34











            0














            This could be done with base R:



            calculate_irregular_ratio <- function(df, time_var = "ms", window_var = 5, calc_var = "correct") 

            sapply(df[[time_var]], function(x) round(mean(df[[calc_var]][df[[time_var]] >= (x - window_var) & df[[time_var]] <= x]), 2))




            You can apply it as follows (the default is set to 5 ms, you can change it with changing the window_var parameter):



            df$window_5_ratio <- calculate_irregular_ratio(df, window_var = 5)


            In your case, you would get (first 10 rows shown only):



             ms correct window_5_ratio
            1 300 1 0.67
            2 300 1 0.67
            3 300 0 0.67
            4 301 0 0.50
            5 303 0 0.40
            6 305 0 0.29
            7 305 0 0.29
            8 306 1 0.20
            9 308 0 0.20
            10 310 0 0.17


            It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.



            For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.



            You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after x - window_var, etc.






            share|improve this answer





























              0














              This could be done with base R:



              calculate_irregular_ratio <- function(df, time_var = "ms", window_var = 5, calc_var = "correct") 

              sapply(df[[time_var]], function(x) round(mean(df[[calc_var]][df[[time_var]] >= (x - window_var) & df[[time_var]] <= x]), 2))




              You can apply it as follows (the default is set to 5 ms, you can change it with changing the window_var parameter):



              df$window_5_ratio <- calculate_irregular_ratio(df, window_var = 5)


              In your case, you would get (first 10 rows shown only):



               ms correct window_5_ratio
              1 300 1 0.67
              2 300 1 0.67
              3 300 0 0.67
              4 301 0 0.50
              5 303 0 0.40
              6 305 0 0.29
              7 305 0 0.29
              8 306 1 0.20
              9 308 0 0.20
              10 310 0 0.17


              It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.



              For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.



              You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after x - window_var, etc.






              share|improve this answer



























                0












                0








                0







                This could be done with base R:



                calculate_irregular_ratio <- function(df, time_var = "ms", window_var = 5, calc_var = "correct") 

                sapply(df[[time_var]], function(x) round(mean(df[[calc_var]][df[[time_var]] >= (x - window_var) & df[[time_var]] <= x]), 2))




                You can apply it as follows (the default is set to 5 ms, you can change it with changing the window_var parameter):



                df$window_5_ratio <- calculate_irregular_ratio(df, window_var = 5)


                In your case, you would get (first 10 rows shown only):



                 ms correct window_5_ratio
                1 300 1 0.67
                2 300 1 0.67
                3 300 0 0.67
                4 301 0 0.50
                5 303 0 0.40
                6 305 0 0.29
                7 305 0 0.29
                8 306 1 0.20
                9 308 0 0.20
                10 310 0 0.17


                It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.



                For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.



                You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after x - window_var, etc.






                share|improve this answer















                This could be done with base R:



                calculate_irregular_ratio <- function(df, time_var = "ms", window_var = 5, calc_var = "correct") 

                sapply(df[[time_var]], function(x) round(mean(df[[calc_var]][df[[time_var]] >= (x - window_var) & df[[time_var]] <= x]), 2))




                You can apply it as follows (the default is set to 5 ms, you can change it with changing the window_var parameter):



                df$window_5_ratio <- calculate_irregular_ratio(df, window_var = 5)


                In your case, you would get (first 10 rows shown only):



                 ms correct window_5_ratio
                1 300 1 0.67
                2 300 1 0.67
                3 300 0 0.67
                4 301 0 0.50
                5 303 0 0.40
                6 305 0 0.29
                7 305 0 0.29
                8 306 1 0.20
                9 308 0 0.20
                10 310 0 0.17


                It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.



                For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.



                You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after x - window_var, etc.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 13 '18 at 22:27

























                answered Nov 13 '18 at 22:20









                arg0nautarg0naut

                4,0191315




                4,0191315



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289315%2fr-calculating-rolling-average-with-window-based-on-value-not-number-of-rows-or%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Use pre created SQLite database for Android project in kotlin

                    Darth Vader #20

                    Ondo