Get difference with closest previous row in a group which meets criterion










0















I'm trying, for each row, to calculate the difference with the closest previous row belonging to the same group which meets a certain criterion.



Suppose I have the following dataframe:



s <- read.table(text = "Visit_num Patient Day Admitted
1 1 2015/01/01 Yes
2 1 2015/01/10 No
3 1 2015/01/15 Yes
4 1 2015/02/10 No
5 1 2015/03/08 Yes
6 2 2015/01/01 Yes
7 2 2015/04/01 No
8 2 2015/04/10 No
9 3 2015/04/01 No
10 3 2015/04/10 No", header = T, sep = "")


For each Visit_num and for each Patient, I'd like to get the difference with the closest row for which the patient was admitted (i.e. Yes). Note column day is ordered by day, and time unit for this example is days.
Here is what I wanted my dataframe to look like:



Visit_num Patient Day Admitted Diff_days
1 1 2015/01/01 Yes NA
2 1 2015/01/10 No 9
3 1 2015/01/15 Yes 14
4 1 2015/02/10 No 26
5 1 2015/03/08 Yes 52
6 2 2015/01/01 Yes NA
7 2 2015/04/01 No 90
8 2 2015/04/10 No 99
9 3 2015/04/01 No NA
10 3 2015/04/10 No NA


Any help is appreciated.










share|improve this question




























    0















    I'm trying, for each row, to calculate the difference with the closest previous row belonging to the same group which meets a certain criterion.



    Suppose I have the following dataframe:



    s <- read.table(text = "Visit_num Patient Day Admitted
    1 1 2015/01/01 Yes
    2 1 2015/01/10 No
    3 1 2015/01/15 Yes
    4 1 2015/02/10 No
    5 1 2015/03/08 Yes
    6 2 2015/01/01 Yes
    7 2 2015/04/01 No
    8 2 2015/04/10 No
    9 3 2015/04/01 No
    10 3 2015/04/10 No", header = T, sep = "")


    For each Visit_num and for each Patient, I'd like to get the difference with the closest row for which the patient was admitted (i.e. Yes). Note column day is ordered by day, and time unit for this example is days.
    Here is what I wanted my dataframe to look like:



    Visit_num Patient Day Admitted Diff_days
    1 1 2015/01/01 Yes NA
    2 1 2015/01/10 No 9
    3 1 2015/01/15 Yes 14
    4 1 2015/02/10 No 26
    5 1 2015/03/08 Yes 52
    6 2 2015/01/01 Yes NA
    7 2 2015/04/01 No 90
    8 2 2015/04/10 No 99
    9 3 2015/04/01 No NA
    10 3 2015/04/10 No NA


    Any help is appreciated.










    share|improve this question


























      0












      0








      0








      I'm trying, for each row, to calculate the difference with the closest previous row belonging to the same group which meets a certain criterion.



      Suppose I have the following dataframe:



      s <- read.table(text = "Visit_num Patient Day Admitted
      1 1 2015/01/01 Yes
      2 1 2015/01/10 No
      3 1 2015/01/15 Yes
      4 1 2015/02/10 No
      5 1 2015/03/08 Yes
      6 2 2015/01/01 Yes
      7 2 2015/04/01 No
      8 2 2015/04/10 No
      9 3 2015/04/01 No
      10 3 2015/04/10 No", header = T, sep = "")


      For each Visit_num and for each Patient, I'd like to get the difference with the closest row for which the patient was admitted (i.e. Yes). Note column day is ordered by day, and time unit for this example is days.
      Here is what I wanted my dataframe to look like:



      Visit_num Patient Day Admitted Diff_days
      1 1 2015/01/01 Yes NA
      2 1 2015/01/10 No 9
      3 1 2015/01/15 Yes 14
      4 1 2015/02/10 No 26
      5 1 2015/03/08 Yes 52
      6 2 2015/01/01 Yes NA
      7 2 2015/04/01 No 90
      8 2 2015/04/10 No 99
      9 3 2015/04/01 No NA
      10 3 2015/04/10 No NA


      Any help is appreciated.










      share|improve this question
















      I'm trying, for each row, to calculate the difference with the closest previous row belonging to the same group which meets a certain criterion.



      Suppose I have the following dataframe:



      s <- read.table(text = "Visit_num Patient Day Admitted
      1 1 2015/01/01 Yes
      2 1 2015/01/10 No
      3 1 2015/01/15 Yes
      4 1 2015/02/10 No
      5 1 2015/03/08 Yes
      6 2 2015/01/01 Yes
      7 2 2015/04/01 No
      8 2 2015/04/10 No
      9 3 2015/04/01 No
      10 3 2015/04/10 No", header = T, sep = "")


      For each Visit_num and for each Patient, I'd like to get the difference with the closest row for which the patient was admitted (i.e. Yes). Note column day is ordered by day, and time unit for this example is days.
      Here is what I wanted my dataframe to look like:



      Visit_num Patient Day Admitted Diff_days
      1 1 2015/01/01 Yes NA
      2 1 2015/01/10 No 9
      3 1 2015/01/15 Yes 14
      4 1 2015/02/10 No 26
      5 1 2015/03/08 Yes 52
      6 2 2015/01/01 Yes NA
      7 2 2015/04/01 No 90
      8 2 2015/04/10 No 99
      9 3 2015/04/01 No NA
      10 3 2015/04/10 No NA


      Any help is appreciated.







      r






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 20:34







      DEYVISON MEDEIROS

















      asked Nov 14 '18 at 18:44









      DEYVISON MEDEIROSDEYVISON MEDEIROS

      92




      92






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Here is an option with tidyverse. Convert the 'Day' to Date class, arrange by 'Patient', 'Day', grouped by 'Patient' get the difference of adjacent 'Day', create a group 'grp' based on the occurrence of 'Yes' in 'Admitted' and take the cumulative sum of 'Diff_days'



          library(tidyverse)
          s %>%
          mutate(Day = ymd(Day)) %>%
          arrange(Patient, Day) %>%
          group_by(Patient) %>%
          mutate(Diff_days = c(NA, diff(Day))) %>%
          group_by(grp = cumsum(lag(Admitted == "Yes", default = TRUE)), add = TRUE) %>%
          mutate(Diff_days = cumsum(replace_na(Diff_days, 0))) %>%
          ungroup %>%
          select(-grp) %>%
          mutate(Diff_days = na_if(Diff_days, 0))
          # A tibble: 8 x 5
          # Visit_num Patient Day Admitted Diff_days
          # <int> <int> <date> <fct> <dbl>
          #1 1 1 2015-01-01 Yes NA
          #2 2 1 2015-01-10 No 9
          #3 3 1 2015-01-15 Yes 14
          #4 4 1 2015-02-10 No 26
          #5 5 1 2015-03-08 Yes 52
          #6 6 2 2015-01-01 Yes NA
          #7 7 2 2015-04-01 No 90
          #8 8 2 2015-04-10 No 99





          share|improve this answer

























          • Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:47












          • @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

            – akrun
            Nov 14 '18 at 19:48






          • 1





            Thank you very much for your answer @akrun. That worked!

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:58












          • I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

            – DEYVISON MEDEIROS
            Nov 14 '18 at 20:36











          • @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

            – akrun
            Nov 15 '18 at 2:01










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306840%2fget-difference-with-closest-previous-row-in-a-group-which-meets-criterion%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Here is an option with tidyverse. Convert the 'Day' to Date class, arrange by 'Patient', 'Day', grouped by 'Patient' get the difference of adjacent 'Day', create a group 'grp' based on the occurrence of 'Yes' in 'Admitted' and take the cumulative sum of 'Diff_days'



          library(tidyverse)
          s %>%
          mutate(Day = ymd(Day)) %>%
          arrange(Patient, Day) %>%
          group_by(Patient) %>%
          mutate(Diff_days = c(NA, diff(Day))) %>%
          group_by(grp = cumsum(lag(Admitted == "Yes", default = TRUE)), add = TRUE) %>%
          mutate(Diff_days = cumsum(replace_na(Diff_days, 0))) %>%
          ungroup %>%
          select(-grp) %>%
          mutate(Diff_days = na_if(Diff_days, 0))
          # A tibble: 8 x 5
          # Visit_num Patient Day Admitted Diff_days
          # <int> <int> <date> <fct> <dbl>
          #1 1 1 2015-01-01 Yes NA
          #2 2 1 2015-01-10 No 9
          #3 3 1 2015-01-15 Yes 14
          #4 4 1 2015-02-10 No 26
          #5 5 1 2015-03-08 Yes 52
          #6 6 2 2015-01-01 Yes NA
          #7 7 2 2015-04-01 No 90
          #8 8 2 2015-04-10 No 99





          share|improve this answer

























          • Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:47












          • @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

            – akrun
            Nov 14 '18 at 19:48






          • 1





            Thank you very much for your answer @akrun. That worked!

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:58












          • I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

            – DEYVISON MEDEIROS
            Nov 14 '18 at 20:36











          • @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

            – akrun
            Nov 15 '18 at 2:01















          0














          Here is an option with tidyverse. Convert the 'Day' to Date class, arrange by 'Patient', 'Day', grouped by 'Patient' get the difference of adjacent 'Day', create a group 'grp' based on the occurrence of 'Yes' in 'Admitted' and take the cumulative sum of 'Diff_days'



          library(tidyverse)
          s %>%
          mutate(Day = ymd(Day)) %>%
          arrange(Patient, Day) %>%
          group_by(Patient) %>%
          mutate(Diff_days = c(NA, diff(Day))) %>%
          group_by(grp = cumsum(lag(Admitted == "Yes", default = TRUE)), add = TRUE) %>%
          mutate(Diff_days = cumsum(replace_na(Diff_days, 0))) %>%
          ungroup %>%
          select(-grp) %>%
          mutate(Diff_days = na_if(Diff_days, 0))
          # A tibble: 8 x 5
          # Visit_num Patient Day Admitted Diff_days
          # <int> <int> <date> <fct> <dbl>
          #1 1 1 2015-01-01 Yes NA
          #2 2 1 2015-01-10 No 9
          #3 3 1 2015-01-15 Yes 14
          #4 4 1 2015-02-10 No 26
          #5 5 1 2015-03-08 Yes 52
          #6 6 2 2015-01-01 Yes NA
          #7 7 2 2015-04-01 No 90
          #8 8 2 2015-04-10 No 99





          share|improve this answer

























          • Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:47












          • @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

            – akrun
            Nov 14 '18 at 19:48






          • 1





            Thank you very much for your answer @akrun. That worked!

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:58












          • I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

            – DEYVISON MEDEIROS
            Nov 14 '18 at 20:36











          • @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

            – akrun
            Nov 15 '18 at 2:01













          0












          0








          0







          Here is an option with tidyverse. Convert the 'Day' to Date class, arrange by 'Patient', 'Day', grouped by 'Patient' get the difference of adjacent 'Day', create a group 'grp' based on the occurrence of 'Yes' in 'Admitted' and take the cumulative sum of 'Diff_days'



          library(tidyverse)
          s %>%
          mutate(Day = ymd(Day)) %>%
          arrange(Patient, Day) %>%
          group_by(Patient) %>%
          mutate(Diff_days = c(NA, diff(Day))) %>%
          group_by(grp = cumsum(lag(Admitted == "Yes", default = TRUE)), add = TRUE) %>%
          mutate(Diff_days = cumsum(replace_na(Diff_days, 0))) %>%
          ungroup %>%
          select(-grp) %>%
          mutate(Diff_days = na_if(Diff_days, 0))
          # A tibble: 8 x 5
          # Visit_num Patient Day Admitted Diff_days
          # <int> <int> <date> <fct> <dbl>
          #1 1 1 2015-01-01 Yes NA
          #2 2 1 2015-01-10 No 9
          #3 3 1 2015-01-15 Yes 14
          #4 4 1 2015-02-10 No 26
          #5 5 1 2015-03-08 Yes 52
          #6 6 2 2015-01-01 Yes NA
          #7 7 2 2015-04-01 No 90
          #8 8 2 2015-04-10 No 99





          share|improve this answer















          Here is an option with tidyverse. Convert the 'Day' to Date class, arrange by 'Patient', 'Day', grouped by 'Patient' get the difference of adjacent 'Day', create a group 'grp' based on the occurrence of 'Yes' in 'Admitted' and take the cumulative sum of 'Diff_days'



          library(tidyverse)
          s %>%
          mutate(Day = ymd(Day)) %>%
          arrange(Patient, Day) %>%
          group_by(Patient) %>%
          mutate(Diff_days = c(NA, diff(Day))) %>%
          group_by(grp = cumsum(lag(Admitted == "Yes", default = TRUE)), add = TRUE) %>%
          mutate(Diff_days = cumsum(replace_na(Diff_days, 0))) %>%
          ungroup %>%
          select(-grp) %>%
          mutate(Diff_days = na_if(Diff_days, 0))
          # A tibble: 8 x 5
          # Visit_num Patient Day Admitted Diff_days
          # <int> <int> <date> <fct> <dbl>
          #1 1 1 2015-01-01 Yes NA
          #2 2 1 2015-01-10 No 9
          #3 3 1 2015-01-15 Yes 14
          #4 4 1 2015-02-10 No 26
          #5 5 1 2015-03-08 Yes 52
          #6 6 2 2015-01-01 Yes NA
          #7 7 2 2015-04-01 No 90
          #8 8 2 2015-04-10 No 99






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 19:30

























          answered Nov 14 '18 at 19:17









          akrunakrun

          415k13204278




          415k13204278












          • Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:47












          • @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

            – akrun
            Nov 14 '18 at 19:48






          • 1





            Thank you very much for your answer @akrun. That worked!

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:58












          • I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

            – DEYVISON MEDEIROS
            Nov 14 '18 at 20:36











          • @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

            – akrun
            Nov 15 '18 at 2:01

















          • Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:47












          • @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

            – akrun
            Nov 14 '18 at 19:48






          • 1





            Thank you very much for your answer @akrun. That worked!

            – DEYVISON MEDEIROS
            Nov 14 '18 at 19:58












          • I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

            – DEYVISON MEDEIROS
            Nov 14 '18 at 20:36











          • @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

            – akrun
            Nov 15 '18 at 2:01
















          Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

          – DEYVISON MEDEIROS
          Nov 14 '18 at 19:47






          Sorry, quick follow up. What if instead of Day as date (2015-01-01) I had a datetime object (2015-01-01 00:15)? Would "ymd" and "diff" functions work?

          – DEYVISON MEDEIROS
          Nov 14 '18 at 19:47














          @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

          – akrun
          Nov 14 '18 at 19:48





          @DEYVISONMEDEIROS If it is already in Datetime, class then you can just do mutate(Day = as.Date(Day)) %>%

          – akrun
          Nov 14 '18 at 19:48




          1




          1





          Thank you very much for your answer @akrun. That worked!

          – DEYVISON MEDEIROS
          Nov 14 '18 at 19:58






          Thank you very much for your answer @akrun. That worked!

          – DEYVISON MEDEIROS
          Nov 14 '18 at 19:58














          I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

          – DEYVISON MEDEIROS
          Nov 14 '18 at 20:36





          I found only one problem with the solution: in case the patient was never admitted, I'm still getting a cumulative sum for diff_days. Any way to avoid that (I wanted it to be 0 or NA)? I have modified the example to make it more clear

          – DEYVISON MEDEIROS
          Nov 14 '18 at 20:36













          @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

          – akrun
          Nov 15 '18 at 2:01





          @DEYVISONMEDEIROS You make add an if/else condition to this, i.e. group_by(Patient) %>% mutate(Diff_days = if(all(Admitted == "No")) NA else c(NA, diff(Day))) and make changes in the rest of the code

          – akrun
          Nov 15 '18 at 2:01



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306840%2fget-difference-with-closest-previous-row-in-a-group-which-meets-criterion%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Use pre created SQLite database for Android project in kotlin

          Darth Vader #20

          Ondo