fill the gap between dates in a data frame



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way



df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)

> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345


I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):



 car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0


My first idea was to generate a "dates dataframe as a sequence this way



min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")

gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()


and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr










share|improve this question



















  • 2





    tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

    – hrbrmstr
    Nov 15 '18 at 13:55












  • great! but I can't set the ending date this way to 2014, for instance...

    – Forge
    Nov 15 '18 at 14:08











  • totally can. i'll expand my answer

    – hrbrmstr
    Nov 15 '18 at 14:08











  • data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

    – Andre Elrico
    Nov 15 '18 at 14:17

















0















I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way



df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)

> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345


I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):



 car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0


My first idea was to generate a "dates dataframe as a sequence this way



min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")

gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()


and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr










share|improve this question



















  • 2





    tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

    – hrbrmstr
    Nov 15 '18 at 13:55












  • great! but I can't set the ending date this way to 2014, for instance...

    – Forge
    Nov 15 '18 at 14:08











  • totally can. i'll expand my answer

    – hrbrmstr
    Nov 15 '18 at 14:08











  • data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

    – Andre Elrico
    Nov 15 '18 at 14:17













0












0








0


0






I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way



df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)

> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345


I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):



 car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0


My first idea was to generate a "dates dataframe as a sequence this way



min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")

gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()


and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr










share|improve this question
















I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way



df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)

> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345


I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):



 car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0


My first idea was to generate a "dates dataframe as a sequence this way



min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")

gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()


and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 14:00







Forge

















asked Nov 15 '18 at 13:48









ForgeForge

391316




391316







  • 2





    tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

    – hrbrmstr
    Nov 15 '18 at 13:55












  • great! but I can't set the ending date this way to 2014, for instance...

    – Forge
    Nov 15 '18 at 14:08











  • totally can. i'll expand my answer

    – hrbrmstr
    Nov 15 '18 at 14:08











  • data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

    – Andre Elrico
    Nov 15 '18 at 14:17












  • 2





    tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

    – hrbrmstr
    Nov 15 '18 at 13:55












  • great! but I can't set the ending date this way to 2014, for instance...

    – Forge
    Nov 15 '18 at 14:08











  • totally can. i'll expand my answer

    – hrbrmstr
    Nov 15 '18 at 14:08











  • data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

    – Andre Elrico
    Nov 15 '18 at 14:17







2




2





tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

– hrbrmstr
Nov 15 '18 at 13:55






tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))

– hrbrmstr
Nov 15 '18 at 13:55














great! but I can't set the ending date this way to 2014, for instance...

– Forge
Nov 15 '18 at 14:08





great! but I can't set the ending date this way to 2014, for instance...

– Forge
Nov 15 '18 at 14:08













totally can. i'll expand my answer

– hrbrmstr
Nov 15 '18 at 14:08





totally can. i'll expand my answer

– hrbrmstr
Nov 15 '18 at 14:08













data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

– Andre Elrico
Nov 15 '18 at 14:17





data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]

– Andre Elrico
Nov 15 '18 at 14:17












2 Answers
2






active

oldest

votes


















3














Expanding on my comment:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


Heavyweight tidyverse way:



dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...


All base R:



merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf

xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)

dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...


Head-to-head:



microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100


You can also do the expansion (e.g. years as you asked):



tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))


or



merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)


And, then layer in other metadata:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df

merge(x2, car2country_df)


or via tidyverse:



tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)





share|improve this answer

























  • could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

    – Forge
    Nov 15 '18 at 17:19












  • updated at the end

    – hrbrmstr
    Nov 15 '18 at 18:08


















0














Check out tidyr package and complete. Use fill = list(amount = 0)) to fill in missing values as indicated by @markus below.



library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows





share|improve this answer




















  • 2





    Add fill = list(amount = 0)) to complete.

    – markus
    Nov 15 '18 at 13:55












  • I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

    – Forge
    Nov 15 '18 at 14:13











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320909%2ffill-the-gap-between-dates-in-a-data-frame%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Expanding on my comment:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


Heavyweight tidyverse way:



dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...


All base R:



merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf

xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)

dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...


Head-to-head:



microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100


You can also do the expansion (e.g. years as you asked):



tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))


or



merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)


And, then layer in other metadata:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df

merge(x2, car2country_df)


or via tidyverse:



tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)





share|improve this answer

























  • could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

    – Forge
    Nov 15 '18 at 17:19












  • updated at the end

    – hrbrmstr
    Nov 15 '18 at 18:08















3














Expanding on my comment:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


Heavyweight tidyverse way:



dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...


All base R:



merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf

xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)

dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...


Head-to-head:



microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100


You can also do the expansion (e.g. years as you asked):



tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))


or



merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)


And, then layer in other metadata:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df

merge(x2, car2country_df)


or via tidyverse:



tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)





share|improve this answer

























  • could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

    – Forge
    Nov 15 '18 at 17:19












  • updated at the end

    – hrbrmstr
    Nov 15 '18 at 18:08













3












3








3







Expanding on my comment:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


Heavyweight tidyverse way:



dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...


All base R:



merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf

xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)

dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...


Head-to-head:



microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100


You can also do the expansion (e.g. years as you asked):



tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))


or



merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)


And, then layer in other metadata:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df

merge(x2, car2country_df)


or via tidyverse:



tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)





share|improve this answer















Expanding on my comment:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


Heavyweight tidyverse way:



dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...


All base R:



merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf

xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)

dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...


Head-to-head:



microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100


You can also do the expansion (e.g. years as you asked):



tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))


or



merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)


And, then layer in other metadata:



read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf


merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)

data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df

merge(x2, car2country_df)


or via tidyverse:



tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 18:08

























answered Nov 15 '18 at 14:02









hrbrmstrhrbrmstr

62.1k694155




62.1k694155












  • could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

    – Forge
    Nov 15 '18 at 17:19












  • updated at the end

    – hrbrmstr
    Nov 15 '18 at 18:08

















  • could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

    – Forge
    Nov 15 '18 at 17:19












  • updated at the end

    – hrbrmstr
    Nov 15 '18 at 18:08
















could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

– Forge
Nov 15 '18 at 17:19






could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far

– Forge
Nov 15 '18 at 17:19














updated at the end

– hrbrmstr
Nov 15 '18 at 18:08





updated at the end

– hrbrmstr
Nov 15 '18 at 18:08













0














Check out tidyr package and complete. Use fill = list(amount = 0)) to fill in missing values as indicated by @markus below.



library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows





share|improve this answer




















  • 2





    Add fill = list(amount = 0)) to complete.

    – markus
    Nov 15 '18 at 13:55












  • I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

    – Forge
    Nov 15 '18 at 14:13















0














Check out tidyr package and complete. Use fill = list(amount = 0)) to fill in missing values as indicated by @markus below.



library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows





share|improve this answer




















  • 2





    Add fill = list(amount = 0)) to complete.

    – markus
    Nov 15 '18 at 13:55












  • I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

    – Forge
    Nov 15 '18 at 14:13













0












0








0







Check out tidyr package and complete. Use fill = list(amount = 0)) to fill in missing values as indicated by @markus below.



library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows





share|improve this answer















Check out tidyr package and complete. Use fill = list(amount = 0)) to fill in missing values as indicated by @markus below.



library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 13:58

























answered Nov 15 '18 at 13:54









jasbnerjasbner

2,036619




2,036619







  • 2





    Add fill = list(amount = 0)) to complete.

    – markus
    Nov 15 '18 at 13:55












  • I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

    – Forge
    Nov 15 '18 at 14:13












  • 2





    Add fill = list(amount = 0)) to complete.

    – markus
    Nov 15 '18 at 13:55












  • I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

    – Forge
    Nov 15 '18 at 14:13







2




2





Add fill = list(amount = 0)) to complete.

– markus
Nov 15 '18 at 13:55






Add fill = list(amount = 0)) to complete.

– markus
Nov 15 '18 at 13:55














I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

– Forge
Nov 15 '18 at 14:13





I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?

– Forge
Nov 15 '18 at 14:13

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320909%2ffill-the-gap-between-dates-in-a-data-frame%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Syphilis

Darth Vader #20