fill the gap between dates in a data frame
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way
df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)
> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345
I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):
car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0
My first idea was to generate a "dates dataframe as a sequence this way
min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")
gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()
and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr
r
add a comment |
I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way
df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)
> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345
I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):
car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0
My first idea was to generate a "dates dataframe as a sequence this way
min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")
gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()
and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr
r
2
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17
add a comment |
I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way
df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)
> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345
I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):
car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0
My first idea was to generate a "dates dataframe as a sequence this way
min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")
gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()
and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr
r
I have a dataframe with collected data by year-month but sometimes there is a month where no data is collected. This way
df <- read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford, 2013,01,2345"), sep = ",", header = TRUE)
> df
car year month amount
1 Mazda 2012 2 2344
2 Ford 2012 4 235234
3 Mazda 2012 3 3455
4 Mazda 2012 4 43554
5 Mazda 2012 5 9854
6 Mazda 2012 6 32556
7 Ford 2013 1 2345
I want fill the gaps this way (given two dates, "from" = 2012/01/01 , "to" = 2013/12/01):
car year month amount
Mazda 2012 1 0
Ford 2012 1 0
Ford 2012 2 0
Mazda 2012 2 2344
Ford 2012 3 0
Ford 2012 4 235234
Mazda 2012 3 3455
Mazda 2012 4 43554
Mazda 2012 5 9854
Mazda 2012 6 32556
Mazda 2012 7 0
Mazda 2012 8 0
Mazda 2012 9 0
Mazda 2012 10 0
Mazda 2012 11 0
Mazda 2012 12 0
Ford 2013 1 2345
Ford 2013 2 0
Ford 2013 3 0
Ford 2013 4 0
Ford 2013 5 0
Ford 2013 6 0
Ford 2013 7 0
Ford 2013 8 0
Ford 2013 9 0
Ford 2013 10 0
Ford 2013 11 0
Ford 2013 12 0
Mazda 2013 1 0
Mazda 2013 2 0
Mazda 2013 3 0
Mazda 2013 4 0
Mazda 2013 5 0
Mazda 2013 6 0
Mazda 2013 7 0
Mazda 2013 8 0
Mazda 2013 9 0
Mazda 2013 10 0
Mazda 2013 11 0
Mazda 2013 12 0
My first idea was to generate a "dates dataframe as a sequence this way
min.date <- as.Date("2012/01/01")
max.date <- as.Date("2013/12/01")
gen.dates <-
seq(from = mid.date,
to = max.date,
by = "month") %>% as.data.frame()
and then JOIN but I discovered that is not as simple as it seems, so I assume there is a cleaner way maybe using dplyr
r
r
edited Nov 15 '18 at 14:00
Forge
asked Nov 15 '18 at 13:48
ForgeForge
391316
391316
2
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17
add a comment |
2
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17
2
2
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17
add a comment |
2 Answers
2
active
oldest
votes
Expanding on my comment:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
Heavyweight tidyverse
way:
dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...
All base R:
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf
xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)
dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...
Head-to-head:
microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100
You can also do the expansion (e.g. years as you asked):
tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
or
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
And, then layer in other metadata:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
or via tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
add a comment |
Check out tidyr
package and complete
. Use fill = list(amount = 0))
to fill in missing values as indicated by @markus below.
library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows
2
Addfill = list(amount = 0))
tocomplete
.
– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320909%2ffill-the-gap-between-dates-in-a-data-frame%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Expanding on my comment:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
Heavyweight tidyverse
way:
dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...
All base R:
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf
xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)
dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...
Head-to-head:
microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100
You can also do the expansion (e.g. years as you asked):
tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
or
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
And, then layer in other metadata:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
or via tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
add a comment |
Expanding on my comment:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
Heavyweight tidyverse
way:
dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...
All base R:
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf
xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)
dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...
Head-to-head:
microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100
You can also do the expansion (e.g. years as you asked):
tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
or
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
And, then layer in other metadata:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
or via tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
add a comment |
Expanding on my comment:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
Heavyweight tidyverse
way:
dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...
All base R:
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf
xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)
dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...
Head-to-head:
microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100
You can also do the expansion (e.g. years as you asked):
tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
or
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
And, then layer in other metadata:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
or via tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)
Expanding on my comment:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
Heavyweight tidyverse
way:
dplyr::glimpse(
tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
)
## Observations: 48
## Variables: 4
## $ car <chr> "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "Ford", "For...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 0, 0, 235234, 0, 0, 0, 0, 0, 0, 0, 0, 2345, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2344, 3455, 43554...
All base R:
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> xdf
xdf$amount <- ifelse(is.na(xdf$amount), 0, xdf$amount)
dplyr::glimpse(xdf)
## Observations: 48
## Variables: 4
## $ car <fct> Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Mazda, Ma...
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013...
## $ month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8...
## $ amount <dbl> 0, 2344, 3455, 43554, 9854, 32556, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23...
Head-to-head:
microbenchmark::microbenchmark(
tidy = tidyr::complete(xdf, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0)),
dplyr = xdf %>%
group_by(car, year) %>%
complete(month = 1:12, fill = list(amount = 0)),
base =
merge(
expand.grid(car = unique(xdf$car), year = unique(xdf$year), month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## tidy 2553.802 3036.262 4233.912 3613.672 5046.737 12219.712 100
## dplyr 5639.261 6851.680 9396.590 7686.171 10273.043 70357.399 100
## base 848.400 1055.845 1593.015 1194.247 1656.759 9594.898 100
You can also do the expansion (e.g. years as you asked):
tidyr::complete(xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
or
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
And, then layer in other metadata:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
or via tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)
edited Nov 15 '18 at 18:08
answered Nov 15 '18 at 14:02
hrbrmstrhrbrmstr
62.1k694155
62.1k694155
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
add a comment |
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
could be possible to add another column just to be preserved?. COUNTRY(JP for MAzda and USA for Ford) I ve tried but no success so far
– Forge
Nov 15 '18 at 17:19
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
updated at the end
– hrbrmstr
Nov 15 '18 at 18:08
add a comment |
Check out tidyr
package and complete
. Use fill = list(amount = 0))
to fill in missing values as indicated by @markus below.
library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows
2
Addfill = list(amount = 0))
tocomplete
.
– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
add a comment |
Check out tidyr
package and complete
. Use fill = list(amount = 0))
to fill in missing values as indicated by @markus below.
library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows
2
Addfill = list(amount = 0))
tocomplete
.
– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
add a comment |
Check out tidyr
package and complete
. Use fill = list(amount = 0))
to fill in missing values as indicated by @markus below.
library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows
Check out tidyr
package and complete
. Use fill = list(amount = 0))
to fill in missing values as indicated by @markus below.
library(tidyr)
library(dplyr)
df %>% group_by(car,year) %>% complete(month = 1:12, fill = list(amount = 0))
# A tibble: 48 x 4
# Groups: car, year [4]
# car year month amount
# <fct> <int> <int> <dbl>
# 1 " Ford" 2012 1 0
# 2 " Ford" 2012 2 0
# 3 " Ford" 2012 3 0
# 4 " Ford" 2012 4 235234
# 5 " Ford" 2012 5 0
# 6 " Ford" 2012 6 0
# 7 " Ford" 2012 7 0
# 8 " Ford" 2012 8 0
# 9 " Ford" 2012 9 0
#10 " Ford" 2012 10 0
# ... with 38 more rows
edited Nov 15 '18 at 13:58
answered Nov 15 '18 at 13:54
jasbnerjasbner
2,036619
2,036619
2
Addfill = list(amount = 0))
tocomplete
.
– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
add a comment |
2
Addfill = list(amount = 0))
tocomplete
.
– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
2
2
Add
fill = list(amount = 0))
to complete
.– markus
Nov 15 '18 at 13:55
Add
fill = list(amount = 0))
to complete
.– markus
Nov 15 '18 at 13:55
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
I get 36 rows but 48 are expected. Your code does not fill the gaps for Mazda vehicles. I was not clear enough, my bad. Sometimes ending year is 2018. So how can I set the starting/ending date using your aproach?
– Forge
Nov 15 '18 at 14:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320909%2ffill-the-gap-between-dates-in-a-data-frame%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
tidyr::complete(df, car = unique(car), year = unique(year), month=1:12, fill=list(amount=0))
– hrbrmstr
Nov 15 '18 at 13:55
great! but I can't set the ending date this way to 2014, for instance...
– Forge
Nov 15 '18 at 14:08
totally can. i'll expand my answer
– hrbrmstr
Nov 15 '18 at 14:08
data.table::setDT(df)[,.SD[.(month=1:12),, on = "month"], by = .(year, car)]
– Andre Elrico
Nov 15 '18 at 14:17