Using Loops to pull in data for a certain range of dates depending on multiple column values
First and foremost, thank you for taking your time to view/answer my question.
I asked this question before, albeit very un-clearly, however, I think I am close to a solution and hoping you can help me out!
I have 2 main df's: Customer (base table), and Top_Customers (a subset of the top n customers per quarter)
Both tables have the same layout, seen below:
Cust_ID Date QTR Sales Action Link_Cust_ID
1 1/1/18 2018 Q1 23 NA NA
1 1/2/18 2018 Q1 22.2 NA NA
1 1/3/18 2018 Q1 12.1 2 5
1 1/4/18 2018 Q1 14.1 5 NA
3 1/1/18 2018 Q1 101 NA NA
3 1/2/18 2018 Q1 55 2 18
... ... ... ... ... ...
Sometimes - a customer might have reference to another customer - hence the link_cust_id column might be populated with the ID of another customer.
My Goal is: If the Action column == 2, I want to include that customer (from Link_Cust_ID) in my Top_Customer table from the date the link_cust_id is populated to the END OF THE QUARTER
For example, I would include Cust_ID = 5 from 1/3/18 to 3/31/18 (end of the quarter) from the above chart
I've been trying at this for a while now and have come up with the following code (which currently doesn't work, but I think the idea is there)
Linking_ID <- function(data)
if (data$link_type == 2)
temp.linkid <- data$link_cust_id[i] #stores the linked customer_id
temp.date <- data$Date[i] #stores the date linking occurs
temp.data <- customer_data %>% group_by(Quarter) %>% filter(customer_id = temp.linkid & Date >= temp.date)
#the above line of code is suppose to subset only link_customer_id data from the link_date to the end of the quarter
data <- rbind(data, temp.data)
I am not great with loops, and try and not use them much in my code, but I might not have a choice in this situation. If you think another way might be better, please do suggest it!
dput for base table (all customers)
structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
dput for top_customer tble:
structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
r dplyr
add a comment |
First and foremost, thank you for taking your time to view/answer my question.
I asked this question before, albeit very un-clearly, however, I think I am close to a solution and hoping you can help me out!
I have 2 main df's: Customer (base table), and Top_Customers (a subset of the top n customers per quarter)
Both tables have the same layout, seen below:
Cust_ID Date QTR Sales Action Link_Cust_ID
1 1/1/18 2018 Q1 23 NA NA
1 1/2/18 2018 Q1 22.2 NA NA
1 1/3/18 2018 Q1 12.1 2 5
1 1/4/18 2018 Q1 14.1 5 NA
3 1/1/18 2018 Q1 101 NA NA
3 1/2/18 2018 Q1 55 2 18
... ... ... ... ... ...
Sometimes - a customer might have reference to another customer - hence the link_cust_id column might be populated with the ID of another customer.
My Goal is: If the Action column == 2, I want to include that customer (from Link_Cust_ID) in my Top_Customer table from the date the link_cust_id is populated to the END OF THE QUARTER
For example, I would include Cust_ID = 5 from 1/3/18 to 3/31/18 (end of the quarter) from the above chart
I've been trying at this for a while now and have come up with the following code (which currently doesn't work, but I think the idea is there)
Linking_ID <- function(data)
if (data$link_type == 2)
temp.linkid <- data$link_cust_id[i] #stores the linked customer_id
temp.date <- data$Date[i] #stores the date linking occurs
temp.data <- customer_data %>% group_by(Quarter) %>% filter(customer_id = temp.linkid & Date >= temp.date)
#the above line of code is suppose to subset only link_customer_id data from the link_date to the end of the quarter
data <- rbind(data, temp.data)
I am not great with loops, and try and not use them much in my code, but I might not have a choice in this situation. If you think another way might be better, please do suggest it!
dput for base table (all customers)
structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
dput for top_customer tble:
structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
r dplyr
can you share both dataframes withdput
?
– Mike
Nov 12 '18 at 16:19
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28
add a comment |
First and foremost, thank you for taking your time to view/answer my question.
I asked this question before, albeit very un-clearly, however, I think I am close to a solution and hoping you can help me out!
I have 2 main df's: Customer (base table), and Top_Customers (a subset of the top n customers per quarter)
Both tables have the same layout, seen below:
Cust_ID Date QTR Sales Action Link_Cust_ID
1 1/1/18 2018 Q1 23 NA NA
1 1/2/18 2018 Q1 22.2 NA NA
1 1/3/18 2018 Q1 12.1 2 5
1 1/4/18 2018 Q1 14.1 5 NA
3 1/1/18 2018 Q1 101 NA NA
3 1/2/18 2018 Q1 55 2 18
... ... ... ... ... ...
Sometimes - a customer might have reference to another customer - hence the link_cust_id column might be populated with the ID of another customer.
My Goal is: If the Action column == 2, I want to include that customer (from Link_Cust_ID) in my Top_Customer table from the date the link_cust_id is populated to the END OF THE QUARTER
For example, I would include Cust_ID = 5 from 1/3/18 to 3/31/18 (end of the quarter) from the above chart
I've been trying at this for a while now and have come up with the following code (which currently doesn't work, but I think the idea is there)
Linking_ID <- function(data)
if (data$link_type == 2)
temp.linkid <- data$link_cust_id[i] #stores the linked customer_id
temp.date <- data$Date[i] #stores the date linking occurs
temp.data <- customer_data %>% group_by(Quarter) %>% filter(customer_id = temp.linkid & Date >= temp.date)
#the above line of code is suppose to subset only link_customer_id data from the link_date to the end of the quarter
data <- rbind(data, temp.data)
I am not great with loops, and try and not use them much in my code, but I might not have a choice in this situation. If you think another way might be better, please do suggest it!
dput for base table (all customers)
structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
dput for top_customer tble:
structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
r dplyr
First and foremost, thank you for taking your time to view/answer my question.
I asked this question before, albeit very un-clearly, however, I think I am close to a solution and hoping you can help me out!
I have 2 main df's: Customer (base table), and Top_Customers (a subset of the top n customers per quarter)
Both tables have the same layout, seen below:
Cust_ID Date QTR Sales Action Link_Cust_ID
1 1/1/18 2018 Q1 23 NA NA
1 1/2/18 2018 Q1 22.2 NA NA
1 1/3/18 2018 Q1 12.1 2 5
1 1/4/18 2018 Q1 14.1 5 NA
3 1/1/18 2018 Q1 101 NA NA
3 1/2/18 2018 Q1 55 2 18
... ... ... ... ... ...
Sometimes - a customer might have reference to another customer - hence the link_cust_id column might be populated with the ID of another customer.
My Goal is: If the Action column == 2, I want to include that customer (from Link_Cust_ID) in my Top_Customer table from the date the link_cust_id is populated to the END OF THE QUARTER
For example, I would include Cust_ID = 5 from 1/3/18 to 3/31/18 (end of the quarter) from the above chart
I've been trying at this for a while now and have come up with the following code (which currently doesn't work, but I think the idea is there)
Linking_ID <- function(data)
if (data$link_type == 2)
temp.linkid <- data$link_cust_id[i] #stores the linked customer_id
temp.date <- data$Date[i] #stores the date linking occurs
temp.data <- customer_data %>% group_by(Quarter) %>% filter(customer_id = temp.linkid & Date >= temp.date)
#the above line of code is suppose to subset only link_customer_id data from the link_date to the end of the quarter
data <- rbind(data, temp.data)
I am not great with loops, and try and not use them much in my code, but I might not have a choice in this situation. If you think another way might be better, please do suggest it!
dput for base table (all customers)
structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
dput for top_customer tble:
structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
r dplyr
r dplyr
edited Nov 12 '18 at 16:33
yungpadewon
asked Nov 12 '18 at 16:11
yungpadewonyungpadewon
396
396
can you share both dataframes withdput
?
– Mike
Nov 12 '18 at 16:19
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28
add a comment |
can you share both dataframes withdput
?
– Mike
Nov 12 '18 at 16:19
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28
can you share both dataframes with
dput
?– Mike
Nov 12 '18 at 16:19
can you share both dataframes with
dput
?– Mike
Nov 12 '18 at 16:19
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28
add a comment |
2 Answers
2
active
oldest
votes
You can do the following:
First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.
library(dplyr)
link_ids_action2 <- customer %>%
select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>%
filter(Action == 2) %>%
mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31",
grepl("Q2", Quarter) ~ "2018-06-30",
grepl("Q3", Quarter) ~ "2018-09-30",
grepl("Q4", Quarter) ~ "2018-12-31"),
Quarter_end_dates = as.Date(Quarter_end_dates)) %>%
select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)
link_ids_action2
# A tibble: 2 x 4
# Link_Cust_ID New_Cust_ID Start_date Quarter_end_dates
# <dbl> <dbl> <dttm> <date>
# 1 5 1 2018-01-03 00:00:00 2018-03-31
# 2 18 3 2018-01-02 00:00:00 2018-03-31
Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates
to filter properly on your datetime. Select only those columns you need for your top_customer
data.
new_top_customers <-
right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>%
filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>%
select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)
new_top_customers
# A tibble: 2 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 2 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
No with bind_rows
you can add the new top customers:
bind_rows(top_customer, new_top_customers)
# A tibble: 10 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 1 2018-01-01 00:00:00 2018 Q1 23 NA NA
# 2 1 2018-01-02 00:00:00 2018 Q1 22.2 NA NA
# 3 1 2018-01-03 00:00:00 2018 Q1 12.1 2 5
# 4 1 2018-01-04 00:00:00 2018 Q1 14.1 NA NA
# 5 3 2018-01-01 00:00:00 2018 Q1 101 NA NA
# 6 3 2018-01-02 00:00:00 2018 Q1 55 2 18
# 7 3 2018-01-03 00:00:00 2018 Q1 56 NA NA
# 8 3 2018-01-04 00:00:00 2018 Q1 55 NA NA
# 9 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 10 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
Data
customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55),
Action = c(NA, NA, 2, NA, NA, 2, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
add a comment |
I think this might help. You don't need to do loops here.
all_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
#get where action is equal to 2
top_cust2 <- filter(top_cust, Action == 2)
#join on cust_id to link_cust_id
#Then filter where date is greater or equal too
#reference date and in the same quarter
bth <- inner_join(all_cust,top_cust2,
by =c("Cust_ID"="Link_Cust_ID")) %>%
filter(Date.x >= Date.y, Quarter.x == Quarter.y)
#remove .y columns from all_cust
bth <- bth[,!grepl(".y",colnames(bth))]
#drop .x from variable names
colnames(bth) <- gsub(".x","",colnames(bth))
cmb <- bind_rows(top_cust, bth)
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266010%2fusing-loops-to-pull-in-data-for-a-certain-range-of-dates-depending-on-multiple-c%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can do the following:
First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.
library(dplyr)
link_ids_action2 <- customer %>%
select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>%
filter(Action == 2) %>%
mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31",
grepl("Q2", Quarter) ~ "2018-06-30",
grepl("Q3", Quarter) ~ "2018-09-30",
grepl("Q4", Quarter) ~ "2018-12-31"),
Quarter_end_dates = as.Date(Quarter_end_dates)) %>%
select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)
link_ids_action2
# A tibble: 2 x 4
# Link_Cust_ID New_Cust_ID Start_date Quarter_end_dates
# <dbl> <dbl> <dttm> <date>
# 1 5 1 2018-01-03 00:00:00 2018-03-31
# 2 18 3 2018-01-02 00:00:00 2018-03-31
Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates
to filter properly on your datetime. Select only those columns you need for your top_customer
data.
new_top_customers <-
right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>%
filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>%
select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)
new_top_customers
# A tibble: 2 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 2 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
No with bind_rows
you can add the new top customers:
bind_rows(top_customer, new_top_customers)
# A tibble: 10 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 1 2018-01-01 00:00:00 2018 Q1 23 NA NA
# 2 1 2018-01-02 00:00:00 2018 Q1 22.2 NA NA
# 3 1 2018-01-03 00:00:00 2018 Q1 12.1 2 5
# 4 1 2018-01-04 00:00:00 2018 Q1 14.1 NA NA
# 5 3 2018-01-01 00:00:00 2018 Q1 101 NA NA
# 6 3 2018-01-02 00:00:00 2018 Q1 55 2 18
# 7 3 2018-01-03 00:00:00 2018 Q1 56 NA NA
# 8 3 2018-01-04 00:00:00 2018 Q1 55 NA NA
# 9 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 10 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
Data
customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55),
Action = c(NA, NA, 2, NA, NA, 2, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
add a comment |
You can do the following:
First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.
library(dplyr)
link_ids_action2 <- customer %>%
select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>%
filter(Action == 2) %>%
mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31",
grepl("Q2", Quarter) ~ "2018-06-30",
grepl("Q3", Quarter) ~ "2018-09-30",
grepl("Q4", Quarter) ~ "2018-12-31"),
Quarter_end_dates = as.Date(Quarter_end_dates)) %>%
select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)
link_ids_action2
# A tibble: 2 x 4
# Link_Cust_ID New_Cust_ID Start_date Quarter_end_dates
# <dbl> <dbl> <dttm> <date>
# 1 5 1 2018-01-03 00:00:00 2018-03-31
# 2 18 3 2018-01-02 00:00:00 2018-03-31
Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates
to filter properly on your datetime. Select only those columns you need for your top_customer
data.
new_top_customers <-
right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>%
filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>%
select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)
new_top_customers
# A tibble: 2 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 2 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
No with bind_rows
you can add the new top customers:
bind_rows(top_customer, new_top_customers)
# A tibble: 10 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 1 2018-01-01 00:00:00 2018 Q1 23 NA NA
# 2 1 2018-01-02 00:00:00 2018 Q1 22.2 NA NA
# 3 1 2018-01-03 00:00:00 2018 Q1 12.1 2 5
# 4 1 2018-01-04 00:00:00 2018 Q1 14.1 NA NA
# 5 3 2018-01-01 00:00:00 2018 Q1 101 NA NA
# 6 3 2018-01-02 00:00:00 2018 Q1 55 2 18
# 7 3 2018-01-03 00:00:00 2018 Q1 56 NA NA
# 8 3 2018-01-04 00:00:00 2018 Q1 55 NA NA
# 9 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 10 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
Data
customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55),
Action = c(NA, NA, 2, NA, NA, 2, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
add a comment |
You can do the following:
First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.
library(dplyr)
link_ids_action2 <- customer %>%
select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>%
filter(Action == 2) %>%
mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31",
grepl("Q2", Quarter) ~ "2018-06-30",
grepl("Q3", Quarter) ~ "2018-09-30",
grepl("Q4", Quarter) ~ "2018-12-31"),
Quarter_end_dates = as.Date(Quarter_end_dates)) %>%
select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)
link_ids_action2
# A tibble: 2 x 4
# Link_Cust_ID New_Cust_ID Start_date Quarter_end_dates
# <dbl> <dbl> <dttm> <date>
# 1 5 1 2018-01-03 00:00:00 2018-03-31
# 2 18 3 2018-01-02 00:00:00 2018-03-31
Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates
to filter properly on your datetime. Select only those columns you need for your top_customer
data.
new_top_customers <-
right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>%
filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>%
select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)
new_top_customers
# A tibble: 2 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 2 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
No with bind_rows
you can add the new top customers:
bind_rows(top_customer, new_top_customers)
# A tibble: 10 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 1 2018-01-01 00:00:00 2018 Q1 23 NA NA
# 2 1 2018-01-02 00:00:00 2018 Q1 22.2 NA NA
# 3 1 2018-01-03 00:00:00 2018 Q1 12.1 2 5
# 4 1 2018-01-04 00:00:00 2018 Q1 14.1 NA NA
# 5 3 2018-01-01 00:00:00 2018 Q1 101 NA NA
# 6 3 2018-01-02 00:00:00 2018 Q1 55 2 18
# 7 3 2018-01-03 00:00:00 2018 Q1 56 NA NA
# 8 3 2018-01-04 00:00:00 2018 Q1 55 NA NA
# 9 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 10 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
Data
customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55),
Action = c(NA, NA, 2, NA, NA, 2, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
You can do the following:
First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.
library(dplyr)
link_ids_action2 <- customer %>%
select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>%
filter(Action == 2) %>%
mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31",
grepl("Q2", Quarter) ~ "2018-06-30",
grepl("Q3", Quarter) ~ "2018-09-30",
grepl("Q4", Quarter) ~ "2018-12-31"),
Quarter_end_dates = as.Date(Quarter_end_dates)) %>%
select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)
link_ids_action2
# A tibble: 2 x 4
# Link_Cust_ID New_Cust_ID Start_date Quarter_end_dates
# <dbl> <dbl> <dttm> <date>
# 1 5 1 2018-01-03 00:00:00 2018-03-31
# 2 18 3 2018-01-02 00:00:00 2018-03-31
Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates
to filter properly on your datetime. Select only those columns you need for your top_customer
data.
new_top_customers <-
right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>%
filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>%
select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)
new_top_customers
# A tibble: 2 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 2 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
No with bind_rows
you can add the new top customers:
bind_rows(top_customer, new_top_customers)
# A tibble: 10 x 6
# Cust_ID Date Quarter Sales Action Link_Cust_ID
# <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
# 1 1 2018-01-01 00:00:00 2018 Q1 23 NA NA
# 2 1 2018-01-02 00:00:00 2018 Q1 22.2 NA NA
# 3 1 2018-01-03 00:00:00 2018 Q1 12.1 2 5
# 4 1 2018-01-04 00:00:00 2018 Q1 14.1 NA NA
# 5 3 2018-01-01 00:00:00 2018 Q1 101 NA NA
# 6 3 2018-01-02 00:00:00 2018 Q1 55 2 18
# 7 3 2018-01-03 00:00:00 2018 Q1 56 NA NA
# 8 3 2018-01-04 00:00:00 2018 Q1 55 NA NA
# 9 5 2018-01-03 00:00:00 2018 Q1 10.5 NA NA
# 10 5 2018-01-04 00:00:00 2018 Q1 11.1 NA NA
Data
customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55),
Action = c(NA, NA, 2, NA, NA, 2, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
edited Nov 12 '18 at 16:56
answered Nov 12 '18 at 16:50
kathkath
3,980724
3,980724
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
add a comment |
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help!
– yungpadewon
Nov 12 '18 at 18:59
add a comment |
I think this might help. You don't need to do loops here.
all_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
#get where action is equal to 2
top_cust2 <- filter(top_cust, Action == 2)
#join on cust_id to link_cust_id
#Then filter where date is greater or equal too
#reference date and in the same quarter
bth <- inner_join(all_cust,top_cust2,
by =c("Cust_ID"="Link_Cust_ID")) %>%
filter(Date.x >= Date.y, Quarter.x == Quarter.y)
#remove .y columns from all_cust
bth <- bth[,!grepl(".y",colnames(bth))]
#drop .x from variable names
colnames(bth) <- gsub(".x","",colnames(bth))
cmb <- bind_rows(top_cust, bth)
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
add a comment |
I think this might help. You don't need to do loops here.
all_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
#get where action is equal to 2
top_cust2 <- filter(top_cust, Action == 2)
#join on cust_id to link_cust_id
#Then filter where date is greater or equal too
#reference date and in the same quarter
bth <- inner_join(all_cust,top_cust2,
by =c("Cust_ID"="Link_Cust_ID")) %>%
filter(Date.x >= Date.y, Quarter.x == Quarter.y)
#remove .y columns from all_cust
bth <- bth[,!grepl(".y",colnames(bth))]
#drop .x from variable names
colnames(bth) <- gsub(".x","",colnames(bth))
cmb <- bind_rows(top_cust, bth)
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
add a comment |
I think this might help. You don't need to do loops here.
all_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
#get where action is equal to 2
top_cust2 <- filter(top_cust, Action == 2)
#join on cust_id to link_cust_id
#Then filter where date is greater or equal too
#reference date and in the same quarter
bth <- inner_join(all_cust,top_cust2,
by =c("Cust_ID"="Link_Cust_ID")) %>%
filter(Date.x >= Date.y, Quarter.x == Quarter.y)
#remove .y columns from all_cust
bth <- bth[,!grepl(".y",colnames(bth))]
#drop .x from variable names
colnames(bth) <- gsub(".x","",colnames(bth))
cmb <- bind_rows(top_cust, bth)
I think this might help. You don't need to do loops here.
all_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5),
Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000,
1514764800, 1514851200, 1514937600, 1515024000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1"),
Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56,
55, NA, NA, 10.5, 11.1),
Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA),
Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date =
structure(c(1514764800,
1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600,
1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1",
"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1,
14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2,
NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
#get where action is equal to 2
top_cust2 <- filter(top_cust, Action == 2)
#join on cust_id to link_cust_id
#Then filter where date is greater or equal too
#reference date and in the same quarter
bth <- inner_join(all_cust,top_cust2,
by =c("Cust_ID"="Link_Cust_ID")) %>%
filter(Date.x >= Date.y, Quarter.x == Quarter.y)
#remove .y columns from all_cust
bth <- bth[,!grepl(".y",colnames(bth))]
#drop .x from variable names
colnames(bth) <- gsub(".x","",colnames(bth))
cmb <- bind_rows(top_cust, bth)
edited Nov 12 '18 at 17:33
answered Nov 12 '18 at 16:49
MikeMike
888317
888317
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
add a comment |
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
1
1
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
I tried this solution @mike, it is very good indeed. thank you very much!
– yungpadewon
Nov 13 '18 at 15:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266010%2fusing-loops-to-pull-in-data-for-a-certain-range-of-dates-depending-on-multiple-c%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
can you share both dataframes with
dput
?– Mike
Nov 12 '18 at 16:19
Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon
– yungpadewon
Nov 12 '18 at 16:28