Product purchases - if consumer buys product x how likely are they to buy product y

-2

I have some data on store level purchases, a random sample of the data looks like the following:

 PANID WEEK L1
966 3357632 2011-02-21 PIZZA
352 3357632 2009-11-09 SALTY
68 3357632 2012-06-18 BEER
65 3357632 2012-03-05 BEER
43 3108696 2011-10-31 BEER
672 3144766 2010-03-29 SALTY
70 3357632 2012-06-18 BEER
810 3144766 2012-06-18 SALTY
546 3144766 2008-05-05 SALTY
933 3357632 2009-06-15 PIZZA

(EDIT: This random sample contains 2012 info, the data I provided was filtered from 2007 - 2010 - due to character count)

There are 3 PANIDs across 4 years worth of data for 3 categories of products (BEER, SALTY, PIZZA). I am trying to find out whether people who bought BEER on a given week also bought SALTY / construct a probability that given that PANID: 3144766 bought SALTY in WEEK: 2009-06-15 what is the probability that they will also by BEER. Doing the same for pizza, i.e. given that they bought BEER the probability of them buying PIZZA and finally given that they bought PIZZA the probability of them buying SALTY.

E.G. the person below bought 3 packets of SALTY and one unit of BEER but on a different week they would have just bought BEER and PIZZA or just SALTY.

> data %>%
+ group_by(PANID) %>%
+ filter(WEEK == "2009-06-15") %>%
+ filter(PANID == "3144766")
# A tibble: 6 x 3
# Groups: PANID [1]
 PANID WEEK L1 
 <int> <date> <chr>
1 3144766 2009-06-15 BEER 
2 3144766 2009-06-15 SALTY
3 3144766 2009-06-15 SALTY
4 3144766 2009-06-15 SALTY
5 3144766 2009-06-15 PIZZA
6 3144766 2009-06-15 PIZZA

Data:

EDIT1: Data removed due to character limit. Can be found here: https://textuploader.com/db1kf

EDIT2:

I run the following code to get the below output:

#Probability of buying BEER or SALTY if PIZZA was bought
dat %>% 
 group_by(PIZZA > 0) %>% 
 summarise(beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())

#Probability of buying SALTY or PIZZA if BEER was bought
dat %>% 
 group_by(BEER > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())


#Probability of buying BEER or PIZZA if SALTY was bought
dat %>% 
 group_by(SALTY > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n())

Output:

1)

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

2)

# A tibble: 2 x 5
 `BEER > 0` pizza nopizza salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.371 0.629 0.843 0.157
2 TRUE 0.290 0.710 0.532 0.468

3)

# A tibble: 2 x 5
 `SALTY > 0` pizza nopizza beer nobeer
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.569 0.431 0.569 0.431
2 TRUE 0.272 0.728 0.219 0.781

Just so my understanding is correct. If I buy PIZZA I have a 0.586 probability of buying SALTY and a 0.414 probability of not buying SALTY (table 1). However if I buy SALTY then I would have a 0.272 probability of buying PIZZA and a 0.728 of not buying PIZZA (table 3)?

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

2

This seems to be more a question on statistical analysis (what do you need to do to calculate the probability) than it is a question in R (how do you code R to do the thing you need to do to calculate the probability).

– iod
Nov 13 '18 at 21:02

Yes that is what I am trying to do. Calculate the conditional probability that given person X bought product BEER what the probability that same person (on the same shopping trip WEEK) will put product SALTY in their basket. - suggesting that beer and salty products are compliments and should have a higher probability of being combined than other products in the dataset i.e. DIAPERS.

– user113156
Nov 13 '18 at 21:06

1

I'm just saying this is a statistics question, not an R question. Anyway hint re the R part: df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n)

– iod
Nov 13 '18 at 21:09

Thats quite helpful, thanks! This gives me the total purchases each consumer made for each product on a given WEEK?

– user113156
Nov 13 '18 at 21:19

1

That is correct

– iod
Nov 13 '18 at 21:22

add a comment |

-2

I have some data on store level purchases, a random sample of the data looks like the following:

 PANID WEEK L1
966 3357632 2011-02-21 PIZZA
352 3357632 2009-11-09 SALTY
68 3357632 2012-06-18 BEER
65 3357632 2012-03-05 BEER
43 3108696 2011-10-31 BEER
672 3144766 2010-03-29 SALTY
70 3357632 2012-06-18 BEER
810 3144766 2012-06-18 SALTY
546 3144766 2008-05-05 SALTY
933 3357632 2009-06-15 PIZZA

(EDIT: This random sample contains 2012 info, the data I provided was filtered from 2007 - 2010 - due to character count)

E.G. the person below bought 3 packets of SALTY and one unit of BEER but on a different week they would have just bought BEER and PIZZA or just SALTY.

> data %>%
+ group_by(PANID) %>%
+ filter(WEEK == "2009-06-15") %>%
+ filter(PANID == "3144766")
# A tibble: 6 x 3
# Groups: PANID [1]
 PANID WEEK L1 
 <int> <date> <chr>
1 3144766 2009-06-15 BEER 
2 3144766 2009-06-15 SALTY
3 3144766 2009-06-15 SALTY
4 3144766 2009-06-15 SALTY
5 3144766 2009-06-15 PIZZA
6 3144766 2009-06-15 PIZZA

Data:

EDIT1: Data removed due to character limit. Can be found here: https://textuploader.com/db1kf

EDIT2:

I run the following code to get the below output:

#Probability of buying BEER or SALTY if PIZZA was bought
dat %>% 
 group_by(PIZZA > 0) %>% 
 summarise(beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())

#Probability of buying SALTY or PIZZA if BEER was bought
dat %>% 
 group_by(BEER > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())


#Probability of buying BEER or PIZZA if SALTY was bought
dat %>% 
 group_by(SALTY > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n())

Output:

1)

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

2)

# A tibble: 2 x 5
 `BEER > 0` pizza nopizza salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.371 0.629 0.843 0.157
2 TRUE 0.290 0.710 0.532 0.468

3)

# A tibble: 2 x 5
 `SALTY > 0` pizza nopizza beer nobeer
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.569 0.431 0.569 0.431
2 TRUE 0.272 0.728 0.219 0.781

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

2

This seems to be more a question on statistical analysis (what do you need to do to calculate the probability) than it is a question in R (how do you code R to do the thing you need to do to calculate the probability).

– iod
Nov 13 '18 at 21:02

Yes that is what I am trying to do. Calculate the conditional probability that given person X bought product BEER what the probability that same person (on the same shopping trip WEEK) will put product SALTY in their basket. - suggesting that beer and salty products are compliments and should have a higher probability of being combined than other products in the dataset i.e. DIAPERS.

– user113156
Nov 13 '18 at 21:06

1

I'm just saying this is a statistics question, not an R question. Anyway hint re the R part: df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n)

– iod
Nov 13 '18 at 21:09

Thats quite helpful, thanks! This gives me the total purchases each consumer made for each product on a given WEEK?

– user113156
Nov 13 '18 at 21:19

1

That is correct

– iod
Nov 13 '18 at 21:22

add a comment |

-2

I have some data on store level purchases, a random sample of the data looks like the following:

 PANID WEEK L1
966 3357632 2011-02-21 PIZZA
352 3357632 2009-11-09 SALTY
68 3357632 2012-06-18 BEER
65 3357632 2012-03-05 BEER
43 3108696 2011-10-31 BEER
672 3144766 2010-03-29 SALTY
70 3357632 2012-06-18 BEER
810 3144766 2012-06-18 SALTY
546 3144766 2008-05-05 SALTY
933 3357632 2009-06-15 PIZZA

(EDIT: This random sample contains 2012 info, the data I provided was filtered from 2007 - 2010 - due to character count)

E.G. the person below bought 3 packets of SALTY and one unit of BEER but on a different week they would have just bought BEER and PIZZA or just SALTY.

> data %>%
+ group_by(PANID) %>%
+ filter(WEEK == "2009-06-15") %>%
+ filter(PANID == "3144766")
# A tibble: 6 x 3
# Groups: PANID [1]
 PANID WEEK L1 
 <int> <date> <chr>
1 3144766 2009-06-15 BEER 
2 3144766 2009-06-15 SALTY
3 3144766 2009-06-15 SALTY
4 3144766 2009-06-15 SALTY
5 3144766 2009-06-15 PIZZA
6 3144766 2009-06-15 PIZZA

Data:

EDIT1: Data removed due to character limit. Can be found here: https://textuploader.com/db1kf

EDIT2:

I run the following code to get the below output:

#Probability of buying BEER or SALTY if PIZZA was bought
dat %>% 
 group_by(PIZZA > 0) %>% 
 summarise(beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())

#Probability of buying SALTY or PIZZA if BEER was bought
dat %>% 
 group_by(BEER > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())


#Probability of buying BEER or PIZZA if SALTY was bought
dat %>% 
 group_by(SALTY > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n())

Output:

1)

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

2)

# A tibble: 2 x 5
 `BEER > 0` pizza nopizza salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.371 0.629 0.843 0.157
2 TRUE 0.290 0.710 0.532 0.468

3)

# A tibble: 2 x 5
 `SALTY > 0` pizza nopizza beer nobeer
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.569 0.431 0.569 0.431
2 TRUE 0.272 0.728 0.219 0.781

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

I have some data on store level purchases, a random sample of the data looks like the following:

 PANID WEEK L1
966 3357632 2011-02-21 PIZZA
352 3357632 2009-11-09 SALTY
68 3357632 2012-06-18 BEER
65 3357632 2012-03-05 BEER
43 3108696 2011-10-31 BEER
672 3144766 2010-03-29 SALTY
70 3357632 2012-06-18 BEER
810 3144766 2012-06-18 SALTY
546 3144766 2008-05-05 SALTY
933 3357632 2009-06-15 PIZZA

(EDIT: This random sample contains 2012 info, the data I provided was filtered from 2007 - 2010 - due to character count)

E.G. the person below bought 3 packets of SALTY and one unit of BEER but on a different week they would have just bought BEER and PIZZA or just SALTY.

> data %>%
+ group_by(PANID) %>%
+ filter(WEEK == "2009-06-15") %>%
+ filter(PANID == "3144766")
# A tibble: 6 x 3
# Groups: PANID [1]
 PANID WEEK L1 
 <int> <date> <chr>
1 3144766 2009-06-15 BEER 
2 3144766 2009-06-15 SALTY
3 3144766 2009-06-15 SALTY
4 3144766 2009-06-15 SALTY
5 3144766 2009-06-15 PIZZA
6 3144766 2009-06-15 PIZZA

Data:

EDIT1: Data removed due to character limit. Can be found here: https://textuploader.com/db1kf

EDIT2:

I run the following code to get the below output:

#Probability of buying BEER or SALTY if PIZZA was bought
dat %>% 
 group_by(PIZZA > 0) %>% 
 summarise(beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())

#Probability of buying SALTY or PIZZA if BEER was bought
dat %>% 
 group_by(BEER > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 salty = sum(SALTY > 0) / n(), nosalty = sum(SALTY == 0) / n())


#Probability of buying BEER or PIZZA if SALTY was bought
dat %>% 
 group_by(SALTY > 0) %>% 
 summarise(pizza = sum(PIZZA > 0) / n(), nopizza = sum(PIZZA == 0) / n(),
 beer = sum(BEER > 0) / n(), nobeer = sum(BEER == 0) / n())

Output:

1)

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

2)

# A tibble: 2 x 5
 `BEER > 0` pizza nopizza salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.371 0.629 0.843 0.157
2 TRUE 0.290 0.710 0.532 0.468

3)

# A tibble: 2 x 5
 `SALTY > 0` pizza nopizza beer nobeer
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.569 0.431 0.569 0.431
2 TRUE 0.272 0.728 0.219 0.781

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

edited Nov 14 '18 at 16:09

asked Nov 13 '18 at 20:53

user113156

8971419

asked Nov 13 '18 at 20:53

user113156

8971419

asked Nov 13 '18 at 20:53

user113156

8971419

2

This seems to be more a question on statistical analysis (what do you need to do to calculate the probability) than it is a question in R (how do you code R to do the thing you need to do to calculate the probability).

– iod
Nov 13 '18 at 21:02

Yes that is what I am trying to do. Calculate the conditional probability that given person X bought product BEER what the probability that same person (on the same shopping trip WEEK) will put product SALTY in their basket. - suggesting that beer and salty products are compliments and should have a higher probability of being combined than other products in the dataset i.e. DIAPERS.

– user113156
Nov 13 '18 at 21:06

1

I'm just saying this is a statistics question, not an R question. Anyway hint re the R part: df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n)

– iod
Nov 13 '18 at 21:09

Thats quite helpful, thanks! This gives me the total purchases each consumer made for each product on a given WEEK?

– user113156
Nov 13 '18 at 21:19

1

That is correct

– iod
Nov 13 '18 at 21:22

add a comment |

2

This seems to be more a question on statistical analysis (what do you need to do to calculate the probability) than it is a question in R (how do you code R to do the thing you need to do to calculate the probability).

– iod
Nov 13 '18 at 21:02

Yes that is what I am trying to do. Calculate the conditional probability that given person X bought product BEER what the probability that same person (on the same shopping trip WEEK) will put product SALTY in their basket. - suggesting that beer and salty products are compliments and should have a higher probability of being combined than other products in the dataset i.e. DIAPERS.

– user113156
Nov 13 '18 at 21:06

1

I'm just saying this is a statistics question, not an R question. Anyway hint re the R part: df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n)

– iod
Nov 13 '18 at 21:09

Thats quite helpful, thanks! This gives me the total purchases each consumer made for each product on a given WEEK?

– user113156
Nov 13 '18 at 21:19

1

That is correct

– iod
Nov 13 '18 at 21:22

This seems to be more a question on statistical analysis (what do you need to do to calculate the probability) than it is a question in R (how do you code R to do the thing you need to do to calculate the probability).

– iod
Nov 13 '18 at 21:02

Yes that is what I am trying to do. Calculate the conditional probability that given person X bought product BEER what the probability that same person (on the same shopping trip WEEK) will put product SALTY in their basket. - suggesting that beer and salty products are compliments and should have a higher probability of being combined than other products in the dataset i.e. DIAPERS.

– user113156
Nov 13 '18 at 21:06

I'm just saying this is a statistics question, not an R question. Anyway hint re the R part: df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n)

– iod
Nov 13 '18 at 21:09

Thats quite helpful, thanks! This gives me the total purchases each consumer made for each product on a given WEEK?

– user113156
Nov 13 '18 at 21:19

That is correct

– iod
Nov 13 '18 at 21:22

add a comment |

1 Answer
1

active

oldest

votes

I'm not 100% sure this is what you're looking for, so let me know if I'm off track.

We start with what I suggested in the comment (slightly adjusted to replace the NAs with 0):

df<- df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n, fill=0)

This gives us a wide-data format where for each person-week, we see the number of purchases of each of the three food types, e.g.:

> head(df,3)
# A tibble: 3 x 6
# Groups: PANID, year, WEEK [3]
 PANID year WEEK BEER PIZZA SALTY
 <int> <int> <date> <dbl> <dbl> <dbl>
1 3108696 2007 2007-12-31 2 4 6
2 3108696 2008 2008-01-21 0 2 2
3 3108696 2008 2008-02-04 1 0 2

Now we can create a table that gives the probability for buying either BEER or SALTY (of any amount) if PIZZA (of any amount) was purchased in the same week:

df %>% group_by(PIZZA>0) %>% 
 summarise(beer=sum(BEER>0)/n(),nobeer=sum(BEER==0)/n(),
 salty=sum(SALTY>0)/n(),nosalty=sum(SALTY==0)/n())

Result:

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

So we can see that if Pizza was purchased, the likelihood of both beer and salty goes down compared to weeks when pizza is not purchased.

The same can be done for BEER and SALTY, of course.

An alternative, since we have a numerical variable for each of the three foods, would be to calculate correlation or even regression, but that's not what you asked for.

answered Nov 14 '18 at 14:10

iod

3,8532722

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289340%2fproduct-purchases-if-consumer-buys-product-x-how-likely-are-they-to-buy-produc%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I'm not 100% sure this is what you're looking for, so let me know if I'm off track.

We start with what I suggested in the comment (slightly adjusted to replace the NAs with 0):

df<- df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n, fill=0)

This gives us a wide-data format where for each person-week, we see the number of purchases of each of the three food types, e.g.:

> head(df,3)
# A tibble: 3 x 6
# Groups: PANID, year, WEEK [3]
 PANID year WEEK BEER PIZZA SALTY
 <int> <int> <date> <dbl> <dbl> <dbl>
1 3108696 2007 2007-12-31 2 4 6
2 3108696 2008 2008-01-21 0 2 2
3 3108696 2008 2008-02-04 1 0 2

Now we can create a table that gives the probability for buying either BEER or SALTY (of any amount) if PIZZA (of any amount) was purchased in the same week:

df %>% group_by(PIZZA>0) %>% 
 summarise(beer=sum(BEER>0)/n(),nobeer=sum(BEER==0)/n(),
 salty=sum(SALTY>0)/n(),nosalty=sum(SALTY==0)/n())

Result:

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

So we can see that if Pizza was purchased, the likelihood of both beer and salty goes down compared to weeks when pizza is not purchased.

The same can be done for BEER and SALTY, of course.

An alternative, since we have a numerical variable for each of the three foods, would be to calculate correlation or even regression, but that's not what you asked for.

answered Nov 14 '18 at 14:10

iod

3,8532722

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

add a comment |

I'm not 100% sure this is what you're looking for, so let me know if I'm off track.

We start with what I suggested in the comment (slightly adjusted to replace the NAs with 0):

df<- df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n, fill=0)

This gives us a wide-data format where for each person-week, we see the number of purchases of each of the three food types, e.g.:

> head(df,3)
# A tibble: 3 x 6
# Groups: PANID, year, WEEK [3]
 PANID year WEEK BEER PIZZA SALTY
 <int> <int> <date> <dbl> <dbl> <dbl>
1 3108696 2007 2007-12-31 2 4 6
2 3108696 2008 2008-01-21 0 2 2
3 3108696 2008 2008-02-04 1 0 2

Now we can create a table that gives the probability for buying either BEER or SALTY (of any amount) if PIZZA (of any amount) was purchased in the same week:

df %>% group_by(PIZZA>0) %>% 
 summarise(beer=sum(BEER>0)/n(),nobeer=sum(BEER==0)/n(),
 salty=sum(SALTY>0)/n(),nosalty=sum(SALTY==0)/n())

Result:

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

So we can see that if Pizza was purchased, the likelihood of both beer and salty goes down compared to weeks when pizza is not purchased.

The same can be done for BEER and SALTY, of course.

An alternative, since we have a numerical variable for each of the three foods, would be to calculate correlation or even regression, but that's not what you asked for.

answered Nov 14 '18 at 14:10

iod

3,8532722

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

add a comment |

I'm not 100% sure this is what you're looking for, so let me know if I'm off track.

We start with what I suggested in the comment (slightly adjusted to replace the NAs with 0):

df<- df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n, fill=0)

This gives us a wide-data format where for each person-week, we see the number of purchases of each of the three food types, e.g.:

> head(df,3)
# A tibble: 3 x 6
# Groups: PANID, year, WEEK [3]
 PANID year WEEK BEER PIZZA SALTY
 <int> <int> <date> <dbl> <dbl> <dbl>
1 3108696 2007 2007-12-31 2 4 6
2 3108696 2008 2008-01-21 0 2 2
3 3108696 2008 2008-02-04 1 0 2

Now we can create a table that gives the probability for buying either BEER or SALTY (of any amount) if PIZZA (of any amount) was purchased in the same week:

df %>% group_by(PIZZA>0) %>% 
 summarise(beer=sum(BEER>0)/n(),nobeer=sum(BEER==0)/n(),
 salty=sum(SALTY>0)/n(),nosalty=sum(SALTY==0)/n())

Result:

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

So we can see that if Pizza was purchased, the likelihood of both beer and salty goes down compared to weeks when pizza is not purchased.

The same can be done for BEER and SALTY, of course.

An alternative, since we have a numerical variable for each of the three foods, would be to calculate correlation or even regression, but that's not what you asked for.

answered Nov 14 '18 at 14:10

iod

3,8532722

I'm not 100% sure this is what you're looking for, so let me know if I'm off track.

We start with what I suggested in the comment (slightly adjusted to replace the NAs with 0):

df<- df %>% group_by(PANID, year, WEEK,L1) %>% summarize(n=n()) %>% tidyr::spread(L1, n, fill=0)

This gives us a wide-data format where for each person-week, we see the number of purchases of each of the three food types, e.g.:

> head(df,3)
# A tibble: 3 x 6
# Groups: PANID, year, WEEK [3]
 PANID year WEEK BEER PIZZA SALTY
 <int> <int> <date> <dbl> <dbl> <dbl>
1 3108696 2007 2007-12-31 2 4 6
2 3108696 2008 2008-01-21 0 2 2
3 3108696 2008 2008-02-04 1 0 2

Now we can create a table that gives the probability for buying either BEER or SALTY (of any amount) if PIZZA (of any amount) was purchased in the same week:

df %>% group_by(PIZZA>0) %>% 
 summarise(beer=sum(BEER>0)/n(),nobeer=sum(BEER==0)/n(),
 salty=sum(SALTY>0)/n(),nosalty=sum(SALTY==0)/n())

Result:

# A tibble: 2 x 5
 `PIZZA > 0` beer nobeer salty nosalty
 <lgl> <dbl> <dbl> <dbl> <dbl>
1 FALSE 0.333 0.667 0.833 0.167
2 TRUE 0.257 0.743 0.586 0.414

So we can see that if Pizza was purchased, the likelihood of both beer and salty goes down compared to weeks when pizza is not purchased.

The same can be done for BEER and SALTY, of course.

An alternative, since we have a numerical variable for each of the three foods, would be to calculate correlation or even regression, but that's not what you asked for.

answered Nov 14 '18 at 14:10

iod

3,8532722

answered Nov 14 '18 at 14:10

iod

3,8532722

answered Nov 14 '18 at 14:10

iod

3,8532722

answered Nov 14 '18 at 14:10

iod

3,8532722

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

add a comment |

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

Yes I believe this is what I was hoping for. The regression will be the next step. I have made an edit to my original post regarding the understanding of the output probabilities.

– user113156
Nov 14 '18 at 16:11

Great. Don't forget to accept! Re your question - yes, your interpretation is correct. How this happens may be clearer if you look at the absolute numbers, by removing all the /n()s from the code.

– iod
Nov 14 '18 at 18:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

lJ6d2QaMu

搜尋此網誌

Pfthb