Get the latest record or the first transaction (sale) BigQuery SQL standard









up vote
1
down vote

favorite












I have a table on BigQuery with the following columns:



user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1


My goal is to have a table in the following format as output:



path transactions
site2 > site3 2
site2 1
site4 > site5 0


What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.



So far I manage use the following query, but it is not the desired output.



SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC









share|improve this question





















  • what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
    – Mikhail Berlyant
    Nov 9 at 18:06











  • It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
    – Legos
    Nov 9 at 18:15










  • tnx. i see. got it
    – Mikhail Berlyant
    Nov 9 at 18:15














up vote
1
down vote

favorite












I have a table on BigQuery with the following columns:



user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1


My goal is to have a table in the following format as output:



path transactions
site2 > site3 2
site2 1
site4 > site5 0


What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.



So far I manage use the following query, but it is not the desired output.



SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC









share|improve this question





















  • what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
    – Mikhail Berlyant
    Nov 9 at 18:06











  • It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
    – Legos
    Nov 9 at 18:15










  • tnx. i see. got it
    – Mikhail Berlyant
    Nov 9 at 18:15












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a table on BigQuery with the following columns:



user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1


My goal is to have a table in the following format as output:



path transactions
site2 > site3 2
site2 1
site4 > site5 0


What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.



So far I manage use the following query, but it is not the desired output.



SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC









share|improve this question













I have a table on BigQuery with the following columns:



user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1


My goal is to have a table in the following format as output:



path transactions
site2 > site3 2
site2 1
site4 > site5 0


What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.



So far I manage use the following query, but it is not the desired output.



SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC






sql google-bigquery






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 at 18:03









Legos

4226




4226











  • what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
    – Mikhail Berlyant
    Nov 9 at 18:06











  • It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
    – Legos
    Nov 9 at 18:15










  • tnx. i see. got it
    – Mikhail Berlyant
    Nov 9 at 18:15
















  • what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
    – Mikhail Berlyant
    Nov 9 at 18:06











  • It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
    – Legos
    Nov 9 at 18:15










  • tnx. i see. got it
    – Mikhail Berlyant
    Nov 9 at 18:15















what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
– Mikhail Berlyant
Nov 9 at 18:06





what you mean by reset the path - please explain more. also do you allow only two referrals in the path?
– Mikhail Berlyant
Nov 9 at 18:06













It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15




It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15












tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15




tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










Below is for BigQuery Standard SQL



#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path


You can test, play with above using dummy data from your question as in below



#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC


with result



Row path transactions 
1 site2 > site3 2
2 site2 1
3 site4 > site5 0





share|improve this answer




















  • The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
    – Legos
    Nov 12 at 18:10






  • 1




    that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
    – Mikhail Berlyant
    Nov 12 at 18:12










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231121%2fget-the-latest-record-or-the-first-transaction-sale-bigquery-sql-standard%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










Below is for BigQuery Standard SQL



#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path


You can test, play with above using dummy data from your question as in below



#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC


with result



Row path transactions 
1 site2 > site3 2
2 site2 1
3 site4 > site5 0





share|improve this answer




















  • The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
    – Legos
    Nov 12 at 18:10






  • 1




    that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
    – Mikhail Berlyant
    Nov 12 at 18:12














up vote
2
down vote



accepted










Below is for BigQuery Standard SQL



#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path


You can test, play with above using dummy data from your question as in below



#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC


with result



Row path transactions 
1 site2 > site3 2
2 site2 1
3 site4 > site5 0





share|improve this answer




















  • The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
    – Legos
    Nov 12 at 18:10






  • 1




    that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
    – Mikhail Berlyant
    Nov 12 at 18:12












up vote
2
down vote



accepted







up vote
2
down vote



accepted






Below is for BigQuery Standard SQL



#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path


You can test, play with above using dummy data from your question as in below



#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC


with result



Row path transactions 
1 site2 > site3 2
2 site2 1
3 site4 > site5 0





share|improve this answer












Below is for BigQuery Standard SQL



#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path


You can test, play with above using dummy data from your question as in below



#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC


with result



Row path transactions 
1 site2 > site3 2
2 site2 1
3 site4 > site5 0






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 9 at 18:24









Mikhail Berlyant

52.8k42964




52.8k42964











  • The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
    – Legos
    Nov 12 at 18:10






  • 1




    that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
    – Mikhail Berlyant
    Nov 12 at 18:12
















  • The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
    – Legos
    Nov 12 at 18:10






  • 1




    that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
    – Mikhail Berlyant
    Nov 12 at 18:12















The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10




The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10




1




1




that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12




that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231121%2fget-the-latest-record-or-the-first-transaction-sale-bigquery-sql-standard%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo