Get the latest record or the first transaction (sale) BigQuery SQL standard
up vote
1
down vote
favorite
I have a table on BigQuery with the following columns:
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
My goal is to have a table in the following format as output:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.
So far I manage use the following query, but it is not the desired output.
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
sql google-bigquery
add a comment |
up vote
1
down vote
favorite
I have a table on BigQuery with the following columns:
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
My goal is to have a table in the following format as output:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.
So far I manage use the following query, but it is not the desired output.
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
sql google-bigquery
what you mean byreset
the path - please explain more. also do you allow only two referrals in the path?
– Mikhail Berlyant
Nov 9 at 18:06
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a table on BigQuery with the following columns:
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
My goal is to have a table in the following format as output:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.
So far I manage use the following query, but it is not the desired output.
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
sql google-bigquery
I have a table on BigQuery with the following columns:
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
My goal is to have a table in the following format as output:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
What I don't understand is how to 'reset' the path for a user that have multiple conversions in the same period, as is the case for user_id = 1234.
So far I manage use the following query, but it is not the desired output.
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
sql google-bigquery
sql google-bigquery
asked Nov 9 at 18:03
Legos
4226
4226
what you mean byreset
the path - please explain more. also do you allow only two referrals in the path?
– Mikhail Berlyant
Nov 9 at 18:06
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15
add a comment |
what you mean byreset
the path - please explain more. also do you allow only two referrals in the path?
– Mikhail Berlyant
Nov 9 at 18:06
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15
what you mean by
reset
the path - please explain more. also do you allow only two referrals in the path?– Mikhail Berlyant
Nov 9 at 18:06
what you mean by
reset
the path - please explain more. also do you allow only two referrals in the path?– Mikhail Berlyant
Nov 9 at 18:06
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Below is for BigQuery Standard SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
You can test, play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
with result
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Below is for BigQuery Standard SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
You can test, play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
with result
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
add a comment |
up vote
2
down vote
accepted
Below is for BigQuery Standard SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
You can test, play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
with result
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Below is for BigQuery Standard SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
You can test, play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
with result
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
Below is for BigQuery Standard SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
You can test, play with above using dummy data from your question as in below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
with result
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
answered Nov 9 at 18:24
Mikhail Berlyant
52.8k42964
52.8k42964
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
add a comment |
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
The solution works fine! I just don't understand "1 PRECEDING" command at the end of the over function. So within a specific user_id, It goes from unbounded preceding (the first one row) to the last?
– Legos
Nov 12 at 18:10
1
1
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
that's correct. this logic catches when group value is changed so then you can use grp in GROUP BY - I recommend you to run inner queries one by one to see how it works - so you will be able to use this approach by your own :o)
– Mikhail Berlyant
Nov 12 at 18:12
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231121%2fget-the-latest-record-or-the-first-transaction-sale-bigquery-sql-standard%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
what you mean by
reset
the path - please explain more. also do you allow only two referrals in the path?– Mikhail Berlyant
Nov 9 at 18:06
It could have more than 2 referrals. By reset I mean, for the first user (id=1234) for the same period, it has 2 conversions, after the first conversion, the path should restart from site2. Instead of being site2 > site3 > site2.
– Legos
Nov 9 at 18:15
tnx. i see. got it
– Mikhail Berlyant
Nov 9 at 18:15