Select distinct row with an empty column but not when empty column is not part of a duplicate row
Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.
e.g. Input is
id | name | fathername | address
1 | bob | john | street1
1 | bob | john |
2 | amir | khan |
3 | roby | johanson | street3
Output
id | name | fathername | address
1 | bob | john | street1
2 | amir | khan |
3 | roby | johanson | street3
We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.
hive hiveql
add a comment |
Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.
e.g. Input is
id | name | fathername | address
1 | bob | john | street1
1 | bob | john |
2 | amir | khan |
3 | roby | johanson | street3
Output
id | name | fathername | address
1 | bob | john | street1
2 | amir | khan |
3 | roby | johanson | street3
We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.
hive hiveql
add a comment |
Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.
e.g. Input is
id | name | fathername | address
1 | bob | john | street1
1 | bob | john |
2 | amir | khan |
3 | roby | johanson | street3
Output
id | name | fathername | address
1 | bob | john | street1
2 | amir | khan |
3 | roby | johanson | street3
We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.
hive hiveql
Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.
e.g. Input is
id | name | fathername | address
1 | bob | john | street1
1 | bob | john |
2 | amir | khan |
3 | roby | johanson | street3
Output
id | name | fathername | address
1 | bob | john | street1
2 | amir | khan |
3 | roby | johanson | street3
We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.
hive hiveql
hive hiveql
edited Nov 14 '18 at 22:20
VK_217
7,76252447
7,76252447
asked Nov 14 '18 at 22:00
user1859366user1859366
5819
5819
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You can use GROUP BY
with MAX
:
select id, name, fathername, max(address)
from data
group by id, name, fathername
Or if you want to use select *
:
select *
from data
where address is not null
union
select *
from data
where address is null and id not in (
select id
from data
where address is not null
)
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
add a comment |
You can prioritize the non-null address row in an order by
using row_number
.
select *
from (select t.*
,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
from tbl t
) t
where rnum = 1
Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309374%2fselect-distinct-row-with-an-empty-column-but-not-when-empty-column-is-not-part-o%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use GROUP BY
with MAX
:
select id, name, fathername, max(address)
from data
group by id, name, fathername
Or if you want to use select *
:
select *
from data
where address is not null
union
select *
from data
where address is null and id not in (
select id
from data
where address is not null
)
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
add a comment |
You can use GROUP BY
with MAX
:
select id, name, fathername, max(address)
from data
group by id, name, fathername
Or if you want to use select *
:
select *
from data
where address is not null
union
select *
from data
where address is null and id not in (
select id
from data
where address is not null
)
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
add a comment |
You can use GROUP BY
with MAX
:
select id, name, fathername, max(address)
from data
group by id, name, fathername
Or if you want to use select *
:
select *
from data
where address is not null
union
select *
from data
where address is null and id not in (
select id
from data
where address is not null
)
You can use GROUP BY
with MAX
:
select id, name, fathername, max(address)
from data
group by id, name, fathername
Or if you want to use select *
:
select *
from data
where address is not null
union
select *
from data
where address is null and id not in (
select id
from data
where address is not null
)
edited Nov 14 '18 at 22:31
answered Nov 14 '18 at 22:08
slaaksoslaakso
3,1751820
3,1751820
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
add a comment |
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
Will it work with select * as well because * will also contain address.
– user1859366
Nov 14 '18 at 22:27
1
1
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
No, you will need to use max in the first one. Added version with select *
– slaakso
Nov 14 '18 at 22:32
add a comment |
You can prioritize the non-null address row in an order by
using row_number
.
select *
from (select t.*
,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
from tbl t
) t
where rnum = 1
Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.
add a comment |
You can prioritize the non-null address row in an order by
using row_number
.
select *
from (select t.*
,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
from tbl t
) t
where rnum = 1
Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.
add a comment |
You can prioritize the non-null address row in an order by
using row_number
.
select *
from (select t.*
,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
from tbl t
) t
where rnum = 1
Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.
You can prioritize the non-null address row in an order by
using row_number
.
select *
from (select t.*
,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
from tbl t
) t
where rnum = 1
Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.
answered Nov 14 '18 at 22:32
Vamsi PrabhalaVamsi Prabhala
41.6k42139
41.6k42139
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309374%2fselect-distinct-row-with-an-empty-column-but-not-when-empty-column-is-not-part-o%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown