comparing two pandas dataframes with different column names and finding match
I have two dataframes :
df1:
A B C
1 ss 123
2 sv 234
3 sc 333
df2:
A dd xc
1 ss 123
df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?
python pandas dataframe
add a comment |
I have two dataframes :
df1:
A B C
1 ss 123
2 sv 234
3 sc 333
df2:
A dd xc
1 ss 123
df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?
python pandas dataframe
add a comment |
I have two dataframes :
df1:
A B C
1 ss 123
2 sv 234
3 sc 333
df2:
A dd xc
1 ss 123
df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?
python pandas dataframe
I have two dataframes :
df1:
A B C
1 ss 123
2 sv 234
3 sc 333
df2:
A dd xc
1 ss 123
df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?
python pandas dataframe
python pandas dataframe
asked Nov 14 '18 at 9:47
qwwwqwww
443210
443210
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Using Numpy comparisons with np.all
with parameter axis=1
for rows:
df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])
df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]
Or:
df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]
print(df3)
A B C
0 1 ss 123
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
|
show 3 more comments
Additional to Sandeep's answer, can do:
df1[np.all(df1.values == df2.values,1)].any().any()
For getting a boolean.
Or another way:
df1[(df2.values==df1.values).all(1)].any().any()
Or:
pd.merge(df1,df2).equals(df1)
Note: both output True
Check specific column (same as Sandeep's):
df1[col].isin(df2[col]).any()
add a comment |
How to check whether there is a match for that row of df2, in df1?
You can align columns and then check equality of df1
with the only row of df2
:
df2.columns = df1.columns
res = (df1 == df2.iloc[0]).all(1).any() # True
The benefit of this solution is you aren't subsetting df1
(expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True
.
This is still not particularly efficient as you are considering every row in df1
rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you havedf1
anddf2
as in your question. You can't automate alignment of columns. How is Pandas meant to knowxc
maps toC
?
– jpp
Nov 14 '18 at 10:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297191%2fcomparing-two-pandas-dataframes-with-different-column-names-and-finding-match%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using Numpy comparisons with np.all
with parameter axis=1
for rows:
df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])
df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]
Or:
df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]
print(df3)
A B C
0 1 ss 123
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
|
show 3 more comments
Using Numpy comparisons with np.all
with parameter axis=1
for rows:
df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])
df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]
Or:
df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]
print(df3)
A B C
0 1 ss 123
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
|
show 3 more comments
Using Numpy comparisons with np.all
with parameter axis=1
for rows:
df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])
df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]
Or:
df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]
print(df3)
A B C
0 1 ss 123
Using Numpy comparisons with np.all
with parameter axis=1
for rows:
df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])
df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]
Or:
df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]
print(df3)
A B C
0 1 ss 123
edited Nov 14 '18 at 10:19
answered Nov 14 '18 at 9:50
Sandeep KadapaSandeep Kadapa
7,308831
7,308831
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
|
show 3 more comments
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
– qwww
Nov 14 '18 at 9:51
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
@qwww check the update.
– Sandeep Kadapa
Nov 14 '18 at 9:54
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
– qwww
Nov 14 '18 at 10:00
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
@qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
– Sandeep Kadapa
Nov 14 '18 at 10:10
1
1
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
@jezrael Yes, Thanks for the suggestion.
– Sandeep Kadapa
Nov 14 '18 at 10:21
|
show 3 more comments
Additional to Sandeep's answer, can do:
df1[np.all(df1.values == df2.values,1)].any().any()
For getting a boolean.
Or another way:
df1[(df2.values==df1.values).all(1)].any().any()
Or:
pd.merge(df1,df2).equals(df1)
Note: both output True
Check specific column (same as Sandeep's):
df1[col].isin(df2[col]).any()
add a comment |
Additional to Sandeep's answer, can do:
df1[np.all(df1.values == df2.values,1)].any().any()
For getting a boolean.
Or another way:
df1[(df2.values==df1.values).all(1)].any().any()
Or:
pd.merge(df1,df2).equals(df1)
Note: both output True
Check specific column (same as Sandeep's):
df1[col].isin(df2[col]).any()
add a comment |
Additional to Sandeep's answer, can do:
df1[np.all(df1.values == df2.values,1)].any().any()
For getting a boolean.
Or another way:
df1[(df2.values==df1.values).all(1)].any().any()
Or:
pd.merge(df1,df2).equals(df1)
Note: both output True
Check specific column (same as Sandeep's):
df1[col].isin(df2[col]).any()
Additional to Sandeep's answer, can do:
df1[np.all(df1.values == df2.values,1)].any().any()
For getting a boolean.
Or another way:
df1[(df2.values==df1.values).all(1)].any().any()
Or:
pd.merge(df1,df2).equals(df1)
Note: both output True
Check specific column (same as Sandeep's):
df1[col].isin(df2[col]).any()
answered Nov 14 '18 at 9:53
U9-ForwardU9-Forward
16.3k51543
16.3k51543
add a comment |
add a comment |
How to check whether there is a match for that row of df2, in df1?
You can align columns and then check equality of df1
with the only row of df2
:
df2.columns = df1.columns
res = (df1 == df2.iloc[0]).all(1).any() # True
The benefit of this solution is you aren't subsetting df1
(expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True
.
This is still not particularly efficient as you are considering every row in df1
rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you havedf1
anddf2
as in your question. You can't automate alignment of columns. How is Pandas meant to knowxc
maps toC
?
– jpp
Nov 14 '18 at 10:41
add a comment |
How to check whether there is a match for that row of df2, in df1?
You can align columns and then check equality of df1
with the only row of df2
:
df2.columns = df1.columns
res = (df1 == df2.iloc[0]).all(1).any() # True
The benefit of this solution is you aren't subsetting df1
(expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True
.
This is still not particularly efficient as you are considering every row in df1
rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you havedf1
anddf2
as in your question. You can't automate alignment of columns. How is Pandas meant to knowxc
maps toC
?
– jpp
Nov 14 '18 at 10:41
add a comment |
How to check whether there is a match for that row of df2, in df1?
You can align columns and then check equality of df1
with the only row of df2
:
df2.columns = df1.columns
res = (df1 == df2.iloc[0]).all(1).any() # True
The benefit of this solution is you aren't subsetting df1
(expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True
.
This is still not particularly efficient as you are considering every row in df1
rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.
How to check whether there is a match for that row of df2, in df1?
You can align columns and then check equality of df1
with the only row of df2
:
df2.columns = df1.columns
res = (df1 == df2.iloc[0]).all(1).any() # True
The benefit of this solution is you aren't subsetting df1
(expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True
.
This is still not particularly efficient as you are considering every row in df1
rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.
answered Nov 14 '18 at 10:09
jppjpp
101k2164114
101k2164114
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you havedf1
anddf2
as in your question. You can't automate alignment of columns. How is Pandas meant to knowxc
maps toC
?
– jpp
Nov 14 '18 at 10:41
add a comment |
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you havedf1
anddf2
as in your question. You can't automate alignment of columns. How is Pandas meant to knowxc
maps toC
?
– jpp
Nov 14 '18 at 10:41
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
what if column numbers are not same?
– qwww
Nov 14 '18 at 10:40
@qwww, Then you have a new question :). Seriously, I'm assuming you have
df1
and df2
as in your question. You can't automate alignment of columns. How is Pandas meant to know xc
maps to C
?– jpp
Nov 14 '18 at 10:41
@qwww, Then you have a new question :). Seriously, I'm assuming you have
df1
and df2
as in your question. You can't automate alignment of columns. How is Pandas meant to know xc
maps to C
?– jpp
Nov 14 '18 at 10:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297191%2fcomparing-two-pandas-dataframes-with-different-column-names-and-finding-match%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown