comparing two pandas dataframes with different column names and finding match










1















I have two dataframes :



df1:



A B C
1 ss 123
2 sv 234
3 sc 333


df2:



A dd xc
1 ss 123


df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?










share|improve this question


























    1















    I have two dataframes :



    df1:



    A B C
    1 ss 123
    2 sv 234
    3 sc 333


    df2:



    A dd xc
    1 ss 123


    df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?










    share|improve this question
























      1












      1








      1


      1






      I have two dataframes :



      df1:



      A B C
      1 ss 123
      2 sv 234
      3 sc 333


      df2:



      A dd xc
      1 ss 123


      df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?










      share|improve this question














      I have two dataframes :



      df1:



      A B C
      1 ss 123
      2 sv 234
      3 sc 333


      df2:



      A dd xc
      1 ss 123


      df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?







      python pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 9:47









      qwwwqwww

      443210




      443210






















          3 Answers
          3






          active

          oldest

          votes


















          3














          Using Numpy comparisons with np.all with parameter axis=1 for rows:



          df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
          df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])

          df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]


          Or:



          df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]



          print(df3)
          A B C
          0 1 ss 123





          share|improve this answer

























          • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

            – qwww
            Nov 14 '18 at 9:51












          • @qwww check the update.

            – Sandeep Kadapa
            Nov 14 '18 at 9:54











          • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

            – qwww
            Nov 14 '18 at 10:00











          • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

            – Sandeep Kadapa
            Nov 14 '18 at 10:10






          • 1





            @jezrael Yes, Thanks for the suggestion.

            – Sandeep Kadapa
            Nov 14 '18 at 10:21


















          0














          Additional to Sandeep's answer, can do:



          df1[np.all(df1.values == df2.values,1)].any().any()


          For getting a boolean.



          Or another way:



          df1[(df2.values==df1.values).all(1)].any().any()


          Or:



          pd.merge(df1,df2).equals(df1)


          Note: both output True



          Check specific column (same as Sandeep's):



          df1[col].isin(df2[col]).any()





          share|improve this answer






























            0















            How to check whether there is a match for that row of df2, in df1?




            You can align columns and then check equality of df1 with the only row of df2:



            df2.columns = df1.columns

            res = (df1 == df2.iloc[0]).all(1).any() # True


            The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.



            This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.






            share|improve this answer























            • what if column numbers are not same?

              – qwww
              Nov 14 '18 at 10:40











            • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

              – jpp
              Nov 14 '18 at 10:41











            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297191%2fcomparing-two-pandas-dataframes-with-different-column-names-and-finding-match%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            Using Numpy comparisons with np.all with parameter axis=1 for rows:



            df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
            df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])

            df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]


            Or:



            df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]



            print(df3)
            A B C
            0 1 ss 123





            share|improve this answer

























            • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

              – qwww
              Nov 14 '18 at 9:51












            • @qwww check the update.

              – Sandeep Kadapa
              Nov 14 '18 at 9:54











            • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

              – qwww
              Nov 14 '18 at 10:00











            • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

              – Sandeep Kadapa
              Nov 14 '18 at 10:10






            • 1





              @jezrael Yes, Thanks for the suggestion.

              – Sandeep Kadapa
              Nov 14 '18 at 10:21















            3














            Using Numpy comparisons with np.all with parameter axis=1 for rows:



            df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
            df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])

            df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]


            Or:



            df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]



            print(df3)
            A B C
            0 1 ss 123





            share|improve this answer

























            • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

              – qwww
              Nov 14 '18 at 9:51












            • @qwww check the update.

              – Sandeep Kadapa
              Nov 14 '18 at 9:54











            • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

              – qwww
              Nov 14 '18 at 10:00











            • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

              – Sandeep Kadapa
              Nov 14 '18 at 10:10






            • 1





              @jezrael Yes, Thanks for the suggestion.

              – Sandeep Kadapa
              Nov 14 '18 at 10:21













            3












            3








            3







            Using Numpy comparisons with np.all with parameter axis=1 for rows:



            df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
            df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])

            df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]


            Or:



            df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]



            print(df3)
            A B C
            0 1 ss 123





            share|improve this answer















            Using Numpy comparisons with np.all with parameter axis=1 for rows:



            df1 = pd.DataFrame('A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333])
            df2 = pd.DataFrame('A': [1], 'dd': ['ss'], 'xc': [123])

            df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]


            Or:



            df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]



            print(df3)
            A B C
            0 1 ss 123






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 14 '18 at 10:19

























            answered Nov 14 '18 at 9:50









            Sandeep KadapaSandeep Kadapa

            7,308831




            7,308831












            • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

              – qwww
              Nov 14 '18 at 9:51












            • @qwww check the update.

              – Sandeep Kadapa
              Nov 14 '18 at 9:54











            • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

              – qwww
              Nov 14 '18 at 10:00











            • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

              – Sandeep Kadapa
              Nov 14 '18 at 10:10






            • 1





              @jezrael Yes, Thanks for the suggestion.

              – Sandeep Kadapa
              Nov 14 '18 at 10:21

















            • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

              – qwww
              Nov 14 '18 at 9:51












            • @qwww check the update.

              – Sandeep Kadapa
              Nov 14 '18 at 9:54











            • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

              – qwww
              Nov 14 '18 at 10:00











            • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

              – Sandeep Kadapa
              Nov 14 '18 at 10:10






            • 1





              @jezrael Yes, Thanks for the suggestion.

              – Sandeep Kadapa
              Nov 14 '18 at 10:21
















            What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

            – qwww
            Nov 14 '18 at 9:51






            What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?

            – qwww
            Nov 14 '18 at 9:51














            @qwww check the update.

            – Sandeep Kadapa
            Nov 14 '18 at 9:54





            @qwww check the update.

            – Sandeep Kadapa
            Nov 14 '18 at 9:54













            ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

            – qwww
            Nov 14 '18 at 10:00





            ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

            – qwww
            Nov 14 '18 at 10:00













            @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

            – Sandeep Kadapa
            Nov 14 '18 at 10:10





            @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.

            – Sandeep Kadapa
            Nov 14 '18 at 10:10




            1




            1





            @jezrael Yes, Thanks for the suggestion.

            – Sandeep Kadapa
            Nov 14 '18 at 10:21





            @jezrael Yes, Thanks for the suggestion.

            – Sandeep Kadapa
            Nov 14 '18 at 10:21













            0














            Additional to Sandeep's answer, can do:



            df1[np.all(df1.values == df2.values,1)].any().any()


            For getting a boolean.



            Or another way:



            df1[(df2.values==df1.values).all(1)].any().any()


            Or:



            pd.merge(df1,df2).equals(df1)


            Note: both output True



            Check specific column (same as Sandeep's):



            df1[col].isin(df2[col]).any()





            share|improve this answer



























              0














              Additional to Sandeep's answer, can do:



              df1[np.all(df1.values == df2.values,1)].any().any()


              For getting a boolean.



              Or another way:



              df1[(df2.values==df1.values).all(1)].any().any()


              Or:



              pd.merge(df1,df2).equals(df1)


              Note: both output True



              Check specific column (same as Sandeep's):



              df1[col].isin(df2[col]).any()





              share|improve this answer

























                0












                0








                0







                Additional to Sandeep's answer, can do:



                df1[np.all(df1.values == df2.values,1)].any().any()


                For getting a boolean.



                Or another way:



                df1[(df2.values==df1.values).all(1)].any().any()


                Or:



                pd.merge(df1,df2).equals(df1)


                Note: both output True



                Check specific column (same as Sandeep's):



                df1[col].isin(df2[col]).any()





                share|improve this answer













                Additional to Sandeep's answer, can do:



                df1[np.all(df1.values == df2.values,1)].any().any()


                For getting a boolean.



                Or another way:



                df1[(df2.values==df1.values).all(1)].any().any()


                Or:



                pd.merge(df1,df2).equals(df1)


                Note: both output True



                Check specific column (same as Sandeep's):



                df1[col].isin(df2[col]).any()






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 14 '18 at 9:53









                U9-ForwardU9-Forward

                16.3k51543




                16.3k51543





















                    0















                    How to check whether there is a match for that row of df2, in df1?




                    You can align columns and then check equality of df1 with the only row of df2:



                    df2.columns = df1.columns

                    res = (df1 == df2.iloc[0]).all(1).any() # True


                    The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.



                    This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.






                    share|improve this answer























                    • what if column numbers are not same?

                      – qwww
                      Nov 14 '18 at 10:40











                    • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                      – jpp
                      Nov 14 '18 at 10:41
















                    0















                    How to check whether there is a match for that row of df2, in df1?




                    You can align columns and then check equality of df1 with the only row of df2:



                    df2.columns = df1.columns

                    res = (df1 == df2.iloc[0]).all(1).any() # True


                    The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.



                    This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.






                    share|improve this answer























                    • what if column numbers are not same?

                      – qwww
                      Nov 14 '18 at 10:40











                    • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                      – jpp
                      Nov 14 '18 at 10:41














                    0












                    0








                    0








                    How to check whether there is a match for that row of df2, in df1?




                    You can align columns and then check equality of df1 with the only row of df2:



                    df2.columns = df1.columns

                    res = (df1 == df2.iloc[0]).all(1).any() # True


                    The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.



                    This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.






                    share|improve this answer














                    How to check whether there is a match for that row of df2, in df1?




                    You can align columns and then check equality of df1 with the only row of df2:



                    df2.columns = df1.columns

                    res = (df1 == df2.iloc[0]).all(1).any() # True


                    The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.



                    This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 14 '18 at 10:09









                    jppjpp

                    101k2164114




                    101k2164114












                    • what if column numbers are not same?

                      – qwww
                      Nov 14 '18 at 10:40











                    • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                      – jpp
                      Nov 14 '18 at 10:41


















                    • what if column numbers are not same?

                      – qwww
                      Nov 14 '18 at 10:40











                    • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                      – jpp
                      Nov 14 '18 at 10:41

















                    what if column numbers are not same?

                    – qwww
                    Nov 14 '18 at 10:40





                    what if column numbers are not same?

                    – qwww
                    Nov 14 '18 at 10:40













                    @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                    – jpp
                    Nov 14 '18 at 10:41






                    @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?

                    – jpp
                    Nov 14 '18 at 10:41


















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297191%2fcomparing-two-pandas-dataframes-with-different-column-names-and-finding-match%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Syphilis

                    Darth Vader #20