Group Pandas Dataframe & validate with condition










1















Dataframe:



id Base field1 field2 field3
1 Y AA BB CC
1 N AA BB CC
1 N AA BB CC
2 Y DD EE FF
2 N OO EE WT
2 N DD JQ FF
3 Y MM NN TT
3 Y MM NN TT
3 N MM NN TT


The expected result is to group this dataframe based on the ID column, 2 validations should be performed.



  1. first check if there is only one Base value 'Y' in each group. If its only true, then this row should be taken as a reference to validate step 2, else write the error as "More than one base Y found for ID" and proceed with step 1 for next ID


  2. Validate if data on all the other columns that have "Base:N" match with the data on the columns where Base is 'Y', and write the names of fields that are not matching in the error column. product column is a unique field and it can be ignored for comparison of data.


  3. Repeat this for all the ID int the dataframe.


The expected result is



id product Base field1 field2 field3 Error
1 A Y AA BB CC Reference value
1 B N AA BB CC Pass
1 C N AA BB CC Pass
2 D Y DD EE FF Reference value
2 E N OO EE WT field1, field3 mismatch
2 F N DE JQ FF field1, field2 mismatch
3 G Y MM NN TT more than 1 Y found for id:
3 H Y MM NN TT more than 1 Y found for id:
3 I N MM NN TT more than 1 Y found for id:


Any help on this?










share|improve this question




























    1















    Dataframe:



    id Base field1 field2 field3
    1 Y AA BB CC
    1 N AA BB CC
    1 N AA BB CC
    2 Y DD EE FF
    2 N OO EE WT
    2 N DD JQ FF
    3 Y MM NN TT
    3 Y MM NN TT
    3 N MM NN TT


    The expected result is to group this dataframe based on the ID column, 2 validations should be performed.



    1. first check if there is only one Base value 'Y' in each group. If its only true, then this row should be taken as a reference to validate step 2, else write the error as "More than one base Y found for ID" and proceed with step 1 for next ID


    2. Validate if data on all the other columns that have "Base:N" match with the data on the columns where Base is 'Y', and write the names of fields that are not matching in the error column. product column is a unique field and it can be ignored for comparison of data.


    3. Repeat this for all the ID int the dataframe.


    The expected result is



    id product Base field1 field2 field3 Error
    1 A Y AA BB CC Reference value
    1 B N AA BB CC Pass
    1 C N AA BB CC Pass
    2 D Y DD EE FF Reference value
    2 E N OO EE WT field1, field3 mismatch
    2 F N DE JQ FF field1, field2 mismatch
    3 G Y MM NN TT more than 1 Y found for id:
    3 H Y MM NN TT more than 1 Y found for id:
    3 I N MM NN TT more than 1 Y found for id:


    Any help on this?










    share|improve this question


























      1












      1








      1


      0






      Dataframe:



      id Base field1 field2 field3
      1 Y AA BB CC
      1 N AA BB CC
      1 N AA BB CC
      2 Y DD EE FF
      2 N OO EE WT
      2 N DD JQ FF
      3 Y MM NN TT
      3 Y MM NN TT
      3 N MM NN TT


      The expected result is to group this dataframe based on the ID column, 2 validations should be performed.



      1. first check if there is only one Base value 'Y' in each group. If its only true, then this row should be taken as a reference to validate step 2, else write the error as "More than one base Y found for ID" and proceed with step 1 for next ID


      2. Validate if data on all the other columns that have "Base:N" match with the data on the columns where Base is 'Y', and write the names of fields that are not matching in the error column. product column is a unique field and it can be ignored for comparison of data.


      3. Repeat this for all the ID int the dataframe.


      The expected result is



      id product Base field1 field2 field3 Error
      1 A Y AA BB CC Reference value
      1 B N AA BB CC Pass
      1 C N AA BB CC Pass
      2 D Y DD EE FF Reference value
      2 E N OO EE WT field1, field3 mismatch
      2 F N DE JQ FF field1, field2 mismatch
      3 G Y MM NN TT more than 1 Y found for id:
      3 H Y MM NN TT more than 1 Y found for id:
      3 I N MM NN TT more than 1 Y found for id:


      Any help on this?










      share|improve this question
















      Dataframe:



      id Base field1 field2 field3
      1 Y AA BB CC
      1 N AA BB CC
      1 N AA BB CC
      2 Y DD EE FF
      2 N OO EE WT
      2 N DD JQ FF
      3 Y MM NN TT
      3 Y MM NN TT
      3 N MM NN TT


      The expected result is to group this dataframe based on the ID column, 2 validations should be performed.



      1. first check if there is only one Base value 'Y' in each group. If its only true, then this row should be taken as a reference to validate step 2, else write the error as "More than one base Y found for ID" and proceed with step 1 for next ID


      2. Validate if data on all the other columns that have "Base:N" match with the data on the columns where Base is 'Y', and write the names of fields that are not matching in the error column. product column is a unique field and it can be ignored for comparison of data.


      3. Repeat this for all the ID int the dataframe.


      The expected result is



      id product Base field1 field2 field3 Error
      1 A Y AA BB CC Reference value
      1 B N AA BB CC Pass
      1 C N AA BB CC Pass
      2 D Y DD EE FF Reference value
      2 E N OO EE WT field1, field3 mismatch
      2 F N DE JQ FF field1, field2 mismatch
      3 G Y MM NN TT more than 1 Y found for id:
      3 H Y MM NN TT more than 1 Y found for id:
      3 I N MM NN TT more than 1 Y found for id:


      Any help on this?







      python-3.x pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 13:23







      Osceria

















      asked Nov 14 '18 at 8:18









      OsceriaOsceria

      599




      599






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Use custom function:



          def f(x):
          #boolena mask for compare Y
          mask = x['Base'] == 'Y'
          #check multiple Y by sum of Trues
          if mask.sum() > 1:
          x['Error'] = 'more than 1 base Y found for id:'.format(x.name)
          else:
          #remove columns for not comparing with not equal
          cols = x.columns.difference(['Base','product'])
          mask1 = x[cols].ne(x.loc[mask, cols])
          #if difference get columns names by dot
          if mask1.values.any():
          vals = mask1.dot(mask1.columns + ', ').str.rstrip(', ') + ' mismatch with base'
          x['Error'] = np.where(mask, 'Base: Y', vals)
          else:
          x['Error'] = np.where(mask, 'Base: Y', 'Pass')

          return x

          df = df.groupby(level=0).apply(f)
          print (df)
          product Base field1 field2 field3 Error
          id
          1 A Y AA BB CC Base: Y
          1 B N AA BB CC Pass
          1 C N AA BB CC Pass
          2 D Y DD EE FF Base: Y
          2 E N OO EE WT field1, field3 mismatch with base
          2 F N DD JQ FF field2 mismatch with base
          3 G Y MM NN TT more than 1 base Y found for id:3
          3 H Y MM NN TT more than 1 base Y found for id:3
          3 I N MM NN TT more than 1 base Y found for id:3


          Sample DataFrame:



          df = pd.DataFrame('id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 
          'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
          'Base': ['Y', 'N', 'N', 'Y', 'N', 'N', 'Y', 'Y', 'N'],
          'field1': ['AA', 'AA', 'AA', 'DD', 'OO', 'DD', 'MM', 'MM', 'MM'],
          'field2': ['BB', 'BB', 'BB', 'EE', 'EE', 'JQ', 'NN', 'NN', 'NN'],
          'field3': ['CC', 'CC', 'CC', 'FF', 'WT', 'FF', 'TT', 'TT', 'TT'])
          df = df.set_index('id')
          print (df)
          product Base field1 field2 field3
          id
          1 A Y AA BB CC
          1 B N AA BB CC
          1 C N AA BB CC
          2 D Y DD EE FF
          2 E N OO EE WT
          2 F N DD JQ FF
          3 G Y MM NN TT
          3 H Y MM NN TT
          3 I N MM NN TT





          share|improve this answer

























          • @Osceria - Can you ediit your question ?

            – jezrael
            Nov 14 '18 at 13:00











          • Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

            – Osceria
            Nov 14 '18 at 13:09











          • @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

            – jezrael
            Nov 14 '18 at 13:11











          • tried with my sample dataset, not returning the expected results. Edited the question with more details

            – Osceria
            Nov 14 '18 at 13:26






          • 1





            The code works perfectly fine now. Marked as Answer

            – Osceria
            Nov 14 '18 at 13:52










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53295685%2fgroup-pandas-dataframe-validate-with-condition%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Use custom function:



          def f(x):
          #boolena mask for compare Y
          mask = x['Base'] == 'Y'
          #check multiple Y by sum of Trues
          if mask.sum() > 1:
          x['Error'] = 'more than 1 base Y found for id:'.format(x.name)
          else:
          #remove columns for not comparing with not equal
          cols = x.columns.difference(['Base','product'])
          mask1 = x[cols].ne(x.loc[mask, cols])
          #if difference get columns names by dot
          if mask1.values.any():
          vals = mask1.dot(mask1.columns + ', ').str.rstrip(', ') + ' mismatch with base'
          x['Error'] = np.where(mask, 'Base: Y', vals)
          else:
          x['Error'] = np.where(mask, 'Base: Y', 'Pass')

          return x

          df = df.groupby(level=0).apply(f)
          print (df)
          product Base field1 field2 field3 Error
          id
          1 A Y AA BB CC Base: Y
          1 B N AA BB CC Pass
          1 C N AA BB CC Pass
          2 D Y DD EE FF Base: Y
          2 E N OO EE WT field1, field3 mismatch with base
          2 F N DD JQ FF field2 mismatch with base
          3 G Y MM NN TT more than 1 base Y found for id:3
          3 H Y MM NN TT more than 1 base Y found for id:3
          3 I N MM NN TT more than 1 base Y found for id:3


          Sample DataFrame:



          df = pd.DataFrame('id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 
          'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
          'Base': ['Y', 'N', 'N', 'Y', 'N', 'N', 'Y', 'Y', 'N'],
          'field1': ['AA', 'AA', 'AA', 'DD', 'OO', 'DD', 'MM', 'MM', 'MM'],
          'field2': ['BB', 'BB', 'BB', 'EE', 'EE', 'JQ', 'NN', 'NN', 'NN'],
          'field3': ['CC', 'CC', 'CC', 'FF', 'WT', 'FF', 'TT', 'TT', 'TT'])
          df = df.set_index('id')
          print (df)
          product Base field1 field2 field3
          id
          1 A Y AA BB CC
          1 B N AA BB CC
          1 C N AA BB CC
          2 D Y DD EE FF
          2 E N OO EE WT
          2 F N DD JQ FF
          3 G Y MM NN TT
          3 H Y MM NN TT
          3 I N MM NN TT





          share|improve this answer

























          • @Osceria - Can you ediit your question ?

            – jezrael
            Nov 14 '18 at 13:00











          • Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

            – Osceria
            Nov 14 '18 at 13:09











          • @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

            – jezrael
            Nov 14 '18 at 13:11











          • tried with my sample dataset, not returning the expected results. Edited the question with more details

            – Osceria
            Nov 14 '18 at 13:26






          • 1





            The code works perfectly fine now. Marked as Answer

            – Osceria
            Nov 14 '18 at 13:52















          0














          Use custom function:



          def f(x):
          #boolena mask for compare Y
          mask = x['Base'] == 'Y'
          #check multiple Y by sum of Trues
          if mask.sum() > 1:
          x['Error'] = 'more than 1 base Y found for id:'.format(x.name)
          else:
          #remove columns for not comparing with not equal
          cols = x.columns.difference(['Base','product'])
          mask1 = x[cols].ne(x.loc[mask, cols])
          #if difference get columns names by dot
          if mask1.values.any():
          vals = mask1.dot(mask1.columns + ', ').str.rstrip(', ') + ' mismatch with base'
          x['Error'] = np.where(mask, 'Base: Y', vals)
          else:
          x['Error'] = np.where(mask, 'Base: Y', 'Pass')

          return x

          df = df.groupby(level=0).apply(f)
          print (df)
          product Base field1 field2 field3 Error
          id
          1 A Y AA BB CC Base: Y
          1 B N AA BB CC Pass
          1 C N AA BB CC Pass
          2 D Y DD EE FF Base: Y
          2 E N OO EE WT field1, field3 mismatch with base
          2 F N DD JQ FF field2 mismatch with base
          3 G Y MM NN TT more than 1 base Y found for id:3
          3 H Y MM NN TT more than 1 base Y found for id:3
          3 I N MM NN TT more than 1 base Y found for id:3


          Sample DataFrame:



          df = pd.DataFrame('id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 
          'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
          'Base': ['Y', 'N', 'N', 'Y', 'N', 'N', 'Y', 'Y', 'N'],
          'field1': ['AA', 'AA', 'AA', 'DD', 'OO', 'DD', 'MM', 'MM', 'MM'],
          'field2': ['BB', 'BB', 'BB', 'EE', 'EE', 'JQ', 'NN', 'NN', 'NN'],
          'field3': ['CC', 'CC', 'CC', 'FF', 'WT', 'FF', 'TT', 'TT', 'TT'])
          df = df.set_index('id')
          print (df)
          product Base field1 field2 field3
          id
          1 A Y AA BB CC
          1 B N AA BB CC
          1 C N AA BB CC
          2 D Y DD EE FF
          2 E N OO EE WT
          2 F N DD JQ FF
          3 G Y MM NN TT
          3 H Y MM NN TT
          3 I N MM NN TT





          share|improve this answer

























          • @Osceria - Can you ediit your question ?

            – jezrael
            Nov 14 '18 at 13:00











          • Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

            – Osceria
            Nov 14 '18 at 13:09











          • @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

            – jezrael
            Nov 14 '18 at 13:11











          • tried with my sample dataset, not returning the expected results. Edited the question with more details

            – Osceria
            Nov 14 '18 at 13:26






          • 1





            The code works perfectly fine now. Marked as Answer

            – Osceria
            Nov 14 '18 at 13:52













          0












          0








          0







          Use custom function:



          def f(x):
          #boolena mask for compare Y
          mask = x['Base'] == 'Y'
          #check multiple Y by sum of Trues
          if mask.sum() > 1:
          x['Error'] = 'more than 1 base Y found for id:'.format(x.name)
          else:
          #remove columns for not comparing with not equal
          cols = x.columns.difference(['Base','product'])
          mask1 = x[cols].ne(x.loc[mask, cols])
          #if difference get columns names by dot
          if mask1.values.any():
          vals = mask1.dot(mask1.columns + ', ').str.rstrip(', ') + ' mismatch with base'
          x['Error'] = np.where(mask, 'Base: Y', vals)
          else:
          x['Error'] = np.where(mask, 'Base: Y', 'Pass')

          return x

          df = df.groupby(level=0).apply(f)
          print (df)
          product Base field1 field2 field3 Error
          id
          1 A Y AA BB CC Base: Y
          1 B N AA BB CC Pass
          1 C N AA BB CC Pass
          2 D Y DD EE FF Base: Y
          2 E N OO EE WT field1, field3 mismatch with base
          2 F N DD JQ FF field2 mismatch with base
          3 G Y MM NN TT more than 1 base Y found for id:3
          3 H Y MM NN TT more than 1 base Y found for id:3
          3 I N MM NN TT more than 1 base Y found for id:3


          Sample DataFrame:



          df = pd.DataFrame('id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 
          'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
          'Base': ['Y', 'N', 'N', 'Y', 'N', 'N', 'Y', 'Y', 'N'],
          'field1': ['AA', 'AA', 'AA', 'DD', 'OO', 'DD', 'MM', 'MM', 'MM'],
          'field2': ['BB', 'BB', 'BB', 'EE', 'EE', 'JQ', 'NN', 'NN', 'NN'],
          'field3': ['CC', 'CC', 'CC', 'FF', 'WT', 'FF', 'TT', 'TT', 'TT'])
          df = df.set_index('id')
          print (df)
          product Base field1 field2 field3
          id
          1 A Y AA BB CC
          1 B N AA BB CC
          1 C N AA BB CC
          2 D Y DD EE FF
          2 E N OO EE WT
          2 F N DD JQ FF
          3 G Y MM NN TT
          3 H Y MM NN TT
          3 I N MM NN TT





          share|improve this answer















          Use custom function:



          def f(x):
          #boolena mask for compare Y
          mask = x['Base'] == 'Y'
          #check multiple Y by sum of Trues
          if mask.sum() > 1:
          x['Error'] = 'more than 1 base Y found for id:'.format(x.name)
          else:
          #remove columns for not comparing with not equal
          cols = x.columns.difference(['Base','product'])
          mask1 = x[cols].ne(x.loc[mask, cols])
          #if difference get columns names by dot
          if mask1.values.any():
          vals = mask1.dot(mask1.columns + ', ').str.rstrip(', ') + ' mismatch with base'
          x['Error'] = np.where(mask, 'Base: Y', vals)
          else:
          x['Error'] = np.where(mask, 'Base: Y', 'Pass')

          return x

          df = df.groupby(level=0).apply(f)
          print (df)
          product Base field1 field2 field3 Error
          id
          1 A Y AA BB CC Base: Y
          1 B N AA BB CC Pass
          1 C N AA BB CC Pass
          2 D Y DD EE FF Base: Y
          2 E N OO EE WT field1, field3 mismatch with base
          2 F N DD JQ FF field2 mismatch with base
          3 G Y MM NN TT more than 1 base Y found for id:3
          3 H Y MM NN TT more than 1 base Y found for id:3
          3 I N MM NN TT more than 1 base Y found for id:3


          Sample DataFrame:



          df = pd.DataFrame('id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 
          'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
          'Base': ['Y', 'N', 'N', 'Y', 'N', 'N', 'Y', 'Y', 'N'],
          'field1': ['AA', 'AA', 'AA', 'DD', 'OO', 'DD', 'MM', 'MM', 'MM'],
          'field2': ['BB', 'BB', 'BB', 'EE', 'EE', 'JQ', 'NN', 'NN', 'NN'],
          'field3': ['CC', 'CC', 'CC', 'FF', 'WT', 'FF', 'TT', 'TT', 'TT'])
          df = df.set_index('id')
          print (df)
          product Base field1 field2 field3
          id
          1 A Y AA BB CC
          1 B N AA BB CC
          1 C N AA BB CC
          2 D Y DD EE FF
          2 E N OO EE WT
          2 F N DD JQ FF
          3 G Y MM NN TT
          3 H Y MM NN TT
          3 I N MM NN TT






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 13:11

























          answered Nov 14 '18 at 9:32









          jezraeljezrael

          343k25297370




          343k25297370












          • @Osceria - Can you ediit your question ?

            – jezrael
            Nov 14 '18 at 13:00











          • Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

            – Osceria
            Nov 14 '18 at 13:09











          • @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

            – jezrael
            Nov 14 '18 at 13:11











          • tried with my sample dataset, not returning the expected results. Edited the question with more details

            – Osceria
            Nov 14 '18 at 13:26






          • 1





            The code works perfectly fine now. Marked as Answer

            – Osceria
            Nov 14 '18 at 13:52

















          • @Osceria - Can you ediit your question ?

            – jezrael
            Nov 14 '18 at 13:00











          • Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

            – Osceria
            Nov 14 '18 at 13:09











          • @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

            – jezrael
            Nov 14 '18 at 13:11











          • tried with my sample dataset, not returning the expected results. Edited the question with more details

            – Osceria
            Nov 14 '18 at 13:26






          • 1





            The code works perfectly fine now. Marked as Answer

            – Osceria
            Nov 14 '18 at 13:52
















          @Osceria - Can you ediit your question ?

          – jezrael
          Nov 14 '18 at 13:00





          @Osceria - Can you ediit your question ?

          – jezrael
          Nov 14 '18 at 13:00













          Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

          – Osceria
          Nov 14 '18 at 13:09





          Tried the same code with the same dataframe, not getting the expected results. Actual results in the Error column is: "base: Y" in all the rows where base is 'Y' and "Error, field1, field2, field3 mismatch with base" in all rows where base is 'N'. The code doesn't capture the errors "more than 1 base Y found for an id:", only should highlight the fields where values are not matching.

          – Osceria
          Nov 14 '18 at 13:09













          @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

          – jezrael
          Nov 14 '18 at 13:11





          @Osceria - Can you test sample dataFrame data if working for you? Answer was edited.

          – jezrael
          Nov 14 '18 at 13:11













          tried with my sample dataset, not returning the expected results. Edited the question with more details

          – Osceria
          Nov 14 '18 at 13:26





          tried with my sample dataset, not returning the expected results. Edited the question with more details

          – Osceria
          Nov 14 '18 at 13:26




          1




          1





          The code works perfectly fine now. Marked as Answer

          – Osceria
          Nov 14 '18 at 13:52





          The code works perfectly fine now. Marked as Answer

          – Osceria
          Nov 14 '18 at 13:52



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53295685%2fgroup-pandas-dataframe-validate-with-condition%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

          Syphilis

          Darth Vader #20