How to check if panda dataframe group have same data

I have a pandas dataframe as below

id name Base field1 field2 field3
1 AA Y Yes Consumer Not Applicable 
1 BB N Yes Consumer Not Applicable 
2 CC Y Yes Consumer Not Applicable 
2 DD N Yes Not Applicable Not Applicable 
2 EE N No Not Applicable Modified
3 FF Y Yes Not Applicable Applicable 
3 GG N Yes Not Applicable Not Applicable 
3 HH N Yes Not Applicable Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=
for col in df.columns:
 result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)
 result_list.append(result)

final = pd.concat(result_list,1)

The expected result is

id name field1 field2 field3 Error
1 AA Yes Consumer Not Applicable Pass 
1 BB Yes Consumer Not Applicable Pass
2 CC Yes Consumer Not Applicable field1, field2, field3 mismatch for ID: 2
2 DD Yes Not Applicable Not Applicable field1, field2, field3 mismatch for ID: 2
2 EE No Not Applicable Modified field1, field2, field3 mismatch for ID: 2
3 FF Yes Not Applicable Applicable field3 mismatch for ID: 3
3 GG Yes Not Applicable Not Applicable field3 mismatch for ID: 3
3 HH Yes Not Applicable Not Applicable field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

I have a pandas dataframe as below

id name Base field1 field2 field3
1 AA Y Yes Consumer Not Applicable 
1 BB N Yes Consumer Not Applicable 
2 CC Y Yes Consumer Not Applicable 
2 DD N Yes Not Applicable Not Applicable 
2 EE N No Not Applicable Modified
3 FF Y Yes Not Applicable Applicable 
3 GG N Yes Not Applicable Not Applicable 
3 HH N Yes Not Applicable Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=
for col in df.columns:
 result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)
 result_list.append(result)

final = pd.concat(result_list,1)

The expected result is

id name field1 field2 field3 Error
1 AA Yes Consumer Not Applicable Pass 
1 BB Yes Consumer Not Applicable Pass
2 CC Yes Consumer Not Applicable field1, field2, field3 mismatch for ID: 2
2 DD Yes Not Applicable Not Applicable field1, field2, field3 mismatch for ID: 2
2 EE No Not Applicable Modified field1, field2, field3 mismatch for ID: 2
3 FF Yes Not Applicable Applicable field3 mismatch for ID: 3
3 GG Yes Not Applicable Not Applicable field3 mismatch for ID: 3
3 HH Yes Not Applicable Not Applicable field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

I have a pandas dataframe as below

id name Base field1 field2 field3
1 AA Y Yes Consumer Not Applicable 
1 BB N Yes Consumer Not Applicable 
2 CC Y Yes Consumer Not Applicable 
2 DD N Yes Not Applicable Not Applicable 
2 EE N No Not Applicable Modified
3 FF Y Yes Not Applicable Applicable 
3 GG N Yes Not Applicable Not Applicable 
3 HH N Yes Not Applicable Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=
for col in df.columns:
 result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)
 result_list.append(result)

final = pd.concat(result_list,1)

The expected result is

id name field1 field2 field3 Error
1 AA Yes Consumer Not Applicable Pass 
1 BB Yes Consumer Not Applicable Pass
2 CC Yes Consumer Not Applicable field1, field2, field3 mismatch for ID: 2
2 DD Yes Not Applicable Not Applicable field1, field2, field3 mismatch for ID: 2
2 EE No Not Applicable Modified field1, field2, field3 mismatch for ID: 2
3 FF Yes Not Applicable Applicable field3 mismatch for ID: 3
3 GG Yes Not Applicable Not Applicable field3 mismatch for ID: 3
3 HH Yes Not Applicable Not Applicable field3 mismatch for ID: 3

Any help on this?

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

I have a pandas dataframe as below

id name Base field1 field2 field3
1 AA Y Yes Consumer Not Applicable 
1 BB N Yes Consumer Not Applicable 
2 CC Y Yes Consumer Not Applicable 
2 DD N Yes Not Applicable Not Applicable 
2 EE N No Not Applicable Modified
3 FF Y Yes Not Applicable Applicable 
3 GG N Yes Not Applicable Not Applicable 
3 HH N Yes Not Applicable Not Applicable

The expected result is to group this dataframe based on the ID column and check if the data on all the other columns are the same data in each group, and finally write the results.

I tried this to validate the data on each group but it always says TRUE

Code:

result_list=
for col in df.columns:
 result = df.groupby(level=0)[col].apply(lambda x: len(set(x))==1)
 result_list.append(result)

final = pd.concat(result_list,1)

The expected result is

id name field1 field2 field3 Error
1 AA Yes Consumer Not Applicable Pass 
1 BB Yes Consumer Not Applicable Pass
2 CC Yes Consumer Not Applicable field1, field2, field3 mismatch for ID: 2
2 DD Yes Not Applicable Not Applicable field1, field2, field3 mismatch for ID: 2
2 EE No Not Applicable Modified field1, field2, field3 mismatch for ID: 2
3 FF Yes Not Applicable Applicable field3 mismatch for ID: 3
3 GG Yes Not Applicable Not Applicable field3 mismatch for ID: 3
3 HH Yes Not Applicable Not Applicable field3 mismatch for ID: 3

Any help on this?

python-3.x pandas dataframe pandas-groupby

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

edited Nov 14 '18 at 10:23

Akhilesh Pandey

549313

asked Nov 13 '18 at 12:52

Osceria

599

asked Nov 13 '18 at 12:52

Osceria

599

asked Nov 13 '18 at 12:52

Osceria

599

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

What's your desired result, only id = 1 passes your test?

– jpp
Nov 13 '18 at 13:38

Hi, I've updated the dataframe and expected result. let me know if it helps

– Osceria
Nov 13 '18 at 14:02

add a comment |

2 Answers
2

active

oldest

votes

You may get what you want with the code (assuming that df has index named id):

def handler(df):
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 return 'error in for id '.format(col, df.index[0])
 else:
 return 'pass'

result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

 id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3

EDIT - minor editions in handler

def handler(df):
 cols = list()
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 cols.append(col)
 if cols:
 return 'error in for id '.format(', '.join(cols), df.index[0])
 else:
 return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

 field1 field2 field3
id 
1 False False False
2 True True True
3 False False True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281433%2fhow-to-check-if-panda-dataframe-group-have-same-data%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You may get what you want with the code (assuming that df has index named id):

def handler(df):
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 return 'error in for id '.format(col, df.index[0])
 else:
 return 'pass'

result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

 id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3

EDIT - minor editions in handler

def handler(df):
 cols = list()
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 cols.append(col)
 if cols:
 return 'error in for id '.format(', '.join(cols), df.index[0])
 else:
 return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You may get what you want with the code (assuming that df has index named id):

def handler(df):
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 return 'error in for id '.format(col, df.index[0])
 else:
 return 'pass'

result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

 id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3

EDIT - minor editions in handler

def handler(df):
 cols = list()
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 cols.append(col)
 if cols:
 return 'error in for id '.format(', '.join(cols), df.index[0])
 else:
 return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You may get what you want with the code (assuming that df has index named id):

def handler(df):
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 return 'error in for id '.format(col, df.index[0])
 else:
 return 'pass'

result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

 id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3

EDIT - minor editions in handler

def handler(df):
 cols = list()
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 cols.append(col)
 if cols:
 return 'error in for id '.format(', '.join(cols), df.index[0])
 else:
 return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

You may get what you want with the code (assuming that df has index named id):

def handler(df):
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 return 'error in for id '.format(col, df.index[0])
 else:
 return 'pass'

result = df.groupby(level=0).apply(handler)
result = df.reset_index().merge(result.to_frame().reset_index(), on='id')

result is:

 id name field1 field2 field3 0
0 1 AA Yes Consumer Not Applicable pass
1 1 BB Yes Consumer Not Applicable pass
2 2 CC Yes Consumer Not Applicable error in field1 for id 2
3 2 DD Yes Not Applicable Not Applicable error in field1 for id 2
4 2 EE No Not Applicable Modified error in field1 for id 2
5 3 FF Yes Not Applicable Applicable error in field3 for id 3
6 3 GG Yes Not Applicable Not Applicable error in field3 for id 3
7 3 HH Yes Not Applicable Not Applicable error in field3 for id 3

EDIT - minor editions in handler

def handler(df):
 cols = list()
 for col in ['field1', 'field2', 'field3']:
 if df.loc[:, col].nunique() > 1:
 cols.append(col)
 if cols:
 return 'error in for id '.format(', '.join(cols), df.index[0])
 else:
 return 'pass'

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

edited Nov 14 '18 at 6:19

answered Nov 13 '18 at 14:52

Poolka

1,5861211

answered Nov 13 '18 at 14:52

Poolka

1,5861211

answered Nov 13 '18 at 14:52

Poolka

1,5861211

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

Hi Poolka, however, this code satisfies almost the expected results. But, on the error column, it's not showing if the data mismatches on more than field. For id 2: it should write as Field1, Field2 & Field 3 mismatch for ID:2. Any thoughts?

– Osceria
Nov 13 '18 at 16:28

The data comparison always happens on the first field of the list and other fields are skipped.

– Osceria
Nov 14 '18 at 0:52

@Osceria The code in the answer is the basis that works and performs something pretty close to what you want. Feel free to modify it (column names, handler, and so on) to meet your expectations. About the issue in the comment - check the EDIT addition.

– Poolka
Nov 14 '18 at 6:25

It works perfectly for my requirements after making slight changes. I just posted another post with a similar question but with an extra check. stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:39

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

 field1 field2 field3
id 
1 False False False
2 True True True
3 False False True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

 field1 field2 field3
id 
1 False False False
2 True True True
3 False False True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

 field1 field2 field3
id 
1 False False False
2 True True True
3 False False True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

You could groupby id and then agg each column calculating the number of unique values per group and then you know there is a mistake where that number is greater than 1:

df[df.columns.drop('name')].groupby('id').agg(lambda x: len(x.unique()))>1

With this output, based on which you could construct your string.

 field1 field2 field3
id 
1 False False False
2 True True True
3 False False True

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

edited Nov 13 '18 at 17:52

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

answered Nov 13 '18 at 14:53

Franco Piccolo

1,591714

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

-This helps. What if the column names to be validated differ in different iterations. I run this piece in a for loop which iterates with different data frames(df1,df2) and the columns of df,df2 and df3 are different. So, I don't want to hardcode the names of the fields which keep changing for other dataframes

– Osceria
Nov 13 '18 at 15:54

See the edit, you can pass the columns list dropping the 'name' column and then you can pass any other number of fields..

– Franco Piccolo
Nov 13 '18 at 17:52

Okay. In case I add another column(Base) to the dataframe(edited). For every group based on ID, there will be only one 'Y' and other rows in the group will be 'N'. Here, the values of the rows where Base='Y' should be the reference and other rows with Base 'N' should be validated against it. The distinct columns on each row should be noted as an error column. Any thoughts?

– Osceria
Nov 14 '18 at 0:49

That completely changes the scope of the question and the solution, I would suggest writing another question with a different input and output for clarification..

– Franco Piccolo
Nov 14 '18 at 6:53

Okay. Posted here stackoverflow.com/questions/53295685/…

– Osceria
Nov 14 '18 at 8:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb