pandas value_counts to output file

objective

I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().

problem

the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.

notes

Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.

Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.

I tried

import pandas as pd

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()

also tried

various version of replacing print with return

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 return(df[val].value_counts(dropna=False))

running file from anaconda prompt

Python SyntaxnewdataEDA.5.py >> Output.outtext.txt

which results in the following codec error:

(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
 File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
 EDA(activeWD, name='activeWD')
 File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
 print(df[col].value_counts(dropna=False))
 File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>

I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.

I have tried to save intermediary variables, which return none type.

testvar = for val in df.columns: df[val].value_counts(dropna=False)

when I do this, testvar is NoneType object of builtins module

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

1

something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55

Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58

same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04

I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11

1

It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23

|
show 3 more comments

objective

I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().

problem

notes

Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.

Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.

I tried

import pandas as pd

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()

also tried

various version of replacing print with return

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 return(df[val].value_counts(dropna=False))

running file from anaconda prompt

Python SyntaxnewdataEDA.5.py >> Output.outtext.txt

which results in the following codec error:

(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
 File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
 EDA(activeWD, name='activeWD')
 File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
 print(df[col].value_counts(dropna=False))
 File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>

I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.

I have tried to save intermediary variables, which return none type.

testvar = for val in df.columns: df[val].value_counts(dropna=False)

when I do this, testvar is NoneType object of builtins module

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

1

something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55

Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58

same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04

I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11

1

It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23

|
show 3 more comments

objective

I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().

problem

notes

Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.

Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.

I tried

import pandas as pd

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()

also tried

various version of replacing print with return

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 return(df[val].value_counts(dropna=False))

running file from anaconda prompt

Python SyntaxnewdataEDA.5.py >> Output.outtext.txt

which results in the following codec error:

(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
 File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
 EDA(activeWD, name='activeWD')
 File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
 print(df[col].value_counts(dropna=False))
 File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>

I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.

I have tried to save intermediary variables, which return none type.

testvar = for val in df.columns: df[val].value_counts(dropna=False)

when I do this, testvar is NoneType object of builtins module

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

objective

I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().

problem

notes

Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.

Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.

I tried

import pandas as pd

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()

also tried

various version of replacing print with return

def EDA(df, name):

 df.name = name # name == string version of df
 print('#', df.name)
 for val in df.columns:
 print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
 return(df[val].value_counts(dropna=False))

running file from anaconda prompt

Python SyntaxnewdataEDA.5.py >> Output.outtext.txt

which results in the following codec error:

(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
 File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
 EDA(activeWD, name='activeWD')
 File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
 print(df[col].value_counts(dropna=False))
 File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>

I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.

I have tried to save intermediary variables, which return none type.

testvar = for val in df.columns: df[val].value_counts(dropna=False)

when I do this, testvar is NoneType object of builtins module

python pandas output

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

edited Nov 14 '18 at 18:13

asked Nov 14 '18 at 16:51

Andrew

674220

asked Nov 14 '18 at 16:51

Andrew

674220

asked Nov 14 '18 at 16:51

Andrew

674220

1

something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55

Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58

same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04

I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11

1

It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23

|
show 3 more comments

1

something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55

Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58

same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04

I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11

1

It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23

something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55

Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58

same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04

I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11

It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23

|
show 3 more comments

1 Answer
1

active

oldest

votes

Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...

import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
 'Color':['Blue','Blue','Red','Orange','Orange'],
 'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
 df.name = name
 print('#n'.format(df.name))
 for col in df.columns:
 print('#n'.format(col))
 print(df[col].value_counts(dropna=False))
 print('n')

if __name__=='__main__':
 EDA(df, name='test')

Then you should be able to run: python filename.py > output.txt in your terminal.

EDIT

For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>

chcp 65001
set PYTHONIOENCODING=utf-8

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

1

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

1

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

1

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

|
show 10 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305155%2fpandas-value-counts-to-output-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
 'Color':['Blue','Blue','Red','Orange','Orange'],
 'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
 df.name = name
 print('#n'.format(df.name))
 for col in df.columns:
 print('#n'.format(col))
 print(df[col].value_counts(dropna=False))
 print('n')

if __name__=='__main__':
 EDA(df, name='test')

Then you should be able to run: python filename.py > output.txt in your terminal.

EDIT

chcp 65001
set PYTHONIOENCODING=utf-8

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

1

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

1

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

1

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

|
show 10 more comments

import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
 'Color':['Blue','Blue','Red','Orange','Orange'],
 'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
 df.name = name
 print('#n'.format(df.name))
 for col in df.columns:
 print('#n'.format(col))
 print(df[col].value_counts(dropna=False))
 print('n')

if __name__=='__main__':
 EDA(df, name='test')

Then you should be able to run: python filename.py > output.txt in your terminal.

EDIT

chcp 65001
set PYTHONIOENCODING=utf-8

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

1

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

1

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

1

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

|
show 10 more comments

import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
 'Color':['Blue','Blue','Red','Orange','Orange'],
 'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
 df.name = name
 print('#n'.format(df.name))
 for col in df.columns:
 print('#n'.format(col))
 print(df[col].value_counts(dropna=False))
 print('n')

if __name__=='__main__':
 EDA(df, name='test')

Then you should be able to run: python filename.py > output.txt in your terminal.

EDIT

chcp 65001
set PYTHONIOENCODING=utf-8

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
 'Color':['Blue','Blue','Red','Orange','Orange'],
 'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
 df.name = name
 print('#n'.format(df.name))
 for col in df.columns:
 print('#n'.format(col))
 print(df[col].value_counts(dropna=False))
 print('n')

if __name__=='__main__':
 EDA(df, name='test')

Then you should be able to run: python filename.py > output.txt in your terminal.

EDIT

chcp 65001
set PYTHONIOENCODING=utf-8

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

edited Nov 15 '18 at 15:53

answered Nov 14 '18 at 17:32

Dascienz

610412

answered Nov 14 '18 at 17:32

Dascienz

610412

answered Nov 14 '18 at 17:32

Dascienz

610412

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

1

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

1

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

1

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

|
show 10 more comments

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

1

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

1

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

1

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14

@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36

Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07

Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38

Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35

|
show 10 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb

pandas value_counts to output file

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

1 Answer
1

EDIT

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

EDIT

EDIT

EDIT

EDIT

Post as a guest

Popular posts from this blog

Medaillenspiegel der Olympischen Winterspiele 1968

Kleinkühnau

Makov (Slowakei)

pandas value_counts to output file

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

objective

problem

notes

I tried

also tried

1 Answer 1

EDIT

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

EDIT

EDIT

EDIT

EDIT

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Medaillenspiegel der Olympischen Winterspiele 1968

Kleinkühnau

Makov (Slowakei)

1 Answer
1

1 Answer
1

1 Answer
1