pandas value_counts to output file










0















objective



I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().



problem



the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.



notes



Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.



Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.



I tried



import pandas as pd

def EDA(df, name):

df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()


also tried




  1. various version of replacing print with return



    def EDA(df, name):



     df.name = name # name == string version of df
    print('#', df.name)
    for val in df.columns:
    print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
    return(df[val].value_counts(dropna=False))



  2. running file from anaconda prompt



    Python SyntaxnewdataEDA.5.py >> Output.outtext.txt



which results in the following codec error:



(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>


I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.




  1. I have tried to save intermediary variables, which return none type.



    testvar = for val in df.columns:
    df[val].value_counts(dropna=False)



when I do this, testvar is NoneType object of builtins module










share|improve this question



















  • 1





    something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

    – Alex
    Nov 14 '18 at 16:55












  • Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

    – Andrew
    Nov 14 '18 at 16:58











  • same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

    – Andrew
    Nov 14 '18 at 17:04











  • I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

    – G. Anderson
    Nov 14 '18 at 17:11






  • 1





    It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

    – G. Anderson
    Nov 14 '18 at 17:23















0















objective



I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().



problem



the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.



notes



Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.



Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.



I tried



import pandas as pd

def EDA(df, name):

df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()


also tried




  1. various version of replacing print with return



    def EDA(df, name):



     df.name = name # name == string version of df
    print('#', df.name)
    for val in df.columns:
    print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
    return(df[val].value_counts(dropna=False))



  2. running file from anaconda prompt



    Python SyntaxnewdataEDA.5.py >> Output.outtext.txt



which results in the following codec error:



(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>


I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.




  1. I have tried to save intermediary variables, which return none type.



    testvar = for val in df.columns:
    df[val].value_counts(dropna=False)



when I do this, testvar is NoneType object of builtins module










share|improve this question



















  • 1





    something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

    – Alex
    Nov 14 '18 at 16:55












  • Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

    – Andrew
    Nov 14 '18 at 16:58











  • same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

    – Andrew
    Nov 14 '18 at 17:04











  • I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

    – G. Anderson
    Nov 14 '18 at 17:11






  • 1





    It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

    – G. Anderson
    Nov 14 '18 at 17:23













0












0








0


1






objective



I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().



problem



the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.



notes



Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.



Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.



I tried



import pandas as pd

def EDA(df, name):

df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()


also tried




  1. various version of replacing print with return



    def EDA(df, name):



     df.name = name # name == string version of df
    print('#', df.name)
    for val in df.columns:
    print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
    return(df[val].value_counts(dropna=False))



  2. running file from anaconda prompt



    Python SyntaxnewdataEDA.5.py >> Output.outtext.txt



which results in the following codec error:



(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>


I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.




  1. I have tried to save intermediary variables, which return none type.



    testvar = for val in df.columns:
    df[val].value_counts(dropna=False)



when I do this, testvar is NoneType object of builtins module










share|improve this question
















objective



I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().



problem



the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.



notes



Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.



Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.



I tried



import pandas as pd

def EDA(df, name):

df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))

path = 'Data/nameofmyfile.csv'

# name of df
activeWD = pd.read_csv(path, skiprows=6)

f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()


also tried




  1. various version of replacing print with return



    def EDA(df, name):



     df.name = name # name == string version of df
    print('#', df.name)
    for val in df.columns:
    print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
    return(df[val].value_counts(dropna=False))



  2. running file from anaconda prompt



    Python SyntaxnewdataEDA.5.py >> Output.outtext.txt



which results in the following codec error:



(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>


I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.




  1. I have tried to save intermediary variables, which return none type.



    testvar = for val in df.columns:
    df[val].value_counts(dropna=False)



when I do this, testvar is NoneType object of builtins module







python pandas output






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 18:13







Andrew

















asked Nov 14 '18 at 16:51









AndrewAndrew

674220




674220







  • 1





    something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

    – Alex
    Nov 14 '18 at 16:55












  • Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

    – Andrew
    Nov 14 '18 at 16:58











  • same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

    – Andrew
    Nov 14 '18 at 17:04











  • I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

    – G. Anderson
    Nov 14 '18 at 17:11






  • 1





    It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

    – G. Anderson
    Nov 14 '18 at 17:23












  • 1





    something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

    – Alex
    Nov 14 '18 at 16:55












  • Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

    – Andrew
    Nov 14 '18 at 16:58











  • same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

    – Andrew
    Nov 14 '18 at 17:04











  • I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

    – G. Anderson
    Nov 14 '18 at 17:11






  • 1





    It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

    – G. Anderson
    Nov 14 '18 at 17:23







1




1





something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55






something like df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.

– Alex
Nov 14 '18 at 16:55














Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58





Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read

– Andrew
Nov 14 '18 at 16:58













same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04





same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has

– Andrew
Nov 14 '18 at 17:04













I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11





I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file

– G. Anderson
Nov 14 '18 at 17:11




1




1





It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23





It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem

– G. Anderson
Nov 14 '18 at 17:23












1 Answer
1






active

oldest

votes


















1














Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...



import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')

if __name__=='__main__':
EDA(df, name='test')


Then you should be able to run: python filename.py > output.txt in your terminal.



EDIT



For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>



chcp 65001
set PYTHONIOENCODING=utf-8





share|improve this answer

























  • This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

    – Andrew
    Nov 14 '18 at 18:14











  • @Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

    – Dascienz
    Nov 14 '18 at 18:36







  • 1





    Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

    – Dascienz
    Nov 14 '18 at 19:07







  • 1





    Maybe post a toy dataframe which contains the characters your program is failing on?

    – Dascienz
    Nov 15 '18 at 13:38






  • 1





    Very happy to hear you solved this issue. Good luck on your project!

    – Dascienz
    Nov 15 '18 at 15:35










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305155%2fpandas-value-counts-to-output-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...



import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')

if __name__=='__main__':
EDA(df, name='test')


Then you should be able to run: python filename.py > output.txt in your terminal.



EDIT



For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>



chcp 65001
set PYTHONIOENCODING=utf-8





share|improve this answer

























  • This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

    – Andrew
    Nov 14 '18 at 18:14











  • @Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

    – Dascienz
    Nov 14 '18 at 18:36







  • 1





    Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

    – Dascienz
    Nov 14 '18 at 19:07







  • 1





    Maybe post a toy dataframe which contains the characters your program is failing on?

    – Dascienz
    Nov 15 '18 at 13:38






  • 1





    Very happy to hear you solved this issue. Good luck on your project!

    – Dascienz
    Nov 15 '18 at 15:35















1














Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...



import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')

if __name__=='__main__':
EDA(df, name='test')


Then you should be able to run: python filename.py > output.txt in your terminal.



EDIT



For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>



chcp 65001
set PYTHONIOENCODING=utf-8





share|improve this answer

























  • This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

    – Andrew
    Nov 14 '18 at 18:14











  • @Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

    – Dascienz
    Nov 14 '18 at 18:36







  • 1





    Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

    – Dascienz
    Nov 14 '18 at 19:07







  • 1





    Maybe post a toy dataframe which contains the characters your program is failing on?

    – Dascienz
    Nov 15 '18 at 13:38






  • 1





    Very happy to hear you solved this issue. Good luck on your project!

    – Dascienz
    Nov 15 '18 at 15:35













1












1








1







Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...



import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')

if __name__=='__main__':
EDA(df, name='test')


Then you should be able to run: python filename.py > output.txt in your terminal.



EDIT



For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>



chcp 65001
set PYTHONIOENCODING=utf-8





share|improve this answer















Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...



import pandas as pd

df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])

def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')

if __name__=='__main__':
EDA(df, name='test')


Then you should be able to run: python filename.py > output.txt in your terminal.



EDIT



For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>



chcp 65001
set PYTHONIOENCODING=utf-8






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 15:53

























answered Nov 14 '18 at 17:32









DascienzDascienz

610412




610412












  • This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

    – Andrew
    Nov 14 '18 at 18:14











  • @Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

    – Dascienz
    Nov 14 '18 at 18:36







  • 1





    Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

    – Dascienz
    Nov 14 '18 at 19:07







  • 1





    Maybe post a toy dataframe which contains the characters your program is failing on?

    – Dascienz
    Nov 15 '18 at 13:38






  • 1





    Very happy to hear you solved this issue. Good luck on your project!

    – Dascienz
    Nov 15 '18 at 15:35

















  • This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

    – Andrew
    Nov 14 '18 at 18:14











  • @Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

    – Dascienz
    Nov 14 '18 at 18:36







  • 1





    Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

    – Dascienz
    Nov 14 '18 at 19:07







  • 1





    Maybe post a toy dataframe which contains the characters your program is failing on?

    – Dascienz
    Nov 15 '18 at 13:38






  • 1





    Very happy to hear you solved this issue. Good luck on your project!

    – Dascienz
    Nov 15 '18 at 15:35
















This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14





This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.

– Andrew
Nov 14 '18 at 18:14













@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36






@Andrew, that error is due to how you're importing your pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').

– Dascienz
Nov 14 '18 at 18:36





1




1





Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07






Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.

– Dascienz
Nov 14 '18 at 19:07





1




1





Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38





Maybe post a toy dataframe which contains the characters your program is failing on?

– Dascienz
Nov 15 '18 at 13:38




1




1





Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35





Very happy to hear you solved this issue. Good luck on your project!

– Dascienz
Nov 15 '18 at 15:35



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305155%2fpandas-value-counts-to-output-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kleinkühnau

Makov (Slowakei)

Peter Parker: The Spectacular Spider-Man #308