pandas value_counts to output file
objective
I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().
problem
the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.
notes
Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.
Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.
I tried
import pandas as pd
def EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))
path = 'Data/nameofmyfile.csv'
# name of df
activeWD = pd.read_csv(path, skiprows=6)
f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()
also tried
various version of replacing
printwithreturndef EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
return(df[val].value_counts(dropna=False))running file from anaconda prompt
Python SyntaxnewdataEDA.5.py >> Output.outtext.txt
which results in the following codec error:
(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>
I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.
I have tried to save intermediary variables, which return none type.
testvar = for val in df.columns:
df[val].value_counts(dropna=False)
when I do this, testvar is NoneType object of builtins module
python pandas output
|
show 3 more comments
objective
I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().
problem
the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.
notes
Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.
Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.
I tried
import pandas as pd
def EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))
path = 'Data/nameofmyfile.csv'
# name of df
activeWD = pd.read_csv(path, skiprows=6)
f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()
also tried
various version of replacing
printwithreturndef EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
return(df[val].value_counts(dropna=False))running file from anaconda prompt
Python SyntaxnewdataEDA.5.py >> Output.outtext.txt
which results in the following codec error:
(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>
I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.
I have tried to save intermediary variables, which return none type.
testvar = for val in df.columns:
df[val].value_counts(dropna=False)
when I do this, testvar is NoneType object of builtins module
python pandas output
1
something likedf['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.
– Alex
Nov 14 '18 at 16:55
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it withreturn, your current code is trying to write nothing to the file, because yourEDA(activeWD, 'activeWD')has no return, and will therefore returnNone. I would say to change thoseprints to areturn, then assign a variable likex=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file
– G. Anderson
Nov 14 '18 at 17:11
1
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23
|
show 3 more comments
objective
I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().
problem
the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.
notes
Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.
Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.
I tried
import pandas as pd
def EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))
path = 'Data/nameofmyfile.csv'
# name of df
activeWD = pd.read_csv(path, skiprows=6)
f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()
also tried
various version of replacing
printwithreturndef EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
return(df[val].value_counts(dropna=False))running file from anaconda prompt
Python SyntaxnewdataEDA.5.py >> Output.outtext.txt
which results in the following codec error:
(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>
I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.
I have tried to save intermediary variables, which return none type.
testvar = for val in df.columns:
df[val].value_counts(dropna=False)
when I do this, testvar is NoneType object of builtins module
python pandas output
objective
I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().
problem
the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.
notes
Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.
Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.
I tried
import pandas as pd
def EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
print(df[val].value_counts(dropna=False))
path = 'Data/nameofmyfile.csv'
# name of df
activeWD = pd.read_csv(path, skiprows=6)
f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()
also tried
various version of replacing
printwithreturndef EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, 'n', df[val].value_counts(dropna=False), 'n', sep='')
return(df[val].value_counts(dropna=False))running file from anaconda prompt
Python SyntaxnewdataEDA.5.py >> Output.outtext.txt
which results in the following codec error:
(base) C:UsersauracollAnalytic ProjectsIDL Attrition>Python Syntaxnewdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntaxnewdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntaxnewdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:ProgramDataAnaconda3libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>
I tried encoding='utf-8' and encoding='ISO-8859-1', neither of which resolve this problem.
I have tried to save intermediary variables, which return none type.
testvar = for val in df.columns:
df[val].value_counts(dropna=False)
when I do this, testvar is NoneType object of builtins module
python pandas output
python pandas output
edited Nov 14 '18 at 18:13
Andrew
asked Nov 14 '18 at 16:51
AndrewAndrew
674220
674220
1
something likedf['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.
– Alex
Nov 14 '18 at 16:55
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it withreturn, your current code is trying to write nothing to the file, because yourEDA(activeWD, 'activeWD')has no return, and will therefore returnNone. I would say to change thoseprints to areturn, then assign a variable likex=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file
– G. Anderson
Nov 14 '18 at 17:11
1
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23
|
show 3 more comments
1
something likedf['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.
– Alex
Nov 14 '18 at 16:55
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it withreturn, your current code is trying to write nothing to the file, because yourEDA(activeWD, 'activeWD')has no return, and will therefore returnNone. I would say to change thoseprints to areturn, then assign a variable likex=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file
– G. Anderson
Nov 14 '18 at 17:11
1
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23
1
1
something like
df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.– Alex
Nov 14 '18 at 16:55
something like
df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.– Alex
Nov 14 '18 at 16:55
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with
return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file– G. Anderson
Nov 14 '18 at 17:11
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with
return, your current code is trying to write nothing to the file, because your EDA(activeWD, 'activeWD') has no return, and will therefore return None. I would say to change those prints to a return, then assign a variable like x=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file– G. Anderson
Nov 14 '18 at 17:11
1
1
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23
|
show 3 more comments
1 Answer
1
active
oldest
votes
Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...
import pandas as pd
df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])
def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')
if __name__=='__main__':
EDA(df, name='test')
Then you should be able to run: python filename.py > output.txt in your terminal.
EDIT
For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>
chcp 65001
set PYTHONIOENCODING=utf-8
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing yourpandas.DataFrame, not how you're writing to file. There are mixeddtypeswithin your columns. Please try reading in yourpandas.DataFramewith an encoding argument as follows:activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').
– Dascienz
Nov 14 '18 at 18:36
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the examplepandas.DataFramethat I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe tryencoding='utf-8'for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.
– Dascienz
Nov 14 '18 at 19:07
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
|
show 10 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305155%2fpandas-value-counts-to-output-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...
import pandas as pd
df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])
def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')
if __name__=='__main__':
EDA(df, name='test')
Then you should be able to run: python filename.py > output.txt in your terminal.
EDIT
For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>
chcp 65001
set PYTHONIOENCODING=utf-8
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing yourpandas.DataFrame, not how you're writing to file. There are mixeddtypeswithin your columns. Please try reading in yourpandas.DataFramewith an encoding argument as follows:activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').
– Dascienz
Nov 14 '18 at 18:36
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the examplepandas.DataFramethat I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe tryencoding='utf-8'for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.
– Dascienz
Nov 14 '18 at 19:07
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
|
show 10 more comments
Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...
import pandas as pd
df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])
def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')
if __name__=='__main__':
EDA(df, name='test')
Then you should be able to run: python filename.py > output.txt in your terminal.
EDIT
For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>
chcp 65001
set PYTHONIOENCODING=utf-8
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing yourpandas.DataFrame, not how you're writing to file. There are mixeddtypeswithin your columns. Please try reading in yourpandas.DataFramewith an encoding argument as follows:activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').
– Dascienz
Nov 14 '18 at 18:36
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the examplepandas.DataFramethat I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe tryencoding='utf-8'for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.
– Dascienz
Nov 14 '18 at 19:07
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
|
show 10 more comments
Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...
import pandas as pd
df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])
def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')
if __name__=='__main__':
EDA(df, name='test')
Then you should be able to run: python filename.py > output.txt in your terminal.
EDIT
For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>
chcp 65001
set PYTHONIOENCODING=utf-8
Command-line solution, although you can certainly print to file using pure python as your commenters suggested. I'm posting this because you mentioned you already tried using your command prompt and weren't able to get your outputs to print to file. So, edit your script, filename.py as follows...
import pandas as pd
df = pd.DataFrame('Pet':['Cat','Dog','Dog','Dog','Fish'],
'Color':['Blue','Blue','Red','Orange','Orange'],
'Name':['Henry','Bob','Mary','Doggo','Henry'])
def EDA(df, name):
df.name = name
print('#n'.format(df.name))
for col in df.columns:
print('#n'.format(col))
print(df[col].value_counts(dropna=False))
print('n')
if __name__=='__main__':
EDA(df, name='test')
Then you should be able to run: python filename.py > output.txt in your terminal.
EDIT
For posterity's sake, OP's issue was not with how they were printing to file, instead there was an issue where their csv contained uncommon characters which pandas.read_csv was having trouble decoding. The solution involved setting python's I/O encoding to UTF-8 before running the code, as shown here: python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character 'u2013' in position 9629: character maps to <undefined>
chcp 65001
set PYTHONIOENCODING=utf-8
edited Nov 15 '18 at 15:53
answered Nov 14 '18 at 17:32
DascienzDascienz
610412
610412
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing yourpandas.DataFrame, not how you're writing to file. There are mixeddtypeswithin your columns. Please try reading in yourpandas.DataFramewith an encoding argument as follows:activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').
– Dascienz
Nov 14 '18 at 18:36
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the examplepandas.DataFramethat I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe tryencoding='utf-8'for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.
– Dascienz
Nov 14 '18 at 19:07
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
|
show 10 more comments
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing yourpandas.DataFrame, not how you're writing to file. There are mixeddtypeswithin your columns. Please try reading in yourpandas.DataFramewith an encoding argument as follows:activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').
– Dascienz
Nov 14 '18 at 18:36
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the examplepandas.DataFramethat I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe tryencoding='utf-8'for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.
– Dascienz
Nov 14 '18 at 19:07
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
This isn't working for me due to codec error. I should have specified earlier that when I tried this type of solution I receive a codec error. the question has been updated.
– Andrew
Nov 14 '18 at 18:14
@Andrew, that error is due to how you're importing your
pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').– Dascienz
Nov 14 '18 at 18:36
@Andrew, that error is due to how you're importing your
pandas.DataFrame, not how you're writing to file. There are mixed dtypes within your columns. Please try reading in your pandas.DataFrame with an encoding argument as follows: activeWD = pd.read_csv(path, skiprows=6, encoding='ISO-8859-1').– Dascienz
Nov 14 '18 at 18:36
1
1
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example
pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.– Dascienz
Nov 14 '18 at 19:07
Test the code yourself on a different dataset to see if it works, that way you'll be able to narrow it down to the dataset you're trying to work with. It works on the example
pandas.DataFrame that I wrote out in my answer, but you should be wary of mixed data in the set you're currently working on. Maybe try encoding='utf-8' for dealing with unicode characters. ALSO, it's good practice to look at the column values that your code is failing on to better understand the issue.– Dascienz
Nov 14 '18 at 19:07
1
1
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
Maybe post a toy dataframe which contains the characters your program is failing on?
– Dascienz
Nov 15 '18 at 13:38
1
1
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
Very happy to hear you solved this issue. Good luck on your project!
– Dascienz
Nov 15 '18 at 15:35
|
show 10 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305155%2fpandas-value-counts-to-output-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
something like
df['column'].value_counts().to_frame().reset_index().to_csv(...)? It's a bit long but should work.– Alex
Nov 14 '18 at 16:55
Would that work for multiple columns? In the past i've used apply to create a single df of value_counts by column, but the output isnt very tidy, each var gets a new column and each set of values gets unique rows, so it creates a diagonal pattern that is hard to read
– Andrew
Nov 14 '18 at 16:58
same problem. if I try to save this is a var `for val in df.columns: df[val].value_counts().to_frame().reset_index() it saves as nonetype. that is the same problem the above code has
– Andrew
Nov 14 '18 at 17:04
I would consider adding intermediate steps to make sure your outputs are working as you think. For one thing, though you said you've tried it with
return, your current code is trying to write nothing to the file, because yourEDA(activeWD, 'activeWD')has no return, and will therefore returnNone. I would say to change thoseprints to areturn, then assign a variable likex=EDA(activeWD, 'activeWD'), print that, and if it looks right, try to write it to file– G. Anderson
Nov 14 '18 at 17:11
1
It may be helpful to provide exactly what returns you've tried in your question, since you stated that you know that's the problem
– G. Anderson
Nov 14 '18 at 17:23