Turn off BeautifulSoup messy encoding confidence output

I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.

2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01

Please help. Any suggestion is gratefully appreciated!

asked Nov 15 '18 at 3:55

enamoria

650722

Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47

add a comment |

I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.

2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01

Please help. Any suggestion is gratefully appreciated!

asked Nov 15 '18 at 3:55

enamoria

650722

Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47

add a comment |

I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.

2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01

Please help. Any suggestion is gratefully appreciated!

asked Nov 15 '18 at 3:55

enamoria

650722

I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.

2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01

Please help. Any suggestion is gratefully appreciated!

python beautifulsoup

asked Nov 15 '18 at 3:55

enamoria

650722

asked Nov 15 '18 at 3:55

enamoria

650722

asked Nov 15 '18 at 3:55

enamoria

650722

asked Nov 15 '18 at 3:55

enamoria

650722

asked Nov 15 '18 at 3:55

enamoria

650722

Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47

add a comment |

Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47

Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47

add a comment |

1 Answer
1

active

oldest

votes

You can set the log level higher like this:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)

In general, if you want to find who produces some annoying logs, do the following:

Provoke the logs to be emitted by running the code. In this case

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

Then the logger must be in this list

import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]

Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:

import logging
for name in logging.Logger.manager.loggerDict.values():
 print(name)
 logger = logging.getLogger(name)
 logger.setLevel(logging.CRITICAL)
 # I have left the exact code here for demonstration purposes
 req = urllib2.Request(url, headers=hdr)
 page = urllib2.urlopen(req, timeout=5)
 soup = BeautifulSoup(page.read(), "lxml")

Then set the log level before the code that emits the logs is run:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312210%2fturn-off-beautifulsoup-messy-encoding-confidence-output%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can set the log level higher like this:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)

In general, if you want to find who produces some annoying logs, do the following:

Provoke the logs to be emitted by running the code. In this case

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

Then the logger must be in this list

import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]

Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:

import logging
for name in logging.Logger.manager.loggerDict.values():
 print(name)
 logger = logging.getLogger(name)
 logger.setLevel(logging.CRITICAL)
 # I have left the exact code here for demonstration purposes
 req = urllib2.Request(url, headers=hdr)
 page = urllib2.urlopen(req, timeout=5)
 soup = BeautifulSoup(page.read(), "lxml")

Then set the log level before the code that emits the logs is run:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

add a comment |

You can set the log level higher like this:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)

In general, if you want to find who produces some annoying logs, do the following:

Provoke the logs to be emitted by running the code. In this case

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

Then the logger must be in this list

import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]

Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:

import logging
for name in logging.Logger.manager.loggerDict.values():
 print(name)
 logger = logging.getLogger(name)
 logger.setLevel(logging.CRITICAL)
 # I have left the exact code here for demonstration purposes
 req = urllib2.Request(url, headers=hdr)
 page = urllib2.urlopen(req, timeout=5)
 soup = BeautifulSoup(page.read(), "lxml")

Then set the log level before the code that emits the logs is run:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

add a comment |

You can set the log level higher like this:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)

In general, if you want to find who produces some annoying logs, do the following:

Provoke the logs to be emitted by running the code. In this case

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

Then the logger must be in this list

import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]

Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:

import logging
for name in logging.Logger.manager.loggerDict.values():
 print(name)
 logger = logging.getLogger(name)
 logger.setLevel(logging.CRITICAL)
 # I have left the exact code here for demonstration purposes
 req = urllib2.Request(url, headers=hdr)
 page = urllib2.urlopen(req, timeout=5)
 soup = BeautifulSoup(page.read(), "lxml")

Then set the log level before the code that emits the logs is run:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

You can set the log level higher like this:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)

In general, if you want to find who produces some annoying logs, do the following:

Provoke the logs to be emitted by running the code. In this case

req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

Then the logger must be in this list

import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]

Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:

import logging
for name in logging.Logger.manager.loggerDict.values():
 print(name)
 logger = logging.getLogger(name)
 logger.setLevel(logging.CRITICAL)
 # I have left the exact code here for demonstration purposes
 req = urllib2.Request(url, headers=hdr)
 page = urllib2.urlopen(req, timeout=5)
 soup = BeautifulSoup(page.read(), "lxml")

Then set the log level before the code that emits the logs is run:

import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

edited Feb 27 at 13:20

answered Feb 27 at 12:43

Maik Röder

76654

answered Feb 27 at 12:43

Maik Röder

76654

answered Feb 27 at 12:43

Maik Röder

76654

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb