Turn off BeautifulSoup messy encoding confidence output
I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup
instance:
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose
option.
2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01
Please help. Any suggestion is gratefully appreciated!
python beautifulsoup
add a comment |
I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup
instance:
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose
option.
2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01
Please help. Any suggestion is gratefully appreciated!
python beautifulsoup
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47
add a comment |
I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup
instance:
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose
option.
2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01
Please help. Any suggestion is gratefully appreciated!
python beautifulsoup
I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup
instance:
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose
option.
2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01
Please help. Any suggestion is gratefully appreciated!
python beautifulsoup
python beautifulsoup
asked Nov 15 '18 at 3:55
enamoriaenamoria
650722
650722
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47
add a comment |
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47
add a comment |
1 Answer
1
active
oldest
votes
You can set the log level higher like this:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
In general, if you want to find who produces some annoying logs, do the following:
Provoke the logs to be emitted by running the code. In this case
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then the logger must be in this list
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then set the log level before the code that emits the logs is run:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312210%2fturn-off-beautifulsoup-messy-encoding-confidence-output%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can set the log level higher like this:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
In general, if you want to find who produces some annoying logs, do the following:
Provoke the logs to be emitted by running the code. In this case
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then the logger must be in this list
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then set the log level before the code that emits the logs is run:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
add a comment |
You can set the log level higher like this:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
In general, if you want to find who produces some annoying logs, do the following:
Provoke the logs to be emitted by running the code. In this case
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then the logger must be in this list
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then set the log level before the code that emits the logs is run:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
add a comment |
You can set the log level higher like this:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
In general, if you want to find who produces some annoying logs, do the following:
Provoke the logs to be emitted by running the code. In this case
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then the logger must be in this list
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then set the log level before the code that emits the logs is run:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
You can set the log level higher like this:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
In general, if you want to find who produces some annoying logs, do the following:
Provoke the logs to be emitted by running the code. In this case
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then the logger must be in this list
import logging
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]
Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:
import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
Then set the log level before the code that emits the logs is run:
import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")
edited Feb 27 at 13:20
answered Feb 27 at 12:43
Maik RöderMaik Röder
76654
76654
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312210%2fturn-off-beautifulsoup-messy-encoding-confidence-output%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you add your minimum input and desired output?
– kcorlidy
Nov 16 '18 at 5:47