Turn off BeautifulSoup messy encoding confidence output










1















I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:



req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.



2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01


Please help. Any suggestion is gratefully appreciated!










share|improve this question






















  • Can you add your minimum input and desired output?

    – kcorlidy
    Nov 16 '18 at 5:47















1















I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:



req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.



2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01


Please help. Any suggestion is gratefully appreciated!










share|improve this question






















  • Can you add your minimum input and desired output?

    – kcorlidy
    Nov 16 '18 at 5:47













1












1








1








I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:



req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.



2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01


Please help. Any suggestion is gratefully appreciated!










share|improve this question














I'm using bs4 for my project. It print out messy output with many encoding confidence score whenever I create a soup instance:



req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


It works fine, but with redundant output. I just want to remove it, but I can't find any information about something like verbose option.



2018-11-15 10:40:46,286 utf-8 confidence = 0.99
2018-11-15 10:40:46,286 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,287 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,287 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,287 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,287 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,287 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,288 EUC-TW Taiwan confidence = 0.01
2018-11-15 10:40:46,288 windows-1251 Russian confidence = 0.01
2018-11-15 10:40:46,288 KOI8-R Russian confidence = 0.01
2018-11-15 10:40:46,288 ISO-8859-5 Russian confidence = 0.0
2018-11-15 10:40:46,288 MacCyrillic Russian confidence = 0.0
2018-11-15 10:40:46,288 IBM866 Russian confidence = 0.0
2018-11-15 10:40:46,289 IBM855 Russian confidence = 0.01
2018-11-15 10:40:46,289 ISO-8859-7 Greek confidence = 0.0
2018-11-15 10:40:46,289 windows-1253 Greek confidence = 0.0
2018-11-15 10:40:46,289 ISO-8859-5 Bulgairan confidence = 0.0
2018-11-15 10:40:46,289 windows-1251 Bulgarian confidence = 0.01
2018-11-15 10:40:46,290 TIS-620 Thai confidence = 0.0
2018-11-15 10:40:46,290 ISO-8859-9 Turkish confidence = 0.54363730033
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,290 windows-1255 Hebrew confidence = 0.0
2018-11-15 10:40:46,291 utf-8 confidence = 0.99
2018-11-15 10:40:46,291 SHIFT_JIS Japanese confidence = 0.01
2018-11-15 10:40:46,291 EUC-JP Japanese confidence = 0.01
2018-11-15 10:40:46,291 GB2312 Chinese confidence = 0.01
2018-11-15 10:40:46,291 EUC-KR Korean confidence = 0.01
2018-11-15 10:40:46,291 CP949 Korean confidence = 0.01
2018-11-15 10:40:46,292 Big5 Chinese confidence = 0.01
2018-11-15 10:40:46,292 EUC-TW Taiwan confidence = 0.01


Please help. Any suggestion is gratefully appreciated!







python beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 15 '18 at 3:55









enamoriaenamoria

650722




650722












  • Can you add your minimum input and desired output?

    – kcorlidy
    Nov 16 '18 at 5:47

















  • Can you add your minimum input and desired output?

    – kcorlidy
    Nov 16 '18 at 5:47
















Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47





Can you add your minimum input and desired output?

– kcorlidy
Nov 16 '18 at 5:47












1 Answer
1






active

oldest

votes


















1














You can set the log level higher like this:



import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)


In general, if you want to find who produces some annoying logs, do the following:



Provoke the logs to be emitted by running the code. In this case



req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


Then the logger must be in this list



import logging 
print(logging.Logger.manager.loggerDict.values())
[..., 'chardet', ...]


Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:



import logging
for name in logging.Logger.manager.loggerDict.values():
print(name)
logger = logging.getLogger(name)
logger.setLevel(logging.CRITICAL)
# I have left the exact code here for demonstration purposes
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")


Then set the log level before the code that emits the logs is run:



import logging
logger = logging.getLogger('chardet')
logger.setLevel(logging.CRITICAL)
# No log output any more from here on
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req, timeout=5)
soup = BeautifulSoup(page.read(), "lxml")





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312210%2fturn-off-beautifulsoup-messy-encoding-confidence-output%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You can set the log level higher like this:



    import logging
    logger = logging.getLogger('chardet')
    logger.setLevel(logging.CRITICAL)


    In general, if you want to find who produces some annoying logs, do the following:



    Provoke the logs to be emitted by running the code. In this case



    req = urllib2.Request(url, headers=hdr)
    page = urllib2.urlopen(req, timeout=5)
    soup = BeautifulSoup(page.read(), "lxml")


    Then the logger must be in this list



    import logging 
    print(logging.Logger.manager.loggerDict.values())
    [..., 'chardet', ...]


    Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:



    import logging
    for name in logging.Logger.manager.loggerDict.values():
    print(name)
    logger = logging.getLogger(name)
    logger.setLevel(logging.CRITICAL)
    # I have left the exact code here for demonstration purposes
    req = urllib2.Request(url, headers=hdr)
    page = urllib2.urlopen(req, timeout=5)
    soup = BeautifulSoup(page.read(), "lxml")


    Then set the log level before the code that emits the logs is run:



    import logging
    logger = logging.getLogger('chardet')
    logger.setLevel(logging.CRITICAL)
    # No log output any more from here on
    req = urllib2.Request(url, headers=hdr)
    page = urllib2.urlopen(req, timeout=5)
    soup = BeautifulSoup(page.read(), "lxml")





    share|improve this answer





























      1














      You can set the log level higher like this:



      import logging
      logger = logging.getLogger('chardet')
      logger.setLevel(logging.CRITICAL)


      In general, if you want to find who produces some annoying logs, do the following:



      Provoke the logs to be emitted by running the code. In this case



      req = urllib2.Request(url, headers=hdr)
      page = urllib2.urlopen(req, timeout=5)
      soup = BeautifulSoup(page.read(), "lxml")


      Then the logger must be in this list



      import logging 
      print(logging.Logger.manager.loggerDict.values())
      [..., 'chardet', ...]


      Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:



      import logging
      for name in logging.Logger.manager.loggerDict.values():
      print(name)
      logger = logging.getLogger(name)
      logger.setLevel(logging.CRITICAL)
      # I have left the exact code here for demonstration purposes
      req = urllib2.Request(url, headers=hdr)
      page = urllib2.urlopen(req, timeout=5)
      soup = BeautifulSoup(page.read(), "lxml")


      Then set the log level before the code that emits the logs is run:



      import logging
      logger = logging.getLogger('chardet')
      logger.setLevel(logging.CRITICAL)
      # No log output any more from here on
      req = urllib2.Request(url, headers=hdr)
      page = urllib2.urlopen(req, timeout=5)
      soup = BeautifulSoup(page.read(), "lxml")





      share|improve this answer



























        1












        1








        1







        You can set the log level higher like this:



        import logging
        logger = logging.getLogger('chardet')
        logger.setLevel(logging.CRITICAL)


        In general, if you want to find who produces some annoying logs, do the following:



        Provoke the logs to be emitted by running the code. In this case



        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")


        Then the logger must be in this list



        import logging 
        print(logging.Logger.manager.loggerDict.values())
        [..., 'chardet', ...]


        Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:



        import logging
        for name in logging.Logger.manager.loggerDict.values():
        print(name)
        logger = logging.getLogger(name)
        logger.setLevel(logging.CRITICAL)
        # I have left the exact code here for demonstration purposes
        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")


        Then set the log level before the code that emits the logs is run:



        import logging
        logger = logging.getLogger('chardet')
        logger.setLevel(logging.CRITICAL)
        # No log output any more from here on
        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")





        share|improve this answer















        You can set the log level higher like this:



        import logging
        logger = logging.getLogger('chardet')
        logger.setLevel(logging.CRITICAL)


        In general, if you want to find who produces some annoying logs, do the following:



        Provoke the logs to be emitted by running the code. In this case



        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")


        Then the logger must be in this list



        import logging 
        print(logging.Logger.manager.loggerDict.values())
        [..., 'chardet', ...]


        Try switching off the loggers one by one. Once you do not see the logs any more, you know which is the log that emits it:



        import logging
        for name in logging.Logger.manager.loggerDict.values():
        print(name)
        logger = logging.getLogger(name)
        logger.setLevel(logging.CRITICAL)
        # I have left the exact code here for demonstration purposes
        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")


        Then set the log level before the code that emits the logs is run:



        import logging
        logger = logging.getLogger('chardet')
        logger.setLevel(logging.CRITICAL)
        # No log output any more from here on
        req = urllib2.Request(url, headers=hdr)
        page = urllib2.urlopen(req, timeout=5)
        soup = BeautifulSoup(page.read(), "lxml")






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 27 at 13:20

























        answered Feb 27 at 12:43









        Maik RöderMaik Röder

        76654




        76654





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312210%2fturn-off-beautifulsoup-messy-encoding-confidence-output%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Syphilis

            Darth Vader #20