When should a single session instance be used for requests?










0















From the aiohttp docs:




[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.




I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.



I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?



It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,



urls = 
'https://aiohttp.readthedocs.io/en/stable/index.html',
'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

'https://www.thesaurus.com/',
'https://www.thesaurus.com/browse/encapsulate',
'https://www.thesaurus.com/browse/connection?s=t',

'https://httpbin.org/',
'https://httpbin.org/#/HTTP_Methods',
'https://httpbin.org/status/200'



To put a number on it, in reality each batch is probably of length 25-50.




*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).



**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).










share|improve this question


























    0















    From the aiohttp docs:




    [An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.




    I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.



    I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?



    It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,



    urls = 
    'https://aiohttp.readthedocs.io/en/stable/index.html',
    'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
    'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

    'https://www.thesaurus.com/',
    'https://www.thesaurus.com/browse/encapsulate',
    'https://www.thesaurus.com/browse/connection?s=t',

    'https://httpbin.org/',
    'https://httpbin.org/#/HTTP_Methods',
    'https://httpbin.org/status/200'



    To put a number on it, in reality each batch is probably of length 25-50.




    *What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).



    **Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).










    share|improve this question
























      0












      0








      0








      From the aiohttp docs:




      [An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.




      I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.



      I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?



      It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,



      urls = 
      'https://aiohttp.readthedocs.io/en/stable/index.html',
      'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
      'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

      'https://www.thesaurus.com/',
      'https://www.thesaurus.com/browse/encapsulate',
      'https://www.thesaurus.com/browse/connection?s=t',

      'https://httpbin.org/',
      'https://httpbin.org/#/HTTP_Methods',
      'https://httpbin.org/status/200'



      To put a number on it, in reality each batch is probably of length 25-50.




      *What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).



      **Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).










      share|improve this question














      From the aiohttp docs:




      [An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.




      I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.



      I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?



      It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,



      urls = 
      'https://aiohttp.readthedocs.io/en/stable/index.html',
      'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
      'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

      'https://www.thesaurus.com/',
      'https://www.thesaurus.com/browse/encapsulate',
      'https://www.thesaurus.com/browse/connection?s=t',

      'https://httpbin.org/',
      'https://httpbin.org/#/HTTP_Methods',
      'https://httpbin.org/status/200'



      To put a number on it, in reality each batch is probably of length 25-50.




      *What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).



      **Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).







      python python-requests aiohttp






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 '18 at 14:06









      Brad SolomonBrad Solomon

      13.4k73482




      13.4k73482






















          1 Answer
          1






          active

          oldest

          votes


















          1














          You'd want to use a dedicated session with it's own connector when



          1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

          2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

          The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.



          But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.



          In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.



          For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.






          share|improve this answer























          • Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

            – Brad Solomon
            Nov 12 '18 at 14:53











          • Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

            – Brad Solomon
            Nov 12 '18 at 14:58











          • @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

            – Martijn Pieters
            Nov 12 '18 at 16:23






          • 1





            @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

            – Martijn Pieters
            Nov 12 '18 at 16:30











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263851%2fwhen-should-a-single-session-instance-be-used-for-requests%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You'd want to use a dedicated session with it's own connector when



          1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

          2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

          The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.



          But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.



          In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.



          For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.






          share|improve this answer























          • Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

            – Brad Solomon
            Nov 12 '18 at 14:53











          • Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

            – Brad Solomon
            Nov 12 '18 at 14:58











          • @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

            – Martijn Pieters
            Nov 12 '18 at 16:23






          • 1





            @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

            – Martijn Pieters
            Nov 12 '18 at 16:30
















          1














          You'd want to use a dedicated session with it's own connector when



          1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

          2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

          The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.



          But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.



          In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.



          For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.






          share|improve this answer























          • Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

            – Brad Solomon
            Nov 12 '18 at 14:53











          • Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

            – Brad Solomon
            Nov 12 '18 at 14:58











          • @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

            – Martijn Pieters
            Nov 12 '18 at 16:23






          • 1





            @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

            – Martijn Pieters
            Nov 12 '18 at 16:30














          1












          1








          1







          You'd want to use a dedicated session with it's own connector when



          1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

          2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

          The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.



          But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.



          In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.



          For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.






          share|improve this answer













          You'd want to use a dedicated session with it's own connector when



          1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

          2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

          The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.



          But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.



          In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.



          For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 12 '18 at 14:28









          Martijn PietersMartijn Pieters

          704k13324502278




          704k13324502278












          • Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

            – Brad Solomon
            Nov 12 '18 at 14:53











          • Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

            – Brad Solomon
            Nov 12 '18 at 14:58











          • @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

            – Martijn Pieters
            Nov 12 '18 at 16:23






          • 1





            @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

            – Martijn Pieters
            Nov 12 '18 at 16:30


















          • Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

            – Brad Solomon
            Nov 12 '18 at 14:53











          • Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

            – Brad Solomon
            Nov 12 '18 at 14:58











          • @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

            – Martijn Pieters
            Nov 12 '18 at 16:23






          • 1





            @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

            – Martijn Pieters
            Nov 12 '18 at 16:30

















          Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

          – Brad Solomon
          Nov 12 '18 at 14:53





          Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

          – Brad Solomon
          Nov 12 '18 at 14:53













          Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

          – Brad Solomon
          Nov 12 '18 at 14:58





          Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

          – Brad Solomon
          Nov 12 '18 at 14:58













          @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

          – Martijn Pieters
          Nov 12 '18 at 16:23





          @BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

          – Martijn Pieters
          Nov 12 '18 at 16:23




          1




          1





          @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

          – Martijn Pieters
          Nov 12 '18 at 16:30






          @BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

          – Martijn Pieters
          Nov 12 '18 at 16:30


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263851%2fwhen-should-a-single-session-instance-be-used-for-requests%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

          Syphilis

          Darth Vader #20