When should a single session instance be used for requests?
From the aiohttp
docs:
[An
aiohttp.ClientSession
] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.
I have almost always used the practice of keeping a single ClientSession
instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.
I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession
be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?
It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,
urls =
'https://aiohttp.readthedocs.io/en/stable/index.html',
'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',
'https://www.thesaurus.com/',
'https://www.thesaurus.com/browse/encapsulate',
'https://www.thesaurus.com/browse/connection?s=t',
'https://httpbin.org/',
'https://httpbin.org/#/HTTP_Methods',
'https://httpbin.org/status/200'
To put a number on it, in reality each batch is probably of length 25-50.
*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession
, which is aiohttp.TCPConnector(limit_per_host=10)
.
**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org'
i.e. set(urllib.parse.urlsplit(u).netloc for u in urls)
.
python python-requests aiohttp
add a comment |
From the aiohttp
docs:
[An
aiohttp.ClientSession
] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.
I have almost always used the practice of keeping a single ClientSession
instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.
I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession
be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?
It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,
urls =
'https://aiohttp.readthedocs.io/en/stable/index.html',
'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',
'https://www.thesaurus.com/',
'https://www.thesaurus.com/browse/encapsulate',
'https://www.thesaurus.com/browse/connection?s=t',
'https://httpbin.org/',
'https://httpbin.org/#/HTTP_Methods',
'https://httpbin.org/status/200'
To put a number on it, in reality each batch is probably of length 25-50.
*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession
, which is aiohttp.TCPConnector(limit_per_host=10)
.
**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org'
i.e. set(urllib.parse.urlsplit(u).netloc for u in urls)
.
python python-requests aiohttp
add a comment |
From the aiohttp
docs:
[An
aiohttp.ClientSession
] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.
I have almost always used the practice of keeping a single ClientSession
instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.
I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession
be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?
It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,
urls =
'https://aiohttp.readthedocs.io/en/stable/index.html',
'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',
'https://www.thesaurus.com/',
'https://www.thesaurus.com/browse/encapsulate',
'https://www.thesaurus.com/browse/connection?s=t',
'https://httpbin.org/',
'https://httpbin.org/#/HTTP_Methods',
'https://httpbin.org/status/200'
To put a number on it, in reality each batch is probably of length 25-50.
*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession
, which is aiohttp.TCPConnector(limit_per_host=10)
.
**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org'
i.e. set(urllib.parse.urlsplit(u).netloc for u in urls)
.
python python-requests aiohttp
From the aiohttp
docs:
[An
aiohttp.ClientSession
] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.
I have almost always used the practice of keeping a single ClientSession
instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.
I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession
be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?
It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,
urls =
'https://aiohttp.readthedocs.io/en/stable/index.html',
'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',
'https://www.thesaurus.com/',
'https://www.thesaurus.com/browse/encapsulate',
'https://www.thesaurus.com/browse/connection?s=t',
'https://httpbin.org/',
'https://httpbin.org/#/HTTP_Methods',
'https://httpbin.org/status/200'
To put a number on it, in reality each batch is probably of length 25-50.
*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession
, which is aiohttp.TCPConnector(limit_per_host=10)
.
**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org'
i.e. set(urllib.parse.urlsplit(u).netloc for u in urls)
.
python python-requests aiohttp
python python-requests aiohttp
asked Nov 12 '18 at 14:06
Brad SolomonBrad Solomon
13.4k73482
13.4k73482
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You'd want to use a dedicated session with it's own connector when
- You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
- You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.
The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.
But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.
In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.
For comparison, the requests
library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.
Thank you. To confirm my understanding--thelimit
argument foraiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?
– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) thelimit
andlimit_per_host
parameters to theconnector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?
– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request forexample.com/foo/bar
followed byexample.com/spam/ham
can reuse the open connection toexample.com
(provided another asyncio task is not still using it).
– Martijn Pieters♦
Nov 12 '18 at 16:23
1
@BradSolomon: and no, manipulatinglimit
andlimit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up thelimit
pool, however
– Martijn Pieters♦
Nov 12 '18 at 16:30
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263851%2fwhen-should-a-single-session-instance-be-used-for-requests%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You'd want to use a dedicated session with it's own connector when
- You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
- You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.
The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.
But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.
In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.
For comparison, the requests
library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.
Thank you. To confirm my understanding--thelimit
argument foraiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?
– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) thelimit
andlimit_per_host
parameters to theconnector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?
– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request forexample.com/foo/bar
followed byexample.com/spam/ham
can reuse the open connection toexample.com
(provided another asyncio task is not still using it).
– Martijn Pieters♦
Nov 12 '18 at 16:23
1
@BradSolomon: and no, manipulatinglimit
andlimit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up thelimit
pool, however
– Martijn Pieters♦
Nov 12 '18 at 16:30
add a comment |
You'd want to use a dedicated session with it's own connector when
- You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
- You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.
The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.
But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.
In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.
For comparison, the requests
library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.
Thank you. To confirm my understanding--thelimit
argument foraiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?
– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) thelimit
andlimit_per_host
parameters to theconnector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?
– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request forexample.com/foo/bar
followed byexample.com/spam/ham
can reuse the open connection toexample.com
(provided another asyncio task is not still using it).
– Martijn Pieters♦
Nov 12 '18 at 16:23
1
@BradSolomon: and no, manipulatinglimit
andlimit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up thelimit
pool, however
– Martijn Pieters♦
Nov 12 '18 at 16:30
add a comment |
You'd want to use a dedicated session with it's own connector when
- You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
- You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.
The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.
But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.
In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.
For comparison, the requests
library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.
You'd want to use a dedicated session with it's own connector when
- You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
- You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.
The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.
But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.
In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.
For comparison, the requests
library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.
answered Nov 12 '18 at 14:28
Martijn Pieters♦Martijn Pieters
704k13324502278
704k13324502278
Thank you. To confirm my understanding--thelimit
argument foraiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?
– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) thelimit
andlimit_per_host
parameters to theconnector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?
– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request forexample.com/foo/bar
followed byexample.com/spam/ham
can reuse the open connection toexample.com
(provided another asyncio task is not still using it).
– Martijn Pieters♦
Nov 12 '18 at 16:23
1
@BradSolomon: and no, manipulatinglimit
andlimit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up thelimit
pool, however
– Martijn Pieters♦
Nov 12 '18 at 16:30
add a comment |
Thank you. To confirm my understanding--thelimit
argument foraiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?
– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) thelimit
andlimit_per_host
parameters to theconnector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?
– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request forexample.com/foo/bar
followed byexample.com/spam/ham
can reuse the open connection toexample.com
(provided another asyncio task is not still using it).
– Martijn Pieters♦
Nov 12 '18 at 16:23
1
@BradSolomon: and no, manipulatinglimit
andlimit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up thelimit
pool, however
– Martijn Pieters♦
Nov 12 '18 at 16:30
Thank you. To confirm my understanding--the
limit
argument for aiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?– Brad Solomon
Nov 12 '18 at 14:53
Thank you. To confirm my understanding--the
limit
argument for aiohttp.TCPConnector
(default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?– Brad Solomon
Nov 12 '18 at 14:53
Secondly, your answer would suggest using a single session and manipulating (1) the
limit
and limit_per_host
parameters to the connector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?– Brad Solomon
Nov 12 '18 at 14:58
Secondly, your answer would suggest using a single session and manipulating (1) the
limit
and limit_per_host
parameters to the connector
and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?– Brad Solomon
Nov 12 '18 at 14:58
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for
example.com/foo/bar
followed by example.com/spam/ham
can reuse the open connection to example.com
(provided another asyncio task is not still using it).– Martijn Pieters♦
Nov 12 '18 at 16:23
@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for
example.com/foo/bar
followed by example.com/spam/ham
can reuse the open connection to example.com
(provided another asyncio task is not still using it).– Martijn Pieters♦
Nov 12 '18 at 16:23
1
1
@BradSolomon: and no, manipulating
limit
and limit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit
pool, however– Martijn Pieters♦
Nov 12 '18 at 16:30
@BradSolomon: and no, manipulating
limit
and limit_per_host
will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit
pool, however– Martijn Pieters♦
Nov 12 '18 at 16:30
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263851%2fwhen-should-a-single-session-instance-be-used-for-requests%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown