When should a single session instance be used for requests?

From the aiohttp docs:

[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.

I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.

I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?

It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,

urls = 
 'https://aiohttp.readthedocs.io/en/stable/index.html',
 'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
 'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

 'https://www.thesaurus.com/',
 'https://www.thesaurus.com/browse/encapsulate',
 'https://www.thesaurus.com/browse/connection?s=t',

 'https://httpbin.org/',
 'https://httpbin.org/#/HTTP_Methods',
 'https://httpbin.org/status/200'

To put a number on it, in reality each batch is probably of length 25-50.

_{*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).}

_{**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).}

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

add a comment |

From the aiohttp docs:

[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.

It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,

urls = 
 'https://aiohttp.readthedocs.io/en/stable/index.html',
 'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
 'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

 'https://www.thesaurus.com/',
 'https://www.thesaurus.com/browse/encapsulate',
 'https://www.thesaurus.com/browse/connection?s=t',

 'https://httpbin.org/',
 'https://httpbin.org/#/HTTP_Methods',
 'https://httpbin.org/status/200'

To put a number on it, in reality each batch is probably of length 25-50.

_{*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).}

_{**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).}

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

add a comment |

From the aiohttp docs:

[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.

It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,

urls = 
 'https://aiohttp.readthedocs.io/en/stable/index.html',
 'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
 'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

 'https://www.thesaurus.com/',
 'https://www.thesaurus.com/browse/encapsulate',
 'https://www.thesaurus.com/browse/connection?s=t',

 'https://httpbin.org/',
 'https://httpbin.org/#/HTTP_Methods',
 'https://httpbin.org/status/200'

To put a number on it, in reality each batch is probably of length 25-50.

_{*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).}

_{**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).}

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

From the aiohttp docs:

[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.

It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,

urls = 
 'https://aiohttp.readthedocs.io/en/stable/index.html',
 'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
 'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

 'https://www.thesaurus.com/',
 'https://www.thesaurus.com/browse/encapsulate',
 'https://www.thesaurus.com/browse/connection?s=t',

 'https://httpbin.org/',
 'https://httpbin.org/#/HTTP_Methods',
 'https://httpbin.org/status/200'

To put a number on it, in reality each batch is probably of length 25-50.

_{*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).}

_{**Specifically, 'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org' i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).}

python python-requests aiohttp

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

asked Nov 12 '18 at 14:06

Brad Solomon

13.4k73482

add a comment |

1 Answer
1

active

oldest

votes

You'd want to use a dedicated session with it's own connector when

You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.

But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

1

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263851%2fwhen-should-a-single-session-instance-be-used-for-requests%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You'd want to use a dedicated session with it's own connector when

You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

1

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

add a comment |

You'd want to use a dedicated session with it's own connector when

You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

1

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

add a comment |

You'd want to use a dedicated session with it's own connector when

You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

You'd want to use a dedicated session with it's own connector when

You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).

You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

answered Nov 12 '18 at 14:28

Martijn Pieters♦

704k13324502278

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

1

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

add a comment |

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

1

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

Thank you. To confirm my understanding--the limit argument for aiohttp.TCPConnector (default 100) refers to the max number of open connections to a set of hosts, rather than individual/full URIs?

– Brad Solomon
Nov 12 '18 at 14:53

Secondly, your answer would suggest using a single session and manipulating (1) the limit and limit_per_host parameters to the connector and (2) the parameters to the session instance itself, rather than creating a separate session for each batch. ("But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer."--it seems the easier route would be to simply increase connection limits on 1 session.) Am I interpreting that correctly?

– Brad Solomon
Nov 12 '18 at 14:58

@BradSolomon: yes, the connection pool limit is there for the number of connections, which are per host. A request for example.com/foo/bar followed by example.com/spam/ham can reuse the open connection to example.com (provided another asyncio task is not still using it).

– Martijn Pieters♦
Nov 12 '18 at 16:23

@BradSolomon: and no, manipulating limit and limit_per_host will not help in the scenario outlined. Say the DMOZ project was still alive and you were trying to scrape it. You'd run a series of tasks to connect to DMOZ, retrieve URLs for other sites, and then scrape those other sites. That's a lot of URLs, and you would not want to lose connections to DMOZ. A limit_per_host would cap DMOZ connections, connections to a huge number of other domains would never hit limit_per_host. A bunch of slow external URLs would use up the limit pool, however

– Martijn Pieters♦
Nov 12 '18 at 16:30

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb