Cloning generators in Python without Tee
I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?
python python-3.x generator
add a comment |
I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?
python python-3.x generator
add a comment |
I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?
python python-3.x generator
I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:
In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?
python python-3.x generator
python python-3.x generator
asked Nov 13 '18 at 7:16
PrincePrince
738
738
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Yes, it's just as you say (except you don't copy the list):
Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?
Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:
def orig(n):
yield from range(n)
orig_gen = orig(100)
for i in range(90):
next(orig_gen)
# now we have 10 values left in gen
values_left = list(orig_gen)
def copy():
yield from values_left
copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))
This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.
That's why tee()
offers a buffered approach. You have to specify how many copies of the generator you want and then tee()
sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee()
could look like this:
- create the original generator and use it a bit
- create 5 copies with
tee().
Each copy has an empty buffer - call
next()
on copy 1. Since the buffer is empty, we callnext()
on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned - call
next()
on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary. - call
next()
on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We callnext()
on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned - call
next()
on copy 3 twice. Both times we can simply read the value from the buffer.
If you call next()
on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.
But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
add a comment |
The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:
>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c
Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275732%2fcloning-generators-in-python-without-tee%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, it's just as you say (except you don't copy the list):
Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?
Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:
def orig(n):
yield from range(n)
orig_gen = orig(100)
for i in range(90):
next(orig_gen)
# now we have 10 values left in gen
values_left = list(orig_gen)
def copy():
yield from values_left
copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))
This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.
That's why tee()
offers a buffered approach. You have to specify how many copies of the generator you want and then tee()
sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee()
could look like this:
- create the original generator and use it a bit
- create 5 copies with
tee().
Each copy has an empty buffer - call
next()
on copy 1. Since the buffer is empty, we callnext()
on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned - call
next()
on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary. - call
next()
on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We callnext()
on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned - call
next()
on copy 3 twice. Both times we can simply read the value from the buffer.
If you call next()
on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.
But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
add a comment |
Yes, it's just as you say (except you don't copy the list):
Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?
Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:
def orig(n):
yield from range(n)
orig_gen = orig(100)
for i in range(90):
next(orig_gen)
# now we have 10 values left in gen
values_left = list(orig_gen)
def copy():
yield from values_left
copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))
This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.
That's why tee()
offers a buffered approach. You have to specify how many copies of the generator you want and then tee()
sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee()
could look like this:
- create the original generator and use it a bit
- create 5 copies with
tee().
Each copy has an empty buffer - call
next()
on copy 1. Since the buffer is empty, we callnext()
on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned - call
next()
on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary. - call
next()
on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We callnext()
on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned - call
next()
on copy 3 twice. Both times we can simply read the value from the buffer.
If you call next()
on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.
But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
add a comment |
Yes, it's just as you say (except you don't copy the list):
Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?
Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:
def orig(n):
yield from range(n)
orig_gen = orig(100)
for i in range(90):
next(orig_gen)
# now we have 10 values left in gen
values_left = list(orig_gen)
def copy():
yield from values_left
copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))
This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.
That's why tee()
offers a buffered approach. You have to specify how many copies of the generator you want and then tee()
sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee()
could look like this:
- create the original generator and use it a bit
- create 5 copies with
tee().
Each copy has an empty buffer - call
next()
on copy 1. Since the buffer is empty, we callnext()
on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned - call
next()
on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary. - call
next()
on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We callnext()
on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned - call
next()
on copy 3 twice. Both times we can simply read the value from the buffer.
If you call next()
on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.
But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.
Yes, it's just as you say (except you don't copy the list):
Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?
Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:
def orig(n):
yield from range(n)
orig_gen = orig(100)
for i in range(90):
next(orig_gen)
# now we have 10 values left in gen
values_left = list(orig_gen)
def copy():
yield from values_left
copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))
This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.
That's why tee()
offers a buffered approach. You have to specify how many copies of the generator you want and then tee()
sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee()
could look like this:
- create the original generator and use it a bit
- create 5 copies with
tee().
Each copy has an empty buffer - call
next()
on copy 1. Since the buffer is empty, we callnext()
on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned - call
next()
on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary. - call
next()
on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We callnext()
on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned - call
next()
on copy 3 twice. Both times we can simply read the value from the buffer.
If you call next()
on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.
But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.
answered Nov 13 '18 at 7:39
lhklhk
6,82195590
6,82195590
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
add a comment |
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
thanks for the awesome explanation.. :)
– Prince
Nov 13 '18 at 8:56
add a comment |
The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:
>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c
Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
add a comment |
The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:
>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c
Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
add a comment |
The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:
>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c
Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.
The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:
>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c
Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.
edited Nov 13 '18 at 7:45
answered Nov 13 '18 at 7:33
juanpa.arrivillagajuanpa.arrivillaga
38k33672
38k33672
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
add a comment |
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
much thanks @juanpa for your answer, i wish I could accept 2 answers here.
– Prince
Nov 13 '18 at 9:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275732%2fcloning-generators-in-python-without-tee%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown