Cloning generators in Python without Tee

I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?

asked Nov 13 '18 at 7:16

Prince

738

add a comment |

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?

asked Nov 13 '18 at 7:16

Prince

738

add a comment |

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?

asked Nov 13 '18 at 7:16

Prince

738

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?

python python-3.x generator

asked Nov 13 '18 at 7:16

Prince

738

asked Nov 13 '18 at 7:16

Prince

738

asked Nov 13 '18 at 7:16

Prince

738

asked Nov 13 '18 at 7:16

Prince

738

asked Nov 13 '18 at 7:16

Prince

738

add a comment |

2 Answers
2

active

oldest

votes

Yes, it's just as you say (except you don't copy the list):

Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?

Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:

def orig(n):
 yield from range(n)

orig_gen = orig(100)

for i in range(90):
 next(orig_gen)

# now we have 10 values left in gen
values_left = list(orig_gen)

def copy():
 yield from values_left

copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))

This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.

That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:

create the original generator and use it a bit

create 5 copies with tee(). Each copy has an empty buffer

call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

call next() on copy 3 twice. Both times we can simply read the value from the buffer.

If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.

But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.

answered Nov 13 '18 at 7:39

lhk

6,82195590

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

add a comment |

The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:

>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c

Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275732%2fcloning-generators-in-python-without-tee%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Yes, it's just as you say (except you don't copy the list):

Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?

def orig(n):
 yield from range(n)

orig_gen = orig(100)

for i in range(90):
 next(orig_gen)

# now we have 10 values left in gen
values_left = list(orig_gen)

def copy():
 yield from values_left

copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))

create the original generator and use it a bit

create 5 copies with tee(). Each copy has an empty buffer

call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

call next() on copy 3 twice. Both times we can simply read the value from the buffer.

answered Nov 13 '18 at 7:39

lhk

6,82195590

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

add a comment |

Yes, it's just as you say (except you don't copy the list):

Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?

def orig(n):
 yield from range(n)

orig_gen = orig(100)

for i in range(90):
 next(orig_gen)

# now we have 10 values left in gen
values_left = list(orig_gen)

def copy():
 yield from values_left

copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))

create the original generator and use it a bit

create 5 copies with tee(). Each copy has an empty buffer

call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

call next() on copy 3 twice. Both times we can simply read the value from the buffer.

answered Nov 13 '18 at 7:39

lhk

6,82195590

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

add a comment |

Yes, it's just as you say (except you don't copy the list):

Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?

def orig(n):
 yield from range(n)

orig_gen = orig(100)

for i in range(90):
 next(orig_gen)

# now we have 10 values left in gen
values_left = list(orig_gen)

def copy():
 yield from values_left

copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))

create the original generator and use it a bit

create 5 copies with tee(). Each copy has an empty buffer

call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

call next() on copy 3 twice. Both times we can simply read the value from the buffer.

answered Nov 13 '18 at 7:39

lhk

6,82195590

Yes, it's just as you say (except you don't copy the list):

Is it saying to convert the generator to a list, copy that and convert
both the lists to iterators/generators?

def orig(n):
 yield from range(n)

orig_gen = orig(100)

for i in range(90):
 next(orig_gen)

# now we have 10 values left in gen
values_left = list(orig_gen)

def copy():
 yield from values_left

copy_gen1 = copy()
copy_gen2 = copy()
print(next(copy_gen1))
print(next(copy_gen2))

create the original generator and use it a bit

create 5 copies with tee(). Each copy has an empty buffer

call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

call next() on copy 3 twice. Both times we can simply read the value from the buffer.

answered Nov 13 '18 at 7:39

lhk

6,82195590

answered Nov 13 '18 at 7:39

lhk

6,82195590

answered Nov 13 '18 at 7:39

lhk

6,82195590

answered Nov 13 '18 at 7:39

lhk

6,82195590

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

add a comment |

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

thanks for the awesome explanation.. :)

– Prince
Nov 13 '18 at 8:56

add a comment |

The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:

>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

add a comment |

The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:

>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

add a comment |

The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:

>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:

>>> def tee_two(iterable):
... mem = list(iterable)
... return iter(mem), iter(mem)
...
>>> en = enumerate('abc')
>>> next(en)
(0, 'a')
>>> it1, it2 = tee_two(en)
>>> for i, x in it1:
... print(i, x)
...
1 b
2 c
>>> for i, x in it2:
... print(i, x)
...
1 b
2 c

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

edited Nov 13 '18 at 7:45

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

answered Nov 13 '18 at 7:33

juanpa.arrivillaga

38k33672

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

add a comment |

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

much thanks @juanpa for your answer, i wish I could accept 2 answers here.

– Prince
Nov 13 '18 at 9:03

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb