Cloning generators in Python without Tee










3















I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:




In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().




How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?










share|improve this question


























    3















    I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:




    In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().




    How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?










    share|improve this question
























      3












      3








      3








      I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:




      In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().




      How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?










      share|improve this question














      I am trying to clone a python generator object. I am bound by design so that I cannot make other changes, other than having a copy of the generator returned by a function and re-use it again. I am aware of the Itertools.tee() method, but the documentation says:




      In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().




      How do I implement this? Is it saying to convert the generator to a list, copy that and convert both the lists to iterators/generators?







      python python-3.x generator






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 7:16









      PrincePrince

      738




      738






















          2 Answers
          2






          active

          oldest

          votes


















          2














          Yes, it's just as you say (except you don't copy the list):




          Is it saying to convert the generator to a list, copy that and convert
          both the lists to iterators/generators?




          Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:



          def orig(n):
          yield from range(n)

          orig_gen = orig(100)

          for i in range(90):
          next(orig_gen)

          # now we have 10 values left in gen
          values_left = list(orig_gen)

          def copy():
          yield from values_left

          copy_gen1 = copy()
          copy_gen2 = copy()
          print(next(copy_gen1))
          print(next(copy_gen2))


          This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.



          That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:



          • create the original generator and use it a bit

          • create 5 copies with tee(). Each copy has an empty buffer

          • call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

          • call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

          • call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

          • call next() on copy 3 twice. Both times we can simply read the value from the buffer.

          If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.



          But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.






          share|improve this answer























          • thanks for the awesome explanation.. :)

            – Prince
            Nov 13 '18 at 8:56



















          2














          The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:



          >>> def tee_two(iterable):
          ... mem = list(iterable)
          ... return iter(mem), iter(mem)
          ...
          >>> en = enumerate('abc')
          >>> next(en)
          (0, 'a')
          >>> it1, it2 = tee_two(en)
          >>> for i, x in it1:
          ... print(i, x)
          ...
          1 b
          2 c
          >>> for i, x in it2:
          ... print(i, x)
          ...
          1 b
          2 c


          Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.






          share|improve this answer

























          • much thanks @juanpa for your answer, i wish I could accept 2 answers here.

            – Prince
            Nov 13 '18 at 9:03










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275732%2fcloning-generators-in-python-without-tee%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          Yes, it's just as you say (except you don't copy the list):




          Is it saying to convert the generator to a list, copy that and convert
          both the lists to iterators/generators?




          Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:



          def orig(n):
          yield from range(n)

          orig_gen = orig(100)

          for i in range(90):
          next(orig_gen)

          # now we have 10 values left in gen
          values_left = list(orig_gen)

          def copy():
          yield from values_left

          copy_gen1 = copy()
          copy_gen2 = copy()
          print(next(copy_gen1))
          print(next(copy_gen2))


          This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.



          That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:



          • create the original generator and use it a bit

          • create 5 copies with tee(). Each copy has an empty buffer

          • call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

          • call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

          • call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

          • call next() on copy 3 twice. Both times we can simply read the value from the buffer.

          If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.



          But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.






          share|improve this answer























          • thanks for the awesome explanation.. :)

            – Prince
            Nov 13 '18 at 8:56
















          2














          Yes, it's just as you say (except you don't copy the list):




          Is it saying to convert the generator to a list, copy that and convert
          both the lists to iterators/generators?




          Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:



          def orig(n):
          yield from range(n)

          orig_gen = orig(100)

          for i in range(90):
          next(orig_gen)

          # now we have 10 values left in gen
          values_left = list(orig_gen)

          def copy():
          yield from values_left

          copy_gen1 = copy()
          copy_gen2 = copy()
          print(next(copy_gen1))
          print(next(copy_gen2))


          This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.



          That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:



          • create the original generator and use it a bit

          • create 5 copies with tee(). Each copy has an empty buffer

          • call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

          • call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

          • call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

          • call next() on copy 3 twice. Both times we can simply read the value from the buffer.

          If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.



          But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.






          share|improve this answer























          • thanks for the awesome explanation.. :)

            – Prince
            Nov 13 '18 at 8:56














          2












          2








          2







          Yes, it's just as you say (except you don't copy the list):




          Is it saying to convert the generator to a list, copy that and convert
          both the lists to iterators/generators?




          Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:



          def orig(n):
          yield from range(n)

          orig_gen = orig(100)

          for i in range(90):
          next(orig_gen)

          # now we have 10 values left in gen
          values_left = list(orig_gen)

          def copy():
          yield from values_left

          copy_gen1 = copy()
          copy_gen2 = copy()
          print(next(copy_gen1))
          print(next(copy_gen2))


          This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.



          That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:



          • create the original generator and use it a bit

          • create 5 copies with tee(). Each copy has an empty buffer

          • call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

          • call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

          • call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

          • call next() on copy 3 twice. Both times we can simply read the value from the buffer.

          If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.



          But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.






          share|improve this answer













          Yes, it's just as you say (except you don't copy the list):




          Is it saying to convert the generator to a list, copy that and convert
          both the lists to iterators/generators?




          Let's say you have a generator and you want to make five copies of it. Each of these copies needs to yield the same values as your original generator. A simple solution would be to get all the values from your generator, for example by converting it to a list, and then using this list to produce new generators:



          def orig(n):
          yield from range(n)

          orig_gen = orig(100)

          for i in range(90):
          next(orig_gen)

          # now we have 10 values left in gen
          values_left = list(orig_gen)

          def copy():
          yield from values_left

          copy_gen1 = copy()
          copy_gen2 = copy()
          print(next(copy_gen1))
          print(next(copy_gen2))


          This can become very expensive though. The purpose of a generator is to produce new values dynamically. If you convert a generator to a list, you have to do all the calculations necessary to get those values. Also, if the generator produces many values, you will end up using huge amounts of memory.



          That's why tee() offers a buffered approach. You have to specify how many copies of the generator you want and then tee() sets up a deque (a list with fast appends and pops) for each copy. When you request a new value from one of your copied generators, it is taken from the buffer. Only if the buffer is empty, new values are produced from the original generator. The source is given in the docs. Let's say you want to have 5 copies, using tee() could look like this:



          • create the original generator and use it a bit

          • create 5 copies with tee(). Each copy has an empty buffer

          • call next() on copy 1. Since the buffer is empty, we call next() on the original generator. The value is added to all the buffers. It is immediately popped from buffer 1 and returned

          • call next() on copy 2. The buffer for copy 2 already contains this value, we pop it from the buffer and return it. No use of the original generator is necessary.

          • call next() on copy 1. In the previous step we didn't have to use the original generator, so our buffer is still empty. We call next() on the original generator. The value is added to all buffers. It is immediately taken out of buffer 1 and returned

          • call next() on copy 3 twice. Both times we can simply read the value from the buffer.

          If you call next() on the copies in roughly the same frequency, this is very efficient. The buffers don't grow since values are removed from them evenly. So you don't have to store many values in memory.



          But if you only use one of the copies, then the buffers for the other copies grow larger and larger. In an extreme case, if you exhaust copy 1, before you touch the other copies, their buffers become lists with all the values from the generator. Now you have 4 lists with all the values. Instead of just 1 list with all values in the simple approach.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 '18 at 7:39









          lhklhk

          6,82195590




          6,82195590












          • thanks for the awesome explanation.. :)

            – Prince
            Nov 13 '18 at 8:56


















          • thanks for the awesome explanation.. :)

            – Prince
            Nov 13 '18 at 8:56

















          thanks for the awesome explanation.. :)

          – Prince
          Nov 13 '18 at 8:56






          thanks for the awesome explanation.. :)

          – Prince
          Nov 13 '18 at 8:56














          2














          The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:



          >>> def tee_two(iterable):
          ... mem = list(iterable)
          ... return iter(mem), iter(mem)
          ...
          >>> en = enumerate('abc')
          >>> next(en)
          (0, 'a')
          >>> it1, it2 = tee_two(en)
          >>> for i, x in it1:
          ... print(i, x)
          ...
          1 b
          2 c
          >>> for i, x in it2:
          ... print(i, x)
          ...
          1 b
          2 c


          Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.






          share|improve this answer

























          • much thanks @juanpa for your answer, i wish I could accept 2 answers here.

            – Prince
            Nov 13 '18 at 9:03















          2














          The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:



          >>> def tee_two(iterable):
          ... mem = list(iterable)
          ... return iter(mem), iter(mem)
          ...
          >>> en = enumerate('abc')
          >>> next(en)
          (0, 'a')
          >>> it1, it2 = tee_two(en)
          >>> for i, x in it1:
          ... print(i, x)
          ...
          1 b
          2 c
          >>> for i, x in it2:
          ... print(i, x)
          ...
          1 b
          2 c


          Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.






          share|improve this answer

























          • much thanks @juanpa for your answer, i wish I could accept 2 answers here.

            – Prince
            Nov 13 '18 at 9:03













          2












          2








          2







          The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:



          >>> def tee_two(iterable):
          ... mem = list(iterable)
          ... return iter(mem), iter(mem)
          ...
          >>> en = enumerate('abc')
          >>> next(en)
          (0, 'a')
          >>> it1, it2 = tee_two(en)
          >>> for i, x in it1:
          ... print(i, x)
          ...
          1 b
          2 c
          >>> for i, x in it2:
          ... print(i, x)
          ...
          1 b
          2 c


          Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.






          share|improve this answer















          The docs are saying that you could achieve the same thing by materializing your iterable into a list then "copy" the iterable using the list as auxiliary storage, something to the effect of:



          >>> def tee_two(iterable):
          ... mem = list(iterable)
          ... return iter(mem), iter(mem)
          ...
          >>> en = enumerate('abc')
          >>> next(en)
          (0, 'a')
          >>> it1, it2 = tee_two(en)
          >>> for i, x in it1:
          ... print(i, x)
          ...
          1 b
          2 c
          >>> for i, x in it2:
          ... print(i, x)
          ...
          1 b
          2 c


          Of course, this requires materializing the rest of your iterator, and it is not memory efficient -- what if you have an infinite iterator? -- but if as the docs state if "one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()" and potentially not much worse memory-wise.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 7:45

























          answered Nov 13 '18 at 7:33









          juanpa.arrivillagajuanpa.arrivillaga

          38k33672




          38k33672












          • much thanks @juanpa for your answer, i wish I could accept 2 answers here.

            – Prince
            Nov 13 '18 at 9:03

















          • much thanks @juanpa for your answer, i wish I could accept 2 answers here.

            – Prince
            Nov 13 '18 at 9:03
















          much thanks @juanpa for your answer, i wish I could accept 2 answers here.

          – Prince
          Nov 13 '18 at 9:03





          much thanks @juanpa for your answer, i wish I could accept 2 answers here.

          – Prince
          Nov 13 '18 at 9:03

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275732%2fcloning-generators-in-python-without-tee%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Use pre created SQLite database for Android project in kotlin

          Darth Vader #20

          Ondo