PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers?










1















Here x_dat and y_dat are just really long 1-dimensional tensors.



class FunctionDataset(Dataset):
def __init__(self):
x_dat, y_dat = data_product()

self.length = len(x_dat)
self.y_dat = y_dat
self.x_dat = x_dat

def __getitem__(self, index):
sample = self.x_dat[index]
label = self.y_dat[index]
return sample, label

def __len__(self):
return self.length

...

data_set = FunctionDataset()

...

training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)

training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)


I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:



x_val, target = next(iter(training_loader))


The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:



16276989 function calls (16254744 primitive calls) in 38.779 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
442 6.369 0.014 6.369 0.014 built-in method stack
3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
222 1.982 0.009 1.982 0.009 built-in method randperm
663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....


Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?










share|improve this question




























    1















    Here x_dat and y_dat are just really long 1-dimensional tensors.



    class FunctionDataset(Dataset):
    def __init__(self):
    x_dat, y_dat = data_product()

    self.length = len(x_dat)
    self.y_dat = y_dat
    self.x_dat = x_dat

    def __getitem__(self, index):
    sample = self.x_dat[index]
    label = self.y_dat[index]
    return sample, label

    def __len__(self):
    return self.length

    ...

    data_set = FunctionDataset()

    ...

    training_sampler = SubsetRandomSampler(train_indices)
    validation_sampler = SubsetRandomSampler(validation_indices)

    training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
    validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)


    I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:



    x_val, target = next(iter(training_loader))


    The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:



    16276989 function calls (16254744 primitive calls) in 38.779 seconds

    Ordered by: cumulative time

    ncalls tottime percall cumtime percall filename:lineno(function)
    1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
    1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
    1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
    1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
    1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
    705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
    222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
    222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
    3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
    21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
    443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
    663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
    221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
    442 6.369 0.014 6.369 0.014 built-in method stack
    3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
    3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
    222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
    222 1.982 0.009 1.982 0.009 built-in method randperm
    663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
    221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
    ....


    Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?










    share|improve this question


























      1












      1








      1


      1






      Here x_dat and y_dat are just really long 1-dimensional tensors.



      class FunctionDataset(Dataset):
      def __init__(self):
      x_dat, y_dat = data_product()

      self.length = len(x_dat)
      self.y_dat = y_dat
      self.x_dat = x_dat

      def __getitem__(self, index):
      sample = self.x_dat[index]
      label = self.y_dat[index]
      return sample, label

      def __len__(self):
      return self.length

      ...

      data_set = FunctionDataset()

      ...

      training_sampler = SubsetRandomSampler(train_indices)
      validation_sampler = SubsetRandomSampler(validation_indices)

      training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
      validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)


      I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:



      x_val, target = next(iter(training_loader))


      The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:



      16276989 function calls (16254744 primitive calls) in 38.779 seconds

      Ordered by: cumulative time

      ncalls tottime percall cumtime percall filename:lineno(function)
      1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
      1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
      1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
      1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
      1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
      705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
      222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
      222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
      3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
      21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
      443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
      663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
      221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
      442 6.369 0.014 6.369 0.014 built-in method stack
      3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
      3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
      222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
      222 1.982 0.009 1.982 0.009 built-in method randperm
      663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
      221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
      ....


      Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?










      share|improve this question
















      Here x_dat and y_dat are just really long 1-dimensional tensors.



      class FunctionDataset(Dataset):
      def __init__(self):
      x_dat, y_dat = data_product()

      self.length = len(x_dat)
      self.y_dat = y_dat
      self.x_dat = x_dat

      def __getitem__(self, index):
      sample = self.x_dat[index]
      label = self.y_dat[index]
      return sample, label

      def __len__(self):
      return self.length

      ...

      data_set = FunctionDataset()

      ...

      training_sampler = SubsetRandomSampler(train_indices)
      validation_sampler = SubsetRandomSampler(validation_indices)

      training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
      validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)


      I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:



      x_val, target = next(iter(training_loader))


      The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:



      16276989 function calls (16254744 primitive calls) in 38.779 seconds

      Ordered by: cumulative time

      ncalls tottime percall cumtime percall filename:lineno(function)
      1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
      1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
      1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
      1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
      1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
      705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
      222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
      222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
      3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
      21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
      443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
      663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
      221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
      442 6.369 0.014 6.369 0.014 built-in method stack
      3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
      3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
      222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
      222 1.982 0.009 1.982 0.009 built-in method randperm
      663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
      221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
      ....


      Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?







      python performance machine-learning iterator pytorch






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 4:47









      Milo Lu

      1,62711527




      1,62711527










      asked Nov 13 '18 at 12:24









      ZirconCodeZirconCode

      484621




      484621






















          1 Answer
          1






          active

          oldest

          votes


















          2














          When retrieving a batch with



          x, y = next(iter(training_loader))


          you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

          What you should do instead is create the iterator once (per epoch):



          training_loader_iter = iter(training_loader)


          and then call next for each batch on the iterator



          for i in range(num_batches_in_epoch):
          x, y = next(training_loader_iter)


          I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.






          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280967%2fpytorch-nextitertraining-loader-extremely-slow-simple-data-cant-num-worke%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            When retrieving a batch with



            x, y = next(iter(training_loader))


            you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

            What you should do instead is create the iterator once (per epoch):



            training_loader_iter = iter(training_loader)


            and then call next for each batch on the iterator



            for i in range(num_batches_in_epoch):
            x, y = next(training_loader_iter)


            I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.






            share|improve this answer



























              2














              When retrieving a batch with



              x, y = next(iter(training_loader))


              you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

              What you should do instead is create the iterator once (per epoch):



              training_loader_iter = iter(training_loader)


              and then call next for each batch on the iterator



              for i in range(num_batches_in_epoch):
              x, y = next(training_loader_iter)


              I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.






              share|improve this answer

























                2












                2








                2







                When retrieving a batch with



                x, y = next(iter(training_loader))


                you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

                What you should do instead is create the iterator once (per epoch):



                training_loader_iter = iter(training_loader)


                and then call next for each batch on the iterator



                for i in range(num_batches_in_epoch):
                x, y = next(training_loader_iter)


                I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.






                share|improve this answer













                When retrieving a batch with



                x, y = next(iter(training_loader))


                you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

                What you should do instead is create the iterator once (per epoch):



                training_loader_iter = iter(training_loader)


                and then call next for each batch on the iterator



                for i in range(num_batches_in_epoch):
                x, y = next(training_loader_iter)


                I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 14 '18 at 5:58









                ShaiShai

                70.1k23137245




                70.1k23137245





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280967%2fpytorch-nextitertraining-loader-extremely-slow-simple-data-cant-num-worke%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Use pre created SQLite database for Android project in kotlin

                    Darth Vader #20

                    Ondo