PyTorch next(iter(training_loader)) extremely slow, simple data, can't num

PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers?

Here x_dat and y_dat are just really long 1-dimensional tensors.

class FunctionDataset(Dataset):
 def __init__(self):
 x_dat, y_dat = data_product()

 self.length = len(x_dat)
 self.y_dat = y_dat
 self.x_dat = x_dat

 def __getitem__(self, index):
 sample = self.x_dat[index]
 label = self.y_dat[index]
 return sample, label

 def __len__(self):
 return self.length

...

data_set = FunctionDataset()

...

training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)

training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)

I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:

x_val, target = next(iter(training_loader))

The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:

16276989 function calls (16254744 primitive calls) in 38.779 seconds

 Ordered by: cumulative time

 ncalls tottime percall cumtime percall filename:lineno(function)
 1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
 1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
 1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
 1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
 1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
 705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
 222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
 222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
 3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
 21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
 443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
 663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
 221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
 442 6.369 0.014 6.369 0.014 built-in method stack
 3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
 3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
 222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
 222 1.982 0.009 1.982 0.009 built-in method randperm
 663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
 221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....

Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

add a comment |

Here x_dat and y_dat are just really long 1-dimensional tensors.

class FunctionDataset(Dataset):
 def __init__(self):
 x_dat, y_dat = data_product()

 self.length = len(x_dat)
 self.y_dat = y_dat
 self.x_dat = x_dat

 def __getitem__(self, index):
 sample = self.x_dat[index]
 label = self.y_dat[index]
 return sample, label

 def __len__(self):
 return self.length

...

data_set = FunctionDataset()

...

training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)

training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)

I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:

x_val, target = next(iter(training_loader))

The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:

16276989 function calls (16254744 primitive calls) in 38.779 seconds

 Ordered by: cumulative time

 ncalls tottime percall cumtime percall filename:lineno(function)
 1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
 1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
 1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
 1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
 1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
 705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
 222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
 222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
 3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
 21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
 443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
 663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
 221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
 442 6.369 0.014 6.369 0.014 built-in method stack
 3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
 3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
 222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
 222 1.982 0.009 1.982 0.009 built-in method randperm
 663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
 221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

add a comment |

Here x_dat and y_dat are just really long 1-dimensional tensors.

class FunctionDataset(Dataset):
 def __init__(self):
 x_dat, y_dat = data_product()

 self.length = len(x_dat)
 self.y_dat = y_dat
 self.x_dat = x_dat

 def __getitem__(self, index):
 sample = self.x_dat[index]
 label = self.y_dat[index]
 return sample, label

 def __len__(self):
 return self.length

...

data_set = FunctionDataset()

...

training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)

training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)

I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:

x_val, target = next(iter(training_loader))

The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:

16276989 function calls (16254744 primitive calls) in 38.779 seconds

 Ordered by: cumulative time

 ncalls tottime percall cumtime percall filename:lineno(function)
 1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
 1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
 1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
 1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
 1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
 705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
 222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
 222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
 3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
 21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
 443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
 663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
 221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
 442 6.369 0.014 6.369 0.014 built-in method stack
 3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
 3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
 222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
 222 1.982 0.009 1.982 0.009 built-in method randperm
 663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
 221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

Here x_dat and y_dat are just really long 1-dimensional tensors.

class FunctionDataset(Dataset):
 def __init__(self):
 x_dat, y_dat = data_product()

 self.length = len(x_dat)
 self.y_dat = y_dat
 self.x_dat = x_dat

 def __getitem__(self, index):
 sample = self.x_dat[index]
 label = self.y_dat[index]
 return sample, label

 def __len__(self):
 return self.length

...

data_set = FunctionDataset()

...

training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)

training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)

I have also tried pinning the memory for the two loaders. Setting num_workers to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:

x_val, target = next(iter(training_loader))

The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:

16276989 function calls (16254744 primitive calls) in 38.779 seconds

 Ordered by: cumulative time

 ncalls tottime percall cumtime percall filename:lineno(function)
 1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
 1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
 1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
 1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
 1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
 705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
 222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
 222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
 3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
 21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
 443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
 663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
 221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
 442 6.369 0.014 6.369 0.014 built-in method stack
 3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
 3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
 222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
 222 1.982 0.009 1.982 0.009 built-in method randperm
 663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
 221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....

python performance machine-learning iterator pytorch

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

edited Nov 14 '18 at 4:47

Milo Lu

1,62711527

asked Nov 13 '18 at 12:24

ZirconCode

484621

asked Nov 13 '18 at 12:24

ZirconCode

484621

asked Nov 13 '18 at 12:24

ZirconCode

484621

add a comment |

1 Answer
1

active

oldest

votes

When retrieving a batch with

x, y = next(iter(training_loader))

you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

What you should do instead is create the iterator once (per epoch):

training_loader_iter = iter(training_loader)

and then call next for each batch on the iterator

for i in range(num_batches_in_epoch):
 x, y = next(training_loader_iter)

I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280967%2fpytorch-nextitertraining-loader-extremely-slow-simple-data-cant-num-worke%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

When retrieving a batch with

x, y = next(iter(training_loader))

you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

What you should do instead is create the iterator once (per epoch):

training_loader_iter = iter(training_loader)

and then call next for each batch on the iterator

for i in range(num_batches_in_epoch):
 x, y = next(training_loader_iter)

I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

add a comment |

When retrieving a batch with

x, y = next(iter(training_loader))

you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

What you should do instead is create the iterator once (per epoch):

training_loader_iter = iter(training_loader)

and then call next for each batch on the iterator

for i in range(num_batches_in_epoch):
 x, y = next(training_loader_iter)

I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

add a comment |

When retrieving a batch with

x, y = next(iter(training_loader))

you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

What you should do instead is create the iterator once (per epoch):

training_loader_iter = iter(training_loader)

and then call next for each batch on the iterator

for i in range(num_batches_in_epoch):
 x, y = next(training_loader_iter)

I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

When retrieving a batch with

x, y = next(iter(training_loader))

you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.

What you should do instead is create the iterator once (per epoch):

training_loader_iter = iter(training_loader)

and then call next for each batch on the iterator

for i in range(num_batches_in_epoch):
 x, y = next(training_loader_iter)

I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

answered Nov 14 '18 at 5:58

Shai

70.1k23137245

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb