Workaround for python MemoryError
How can I change this function to make it more efficient? I keep getting MemoryError
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I call the function here:
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Train and Test data are IMDB dataset for sentiment analysis, i.e.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.
Here is the Traceback:
Traceback (most recent call last):
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError
python keras sentiment-analysis
add a comment |
How can I change this function to make it more efficient? I keep getting MemoryError
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I call the function here:
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Train and Test data are IMDB dataset for sentiment analysis, i.e.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.
Here is the Traceback:
Traceback (most recent call last):
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError
python keras sentiment-analysis
1
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
1
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00
add a comment |
How can I change this function to make it more efficient? I keep getting MemoryError
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I call the function here:
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Train and Test data are IMDB dataset for sentiment analysis, i.e.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.
Here is the Traceback:
Traceback (most recent call last):
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError
python keras sentiment-analysis
How can I change this function to make it more efficient? I keep getting MemoryError
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I call the function here:
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Train and Test data are IMDB dataset for sentiment analysis, i.e.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.
Here is the Traceback:
Traceback (most recent call last):
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError
python keras sentiment-analysis
python keras sentiment-analysis
edited Nov 11 at 14:59
asked Nov 11 at 14:20
BlueMango
226
226
1
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
1
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00
add a comment |
1
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
1
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00
1
1
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
1
1
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00
add a comment |
1 Answer
1
active
oldest
votes
Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.
If you use float32 it will cut the memory usage in half:
np.zeros((len(sequences), dimension), dtype=np.float32)
Or if you only care about 0 and 1, this will cut it by 88%:
np.zeros((len(sequences), dimension), dtype=np.int8)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.
If you use float32 it will cut the memory usage in half:
np.zeros((len(sequences), dimension), dtype=np.float32)
Or if you only care about 0 and 1, this will cut it by 88%:
np.zeros((len(sequences), dimension), dtype=np.int8)
add a comment |
Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.
If you use float32 it will cut the memory usage in half:
np.zeros((len(sequences), dimension), dtype=np.float32)
Or if you only care about 0 and 1, this will cut it by 88%:
np.zeros((len(sequences), dimension), dtype=np.int8)
add a comment |
Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.
If you use float32 it will cut the memory usage in half:
np.zeros((len(sequences), dimension), dtype=np.float32)
Or if you only care about 0 and 1, this will cut it by 88%:
np.zeros((len(sequences), dimension), dtype=np.int8)
Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.
If you use float32 it will cut the memory usage in half:
np.zeros((len(sequences), dimension), dtype=np.float32)
Or if you only care about 0 and 1, this will cut it by 88%:
np.zeros((len(sequences), dimension), dtype=np.int8)
answered Nov 12 at 4:34
John Zwinck
150k16175287
150k16175287
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27
1
Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54
@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00