Workaround for python MemoryError










2














How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question



















  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00















2














How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question



















  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00













2












2








2







How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question















How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError






python keras sentiment-analysis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 14:59

























asked Nov 11 at 14:20









BlueMango

226




226







  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00












  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00







1




1




Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27





Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27





1




1




Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54




Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54












@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00




@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00












1 Answer
1






active

oldest

votes


















1














Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



If you use float32 it will cut the memory usage in half:



np.zeros((len(sequences), dimension), dtype=np.float32)


Or if you only care about 0 and 1, this will cut it by 88%:



np.zeros((len(sequences), dimension), dtype=np.int8)





share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



    If you use float32 it will cut the memory usage in half:



    np.zeros((len(sequences), dimension), dtype=np.float32)


    Or if you only care about 0 and 1, this will cut it by 88%:



    np.zeros((len(sequences), dimension), dtype=np.int8)





    share|improve this answer

























      1














      Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



      If you use float32 it will cut the memory usage in half:



      np.zeros((len(sequences), dimension), dtype=np.float32)


      Or if you only care about 0 and 1, this will cut it by 88%:



      np.zeros((len(sequences), dimension), dtype=np.int8)





      share|improve this answer























        1












        1








        1






        Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



        If you use float32 it will cut the memory usage in half:



        np.zeros((len(sequences), dimension), dtype=np.float32)


        Or if you only care about 0 and 1, this will cut it by 88%:



        np.zeros((len(sequences), dimension), dtype=np.int8)





        share|improve this answer












        Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



        If you use float32 it will cut the memory usage in half:



        np.zeros((len(sequences), dimension), dtype=np.float32)


        Or if you only care about 0 and 1, this will cut it by 88%:



        np.zeros((len(sequences), dimension), dtype=np.int8)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 at 4:34









        John Zwinck

        150k16175287




        150k16175287



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Darth Vader #20

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Ondo