Limitations to using LocalCluster? Crashing persisting 50GB of data to 90GB of memory










0















System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram



So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or restarting workers also I don't see the memory usage on the web dashboard increasing or tasks being executed.



Should this work or am I missing something obvious here?



import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)









share|improve this question

















  • 1





    50GB is the size on-disc?

    – mdurant
    Nov 13 '18 at 19:17











  • Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

    – user32185
    Nov 13 '18 at 20:14






  • 1





    I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

    – dead_zero
    Nov 14 '18 at 11:52












  • @dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

    – user32185
    Nov 14 '18 at 14:40











  • Ok I'm going to mark this as answered

    – dead_zero
    Nov 14 '18 at 16:42















0















System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram



So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or restarting workers also I don't see the memory usage on the web dashboard increasing or tasks being executed.



Should this work or am I missing something obvious here?



import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)









share|improve this question

















  • 1





    50GB is the size on-disc?

    – mdurant
    Nov 13 '18 at 19:17











  • Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

    – user32185
    Nov 13 '18 at 20:14






  • 1





    I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

    – dead_zero
    Nov 14 '18 at 11:52












  • @dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

    – user32185
    Nov 14 '18 at 14:40











  • Ok I'm going to mark this as answered

    – dead_zero
    Nov 14 '18 at 16:42













0












0








0








System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram



So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or restarting workers also I don't see the memory usage on the web dashboard increasing or tasks being executed.



Should this work or am I missing something obvious here?



import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)









share|improve this question














System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram



So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or restarting workers also I don't see the memory usage on the web dashboard increasing or tasks being executed.



Should this work or am I missing something obvious here?



import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)






dask dask-distributed






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 17:06









dead_zerodead_zero

1015




1015







  • 1





    50GB is the size on-disc?

    – mdurant
    Nov 13 '18 at 19:17











  • Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

    – user32185
    Nov 13 '18 at 20:14






  • 1





    I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

    – dead_zero
    Nov 14 '18 at 11:52












  • @dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

    – user32185
    Nov 14 '18 at 14:40











  • Ok I'm going to mark this as answered

    – dead_zero
    Nov 14 '18 at 16:42












  • 1





    50GB is the size on-disc?

    – mdurant
    Nov 13 '18 at 19:17











  • Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

    – user32185
    Nov 13 '18 at 20:14






  • 1





    I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

    – dead_zero
    Nov 14 '18 at 11:52












  • @dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

    – user32185
    Nov 14 '18 at 14:40











  • Ok I'm going to mark this as answered

    – dead_zero
    Nov 14 '18 at 16:42







1




1





50GB is the size on-disc?

– mdurant
Nov 13 '18 at 19:17





50GB is the size on-disc?

– mdurant
Nov 13 '18 at 19:17













Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

– user32185
Nov 13 '18 at 20:14





Have you tried to load a single chunk and calculate (using x.nbytes) the memory is using?

– user32185
Nov 13 '18 at 20:14




1




1





I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

– dead_zero
Nov 14 '18 at 11:52






I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker.

– dead_zero
Nov 14 '18 at 11:52














@dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

– user32185
Nov 14 '18 at 14:40





@dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of map_partitions for dask.dataframe or use a distributed loop.

– user32185
Nov 14 '18 at 14:40













Ok I'm going to mark this as answered

– dead_zero
Nov 14 '18 at 16:42





Ok I'm going to mark this as answered

– dead_zero
Nov 14 '18 at 16:42












1 Answer
1






active

oldest

votes


















0














This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.



lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted





share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286174%2flimitations-to-using-localcluster-crashing-persisting-50gb-of-data-to-90gb-of-m%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.



    lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted





    share|improve this answer



























      0














      This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.



      lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted





      share|improve this answer

























        0












        0








        0







        This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.



        lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted





        share|improve this answer













        This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.



        lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 16:45









        dead_zerodead_zero

        1015




        1015





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286174%2flimitations-to-using-localcluster-crashing-persisting-50gb-of-data-to-90gb-of-m%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo