Is there a default batch size that is used by MongoDB in the Bulk API?










0















I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.



Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?



Please help me understand.



Thanks.










share|improve this question






















  • Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

    – Neil Lunn
    Nov 14 '18 at 9:22











  • Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

    – Akshay
    Nov 14 '18 at 9:40











  • It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

    – Neil Lunn
    Nov 14 '18 at 9:46











  • Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

    – Akshay
    Nov 14 '18 at 10:00















0















I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.



Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?



Please help me understand.



Thanks.










share|improve this question






















  • Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

    – Neil Lunn
    Nov 14 '18 at 9:22











  • Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

    – Akshay
    Nov 14 '18 at 9:40











  • It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

    – Neil Lunn
    Nov 14 '18 at 9:46











  • Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

    – Akshay
    Nov 14 '18 at 10:00













0












0








0








I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.



Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?



Please help me understand.



Thanks.










share|improve this question














I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.



Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?



Please help me understand.



Thanks.







mongodb bulkinsert






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 9:10









AkshayAkshay

82




82












  • Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

    – Neil Lunn
    Nov 14 '18 at 9:22











  • Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

    – Akshay
    Nov 14 '18 at 9:40











  • It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

    – Neil Lunn
    Nov 14 '18 at 9:46











  • Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

    – Akshay
    Nov 14 '18 at 10:00

















  • Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

    – Neil Lunn
    Nov 14 '18 at 9:22











  • Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

    – Akshay
    Nov 14 '18 at 9:40











  • It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

    – Neil Lunn
    Nov 14 '18 at 9:46











  • Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

    – Akshay
    Nov 14 '18 at 10:00
















Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22





Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22













Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40





Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40













It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46





It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46













Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00





Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00












1 Answer
1






active

oldest

votes


















0














Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296516%2fis-there-a-default-batch-size-that-is-used-by-mongodb-in-the-bulk-api%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
    Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
    You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).






    share|improve this answer



























      0














      Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
      Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
      You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).






      share|improve this answer

























        0












        0








        0







        Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
        Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
        You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).






        share|improve this answer













        Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
        Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
        You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 13:03









        SandeepSandeep

        314110




        314110





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296516%2fis-there-a-default-batch-size-that-is-used-by-mongodb-in-the-bulk-api%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo