Is there a default batch size that is used by MongoDB in the Bulk API?

I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.

Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?

Please help me understand.

Thanks.

asked Nov 14 '18 at 9:10

Akshay

Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22

Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40

It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46

Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00

add a comment |

Please help me understand.

Thanks.

asked Nov 14 '18 at 9:10

Akshay

Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22

Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40

It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46

Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00

add a comment |

Please help me understand.

Thanks.

asked Nov 14 '18 at 9:10

Akshay

Please help me understand.

Thanks.

mongodb bulkinsert

asked Nov 14 '18 at 9:10

Akshay

asked Nov 14 '18 at 9:10

Akshay

asked Nov 14 '18 at 9:10

Akshay

asked Nov 14 '18 at 9:10

Akshay

asked Nov 14 '18 at 9:10

Akshay

Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22

Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40

It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46

Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00

add a comment |

Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22

Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40

It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46

Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00

Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".

– Neil Lunn
Nov 14 '18 at 9:22

Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?

– Akshay
Nov 14 '18 at 9:40

It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.

– Neil Lunn
Nov 14 '18 at 9:46

Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.

– Akshay
Nov 14 '18 at 10:00

add a comment |

1 Answer
1

active

oldest

votes

Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).

answered Nov 14 '18 at 13:03

Sandeep

314110

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296516%2fis-there-a-default-batch-size-that-is-used-by-mongodb-in-the-bulk-api%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Nov 14 '18 at 13:03

Sandeep

314110

add a comment |

answered Nov 14 '18 at 13:03

Sandeep

314110

add a comment |

answered Nov 14 '18 at 13:03

Sandeep

314110

answered Nov 14 '18 at 13:03

Sandeep

314110

answered Nov 14 '18 at 13:03

Sandeep

314110

answered Nov 14 '18 at 13:03

Sandeep

314110

answered Nov 14 '18 at 13:03

Sandeep

314110

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb