Is there a default batch size that is used by MongoDB in the Bulk API?
I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.
Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?
Please help me understand.
Thanks.
mongodb bulkinsert
add a comment |
I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.
Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?
Please help me understand.
Thanks.
mongodb bulkinsert
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00
add a comment |
I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.
Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?
Please help me understand.
Thanks.
mongodb bulkinsert
I am new to MongoDB and I have this question because I read in the following link : "How can I improve MongoDB bulk performance?" that MongoDB internally breaks the list down into 1000 operations at a time. However, I could not find this information in the MongoDB documentation.
Does this mean that when I insert 50,000 documents in a MongoDB collection using the Bulk API, MongoDB will internally break the list into batches of 1,000 and performing the bulk insert operation 50 times? If yes, would I achieve the same performance if I break down the list of 50,000 documents into sublists of 1,000
documents and using the bulk insert operation in a for loop? Which is the better approach?
Please help me understand.
Thanks.
mongodb bulkinsert
mongodb bulkinsert
asked Nov 14 '18 at 9:10
AkshayAkshay
82
82
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00
add a comment |
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00
add a comment |
1 Answer
1
active
oldest
votes
Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296516%2fis-there-a-default-batch-size-that-is-used-by-mongodb-in-the-bulk-api%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).
add a comment |
Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).
add a comment |
Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).
Yes, if you insert 50,000 documents in a MongoDB collection using the Bulk API, mongo will break it down to 1000 operations at max.
Ideally, you should make the batches of 1000 yourself and do the inserts but in this case it is not going to make any difference because data is already there in memory.
You should not be accepting this huge amount of data in a single request in a production ready system. Client should be capable of sending small chunks of data so that you can store it in a queue on the server and process it in the background(some other thread).
answered Nov 14 '18 at 13:03
SandeepSandeep
314110
314110
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296516%2fis-there-a-default-batch-size-that-is-used-by-mongodb-in-the-bulk-api%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Kind of broad to answer really. There are existing answers here showing things like "bulk inserts". If it's one of "mine" for instance then I would certainly show breaking up the batches manually, and the main reason I do this is so you don't build an array of 50,000 items in memory. That's really the point more than anything is that your process loading data should not have read that much into memory. If you want "performance" it's usually better have "many worker processes" rather than "one big one".
– Neil Lunn
Nov 14 '18 at 9:22
Thanks for the reply Neil. So I interpret from your reply that MongoDB indeed breaks up the input list of 50,000 into lists of 1,000 before performing the Bulk insert operation. In that case, does it make sense for me to break the list of 50K documents to smaller lists of any size > 1K (say 2K, given that Mongo will again break this list into a max of 1K). I believe that this will just be an unnecessary overhead and I would be better off inserting all the 50K documents in a single go? Am I right?
– Akshay
Nov 14 '18 at 9:40
It makes sense to not load a huge array of data in memory that you intend to store in a remote database. That's the essence of the reasoning here. If I'm processing 10TB of data, I don't want my process that is loading that into a database to even attempt loading all that in memory at once, even if there was a possible machine capable of doing that. Storing 10TB in a "database" however is perfectly reasonable.
– Neil Lunn
Nov 14 '18 at 9:46
Sure, I understand that loading a huge amount of data in my memory isn't a good thing to do. However, in my particular use case, my application server can receive a request that would lead to the creation of a large number of objects and these objects need to be saved effectively in Mongo. This is the exact problem I am trying to solve. In this case, I would definitely have these objects in memory anyway. So before inserting these objects in the DB, I was wondering if I should break down the list into smaller chunks or insert it in one go.
– Akshay
Nov 14 '18 at 10:00