MongoDB 2.6 Shell Performance
Note: I have also written this up in Q&A format over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple for
loop generally suffices. Here is a basic example that inserts 100,000 docs:
for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};
Generally, I would just copy and paste that into the mongo shell, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`
db.timecheck.drop();
true
start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id” : i})}; end = new Date(); print(end - start);
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``Note: I have also written this up in Q&A format over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple for
loop generally suffices. Here is a basic example that inserts 100,000 docs:
for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};
Generally, I would just copy and paste that into the mongo shell, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`
db.timecheck.drop();
true
start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id” : i})}; end = new Date(); print(end - start);
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a JavaScript snippet. This time we’ll just use the time command to measure:
2.4 shell:
<br /> $ time mongo ~/mongo/insert100k.js --port 31100<br /> MongoDB shell version: 2.4.6<br /> connecting to: 127.0.0.1:31100/test
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
$ time ./mongo ~/mongo/insert100k.js --port 31100<br /> MongoDB shell version: 2.6.0-rc2<br /> connecting to: 127.0.0.1:31100/test
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(UPDATE: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using getLastError) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the default for some time now, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new unordered bulk insert API:
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
Success! And essentially the same performance at just over 2 seconds. Sure, it’s a little more bulky (pardon the pun), but you know exactly what you are getting, which I think is a good thing in general. There is also an upside here, when you are not looking for timing information. Let’s get rid of that and run the insert again:
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```
Success! And essentially the same performance at just over 2 seconds. Sure, it’s a little more bulky (pardon the pun), but you know exactly what you are getting, which I think is a good thing in general. There is also an upside here, when you are not looking for timing information. Let’s get rid of that and run the insert again:
Now we get a nice result document when we do the bulk insert. Because it is an unordered bulk operation, it will continue should it encounter an error and report on each one in this document. There are none to be seen here, but it’s easy to create a failure scenario, let’s just pre-insert a value we know will come up and hence cause a duplicate key error on the (default) unique _id index:
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```
Success! And essentially the same performance at just over 2 seconds. Sure, it’s a little more bulky (pardon the pun), but you know exactly what you are getting, which I think is a good thing in general. There is also an upside here, when you are not looking for timing information. Let’s get rid of that and run the insert again:
````**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``**Note:** I have also written this up <a href="http://stackoverflow.com/q/22719211/1148648" target="_blank">in Q&A format</a> over on StackOverflow for visibility.
When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple `for` loop generally suffices. Here is a basic example that inserts 100,000 docs:
`for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})};`
Generally, I would just copy and paste that into the <a href="http://docs.mongodb.org/manual/reference/program/mongo/" target="_blank">mongo shell</a>, and then go about using the data. With 2.4 and below, this is pretty fast. To test, I’ve simplified even more and kept it to a single field (_id) and added some very basic timing. Here’s the result with the 2.4 shell:
`<br />
> db.timecheck.drop();<br />
true<br />
> start = new Date(); for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i})}; end = new Date(); print(end - start);<br />
2246`
A little over 2 seconds to insert 100,000 documents, not bad. Now, let’s try the same thing with the 2.6.0-rc2 shell:
``
Oh dear – over 37 seconds to insert the same number of documents, that’s more than 15x slower! You might be tempted to despair and think 2.6 performance is terrible, but in fact this is just a behavioral change in the shell (I will explain that shortly). Just to make it clear that it’s not something weird caused by running things in a single line in the shell, let’s pass the same code in as a <a href="https://gist.github.com/comerford/9834062" target="_blank">JavaScript snippet</a>. This time we’ll just use the <a href="http://en.wikipedia.org/wiki/Time_%28Unix%29" target="_blank">time command</a> to measure:
2.4 shell:
`<br />
$ time mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.4.6<br />
connecting to: 127.0.0.1:31100/test`
real 0m2.253s
user 0m0.942s
sys 0m0.432s
2.6 shell:
`$ time ./mongo ~/mongo/insert100k.js --port 31100<br />
MongoDB shell version: 2.6.0-rc2<br />
connecting to: 127.0.0.1:31100/test`
real 0m34.691s
user 0m22.203s
sys 0m2.272s
So, no real change, things are pretty slow with a 2.6 shell. It should be noted that I ran both against a 2.6 mongod, only the shells I am using are different. So, of course, you can work around it by using the 2.4 shell to connect to 2.6 but that is not exactly future proof.
(**UPDATE**: if anyone saw my original post, I had screwed up and run a 2.4 shell thanks to a PATH mix up, there is no difference between passing in the file and an interactive loop).
To explain: before 2.4 the interactive shell would run through the loop and only check the success (using <a href="http://docs.mongodb.org/manual/reference/command/getLastError/" target="_blank">getLastError</a>) of the last operation in the loop (more specifically, it called getLastError after each carriage return, with the last operation being the last insert in the loop). With 2.6, the shell will now check on the status of each individual operation within the loop. Essentially that means that the “slowness” with 2.6 can be attributed to acknowledged versus unacknowledged write performance rather than an actual issue.
Acknowledged writes have been the <a href="http://docs.mongodb.org/manual/release-notes/drivers-write-concern/#default-write-concern-change" target="_blank">default for some time now</a>, and so I think the behavior in the 2.6 is more correct, though a little inconvenient for those of us used to the original behavior. We have a workaround with 2.4 but ideally we want to use the latest shell with the latest server, so the question remains, how do I do a simple bulk insert from the 2.6 shell quickly if I truly don’t care about failures?
The answer is to use the new <a href="http://docs.mongodb.org/master/reference/method/db.collection.initializeUnorderedBulkOp/#db.collection.initializeUnorderedBulkOp" target="_blank">unordered bulk insert API</a>:
```
Success! And essentially the same performance at just over 2 seconds. Sure, it’s a little more bulky (pardon the pun), but you know exactly what you are getting, which I think is a good thing in general. There is also an upside here, when you are not looking for timing information. Let’s get rid of that and run the insert again:
````
Now we get a nice result document when we do the bulk insert. Because it is an unordered bulk operation, it will continue should it encounter an error and report on each one in this document. There are none to be seen here, but it’s easy to create a failure scenario, let’s just pre-insert a value we know will come up and hence cause a duplicate key error on the (default) unique _id index:
Now we can see how many were successful, which one failed (and why). It may be a little more complicated to set up, but overall I think we can call this a win.