MongoDB 3.0: Hitting 1.3 Million Inserts/Sec with mongorestore

This will likely be my last post about MongoDB 3.0 performance, as I am leaving MongoDB (after 3 fantastic years) to join Riot Games – more about that in later posts. Before I go, some testing I was doing to verify WiredTiger performance with the journal disabled led to some eye-catching numbers. While performing a restore using 3.0 mongorestore to a 3.0 mongod with 4 collections, I noticed I was pushing close to 300,000 inserts/sec.

MongoDB 3.0: Benefits of Compression with Limited IO

I re-ran my compression testing from November to show the on-disk savings that compression in WiredTiger allows in my last post. The other part of that testing was to see if we could show a benefit to having compressed data from a read perspective. As I explained back in November, one of the scenarios where compression can help is when there is only limited IO capacity available. Once again, to simulate that, I used a relatively slow external USB 3.0 drive to simulate reasonable IO constraints and then recorded how long it took to read my entire data set from the on-disk testing into memory from that disk.

MongoDB 3.0: Testing Compression

Back in November, with the newly minted RC0 version of MongoDB 3.0 (then 2.8 of course), I ran a series of tests to check out how well compression using the new WiredTiger storage engine worked with a ~16GB data sample. Compression on disk worked fine, but the follow up tests to show the benefits of using compression when limited IO is available ran into issues and I ended up having to run the tests again with an updated version (and the appropriate bug fix).

MongoDB 2.8: Improving WiredTiger Performance

My (rather popular) first post on this topic explained the benefits of compression (which comes as the default option with the new WiredTiger storage engine) for systems with lesser IO capabilities.  The intent was to first show that the new storage engine saved space on disk and then to show that this could be translated into a gain in terms of performance when reading that data (slowly) off disk.

The first part of that story worked out pretty well, the data was nicely compressed on disk and it was easy to show it in the graph.  The second part of that story did not work out as expected, the graph was a little off from expectations and my initial speculation that it was a non-optimal access pattern didn’t pan out.  In fact, I determined that the slowness I was seeing was independent of IO and was due to how slow the in-memory piece was when using WiredTiger to do a table scan.  Needless to say, I started to talk to engineering about the issue and tried tweaking various options – each one essentially reinforced the original finding.

It was soon obvious that we had a bug that needed to be addressed (one that was still present in the first release candidate 2.8.0-rc0). I gathered the relevant data and opened SERVER-16150 to investigate the problem. Thanks to the ever excellent folks in MongoDB engineering (this one in particular), we soon had the first patched build attempting to address the issue (more, with graphs after the jump).  Before that, for anyone looking to reproduce this testing, I would recommend waiting until SERVER-16150 has been closed and integrated into the next release candidate (2.8.0-rc1), you won’t see the same results from 2.8.0-rc0 (it will instead look like the first set of results).

MongoDB 2.8 – New WiredTiger Storage Engine Adds Compression

CAVEAT: This post deals with a development version of MongoDB and represents very early testing. The version used was not even a release candidate – 2.7.9-pre to be specific, this is not even a release candidate.  Therefore any and all details may change significantly before the release of 2.8, so please be aware that nothing is yet finalized and as always, read the [release notes][1] once 2.8.0 ships.

Update (Nov 17th, 2014): Good news! I have re-tested with a patched version of 2.8.0-rc0 and the results are very encouraging compared to figure 2 below.  For full details (including an updated graph), see MongoDB 2.8: Improving WiredTiger Performance

Anyone that follows the keynotes from recent MongoDB events will know that we have demonstrated the concurrency performance improvements coming in version 2.8 several times now.  This is certainly the headline performance improvement for MongoDB 2.8, with concurrency constraints in prior versions leading to complex database/collection layouts, complex deployments and more to work around the per-database locking limitations.

However, the introduction of the new WiredTiger storage engine that was announced at MongoDB London also adds another capability with a performance component that has long been requested: [compression][2]

Eliot also gave a talk about the new storage engines at MongoDB London last week after announcing the availability of WiredTiger in the keynote.  Prior to that we were talking about what would be a good way to structure that talk and I suggested showing the effects and benefits of compression. Unfortunately there wasn’t enough time to put something meaningful together on the day, but the idea stuck with me and I have put that information together for this blog post instead.

It’s not a short post, and it has graphs, so I’ve put the rest after the jump.

Finding the ReadAhead Sweetspot on an EBS Volume

Setting readahead (RA from now on) appropriately is a contentious subject. There are a lot of variables involved, but in my particular case I am setting out to minimize those variables, get a baseline, and have a reasonable idea of what to expect out of this configuration: Environment: Amazon EC2 Instance Size: m3.xlarge (4 vCPU, 15GiB RAM) Disk Config: Single EBS Volume, 1000 PIOPS The testing I am going to be doing is pretty detailed, and intended for use in a future whitepaper, so I wanted to get some prep done and figure out exactly what I was dealing with here before I moved forward.

MongoDB 2.6 Shell Performance

Note: I have also written this up in Q&A format over on StackOverflow for visibility. When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple for loop generally suffices. Here is a basic example that inserts 100,000 docs: for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})}; Generally, I would just copy and paste that into the mongo shell, and then go about using the data.

Testing MongoDB Over the WAN: Part 1

This will be my first multi-part blog post, and I am actually not sure just how many parts it will have by the time I am finished.  My original intent was to test some failure scenarios whereby I would emulate a WAN link disappearing.  That quickly expanded into a more ambitious test, I still wanted to test failure scenarios but I also wanted to test with some real world (ish) values for latency, packet loss, jitter etc.

In particular, I wanted to see how a sharded MongoDB cluster would behave with this type of variable performance and what I could do to make things better (I have some interesting ideas there), as well as test some improvements in the 2.6 version.  I’ve created configurations like this in labs with hardware (expensive), XenServer (paid) but I wanted something others could reproduce, reuse and preferably easily and at no cost.  Hence I decided to see if I could make this work with VirtualBox (I also plan to come up with something similar for Docker/Containers having read this excellent summary, but that is for later).

My immediate thought was to use Traffic Control but I had a vague recollection of having used a nice utility in the past that gave me a nice (basic) web interface for configuring various options, and was fairly easy to set up.  A bit of quick Googling got me to WANem and this was indeed what I had used in the past.  I recalled the major drawback at the time was that after booting it, we needed to reconfigure each time because it was a live CD.  Hence the first task was to fix that and get it to the point that it was a permanent VM (note: there is a pre-built VMWare appliance available for those on that platform).

That was reasonably straight forward, and I wrote up the process over at SuperUser as a Q&A:

Once that was done, it was time to configure the interfaces, make routing work and test that the latency and packet loss settings actually worked (continues after the jump).

More MongoDB Content – On StackExchange

As mentioned previously, I like getting my little gamification rewards and I have been meaning to add new content here for quite some time. In order to kill two birds with one stone, I took a couple of my ideas and turned them into the Q&A format that is encouraged on StackOverflow and the DBA StackExchange site. Hence we now have these two new questions (and answers): How to Programmatically Pre-Split a GUID Based Shard Key with MongoDB How to Determine Chunk Distribution (data and number of docs) in a Sharded MongoDB Cluster?

MongoDB Melbourne and Sydney

I recently presented at MongoDB Melbourne and MongoDB Sydney and the slides have now been made available on the 10gen website: These were not recorded, but at least the slides are now up for reference, which several people had asked about. The Amazon whitepaper should be updated shortly too, but still contains good information for reference purposes, even if it is lacking some of the newer features/options. Interestingly, my

Simulating Rollback on MongoDB

The Rollback state of MongoDB can cause a lot of confusion when it occurs.  While there are documents and posts that describe how to deal with the state, there is nothing that explains step by step how you might end up that way.  I often find that figuring out how to simulate the event makes it a lot clearer what is going on, while at the same time allowing people to figure out how they should deal with the situation when it arises.  Here is the basic outline:

  1. Create a replica set, load up some data, verify it is working as expected, start to insert data
  2. Break replication (artificially) on the set so that the secondary falls behind
  3. Stop the primary while the secondary is still lagging behind
  4. This will cause the lagging secondary to be promoted
I’ll go through each step, including diagrams to explain what is going on after the jump.

Gamification – A New Twist on the Carrot and Stick

I recently realised I had become a bit obsessed with increasing my StackOverflow reputation – hardly a surprise really given how close it is to the gaming concept of XP and leveling, which has caused large parts of my life to disappear into a black hole (Civ and WoW, I am looking at you). I have been helping out on SO, mainly on the MongoDB side of things there since I joined 10gen and it has been a great learning experience as well as a great tool to use, the self moderation means that the questions (and I hope, my answers) are generally well formed and potentially useful to others.