MongoDB 2.8: Improving WiredTiger Performance

My (rather popular) first post on this topic explained the benefits of compression (which comes as the default option with the new WiredTiger storage engine) for systems with lesser IO capabilities.  The intent was to first show that the new storage engine saved space on disk and then to show that this could be translated into a gain in terms of performance when reading that data (slowly) off disk.

The first part of that story worked out pretty well, the data was nicely compressed on disk and it was easy to show it in the graph.  The second part of that story did not work out as expected, the graph was a little off from expectations and my initial speculation that it was a non-optimal access pattern didn’t pan out.  In fact, I determined that the slowness I was seeing was independent of IO and was due to how slow the in-memory piece was when using WiredTiger to do a table scan.  Needless to say, I started to talk to engineering about the issue and tried tweaking various options – each one essentially reinforced the original finding.

It was soon obvious that we had a bug that needed to be addressed (one that was still present in the first release candidate 2.8.0-rc0). I gathered the relevant data and opened SERVER-16150 to investigate the problem. Thanks to the ever excellent folks in MongoDB engineering (this one in particular), we soon had the first patched build attempting to address the issue (more, with graphs after the jump).  Before that, for anyone looking to reproduce this testing, I would recommend waiting until SERVER-16150 has been closed and integrated into the next release candidate (2.8.0-rc1), you won’t see the same results from 2.8.0-rc0 (it will instead look like the first set of results).

Finding the ReadAhead Sweetspot on an EBS Volume

Setting readahead (RA from now on) appropriately is a contentious subject. There are a lot of variables involved, but in my particular case I am setting out to minimize those variables, get a baseline, and have a reasonable idea of what to expect out of this configuration: Environment: Amazon EC2 Instance Size: m3.xlarge (4 vCPU, 15GiB RAM) Disk Config: Single EBS Volume, 1000 PIOPS The testing I am going to be doing is pretty detailed, and intended for use in a future whitepaper, so I wanted to get some prep done and figure out exactly what I was dealing with here before I moved forward.

MongoDB 2.6 Shell Performance

Note: I have also written this up in Q&A format over on StackOverflow for visibility. When I am testing MongoDB, I often need to insert a bunch of data quickly into a collection so I can manipulate it, check performance, try out different indexes etc. There’s nothing particularly complex about this data usually, so a simple for loop generally suffices. Here is a basic example that inserts 100,000 docs: for(var i = 0; i < 100000; i++){db.timecheck.insert({"_id" : i, "date" : new Date(), "otherID" : new ObjectId()})}; Generally, I would just copy and paste that into the mongo shell, and then go about using the data.