MongoDB 3.0: Testing Compression

Back in November, with the newly minted RC0 version of MongoDB 3.0 (then 2.8 of course), I ran a series of tests to check out how well compression using the new WiredTiger storage engine worked with a ~16GB data sample.  Compression on disk worked fine, but the follow up tests to show the benefits of using compression when limited IO is available ran into issues and I ended up having to run the tests again with an updated version (and the appropriate bug fix).

I’ve decided to revisit the compression testing 3 months on – we are now on release candidate 7 and heading for a GA release in March.  There have been some new settings exposed to tweak compression as well as a whole host of bug fixes and improvements.  I will start with just the straight compression testing and leave the restricted IO piece for later.

My testing was done as follows:

  1. Create a ~16GB data set using this gist (on MMAPv1, but engine does not really matter)
  2. Dump out the data using mongodump
  3. Restore that same data into WiredTiger backed MongoDB with various compression options
  4. Create a pretty graph to illustrate/compare the resulting disk usage

The config files used for the processes can be found here and all testing was done on an Ubuntu 14.10 host with an ext4 filesystem.  The only real difference between the tests this time and last time is the use of mongodump/restore to ensure that all data is the same (previously I had regenerated data each time, so there was probably some variance).

The graph speaks for itself, so here it is:


On-Disk Usage with various storage engines and compression options

On-Disk Usage with various storage engines and compression options

I think that it nicely illustrates the savings on-disk that compression in WiredTiger offers – the savings versus a non-compressed WiredTiger configuration are impressive enough – taking up just 24% of the space, but when compared to the MMAPv1 they are huge, taking up just 16% of the space.  It also shows that the defaults for WiredTiger with MongoDB (represented by the WT/Snappy bar) are a good middle ground at just over 50% of the non-compressed WT total and 34.7% of the MMAPv1 usage.

It should be noted that the trade off for using zlib rather than snappy is significant, even with just a mongorestore happening.  The insert rate with snappy was similar to that with no compression and hovered between 65,000/sec and 67000/sec.  With zlib being used rather than snappy, that dropped to between 22,000/sec and 25,000/sec.  Now, insert performance was not the target for this testing, so I only mention it as an aside – the first thing I would do to see a proper comparison would be to increase the number of collections being restored in parallel to 4, rather than 1, and then looks to see what else I could tweak to improve things.

Hence, if anyone out there is concerned about disk usage with MongoDB, then I would recommend starting to test 3.0 with WiredTiger as soon as possible and see if it suits your use case.

Update: I re-ran the read based tests on limited IO also, see this post for details.