I re-ran my compression testing from November to show the on-disk savings that compression in WiredTiger allows in my last post. The other part of that testing was to see if we could show a benefit to having compressed data from a read perspective. As I explained back in November, one of the scenarios where compression can help is when there is only limited IO capacity available.
Once again, to simulate that, I used a relatively slow external USB 3.0 drive to simulate reasonable IO constraints and then recorded how long it took to read my entire data set from the on-disk testing into memory from that disk. The results show that benefit, though the results in WiredTiger are somewhat hampered by the issue described in SERVER-16444. I may attempt to re-run these tests with a more random, index driven read load – as well as being more representative of a normal workload, it should also show greater benefits by avoiding the table scan related overheads.
In any case, as you can see in the graph below, the benefits are clear with a 23.8% and 34.6% reduction in the time taken to read in the compressed data using snappy and zlib respectively (over the non-compressed data):
Once again, I think this reinforces the choice of snappy as a good default for the WiredTiger Storage engine in MongoDB.