Finding the ReadAhead Sweetspot on an EBS Volume

Setting readahead (RA from now on) appropriately is a contentious subject.  There are a lot of variables involved, but in my particular case I am setting out to minimize those variables, get a baseline, and have a reasonable idea of what to expect out of this configuration:

  • Environment: Amazon EC2
  • Instance Size: m3.xlarge (4 vCPU, 15GiB RAM)
  • Disk Config: Single EBS Volume, 1000 PIOPS

The testing I am going to be doing is pretty detailed, and intended for use in a future whitepaper, so I wanted to get some prep done and figure out exactly what I was dealing with here before I moved forward.  The initial testing (which is somewhat unusual for MongoDB) involves a lot of sequential IO.  Normally, I am tweaking an instance for random IO and optimizing for memory utilization efficiency – a very different beast which generally means low RA settings.  For this testing, I figured I would start with my usual config (and the one I was using on a beefy local server) and do some tweaking to see what the impact was.

I was surprised to find a huge cliff in terms of operations per second hitting the volume when I dropped RA to 16.  I expected the larger readahead settings to help up to a certain point because of the sequential IO (probably up to the point that I saturate the bandwidth to the EBS volume or similar).  But I did not expect the “cliff” between RA settings of 32, and RA settings of 16.

To elaborate: one of the things I was keeping an eye on was the page faulting rate within MongoDB.  MongoDB only reports “hard” page faults, where the data is actually fetched off the disk.  Since I was wiping out the system caches between caching runs, all of the data I was reading had to come from the disk, so the fault rate should be pretty predictable, and IO was going to be my limiting factor.

With the RA settings at 32, my tests were taking longer than 64, 64 took longer than 128 etc. until the results for 256, 512 were close enough to make no difference and RA was not really a factor anymore.  At 32, the faulting rate was relatively normal – somewhere around 20 faults/sec at peak and well within the capacity of the PIOPS volume to satisfy, this was a little higher than the 64 RA fault rate which ran at ~15 faults/sec.  I was basically just keeping an eye on it, it did not seem to be playing too big a part.

With an RA of 16 though, things slowed down dramatically.  The faults spike to over 1000 faults/sec and stay there.  That’s a ~50x increase over the 32 RA setting and basically is pegging the max PIOPS I have on that volume.  Needless to say the test takes a **lot** longer to run with the IO pegged.  To show this graphically, here are the run completion times with the differing RA settings (click for larger view):

mongodump test runs, using various readahead settings

TL;DR I will be using RA settings of 128 for this testing, and will be very careful before dropping RA below 32 on EBS volumes in EC2 in future.

Update: A bit of digging revealed that the default size of an IO request on provisioned IOPS instances is 16K, which would mean that setting RA to 32 matches this well, whereas dropping below it by 50% is essentially a bad mismatch.  Here is the relevant paragraph (from the RDS documentation):

Provisioned IOPS works with an IO request size of 16KB. An IO request smaller than 16KB is handled as one IO; for example, 1000 8KB IO requests are treated the same as 1000 16KB requests.  IO requests larfer than 16KB consume more than one IO request; Provisioned IOPS consumption is a linear function of IO request size above 16KB. For example, a 24KB IO request consumes 1.5 OP requests of storage capacity; a 32KB IO request consumes 2 IO requests etc.

Hence, once you drop below RA of 32 (and you are doing small IO requests like a 4k page for example), every request is “wasting” the difference and consuming a full IOP – definitely something to bear in mind when running any memory mapped application on EC2.