I was recently asked to come up with a solution for hosting some media files for a site. The requirements were pretty straight forward:
- Should be available via HTTP and HTTPS
- Minimal support requirements
- AWS Based
- Assets can be uploaded to the hosting with minimal effort/knowledge
- Site should be performant within Europe (in particular)
The rest of the post will detail why I chose the solution I did to satisfy those requirements and how I went about configuring it. It includes a generalised version of the configuration in Github and even a diagram to illustrate!
Something I have been meaning to do for a while is to switch away from Wordpress. While it is quite a nice collaborative tool and has some great options for WYSIWIG editing and content, it’s also far too big a hammer to use given my modest content and collaboration needs. I don’t have guest posters, I don’t get a lot of comments, and when I do get some interactions they would often work better in another medium (Twitter for example). I don’t have a lot of dynamic content and running Wordpress myself is not something I am particularly expert in.
I had been considering several options, but when I saw Steve’s work with Hugo I was sold (this was early days in the project and it was still pretty awesome). The big stumbling block was what to do with my old content and that stalled me out for a while. Around the same time as I contemplated the switch, I finally obtained the comerford.net domain - I had watched it for some time while someone sat on it, not using it, for years. I even tried making an offer for it (at what I considered a reasonable price) but got nothing, so I decided to wait for it to expire, which it eventually did.
This post took a while to write – when I started, it was my last day in MongoDB and finishing it I am over a week into my new role in Riot Games. It was a bittersweet moment – I was very excited to move on to my new adventure in Riot Games, but I was sad to leave a company that has been such a great place to work for the last three years. I decided to write this post partially as a record of what I enjoyed most, partly as a list of some of my accomplishments – memory fades and I have very much enjoyed reading similar posts from several years ago. I often wish I had written more in fact.
So, here it is, some random reflection on three great years, in no particular order. Due to the length, I’ll insert a jump, so click through if you would like to read on.
I was asked to do an “About Me” introduction for my new gig, and wasn’t sure how to mention the fact that I had recently become a parent for the first time. Given the fact that I now work for a games company, I decided to have a bit of fun with it and translate it into more gamer-friendly format. I think it turned out pretty well: I’ve just started playing a new game called Parenting, it’s in early beta, only at version 0.5 (first released in August 2014).
This will likely be my last post about MongoDB 3.0 performance, as I am leaving MongoDB (after 3 fantastic years) to join Riot Games – more about that in later posts. Before I go, some testing I was doing to verify WiredTiger performance with the journal disabled led to some eye-catching numbers. While performing a restore using 3.0 mongorestore to a 3.0 mongod with 4 collections, I noticed I was pushing close to 300,000 inserts/sec.
I re-ran my compression testing from November to show the on-disk savings that compression in WiredTiger allows in my last post. The other part of that testing was to see if we could show a benefit to having compressed data from a read perspective. As I explained back in November, one of the scenarios where compression can help is when there is only limited IO capacity available. Once again, to simulate that, I used a relatively slow external USB 3.0 drive to simulate reasonable IO constraints and then recorded how long it took to read my entire data set from the on-disk testing into memory from that disk.
Back in November, with the newly minted RC0 version of MongoDB 3.0 (then 2.8 of course), I ran a series of tests to check out how well compression using the new WiredTiger storage engine worked with a ~16GB data sample. Compression on disk worked fine, but the follow up tests to show the benefits of using compression when limited IO is available ran into issues and I ended up having to run the tests again with an updated version (and the appropriate bug fix).
My (rather popular) first post on this topic explained the benefits of compression (which comes as the default option with the new WiredTiger storage engine) for systems with lesser IO capabilities. The intent was to first show that the new storage engine saved space on disk and then to show that this could be translated into a gain in terms of performance when reading that data (slowly) off disk.
The first part of that story worked out pretty well, the data was nicely compressed on disk and it was easy to show it in the graph. The second part of that story did not work out as expected, the graph was a little off from expectations and my initial speculation that it was a non-optimal access pattern didn’t pan out. In fact, I determined that the slowness I was seeing was independent of IO and was due to how slow the in-memory piece was when using WiredTiger to do a table scan. Needless to say, I started to talk to engineering about the issue and tried tweaking various options – each one essentially reinforced the original finding.
It was soon obvious that we had a bug that needed to be addressed (one that was still present in the first release candidate 2.8.0-rc0). I gathered the relevant data and opened SERVER-16150 to investigate the problem. Thanks to the ever excellent folks in MongoDB engineering (this one in particular), we soon had the first patched build attempting to address the issue (more, with graphs after the jump). Before that, for anyone looking to reproduce this testing, I would recommend waiting until SERVER-16150 has been closed and integrated into the next release candidate (2.8.0-rc1), you won’t see the same results from 2.8.0-rc0 (it will instead look like the first set of results).
CAVEAT: This post deals with a development version of MongoDB and represents very early testing. The version used was not even a release candidate – 2.7.9-pre to be specific, this is not even a release candidate. Therefore any and all details may change significantly before the release of 2.8, so please be aware that nothing is yet finalized and as always, read the [release notes] once 2.8.0 ships.
Update (Nov 17th, 2014): Good news! I have re-tested with a patched version of 2.8.0-rc0 and the results are very encouraging compared to figure 2 below. For full details (including an updated graph), see MongoDB 2.8: Improving WiredTiger Performance
Anyone that follows the keynotes from recent MongoDB events will know that we have demonstrated the concurrency performance improvements coming in version 2.8 several times now. This is certainly the headline performance improvement for MongoDB 2.8, with concurrency constraints in prior versions leading to complex database/collection layouts, complex deployments and more to work around the per-database locking limitations.
However, the introduction of the new WiredTiger storage engine that was announced at MongoDB London also adds another capability with a performance component that has long been requested: [compression]
Eliot also gave a talk about the new storage engines at MongoDB London last week after announcing the availability of WiredTiger in the keynote. Prior to that we were talking about what would be a good way to structure that talk and I suggested showing the effects and benefits of compression. Unfortunately there wasn’t enough time to put something meaningful together on the day, but the idea stuck with me and I have put that information together for this blog post instead.
It’s not a short post, and it has graphs, so I’ve put the rest after the jump.
Setting readahead (RA from now on) appropriately is a contentious subject. There are a lot of variables involved, but in my particular case I am setting out to minimize those variables, get a baseline, and have a reasonable idea of what to expect out of this configuration: Environment: Amazon EC2 Instance Size: m3.xlarge (4 vCPU, 15GiB RAM) Disk Config: Single EBS Volume, 1000 PIOPS The testing I am going to be doing is pretty detailed, and intended for use in a future whitepaper, so I wanted to get some prep done and figure out exactly what I was dealing with here before I moved forward.