Amazon EC2 persistent storage
Werner Vogels, CTO of amazon has just written about Amazon’s latest feature addition: persistent storage for amazon ec2 — This recent announcement come a few weeks after Amazon announced static IP’s (Elastic IP’s) and Availability Zones — or the ability to specify the location of an instance on creation.
Persistent storage for Amazon EC2 will be offered in the form of storage volumes which you can mount into your EC2 instance as a raw block storage device. It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.
Reading the post, what the technology sounds like is some sort of home grown SAN — there are however some limitations — the storage device can only be mounted by one instance at a time, and more annoying is only available from one availability zone. However, one nice and unexpected feature is the ability to store snapshots of the volume to S3 and then create volumes in other Availability Zones from that snap shot.
I think it is an important step forward, and the pace of development at Amazon is impressive… but I’m really getting annoyed by features that are missing something, or have some constraints. What gets to me the most is that only one instance and mount one volume at any given time — a truly distributed file system that allowed multiple running instances to use it concurrently would really blow me away.
I guess the context of this is really a database server, and you would only have one storage device per zone to correspond to each mysql-slave per zone… Another scenario would be an Apache SOLR master instance would use the volume as the persistent storage for the Lucene Index and replicate out to the Slaves that would just store on the transient EC2 drive.
Filed under: Virtualization





We’ve been testing the new storage volumes for a while and they really change the game. You can do lots more than run databases, see http://blog.rightscale.com/2008/04/13/ for some ideas. The Amazon folks are on a roll!
With the addition of the storage volumes there’s no doubt in my mind anymore: the cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines. Cloud computing is going to be as significant for deployment as agile is for software development. You either compute in the cloud or you’ll be left behind by your competitors because they can deploy faster, better, and cheaper than you can.
Yup, there’s always room for version #2… I don’t think it would make any sense to mount a storage volume across availability zone boundaries as it would immediately defeat the whole point of availability zones because you’d be coupling two zones. And with the ability to snapshot your volume and then fire up a new one in a different zone you have an amazing tool available.
For mounting a volume read-only on multiple instances, you can do this using snapshots and cloning. I’ve described some ideas around this on my blog at http://blog.rightscale.com/2008/04/13/ and when you think about the scales possible it is really breathtaking.