We have the same setup, use local ephemeral disks on EC2 with Postgres. We never...

fdr · on Jan 27, 2012

I'm glad you like wal-e. I tried rather hard to make it as easy as I could to set up.

Please send feedback or patches, I'm happy to work with someone if they had an itch in mind.

If one has a lot of data, EBS becomes much more attractive because swapping the disks in the case of a common failure (instance goes away) is so much faster than having to actually duplicate the data at the time, presuming no standby. Although a double-failure of ephemerals seems unlikely and the damage is hopefully mitigated by continuous archiving, the time to replay logs in a large and busy database can be punishing. I think there is a lot of room for optimization in wal-e's wal-fetching procedure (pipelining and parallelism come to mind).

Secondly, EBS seek times are pretty good: one can, in aggregate, control a lot more disk heads via EBS. The latency is a bit noisy, but last I checked (not recently) considerably better than what RAID-0 on ephemerals for some instances would allow one to do.

Thirdly, EBS volumes are sharing with one's slice of the network interface on the physical machine. That means larger instance sizes can have less noisy-neighboring effects and more bandwidth overall, and RAID 1/1+0 are going to be punishing. I'm reasonably sure (but not 100% sure) that mdadm is not smart enough to let a disk with decayed performance "fall behind", demoting it from an array, using a mirrored partner in preference. Overall, use RAID-0 and archiving instead.

When an EBS volume suffers a crash/remirroring event they will get slow, though, and if you are particularly performance sensitive that would be a good time to switch to a standby that possesses an independent copy of the data.

[0]: http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...