But how do you achieve data persistence in case of server crash? Snapshots are n...

ruckusing · on Jan 26, 2012

We have the same setup, use local ephemeral disks on EC2 with Postgres. We never even tried EBS as we just heard too much negative things about it, namely its variance in performance.

So our approach is to RAID-10 (4) local volumes together. We then use replication to at least 3 slaves, all of which are configured the same and can become master in the event of a failover.

We use WAL-E[0] to ship WAL logs to S3. WAL-E is totally awesome. Love it!

[0] https://github.com/heroku/wal-e

fdr · on Jan 27, 2012

I'm glad you like wal-e. I tried rather hard to make it as easy as I could to set up.

Please send feedback or patches, I'm happy to work with someone if they had an itch in mind.

If one has a lot of data, EBS becomes much more attractive because swapping the disks in the case of a common failure (instance goes away) is so much faster than having to actually duplicate the data at the time, presuming no standby. Although a double-failure of ephemerals seems unlikely and the damage is hopefully mitigated by continuous archiving, the time to replay logs in a large and busy database can be punishing. I think there is a lot of room for optimization in wal-e's wal-fetching procedure (pipelining and parallelism come to mind).

Secondly, EBS seek times are pretty good: one can, in aggregate, control a lot more disk heads via EBS. The latency is a bit noisy, but last I checked (not recently) considerably better than what RAID-0 on ephemerals for some instances would allow one to do.

Thirdly, EBS volumes are sharing with one's slice of the network interface on the physical machine. That means larger instance sizes can have less noisy-neighboring effects and more bandwidth overall, and RAID 1/1+0 are going to be punishing. I'm reasonably sure (but not 100% sure) that mdadm is not smart enough to let a disk with decayed performance "fall behind", demoting it from an array, using a mirrored partner in preference. Overall, use RAID-0 and archiving instead.

When an EBS volume suffers a crash/remirroring event they will get slow, though, and if you are particularly performance sensitive that would be a good time to switch to a standby that possesses an independent copy of the data.

[0]: http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...

cheald · on Jan 26, 2012

Lots and lots of replication, more or less. I don't know how it works in Postgres, but with something like Mongo, you set up a replication cluster and presume that up to (half - 1) of the nodes can fail and still maintain uptime. Postgres, being a relational database rather than a document store, likely has an additional set of challenges to overcome there, but it's very possible to do.

_3u10 · on Jan 26, 2012

You just copy the WAL log to another server and replay it. It takes a day to setup and test. Once that is setup you have two options async replication (which means you'll lose about 100ms of data in event of a crash) or you can use sync replication which means the transaction doesn't commit until the WAL log is replicated on the other server. (that adds latency but doesn't really affect throughput)

I'm not exactly sure how the failover system works in Postgres, the last time I setup replication on postgres it would only copy the WAL log after it was fully written, but I know they have a much more fine grained system now.

If you use SQL Server you can add a 3rd monitoring server and your connections failover to the new master pretty much automatically as long as you add the 2nd server to your connection string. Using the setup with a 3rd server can create some very strange failure modes though.

notaddicted · on Jan 26, 2012

Another possibility on the AWS cloud is to put the logs on S3, Amazon S3 has High Durability. Heroku has published a tool for doing it: https://github.com/heroku/WAL-E , which they use to manage their database product.

darwinGod · on Jan 26, 2012

If this was for Postgres refer this : http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial See the section "Starting Replication with only a Quick Master Restart", I tested this with a DB <10 GB. The setup time for replication was in the order of minutes. On a related note, it would be awesome if someone could point out real-world experiences for Postgres 9.1 synchronous replication.