There's a number of datastores that claim to shard and replicate automatically, ...

AgentME · on June 4, 2015

>I'd suggest trying to use a simpler model, and understand and accept its failure modes. Maybe your app has to go into read-only mode for a few hours if there's a server failure, etc.

I'm fine with failure modes like that. I just want it to be automated. I don't want to come home from a trip and find that my database master has fallen over and the database slave has been patiently waiting for me to manually promote it for the last few days. I could probably rig up some cron jobs and shell scripts to automate this, but this is what I'm looking for something to do for me that's hopefully written by people smarter than me.

toomuchtodo · on June 5, 2015

RDS MySQL does this, with a hot standby master and 60 second failover. Replicas too. Unless you were looking for free. That's a bit harder.

15155 · on June 4, 2015

FoundationDB was the only other database that attempted the same level of consistency as CockroachDB.

There's a reason Apple bought them.

graffitici · on June 4, 2015

Would you also say the same about Cassandra?

Netflix is the largest user, and they are well-known for their "Chaos Monkey" strategy of taking down servers randomly..

mdavidn · on June 4, 2015

Yes. Aphyr has an excellent article on Cassandra.

https://aphyr.com/posts/294-call-me-maybe-cassandra/

thiel · on June 4, 2015

which is, like, 18 months old; cassandra has come a ways since then.

(full disclosure: I work at DataStax and work on improving cassandra every day)

beachstartup · on June 4, 2015

networking is the real performance bottleneck these days. cpu's, memory, ssd's are plenty fast and cheap. if your cluster isn't on at least 10gig switched ethernet, you're probably nowhere near the potential performance limit.

mackwic · on June 4, 2015

It's not about performance and bottlenecks, it's about CAP theorem: any system can provide only 2 of either Consistency, Availability, or Partition-tolerant.

Consistency: you want all you cluster to have the same data.

Availability: you want being able to lose one node or more in case of some issue.

Partition-tolerant: in case of net-split (think IRC), the splitted part of your cluster can work in isolation, then re-heal when the link is up again.

When someone sell you auto-healing and cluster reliability, they are selling you AP, which means that you lose the C, something we all takes for granted. Cassandra is one of those. Think of what you can't do when all your nodes can have different data of the same model.

Sorry for the useless explanation if you already knew that.

nemothekid · on June 4, 2015

>When someone sell you auto-healing and cluster reliability, they are selling you AP, which means that you lose the C, something we all takes for granted. Cassandra is one of those. Think of what you can't do when all your nodes can have different data of the same model.

This is pretty hyperbolic. Netflix does perfectly fine with this model given that they run Cassandra at its lowest consistency level[1]. If they can reliably store watch histories, run recommendations, settings, and playlists on this model well I'm wondering what you have in mind when you say "think of what you can't do". Besides, its not like large AP systems are a new thing, have you ever overdrawn your account?

[1] http://planetcassandra.org/blog/a-netflix-experiment-eventua...

knucklesandwich · on June 5, 2015

This is the most hand-wavy, unrigorous talk on a distributed database I've ever seen. You run a test 5 times in optimistic conditions and that gives you confidence that "you can trust it" to replicate your writes?

There are a multitude of failure cases in which it cannot replicate those writes. Ultimately your database has to be a decision based on the availability and consistency needs due to your use case, period. "Trust" should never come into the discussion at all, you should be well aware of what your tradeoffs mean in the worst case.