In the stack I am working on we have a variety of databases all serving a different type of data storage:
- memcache: for caching of data that doesn't persist
- redis: caching of data that needs a to be persisted short term but not on the longer term (eg. sessions)
- MySQL: for user-like data (account details, addresses, projects, ...)
- DynamoDB: for millions of data points that only needs to be queried in 1 dimension, so are not related or compared to one another. eg. give me all values from this table containing a given datatype, between 2 dates
- MongoDB: for millions of datapoints that need to be queried on deeper levels
- etc.
Great, thank you for the detailed overview. If you don't mind me asking, do you ever end up having problems with consistency? (I imagine it could happen if some data was written to 2 databases and the second one rejected the transaction.)
Well, the downside of using multiple DB stores is that the logic of keeping everything consistent is in the hands of the developer. So you have to make sure that everything is written correctly.
For instance, if you write to MySQL and Mongo, but your Mongo is down, you'll either have to queue the data item somewhere for a write once the system is back up, or you have a migration system in place that gets everything from MySQL since the downtime and writes it back to Mongo.
Depending on the type of data we have a few easing factors: for some data stores it is not that big of a deal if it doesn't get written to it's 2nd layer (eg. cache) as we can rewrite it the next time it is requested in layer 1.
- memcache: for caching of data that doesn't persist - redis: caching of data that needs a to be persisted short term but not on the longer term (eg. sessions) - MySQL: for user-like data (account details, addresses, projects, ...) - DynamoDB: for millions of data points that only needs to be queried in 1 dimension, so are not related or compared to one another. eg. give me all values from this table containing a given datatype, between 2 dates - MongoDB: for millions of datapoints that need to be queried on deeper levels - etc.