NoSQL Data Modeling Techniques

michaelmior · on Oct 18, 2016

This post provides a great general purpose overview of common data modeling techniques. I'm going to take an opportunity to share some of my work in the area. NoSE[0] is a tool I've been building to automate the data modeling process for NoSQL systems. The models it produces cover many of the techniques mentioned in the article.

The high level idea is that with a model of the data an application wants to store and the workload, NoSE will suggest a data model for a particular NoSQL store. Currently I've only tested this out with Cassandra although most of the work to support MongoDB is in place as well.

Happy to work with anyone who may be interested in trying it out :)

[0] https://michael.mior.ca/projects/NoSE/

bogomipz · on Oct 19, 2016

This is very interesting. What was the impetus for starting this, a particular data migration?

michaelmior · on Oct 19, 2016

The idea is that there are a number of complicated tradeoffs that must be made when designing a schema. See [0] for an example; note that every rule comes with an immediate caveat. The result is that you have to be an expert to come up with a great design for a particular system. Even then, if your workload is complex, you might miss a non-obvious choice which will outperform your manual selection.

The goal of NoSE is to automate the process by estimating the cost of executing a workload against a wide range of schema designs. We can then select the one which is likely to perform the best. As far as data migration, moving from a relational database is a reasonable use case. Some of the work I'm doing right now is looking at how you can transition between different schemas in a denormalized database. However, our initial use case was to build something that would allow a non-expert user to design a good schema.

[0] http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeli...

dacm · on Oct 18, 2016

Should be noted in the title that this post is from 2012

bogomipz · on Oct 18, 2016

This is a great survey of the modern data stores landscape. The cartoon at the top is hilarious as well. Thanks for sharing.