Cassandra isn't designed for this. Been there, done that.
postgres all the way.
THe FT use(d) cassandra to store their membership details (6 million rows of largely static, read-only well structured data) The cluster was massive (12+nodes in at least two regions, from memory) slow and was impossible to upgrade reliably.
The support from datastax is shite. Backups are not reliable. imports even less so, and you are beta testing the whole system every time you do a point release.
CMSs have highly structured data. swallow your pride, map the data and build a proper schema. Its really not that hard.
Yes, cassandra has a graph layer, no its really not worth it. Yes it has gremlin, no you shouldn't need it if you've modelled your data correctly.
Cassandra doesn't have graph support, it's just a sorted, distributed, nested key/value system.
Datastax builds a graph layer on top, or you can use something like JanusGraph, but it's never as good as using a real graph database with natively designed storage system.
What were the issues? You're the first person I hear complain about cassandra for this. Agreed that any database can do that type of data at that volume.
its a 50/50 split between terrible design and horrible support.
I've seen cassandra shine when it comes for write optimised loads. Pipe a bucket load of data into gremlin and magic happens.
But thats a specific workload, which is pretty rare, and certainly not suited 999:1 read to write ratio. Thats not cassandra's fault, thats the fault of the idiot that chose it, and the boatload of idiots who carried on and added loads of systems that makes it much harder to migrate away.
Then we come to support. Datastax is the defacto support provider. They make a lifecycle manager, backup/restore tool, and push a load of patches into the main codebase. But it is shit
o Backups fail silently
o The only way to make alerts work (ie do an action, rather than create a popup when you log into the ops center) requires work to navigate the API
o Its full of CVEs, which are script kiddy-able
o migrating data from backups to new clusters was impossible to do without a boat load of manual work, failed 50% of the time
o restoring from automated backups was impossible until august.
o it couldn't use instance profiles on AWS until august
o upgrading to a point release silently breaks backups, _always_
Basically I spent the first half of this year QA very expensive software. There are some very very good support people, but there were some terrible ones as well.
Is this all about their enterprise tools? I can't say I've used any of these.
For backups, we found a tool on github to snapshot to S3. Worked fine as far as I know. It's the guys in the office next to me that were handling this, not me, never heard of any major issue.
postgres all the way.
THe FT use(d) cassandra to store their membership details (6 million rows of largely static, read-only well structured data) The cluster was massive (12+nodes in at least two regions, from memory) slow and was impossible to upgrade reliably.
The support from datastax is shite. Backups are not reliable. imports even less so, and you are beta testing the whole system every time you do a point release.
CMSs have highly structured data. swallow your pride, map the data and build a proper schema. Its really not that hard.
Yes, cassandra has a graph layer, no its really not worth it. Yes it has gremlin, no you shouldn't need it if you've modelled your data correctly.