Anyone has a recommendation for a NoSQL database? https://news.ycombinator.com/i...

balfirevic · on May 24, 2020

This question sounded familiar - turns out I replied to it in another thread: https://news.ycombinator.com/item?id=23286054

To repeat my (non)answer:

There is no way to recommend NoSQL database without knowing what you need it for because NoSQL databases are highly specialized systems. If you need general-purpose database use an SQL one.

It's kind of a weird question, now that I think about it. Why would anyone seek out a database based on what it doesn't have?

lmm · on May 24, 2020

I'd actually say the reverse. SQL databases are highly specialised datastores: they make sense if you need one particular transaction model and one particular query language and are prepared to coerce your data into one particular model to do so.

If you're starting from just "I need to store some data" I'd look to e.g. Riak or Cassandra before looking to an SQL database.

jatone · on May 25, 2020

SQL DBs are not specialized.... they're incredibly general...

You are never starting from "I need to store some data" you're always going to start from "I need to store and read some data" otherwise /dev/null would work if you are not going to read the data back.

the problem with cassandra and riak is precisely the read aspect of the problem which quickly degrades the performance of those systems.

I've used both cassandra and postgresql at scales most companies never reach. cassandra I'd only touch for immutable time series data and only if that information was large enough to not fix on a single server and i didn't care about consistency. everything else is a SQL rdbms.

lmm · on May 26, 2020

For simple reads, the SQL model forces significantly worse performance: MySQL benchmarks found that 75% of the time for a pkey lookup was spent on parsing the SQL. For more complex querying, SQL databases can be fast... and they can also be extremely slow, and you can't tell for any given query just by looking at it.

The much-vaunted consistency comes at a significant cost: index updates block writes, and more insidiously, it's very easy to be surprised by a deadlock or a stale transaction with a long-running query. I've seen an SQL database stop committing any new writes because someone ran a seemingly innocuous query 23 days ago. And a lot of the time - including every web use case I've seen - you can't actually make any real use of those consistency guarantees.

Writing either a transformation pipeline that serves the same function as a secondary index, or a deliberate map-reduce style aggregation, takes more up-front effort. But it means you understand what's actually going on a lot more clearly and are much less likely to hit that kind of unpleasant surprise.

databrecht · on May 25, 2020

I wonder, given your experience, did you ever try FaunaDB? It grew from the not optimal experience when scaling databases like Cassandra etc @ Twitter. Consistent + relational + multi-region.