It probably scales but how is the performance? If I need to load a couple billio...

arjunnarayan · on May 10, 2017

[Cockroach Labs engineer here]

For just a couple billion rows and a dozen joins, a single node will suffice (with the caveat that you really want at least 3 nodes because CockroachDB is built for replication and fault-tolerance and you're not getting that with a single node cluster), but you'll get linear speedup as you add more machines.

Your performance on a single node should be on the same order of magnitude as doing this in Postgres right now. We are rapidly closing that gap, and intend to close it completely for TPC-H style queries, while retaining the linear performance speedup with more nodes.

The reason this gap isn't already closed is we've been focused on transactional performance in distributed, fault-tolerant situations rather than analytics performance, for 1.0. There are lots of optimization low hanging fruit that we haven't focused on in analytics scenarios that we are just getting started on.

gflarity · on May 10, 2017

Hi Cockroach Labs Engineer here,

On the feature FAQ joins are describe as 'functional' which doesn't inspire a lot of confidence but maybe it's just a perception thing. What exactly does functional mean?

A SQL db without joins sounds a lot like just a NOSQL db with a familiar query dialect.

arjunnarayan · on May 10, 2017

If you are using Joins in an OLTP setting, everything should work absolutely as you might expect.

"Functional" is our caveat that if you run Joins across your data in an OLAP setting, it will work, but it may not be the most performant Join possible. For example, our query planner does not currently plan Merge-joins even if the appropriate secondary indices exist. So after a point (joining ~billions of rows of data) it no longer is as performant as it could be. Now we expect to roll out this particular fix within 6 months. However, optimizing 4 or 5-way nested Joins in OLAP-cube style settings isn't something we're going to be performant at for years. We need a lot more infrastructure built up before we start solving the kinds of problems revealed by, say, the Join Order Benchmark paper (http://www.vldb.org/pvldb/vol9/p204-leis.pdf).

MichaelBurge · on May 10, 2017

Thanks for your response. It sounds like CockroachDB might be an alternative to setting up an RDBMS for read replication once you need many connections.