> SQL constraints are very limited Oh really? What exactly cannot be expressed i...

jeltz · on June 8, 2016

Arbitrary constraints which look at multiple rows require serializable isolation to work under concurrency, and most people do not run their databases at serializable. But you get pretty far with single row check constraints and exclusion constraints (exclusion constraints can be used for checking against range overlap).

snaky · on June 8, 2016

There's no need of serializable level of isolation in common case. It depends on particular data model, particular access pattern, particular database implementation (MVCC or blocking, and how exactly transaction isolation levels are implemented - e.g. Oracle's "serializable" is closer to PostgreSQL's "repeatable read").

Anyway, any sufficient modern RDBMS implementation provides pretty good level of performance even with serializable isolation level, thanks to decades of tuning and research in the field.

jeltz · on June 8, 2016

Not in the common case, but it is necessary in the general case. One of the main motivations I have heard from the few people actually using serializable in their systems is the ability to enforce arbitrary constraints under parallelism.

And, yes, performance should in general still be good, but there is less knowledge out there about how to solve the performance issues specific to serializable since there are few people who use it (at least in the PostgreSQL world).

snaky · on June 8, 2016

Well, not so many people use serializable in their systems (in the whole system, for every transaction) and that is common and reasonable approach. They use the minimal isolation level for particular transaction that keeps their data consistent. Most widely used safe default is repeatable read, in PostgreSQL terms.

Broadly speaking, of course there are some quirks in the field, let's start from "A critique of ANSI SQL isolation levels"[1] by Jim Gray et al., an author of highly respected fundamental book about transactions [2]. But the very kind of problems discussed in RDBMS world, is a rather contrasting with an "ACID? why do we need it?" attitude that is so often in the world of "web scale NoSQL".

[1] http://dl.acm.org/citation.cfm?id=223785

[2] https://www.amazon.com/Transaction-Processing-Concepts-Techn...?

im_down_w_otp · on June 8, 2016

> "using serializable in their systems is the ability to enforce arbitrary constraints under parallelism"

That's a funny way to say that. :-) Because what it really means is, "wanting to enforce a total lack of parallelism when under parallelism." I mean I get what you're saying, it was just funny to read.

tremon · on June 8, 2016

Can you give a use-case for such a constraint that can not be avoided by a better data organization? I'm not sure why you would deliberately design your data with constraints based on other row contents.

Also: how do you enforce such constraints with MongoDB exactly? If the answer is "do it in the application", then your answer applies to relational databases too.

jeltz · on June 8, 2016

A common example is making sure that every event in a bookkeeping system balances to zero (e.g. three rows: -125 EUR bank, +100 EUR office materials, +25 EUR tax). To enforce this at the database level you either need to run at SERIALIZABLE, take a table lock, or do something horrible with your database structure (like putting all rows of the event in a JSON blob).

When I have solved this case I have done it in the application by making sure we are only using a small set of carefully audited stored procedures to modify the event table.

MongoDB does not even try to solve this kind of problem, and I am the wrong guy to ask if you want someone to try to make a case for using it anywhere.

tremon · on June 8, 2016

A common example is making sure that every event in a bookkeeping system balances to zero (e.g. three rows: -125 EUR bank, +100 EUR office materials, +25 EUR tax). To enforce this at the database level you either need to run at SERIALIZABLE, take a table lock, or do something horrible with your database structure (like putting all rows of the event in a JSON blob).

Or you deny direct table inserts, and provide a stored procedure for inserting multiple rows in one transaction. But I see how it could be useful in the generic case.

jeltz · on June 8, 2016

Which is exactly what I did when I had to solve the problem. :) And you are right in mentioning that the fact that you can restrict modification of the database to only certain stored procedures is a great help here.

snaky · on June 8, 2016

With on commit triggers or deferrable check constraints it would be easier, but in Oracle you could use a materialized view with "on commit refresh" and add triggers to this MV.

snaky · on June 8, 2016

You don't have to avoid such constraints if they are natural to data model actually. It's much faster (and safer, thanks to transactions) to check new data against some old data when the checking code is as close to the data as possible, instead of make a couple additional requests between DBMS and the business logic layer.