On the contrary, a model that is solid enough that you can continue to improve a...

Latteland · on Oct 5, 2018

sql has stood the test of time, but we've also figured out how to extend the sql dbs and relational model so that it works efficiently in a lot more scenarios. there are column stores that do incredible compression, in-memory row stores, you can put json in a sql database and efficiently query it.

weberc2 · on Oct 5, 2018

Features aren’t being “built on it”, they’re being bolted on because the model is insufficiency powerful, which is my point. To be clear, relational databases are great and important, but the relational model will have to be replaced.

nl · on Oct 5, 2018

There are two issues I know of with SQL where it isn't super great IMHO.

1) Hierarchical data. The relational model is fine here, but the query syntax isn't awesome.

2) Understanding remote data. This is outside the scope of the data model, but it does effect the way software is built.

Neither of these show any evidence that the relational model should be replaced.

naasking · on Oct 5, 2018

> There are two issues I know of with SQL where it isn't super great IMHO.

There are many more issues, you've just gotten used to eating SQL turds. Hierarchical data is solved by CTEs, so that's not a big deal.

The bigger deal is that relations themselves are impoverished second-class citizens. This means you can't write a query, bind it to a variable and then reuse that query to build another query or store that query in a table column. Something like:

    var firstQuery = select * from Foo where ...
    var compositeQuery = select * from firstQuery where ...

    create table StoredQuery(id int not null primary key, query relation)
    insert into StoredQuery values (0, firstQuery)

That's something like relations as first-class values. This replaces at least 3 distinct concepts in SQL: views, CTEs, and temporary tables, all of which were added to address SQL's expressiveness limitations. Relations as first-class values not only improve the expressiveness of SQL beyond those 3 constructs, it would also solve many annoying domain problems that require a lot of boilerplate at the moment.

Then there's pervasive NULL, inconsistent function and value semantics across implementations, and a few other issues.

MaxBarraclough · on Oct 5, 2018

I like the first-class-relations idea.

As for pervasive NULLs, isn't that more the fault of schema design?

There are a lot of things wrong with SQL. Little things like the fact that INSERT and UPDATE have different syntax for no good reason.

A few of its syntax quirks annoy me as they're nonstandard in today's languages, but it's only a shallow complaint: using '<>' for its inequality operator, and using single-quotes for strings.

I'm also disappointed in the implementations in various ways: the way Microsoft SQL Server sends query text over the wire unencrypted, in its default configuration. The way Firebird SQL has such a basic wire protocol that you can see significant performance enhancements by invoking TRIM on text-type fields. The way the optimisers are so damn primitive, especially compared to the baffling wizardry that goes on in today's (programming language) compilers.

Somewhat off topic further ranting:

But the core relational model makes good sense. I see little general value in the freeform graph-databases calling themselves 'NoSQL'. (Do we call functional programming languages "No-assignment"?)

Perhaps some of them can scale well, but can't SQL do that? Google Cloud Datastore, for instance - it can scale marvellously, but only because it imposes considerable constraints on its queries. Can't we do the same thing with an SQL subset?

naasking · on Oct 5, 2018

> As for pervasive NULLs, isn't that more the fault of schema design?

I think NULL defaults are widely regarded as a bad thing by now. You should have to declare what's nullable, not declare what's not null.

> There are a lot of things wrong with SQL. Little things like the fact that INSERT and UPDATE have different syntax for no good reason.

Moreover, SELECT should be at the end not the beginning. Query comprehensions and LINQ did this right.

And yes, the implementation inconsistencies are seriously irritating as well.

> Google Cloud Datastore, for instance - it can scale marvellously, but only because it imposes considerable constraints on its queries. Can't we do the same thing with an SQL subset?

If you extend Map-Reduce with a Merge phase, then you can implement the relational algebra with joins [1]. That scales pretty well.

http://cs.brown.edu/courses/cs295-11/mapreducemerge.pdf

tomnipotent · on Oct 6, 2018

> Moreover, SELECT should be at the end not the beginning. Query comprehensions and LINQ did this right.

I'd one-up this and argue that all SQL statements should be in order of execution (within reason). Moving SELECT to the end is definitely a good start.

nl · on Oct 7, 2018

Mixing execution concerns and the language seems like a bad idea.

It's pretty much impossible to decide what an optimised query should look like on a different databases with different data.

Slow network? Fast disks? GPU optimised joins? No way to know what execution order should be, and that's a strength of the relational model.

MaxBarraclough · on Oct 6, 2018

Interesting paper, thanks.

TimJYoung · on Oct 5, 2018

The schema aspects of this idea are hard for me to grasp.

How would the database engine efficiently resolve any references to the embedded relation ?

I'm also having trouble understanding how your INSERT statement would even work - is the firstQuery being inserted being treated as hierarchical data where every single relation attached to every single row has its own schema/structure ? Or is the structure fixed in the CREATE TABLE statement so that you can't use just any relation, but rather have to use a relation that conforms to the structure defined in the CREATE TABLE statement ?

naasking · on Oct 5, 2018

I played a little fast and loose to convey the basic idea. To elaborate since you asked, introduce typed tuples/records as first-class entities, then "tables" are just a sequence of records declared with a particular lifetime.

Relations are functions over these sequences. The embedded relation is then a stored function over a set of nested sequences. So to add the concrete types:

    var firstQuery = select Id, Name, Payment from Foo where ...
    var compositeQuery = select Id, Name, Total from firstQuery where ...

    create table StoredQuery(id int not null primary key, query relation{Id int, Name Text, Payment decimal})
    insert into StoredQuery values (0, firstQuery)

Again eliding a few details, but hopefully you get the basic idea.

TimJYoung · on Oct 6, 2018

I think what you're describing is already present as of SQL 2003 in the form of multisets:

http://farrago.sourceforge.net/design/CollectionTypes.html

However, good luck finding a database engine that necessarily supports such features. Some support array types, but I haven't seen too many that support multisets.

tomnipotent · on Oct 5, 2018

What you're asking for is an independently evolving logical schema on top of the physical schema - this is a great idea, but performance guarantees are highly tied to the physical layout of the data.

Apache Spark is the best example I've seen of this in that you can compose schemas on the fly and it will unroll them at the end based on physical layout.

Izkata · on Oct 5, 2018

This particular use case sounds like a MATERIALIZED VIEW, and views in general for doing what you're asking for...

naasking · on Oct 5, 2018

A materialized view would be an optimization technique used when compiling a schema containing first-class relations (essentially, memoization). First-class relations are more general though.

But first-class relations would also present some optimization challenges, so a subset of a relational system with a restricted form of first-class relations corresponding to materialized views that can be stored in table columns and used in queries would get pretty close.