The result of a query isn't a set but a multiset/bag (sets that allow for multip...

jerf · on March 13, 2023

I imagine if we were writing a new RBDMS that wasn't worried about backwards compatibility we could reduce the number of types flying around internally. I think there's a lot of historical accident in which exact variants of set are available at which point. A refinement of relational logic based on a base data structure that is a bit more pragmatic (because Codd's logic is close to pragmatic, but does still have a bit of ivory towerism in it; no criticism inteded, it was a huge advance, but I think we could tweak it a bit in modern times, and in his defense I think there were bits in Codd's work that failed to come out to the pragmatic systems for a long time, to their detriment) would probably also be helpful.

But in the meantime, back here on the ground, it is nice to at least be working in an RDBMS that can convert back and forth between these things, even if it's still klunky. I remember when I just plain couldn't, and the circumlocutions to do what I wanted to do were bigger than the business logic I needed.

housecarpenter · on March 14, 2023

The reason sets are important is that they correspond to (Boolean-valued) properties. To each set, there corresponds the property of belonging to that set. To each property, there corresponds the set of all things with that property. I think this is the key reason why the relational algebra is a good foundation for a query language. When I'm writing a query, I'm thinking of some property P, such that I want the results of the query to be all records with property P. By utilizing the correspondence between properties and sets I can translate that property fairly directly into an expression in the relational algebra, and then the magic of a relational query language is that that expression is all I need to write to carry out the query. That's the sense in which relational query languages are declarative. I just write down the property of the results I want, and I get those results automatically without having to specify how to collect those results.

Having queries return multisets rather than sets "breaks" the relational algebra in the sense that it breaks this correspondence. Results of queries no longer correspond one-to-one to properties, since properties have no multiplicity. To be fair, you can identify the property being true or the element belonging to the set with having multiplicity > 0, and the property being false or the element not belonging to the set with having multiplicity 0, and by doing this you can think of SQL queries as corresponding to sets/properties most of the time. But if you're going to think of them that way, you might as well just have them be sets in the first place. The multisets are just a needless complication. Thinking about SQL queries in terms of multisets seems to only be compatible with a more imperative, non-relational approach to the language, where you still have to think algorithmically about how to assemble the collection of results that you want, rather than just directly characterizing the results in terms of a property.