Hacker News new | past | comments | ask | show | jobs | submit login

The result of a query isn't a set but a multiset/bag (sets that allow for multiple instances of the same element). One popular take on this is that this "breaks" the relational algebra which is certainly true if sets are the thing you want to base everything on like a lot of mathematics today. However multisets are in of themself a very interesting structure that has interesting properties which could be helpful in understanding distributed systems and general relativity.

Things like sum types and the option monad would be powerful additions to a RDBMS but I wish that people would not be so quick to dismiss anything because it is not a set. Sets can be extremely difficult to work with, just simple things like adding numbers together as described in peano-arithmetic is bunkers compared with multisets. Everyone knows how the 19 century dream of Hilbert of grounding mathematics in logic/set theory failed but somehow everyone keeps wanting to use them for everything. In my mind this is a shame and I don't think that removing them from the one place where they have a use in society at large is a good idea.




I imagine if we were writing a new RBDMS that wasn't worried about backwards compatibility we could reduce the number of types flying around internally. I think there's a lot of historical accident in which exact variants of set are available at which point. A refinement of relational logic based on a base data structure that is a bit more pragmatic (because Codd's logic is close to pragmatic, but does still have a bit of ivory towerism in it; no criticism inteded, it was a huge advance, but I think we could tweak it a bit in modern times, and in his defense I think there were bits in Codd's work that failed to come out to the pragmatic systems for a long time, to their detriment) would probably also be helpful.

But in the meantime, back here on the ground, it is nice to at least be working in an RDBMS that can convert back and forth between these things, even if it's still klunky. I remember when I just plain couldn't, and the circumlocutions to do what I wanted to do were bigger than the business logic I needed.


The reason sets are important is that they correspond to (Boolean-valued) properties. To each set, there corresponds the property of belonging to that set. To each property, there corresponds the set of all things with that property. I think this is the key reason why the relational algebra is a good foundation for a query language. When I'm writing a query, I'm thinking of some property P, such that I want the results of the query to be all records with property P. By utilizing the correspondence between properties and sets I can translate that property fairly directly into an expression in the relational algebra, and then the magic of a relational query language is that that expression is all I need to write to carry out the query. That's the sense in which relational query languages are declarative. I just write down the property of the results I want, and I get those results automatically without having to specify how to collect those results.

Having queries return multisets rather than sets "breaks" the relational algebra in the sense that it breaks this correspondence. Results of queries no longer correspond one-to-one to properties, since properties have no multiplicity. To be fair, you can identify the property being true or the element belonging to the set with having multiplicity > 0, and the property being false or the element not belonging to the set with having multiplicity 0, and by doing this you can think of SQL queries as corresponding to sets/properties most of the time. But if you're going to think of them that way, you might as well just have them be sets in the first place. The multisets are just a needless complication. Thinking about SQL queries in terms of multisets seems to only be compatible with a more imperative, non-relational approach to the language, where you still have to think algorithmically about how to assemble the collection of results that you want, rather than just directly characterizing the results in terms of a property.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: