"So the complaint amounts to SQLs failure to fully ‘close’ such that the result ...

KrugerDunnings · on March 13, 2023

The result of a query isn't a set but a multiset/bag (sets that allow for multiple instances of the same element). One popular take on this is that this "breaks" the relational algebra which is certainly true if sets are the thing you want to base everything on like a lot of mathematics today. However multisets are in of themself a very interesting structure that has interesting properties which could be helpful in understanding distributed systems and general relativity.

Things like sum types and the option monad would be powerful additions to a RDBMS but I wish that people would not be so quick to dismiss anything because it is not a set. Sets can be extremely difficult to work with, just simple things like adding numbers together as described in peano-arithmetic is bunkers compared with multisets. Everyone knows how the 19 century dream of Hilbert of grounding mathematics in logic/set theory failed but somehow everyone keeps wanting to use them for everything. In my mind this is a shame and I don't think that removing them from the one place where they have a use in society at large is a good idea.

jerf · on March 13, 2023

I imagine if we were writing a new RBDMS that wasn't worried about backwards compatibility we could reduce the number of types flying around internally. I think there's a lot of historical accident in which exact variants of set are available at which point. A refinement of relational logic based on a base data structure that is a bit more pragmatic (because Codd's logic is close to pragmatic, but does still have a bit of ivory towerism in it; no criticism inteded, it was a huge advance, but I think we could tweak it a bit in modern times, and in his defense I think there were bits in Codd's work that failed to come out to the pragmatic systems for a long time, to their detriment) would probably also be helpful.

But in the meantime, back here on the ground, it is nice to at least be working in an RDBMS that can convert back and forth between these things, even if it's still klunky. I remember when I just plain couldn't, and the circumlocutions to do what I wanted to do were bigger than the business logic I needed.

housecarpenter · on March 14, 2023

The reason sets are important is that they correspond to (Boolean-valued) properties. To each set, there corresponds the property of belonging to that set. To each property, there corresponds the set of all things with that property. I think this is the key reason why the relational algebra is a good foundation for a query language. When I'm writing a query, I'm thinking of some property P, such that I want the results of the query to be all records with property P. By utilizing the correspondence between properties and sets I can translate that property fairly directly into an expression in the relational algebra, and then the magic of a relational query language is that that expression is all I need to write to carry out the query. That's the sense in which relational query languages are declarative. I just write down the property of the results I want, and I get those results automatically without having to specify how to collect those results.

Having queries return multisets rather than sets "breaks" the relational algebra in the sense that it breaks this correspondence. Results of queries no longer correspond one-to-one to properties, since properties have no multiplicity. To be fair, you can identify the property being true or the element belonging to the set with having multiplicity > 0, and the property being false or the element not belonging to the set with having multiplicity 0, and by doing this you can think of SQL queries as corresponding to sets/properties most of the time. But if you're going to think of them that way, you might as well just have them be sets in the first place. The multisets are just a needless complication. Thinking about SQL queries in terms of multisets seems to only be compatible with a more imperative, non-relational approach to the language, where you still have to think algorithmically about how to assemble the collection of results that you want, rather than just directly characterizing the results in terms of a property.

hgsgm · on March 13, 2023

> It was just bad code,

To the point of this thread, it wasn't "bad" then, any more than Newton's ideas about satellites were bad. It was state of the art, and had room for improvement over time.

jerf · on March 13, 2023

Fair point; that was some clumsy phrasing on my part. By the same standard even our best code is bad today in some way, I'm sure. (If only for being more verbose and overspecified than "CodeAI, write an optimized routine to losslessly compress video".)

kragen · on March 13, 2023

in the relational model the construct that corresponds to optional is a second table that is missing some rows

for example, every foo here has a bar, but not every foo has a baz

you can think of this as baz being an optional attribute of whatever entities the foo values identify. if the second relation is not unique on foo (maybe its primary key is both columns!) you could think of it as being a set-valued relationship, associating any number of bazzes with each foo rather than just 0 or 1

from darwen and date's perspective, having null is an error-prone construct analogous to global variables and goto statements

as a historical note, subroutines predate goto statements by about ten years; subroutines predate not only high-level languages but even symbolic assembler. it is true that many people have written programs with global variables and goto statements and no subroutines, but that wasn't because subroutines hadn't been invented yet, or even publicized; wilkes explained how to use subroutines in his 01951 book "The Preparation of Programs for an Electronic Digital Computer" https://archive.org/details/programsforelect00wilk but they were presumably invented somewhat earlier, possibly by david wheeler

the whole array construct in postgres seems like it was probably a huge mistake

jerf · on March 13, 2023

"subroutines predate goto statements by about ten years;"

In academia, perhaps. Subroutines were 'expensive' in the real world for a long time. Many things we take for granted today are too expensive when 16KB is a lot of RAM. I was referring to common practice rather than academia.

"the whole array construct in postgres seems like it was probably a huge mistake"

In a cousin comment where I talk about how we might do things differently if we were starting from scratch, this is one of the biggest things I had in mind. As a column type it's plausible enough, as a generic sequence type you can wrap around any other type. Seems a basic enough primitive that we should have it around. But probably everywhere else I'm using it, I really want a more native "table" type. But I can't really have that. I can kind of see it forming, I think, as the versions march on. But at least as of the version I've been using lately, it's not quite here yet.

In general SQL struggles with type recursion. While it's not hard to see why from a performance perspective you want an engine that doesn't have to deal with arbitrarily-sized values, it gets more and more limiting every year. I've gotten a distressing amount of mileage out of using the JSON types the major engines have been forced to incorporate to stay competitive just as "Look, I just need a little tree here... I can promise it'll stay bounded in size, but I don't want to encode that as a crapton of rows" or "hey, honestly, I've got a modestly complicated structure here that my clients will decode and breaking it out the Official SQL Way is just crazy". This is the thing that really gives me that 1970s Tech feel when I'm working with SQL, and while I appreciate the slow and steady lifting of the restrictions I would definitely like to use a tech that lifted them in a thoughtful, principled manner instead of a multi-decade drunkard's walk across the design landscape.

kragen · on March 14, 2023

using subroutines was common practice when 16 kibibytes was a lot of core, it was common practice before fortran existed, it was common practice before people used assemblers, and it wasn't just common practice in academia but also in places like ibm, cdc, hp, burroughs, and librascope general precision

(it's true, though, that they were expensive)

from my point of view the problem with arrays, json, etc., is not the performance but the failures of orthogonality which require you to memorize a zillion special cases, which i think is what you're saying about 'a thoughtful, principled manner'

some of my own thoughts on what that might look like are at http://canonical.org/~kragen/binary-relations