Out of my depth here (no experience) but "Codds relational database model adds the further constraint that nested tables are not allowed" may be wrong. He allowed nested stuff, it's just that SQL didn't support it.
Can anyone elucidate? Please don't shout that I'm wrong because there was something there in his first paper.
No, he explicitly disallows nested relations. This is the definition of first normal form.
Hierarchical databases (which predate relational) can be understood as nested relations, and Codds first example of normalization is how to extract the nested relations in such a database into seperate tables and instead express the relationships through foreign keys.
Date and Darwen (spritual successors to Codd?), have a relational theory which allows nested relation values and I think it doesn't contradict the motivation behind first normal form, and it isn't hierarchical in the pre-relational database sense. I think they cleaned up Codd's ideas very effectively.
The motivation behind first normal form is to keep the query language simple and powerful at the same time. Allowing nested tables would require extensions to the query language but wouldn’t give any additional expressive power since you can already express the same relationships using foreign keys.
Also I dont see how it isn’t hierachical? Nested tables create a hierachical structure just like nested records in a hierachical database. What is the difference?
I consider Date and Darwen more like fan-fiction writers than spiritual sucessors. It is fine they propose a modified version of the relational model, but they are doing it in a weird way where they try to redefine Codds words to mean the opposite of what he is saying, instead of just noting how they disagree. Like if Codd was a prophet whose words cannot be directly contradicted.
> Also I dont see how it isn’t hierachical? Nested tables create a hierachical structure just like nested records in a hierachical database. What is the difference?
I haven't seen any reasonable uses of nested relations in base relvars, I think they are slightly different to nested records if you can query them declaratively instead of via imperative looking nested loop code.
But nested tables can be used in queries - the group and ungroup operators are pretty useful. It's a straightforward way to define (and even implement) group by aggregates for instance.
> Allowing nested tables would require extensions to the query language but wouldn’t give any additional expressive power since you can already express the same relationships using foreign keys.
Everything is a tradeoff. There are lots of queries that you can write using correlated subqueries in SQL, but I think would be much simpler if you could use group and ungroup.
I don't really follow your last paragraph at all. I would much rather use Date and Darwen's relational theory than Codd's. I think they have the right amount of respect for Codd, and definitely aren't shy to highlight where they differ from him, or to contradict him.
Do you have an example of them trying to redefine something in the way you say?
For example Date redefines first normal form to allow nested tables, while in Codds paper it is defined as eliminating nested tables. Why don’t they just give their concept a different name then? It just creates confusion.
Can you say where he does this? I only remember him referring to first normal form as something historical, that he doesn't use that particular normal form directly at all.
It may upset someone's sense of "purity", but the SQL standard and thoughts about the relational model have long since moved on from this dogmatic view of atoms, and no, foreign keys can't perfectly model what folks need. Eventually the rubber hits the road, theory and practice diverge, and different approaches are clearly needed.
There is a place for a document atom in a database (JSON, XML, etc. that don't map well or efficiently to a traditional relational model). For arrays. (Putting in a separate table with its own PK to join is somehow more flexible? Just use unnest(…) for that.) For label trees. For geometries (GIS). For references to tables on other servers (SQL/MED). For tuples. For UUIDs. For ranges. For intervals.
You may call Date and Darwen's work fan fiction, but I have seen tremendous efficiency gains over the last few decades from engines drawing upon their insights without compromising data integrity. In fact, exclusion constraints with timestamp ranges is one area where the old strict relational model would be wholly ineffective for data integrity and avoiding race conditions cascading out to the application level.
I belive in using the right tool for the job, and this might somtimes be document databases, key-value stores, xml or json values or whatever is needed to solve the problem.
But let me point out that hierachical databases (which document databases, xml etc are variants of) predate the relational model (and certainly predate Dates work) so it it worth to be aware of the challenges and limitations which caused the development of the relational model as an alternative.
And more of these tools (RDBMSs) are able to perform more jobs. There's definitely use cases for correlating parts of documents with traditional relational data. Quite often there's no need to choose between a relational database engine and a document database engine when the engines support both and more.
Relations are, themselves, values too, and relation attributes can therefore be declared to be of another relation type. Such attributes are called 'Relation-valued attributes' (RVA's for short).
In the RA, two operators are available that allow us to manipulate relations in connection with RVA's : GROUP and UNGROUP
"
Like I said, I'm a bit out of my depth here so take the above as evidence rather than proof that such things existed, but I'm pretty sure I saw this, hand-drawn, in one of Codd's original papers.
.
Edit: you are right
"Codd proposed a normal form thathe called first normal form (1NF), and he included a requirement for 1NF in his definitions for 2NF,3NF, and subsequently BCNF. Under 1NF as he defined it, relation-valued attributes were “outlawed”;that is to say, a relvar having such an attribute was not in 1NF."
No, it doesn't mean he's right. The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.
No one has provided convincing evidence that Codd intended to exclude nested tables entirely. People seem to be conflating i) good database design, as suggested by Codd ii) the feature-set of a DBMS, also as suggested by Codd.
> The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.
I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places in your base relation variables (aka tables in SQL). So you can have repeated values or nested relations in queries, and you can have them in base tables which are morally normalized, as long as there's no possibility that these lead to the same information being recorded in two distinct places.
When people talk about 'denormalizing' and it's justified, I think it's breaking this rule about representing information in two or more places in exchange for performance. If you do this, the application programmer has to be careful to keep these multiple locations in sync - a kind of consistency you don't have to think about in a clean database design. I think that database management software in general cannot enforce normalisation - it can only make it easier or more difficult to use it with normalized databases.
In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.
(Posted under a different account because I'm being slow-posted again by HN)
> In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.
Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in. Unfortunately I've been able to find pretty well nothing to describe what it would look like and how it would be done. If you can provide any links that would be so incredibly helpful!
It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.
I'm a bit fuzzy, but I think Vertica allows duplicating tables stored in multiple orders - then I think the appropriate version is picked automatically by the query optimiser. So this works not that differently to an index (which is also dbms managed performance denormalization).
There's also materialized views - if you have automatic incrementally updated materialized views, which are transparently substituted into queries, that's along these lines. I think there's a lot of progress being made here, and plenty of compromises used in the field that have been in production for a long time.
I think there's some ambitious work on materialized views being done in postgres.
> It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.
I don't think they ever convincingly got into the details on it.
> Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in.
> I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places
This is true for the second and higher normal forms, but not for first normal form. First normal form is about eliminating nested tables, not about eliminating redundant data.
> No one has provided convincing evidence that Codd intended to exclude nested tables entirely.
See Codds original paper (linked in a sibling comment) section 1.4.
Note that the relational algebra developed by Codd does not support querying nested tables, which would make them practically useless, even if allowed.
Can anyone elucidate? Please don't shout that I'm wrong because there was something there in his first paper.