Thanks for a polite disagreement, but I believe you are wrong (not saying you ar...

ogogmad · on Sept 16, 2023

No, it doesn't mean he's right. The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.

No one has provided convincing evidence that Codd intended to exclude nested tables entirely. People seem to be conflating i) good database design, as suggested by Codd ii) the feature-set of a DBMS, also as suggested by Codd.

jimwhite42 · on Sept 16, 2023

> The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.

I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places in your base relation variables (aka tables in SQL). So you can have repeated values or nested relations in queries, and you can have them in base tables which are morally normalized, as long as there's no possibility that these lead to the same information being recorded in two distinct places.

When people talk about 'denormalizing' and it's justified, I think it's breaking this rule about representing information in two or more places in exchange for performance. If you do this, the application programmer has to be careful to keep these multiple locations in sync - a kind of consistency you don't have to think about in a clean database design. I think that database management software in general cannot enforce normalisation - it can only make it easier or more difficult to use it with normalized databases.

In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

6345dhjdsf · on Sept 16, 2023

(Posted under a different account because I'm being slow-posted again by HN)

> In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in. Unfortunately I've been able to find pretty well nothing to describe what it would look like and how it would be done. If you can provide any links that would be so incredibly helpful!

It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

jimwhite42 · on Sept 16, 2023

I'm a bit fuzzy, but I think Vertica allows duplicating tables stored in multiple orders - then I think the appropriate version is picked automatically by the query optimiser. So this works not that differently to an index (which is also dbms managed performance denormalization).

There's also materialized views - if you have automatic incrementally updated materialized views, which are transparently substituted into queries, that's along these lines. I think there's a lot of progress being made here, and plenty of compromises used in the field that have been in production for a long time.

I think there's some ambitious work on materialized views being done in postgres.

> It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

I don't think they ever convincingly got into the details on it.

6345dhjdsf · on Sept 16, 2023

> So this works not that differently to an index

Yes, it's pretty much the same as a covering index is used

> There's also materialized views

Ah yes, that's pretty much the answer (if incrementally updated). Thanks.

smaddox · on Sept 16, 2023

> Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in.

Sounds a bit like Noria: https://github.com/mit-pdos/noria

bazoom42 · on Sept 16, 2023

> I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places

This is true for the second and higher normal forms, but not for first normal form. First normal form is about eliminating nested tables, not about eliminating redundant data.

bazoom42 · on Sept 16, 2023

> No one has provided convincing evidence that Codd intended to exclude nested tables entirely.

See Codds original paper (linked in a sibling comment) section 1.4.

Note that the relational algebra developed by Codd does not support querying nested tables, which would make them practically useless, even if allowed.

_a_a_a_ · on Sept 16, 2023

> No one has provided convincing evidence that Codd intended to exclude nested tables entirely

Erm, my last para strongly suggests that he did?

"Under 1NF as [Codd] defined it, relation-valued attributes were “outlawed”;that is to say, a relvar having such an attribute was not in 1NF."

(but see @jimwhite42's comment)

bazoom42 · on Sept 16, 2023

When in doubt, check the primary source: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

See section 1.4 about eliminating “non-simple domains” (which means nested tables) through a process of normalization.