Hacker News new | past | comments | ask | show | jobs | submit login

I don't really understand your comment about it always being join x+n that causes the problem instead of join x. What is different about later joins that is not true when the later join is the current join, when you get around to writing it? If joins for tables T1::T2 and T2::T3 are 1:1, they don't magically become 1:n when you do T1::T2::T3.

To do it right, you'd need to mark up the relations with expected cardinality (effectively unique constraints), so that the DB would be able to verify ahead of time whether a join is going to be 1:1 or 1:n. That would be a better solution than a join working fine up until the cardinality expectation is violated and it suddenly stops working.

If we had to stick with SQL syntax, it might be something like:

   select a.*, b.*
   from a
   join one b on b.id = a.b_id
(Unique constraint from primary key on b.id. It may still filter down the set of rows in a, but it won't duplicate.)

Or:

   select a.*, b.*
   from a
   join many b on b.some_key = a.some_key
(No unique constraint on b.some_key)

I think the interesting cardinality distinctions are 1:1, 1:n and 1:0.

(The above syntax is ambiguous with aliases, so it wouldn't fly as is. But it gives a flavour.)




So the idea is that when you write the queries with these explicit join requirements, you'll want the query to fail if someone changes the schema later in such a way that the requirement is no longer met? If so, it seems like something that comes up so infrequently as to not warrant the noise, but maybe I'm still misunderstanding.

What I was saying was that often people have T1::T2 (1-1) and then throw in a later join T2::T3 (1-many) without considering the implications for the T1 (maybe they were doing a count(T1.id)). At least that seems to be an aspect that confuses developers newer to sql (that the many join is duplicating other parts of the data).


It comes up quite a bit if you have repeated data in batches where the batches are a first class concept. Any relations need to be qualified by both normal fk id and batch id, or else you bring back across all batches.

Think something like measurements samples that come in an envelope with metadata, or account updates that come in a statement, that kind of thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: