Hacker News new | past | comments | ask | show | jobs | submit login

It's an interesting article with interesting ideas.

I'm squarely in the camp of using natural PKs until there is good reason to use surrogate PKs.

I generally disagree with the notion of using a combination of surrogate and natural keys. In SQL, a PK isn't just another unique key: a PK an important block of communication.

As a rule of thumb, when a PK is attached to a semantic value, it is saying that this is the identifier of the table. If a PK is on a surrogate key, it is saying that there is no good unique identifying value in the table. When you are dealing with larger data sets, this distinction isn't minor, as it helps to understand the intent of the data when working with it. PKs serve as guideposts in your design along with guideposts to the person who has to maintain (or fix) your database later on.

I know some disagree with me on that, but there are many undeniably good reasons to use a natural key. A good place is a check-constraint table, where you say, have a list of US states and you want to ensure that "New York" and not "New Yoerk" inserted into the state column of an address table. Put a PK on valid_state_names and FK to the PK from addresses.




> If a PK is on a surrogate key, it is saying that there is no good unique identifying value in the table.

Almost always there is no good unique unchanging identifying value on a table. Take something as simple as "Person" -- there is literally no unique unchangeable value for such an entity. And that's the rule more than the exception.

I take exactly the the opposite approach; nearly all entities should be identified with a surrogate key. You pretty much cannot go wrong with this approach.

In the US state names are pretty stable, but in other countries state/province names have changed so even that value is poor choice for a primary key.


Yes, each set of data should reflect the data you want to use. That's why I said I default to natural keys unless there is good reason to use a surrogate key.

If I have to compare databases I've worked on over the years, the databases that are all surrogate will have many more data errors than a database that is predominantly natural keys. I could write a book on why that happens, but if you don't know relational theory cold (not saying you don't), you should just use natural keys. The short answer is that using all surrogate keys is equivalent to not using unique constraints at all.

Data is bound to change at some point in the future. The entire point of having constraints is to ensure that your data keeps its integrity, now and later on, no matter how the data needs to be updated.


I'm working with a 3rd party database right that used more than a few natural keys and it's awful. The primary key for "users" is username so we can't rename our users to improve our company's username security policy. And that's just one of the problems. We'll be offline for a full 24 hours in April to handle changing the value (for necessary business reasons) of the natural primary key in the client table.

I'm all for constraints, especially uniqueness constraints, where appropriate. You can use unique indexes; you don't have to use natural keys as primary keys. Data is bound to change, so make that change possible.


A badly built database isn't the reason to toss out good practices.

It's actually quite obnoxious to see how bad many databases are, and this often causes a blow-back of tossing the baby out with the bathwater. It's a trap.

I'm not really sure what all the problems are, but if changing username to something else is causing a 24 shutdown, there are many deeper issues than a PK on username.


> A badly built database isn't the reason to toss out good practices.

It's a good reason to toss out bad practices like, for example, using natural keys as primary keys. Which is, almost universally, considered a bad practice. It is, in fact, the bad practice that causes the most pain in this bad database.

> if changing username to something else is causing a 24 shutdown, there are many deeper issues than a PK on username.

This is true; the software is so old it pre-dates most modern RDBMS features and that is a factor in the database design. And it contains a few decades worth of data. And the process to go through the all records and re-do every foreign and primary key value is expensive (through the app server) but is a well-tested path. None of this would be necessary, however, if they had used surrogate keys for these tables. They did use surrogate keys for other tables so it wasn't a technical limitation.


I'm curious what database this is. Even mysql supports ON UPDATE CASCADE back to 5.5 (or earlier?).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: