Hacker News new | past | comments | ask | show | jobs | submit login

Using IDs may lead to a performance gain if it leads to a smaller database: less bytes to read/write/cache => less I/O volume, and better cache hits compensating the potentially higher amounts of seeks on mass storage. This is especially true when seeks are quick (SSD...).

Therefore as long as the size data type (C language's "sizeof") used for an ID is inferior to the average size of the column contents then using an ID will very probably lead to a performance gain.

On some DB states and usage patterns (where commonly used data+index cannot fit in RAM: the caches (DB+OS) hit ratios are < to .99) this gain will be somewhat proportional (beware of diminishing returns) to the value of the ratio (total data+index size BEFORE using IDs)/(total data+index size AFTER using IDs).

Creating queries then becomes more difficult (one has to use 'JOIN'), however there are ways alleviate this: using views, "natural join"...

Some modern DB engines let you put data into an index, in order to spare an access to the data when the index is used (Postgresql: see the "INCLUDE" parameter of the "CREATE INDEX"). As far as I understand using a proper ID (<=> on average smaller than the data it represents) will also lead to a gain(?)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: