Hacker News new | past | comments | ask | show | jobs | submit login

On Postgres I prefer to use a single sequence to generate ids across all tables. This reduces the chance of accidents (eg accidentally deleting the wrong thing with id #123) and reduces information leakage ("oh, I see I'm customer #5, you must only have 4 other customers").



Another way I've seen done to preserve uniqueness where order leakage doesn't matter is prefixing IDs uniquely per table. Something like uid147... for user IDs pid12... for post ID... It is certainly helpful when debugging and isn't hard to write a test to verify prefixes aren't duplicated across tables.


That's not a bad option, but I think I'd find locally-unique integer IDs spread across tables to be a little confusing. In that case I think I'd lean towards UUIDs, which may be a little easier overall, or use separate sequences but expose IDs via Hashids in my API/frontend: http://hashids.org


In Postgres, having multiple tables share a single ID sequence is trivial thanks to SEQUENCE/NEXTVAL.


UUIDs are easier because they immediately scan as random values when you look at them. They are a clear indicator of surrogacy in a way that a spread-out sequence is not.

It doesn't matter how difficult picking a sequence is (and your post is odd because I don't think I gave any indication of unfamiliarity with Postgres?). It's a cognitive concern with keys that look simple but are not obviously unrelated (as a normal sequence is by convention or UUIDs are by definition) to the rest of the table.


Sure, but Postgres doesn't to my knowledge support a cross-table UNIQUE index to insure your sequences are followed in all cases. I imagine that all it takes is one confused junior developer to add a bunch of manually IDed data by hand to create some weird and confusing corner cases in this scheme.


You can define the field as GENERATED ALWAYS, though, in which case specifying the value explicitly in INSERT won't work (there's an escape hatch, but it has to be explicitly and very verbosely invoked).


What is the advantage over UUIDs? The author says that the surrogate keys should never be exposed outside of the database anyway, so no problems with data leakage. If an artificial key is to be exposed then it can be obfuscated to prevent information leakage.

I think it should be obvious that it is obfuscated, though, like Youtube video IDs, for example. If you generate integers then your users might assume they are not obfuscated. Eventually a customer will get some low value integer and might assume that they are "customer #5" even if they are not.


Using a global sequence means 8 bytes less storage per row (including index entries and all foreign keys which refer to the PK), and more importantly typically mostly sequential inserts into B-tree indexes which means less bloat and faster insert and access.

Disadvantages are data leakage and, theoretically, minor contention on accessing the sequence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: