Hacker News new | past | comments | ask | show | jobs | submit login
UUIDs are popular, but bad for performance (2019) (percona.com)
256 points by timeoperator on Jan 8, 2022 | hide | past | favorite | 232 comments



This article talks about random IDs leading to page thrashing, and MySQL b-tree indexes not handling them well. They are bad for _MySQL performance_.

It doesn't talk about NoSQL or sharding, where random IDs usually perform much better than sequential due to a lack of hot shards.

If you distribute your reads and writes at random across N machines you can get ~Nx performance vs one machine. If you make your writes sequential, you'll usually get ~1x performance because every insert goes to the same machine for this second/minute/day, then rolls to a new one for the next period. There are sharding schemes that can counter this, but they require insight into your data design before implementation.


Sure, but each individual machine still has to do the same slow random lookups, right? Generally you want some deterministic component (for caching) and some random component (for sharding). Snowflakes work well for this, since you can use the upper bits for predictable caching and the lower bits for random entropy.


its a MySQL blog, why would it talk about NoSQL?


I don't know if you've actually been to their blog recently, but Percona's community includes NoSQL as they maintain their own distro of MongoDB--similar in spirit to their MySQL and PostgreSQL offerings.

https://www.percona.com/blog/category/mongodb

https://www.percona.com/software/mongodb


Even in MySQL - one can use Sequential UUIDs.


I never understood why people would use sequential UUIDs. That rather defeats the purpose of a UUID. If you need something sequential then just use a much more simple number


It depends on how the non-sequential part is derived.

IIRC this is often a mix of hardware network address and a portion that is either random or based on a high-precision time, so it is still unlikely that you'll see collisions between machines.

MS SQL Server's NEWSEQUENTIALID function returns something akin to a v4 UUID (fully random, aside from variant indicator bits, I'm not sure if the variant bits are respected in NEWSEQUENTIALIDs output or if it just returns a 128-bit number) but after the first is generated the rest follow in sequence until the sequence is reset (by a reboot). Assuming the variant indicators are present, there is are 122 random bits. Even if your system is up long enough to use 2^58 (2.8810^17 if you want that in decimal) IDs generated this way, you still effectively have 64-bits of randomness even if the variant bits are present. For most systems the chance of collision with sequential UUIDs is, while larger than with other types, still so small as to be inconsequential. These are "atoms in several galaxies" level numbers.

You don't want to use sequential UUIDs in place of v4 UUIDs where security matters, or course, as it is easy to work the next in sequence.

> If you need something sequential then just use a much more simple number*

Sequential isn't their only property. UUIDs, including those that increment from the start point, are intended not to collide with those generated elsewhere.

Sequential UUIDs are a compromise - giving away a small amount of collision protection in order to gain what could be a significant efficiency boost in some circumstances (DB indexes being the main one).


https://docs.microsoft.com/en-us/sql/t-sql/functions/newsequ...

Each GUID generated by using NEWSEQUENTIALID is unique on that computer. GUIDs generated by using NEWSEQUENTIALID are unique across multiple computers only if the source computer has a network card.


Because simple integers are not universally unique, a major feature of UUIDs…


UUIDs are integers, they just happen to be 128 bits long instead of the more common 32 or 64, and they are usually printed in a specific way that differs from how integers are usually printed. But that's just smoke and mirrors.

UUIDs are generally not guaranteed to be universally unique. The name is misleading marketing.

If you generate them randomly, then the probability of collision is small enough not to matter, but the more you introduce deterministic elements, like generating them sequentially, the more your probability of collision increases.


Yes, of course. But making them sequential does not take away from the universal(ish) uniqueness


Meant ordered here, not sequential.


Is this sarcasm? If you generate an incrementing-UUID then its predictability is going to make it not-universally unique too


No. Incrementing an integer might be unique for the local database, but not unique for the universe.


That is…not what universally unique means.


From RFC4122: "A UUID is an identifier that is unique across both space and time, with respect to the space of all UUIDs."

Hard to achieve if everyone starts from 00000000-0000-0000-0000-000000000000.


Right that’s why nobody starts at 0. And you don’t have to only add 1 to be ordered, no DB I’m aware of just adds 1 to sequential UUIDs. There’s always randomness/entropy involved.


Sequential UUIDs don’t start at 0. They are a 128-bit composite of two integers; a temporal component in the high order bits and a random component in the low order bits.


UUIDs are just 128-bit simple integers.


Interestingly, for other systems you sometimes want the exact opposite: for your key space to be distributed across indexes to balance the load (vs wanting them all to hit the same “hot” index for MySQL).

For example, Google’s Cloud Firestore is bottlenecked to 500 writes/second if your key is monotonically increasing (like a timestamp or these timestamp-based UUIDs) causing you to “hotspot” the index: https://cloud.google.com/datastore/docs/best-practices#high_...


Yeah S3 has similar performance issues where accessing objects with the same prefixes has lower throughput because they get sharded onto the same server. It's very counterintuitive when you're used to how performance works on single computers where you want to optimize for cache-locality.


This used to be the case, but it's not true any more.

For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...


The doc is talking about how the full prefix is now part of the sharding, in contrast to previous times when only the few characters mattered.

But if you put many files with the exact same prefix - even sequential dates - then you hit the threshold the doc says, about 5000 ops/sec.


I re-read your comment and you do say "same" prefixes (I think I read it as "shared"), which AWS hasn't changed the behavior of (AFAIK). You're right that objects with the exact same prefix are still routed to the same shard (set?) and have that throughput limit.

P.S.: In a few projects I've built prefixes using timestamps, but not at the very beginning, and worried that they weren't getting sharded out. The change I linked to fixes that problem.


Ugh really? Why would they not hash the whole filename for shard assignment?


In these systems you typically can list blobs based on prefix. If you shard blobs sharing prefix to different nodes that becomes difficult.


Because the file name includes the "directory" prefix, i.e. each file's name stores the `/entire/bucket/dir/tree`, which can get large.


The size of the input doesn't affect the hashing.


It affects the runtime of the hashing.


Is it that they truly want it distributed across the key space or distributed across the shards?

With MySQL, you're writing to a single box. Because of that, you'll want to be on hot pages. With Firestore, you're writing to many boxes. If you hotspot with Firestore, you're only writing to one box. If you write completely randomly to Firestore, maybe each box is bottlenecked to 400 wires/second (rather than 500) because there's no locality on each box. But if Google isn't charging you based on locality, there's no penalty for you since it'll scale up to more boxes (invisibly behind the scenes).

If you're only writing to one box, might as well take advantage of locality. If you have the opportunity to write to many boxes, you don't want all of the load going to one box.

However, one could imagine a distributed system where you could try and get both. Create an ID that has the ShardId to route it to the correct box and then something sequential after the ShardId. If you had 1,000 shards and 25 boxes, each box would be responsible for 40 shards so you wouldn't be perfectly sequential, but you'd kinda end up with 40 hot pages rather than randomly writing all over the place. It would also give you ample room to scale up your cluster.

So there is room to apply this technique to a distributed system as well.

Oh, one thing I'd note is that MySQL's InnoDB uses index-oriented (clustered) tables as they note in the article (which is why this is important with MySQL). PostgreSQL doesn't use index-oriented tables so new rows are (I believe) just written sequentially by default regardless of ID. It will have to update the primary-key index, but because the PK will be a lot smaller than the whole rows, you probably don't have to worry as much about randomly writing there. It's easier to keep an index hot than to keep a whole table hot.


Your other option is to hash your monotonically increasing numbers. You don't want to use something like SHA, as it does not give good distribution. You use something like murmur hash (https://en.m.wikipedia.org/wiki/MurmurHash). I've used it before for indexes at Google for values that may hotspot. See Google's impl here: https://github.com/google/zetasketch/blob/master/java/com/go...


> You don't want to use something like SHA, as it does not give good distribution

How come? Even distribution is one of requirements for crypto hash functions.


You know, likely am wrong here. I think murmur is good because it had a good distribution and is much faster than SHA? I did find someone that compared murmur to other non crypt hashes here: https://softwareengineering.stackexchange.com/a/145633. I'd love to see this compared with sha1.


Can you elaborate on the claim about SHA? If I'm not mistaken, all these cryptographic hashes have a 50% chance of flipping any one output bit upon changing a single input bit -- is there some hidden gotcha I'm unaware of?


It’s about random distribution. If you output an array of the key space as a bitmap, it should look like white noise. SHA doesn’t look like white noise. There are certain parts of the space it rarely goes with certain classes of inputs (numeric, usernames, etc) while murmur and md5 give that white noise look.


That is generally true for most if not all NewSQL databases, as far as I know. Spanner, CockroachDB, TiDB, … Maybe even Aurora?

The second argument on secondary indexes’ size still holds water, if it matters for a given application.


Isn't this easily solved by supporting 128 bit keys and using UUIDs as intended, i.e. as integers and not in their string serialization? This is as nonsensical as storing IPv4 as strings instead of 32 bit integers.


It should be noted that some database providers already provide a UUID data type which is a 128bit integer.

https://www.postgresql.org/docs/14/datatype-uuid.html


Length isn't the primary issue. Locality is.

UUIDs are generally generated randomly. This results in terrible insert performance into B+tree-based indexes, and terrible (= no) lookup locality with basically any database. In a large table, successive entries ends up in separate disk pages.

Even with time-based UUIDs, the time fields are ordered backward, which produces the same issue.

One way to fix this (beside the methods outlined in the article) is to create time-based UUIDs with fields sorted the other way around. Or reverse the time fields during indexing (slower, but still not as slow as an unnecessary disk read!).


Firestore has exactly the opposite problem - indexed sequential values put pressure on the "last tablet" and require the same tablet to repeatedly split. This becomes the limiting factor on insert volume. Random generated keys are better, because they spread inserts across multiple tablets (ie servers).

I don't know for certain, but I suspect DynamoDB and most other databases that can trace their origins to the bigtable whitepaper have similar behavior.


Indeed, traditional rdbms are generally optimized for patterns with high locality. High locality patterns with NoSQL and NewSQL style dbs often end up causing one node in the system to receive disproportionately high load that defeats attempts to spread operations across the cluster.


DynamoDB runs the partition key through a hash function, so sequential values end up being evenly distributed across partitions.


If so, how does it iterate in key order?

I should clarify - the problem with Firestore is the index tablets. The indexes inherently need to be inorder or you can't perform ordered queries.


It can't. You can only perform range queries and the like on "sort keys." All sort keys are on the same partition, so has limits on storage size and throughput.

Partitioning is very explicit in DynamoDB, for better or for worse. Harder to shoot yourself in the foot, but also limits what you can do.


I can't say I've had much issue with this in postgres, but even still, this is easily worked around by having a pkey/cluster key of a monotomically increasing int--which is useful for other things such as data migrations, as you can pick defined start and end targets without worrying about new things being inserted in between.


Pg does not cluster, so the only impact of a uuid is that you’re inserting in a random location location of the index, which is not great but not the end of the world (probably).

InnoDB clusters on the PK by default, so when you're inserting a UUID you're not only inserting in the middle of the index (on average) you're also inserting in the middle of the table.

And I don't know how much the on-disk storage has been optimised for this sort of things, but if the answer is "not" and sparse pages are not really a thing, you might need to rewrite half the table in order to do so.


It's going to depend on the database. This article is primarily focused on InnoDB and goes over the read/write locality issues with a uuid.

That isn't always going to matter (or can be a very good thing). For a KV store like dynamodb a uuid is a great key because it'll distribute nicely across your shards.

It also depends a lot on your query patterns. If you have a uuid primary key but you're partitioning by some other value you may end up significantly reducing the number of pages you have to look through.


> Or reverse the time fields during indexing (slower, but still not as slow as an unnecessary disk read!).

When you're IO bound, I'm not sure it actually is slower in practice.


Depending on caching strategy, using random UUIDs also decreases the maximum working set size before you become I/O bound by a huge factor. Postgres e.g. relies heavily on the O/S block cache and is subject to this.


> using random UUIDs

I wasn't discussing using random UUIDs.


there is new uuid versions (draft) that solves the locality problem: https://datatracker.ietf.org/doc/html/draft-peabody-dispatch... (one of the new ones, cant remember which exactly)


That helps with storage, but still is larger than a bigint, and doesn't help with the random distribution of data. I believe newer versions of MySQL have a data type for this.


MySQL doesn't have a UUID type, nor 128-bit int type. But MySQL 8 does provide functions UUID_TO_BIN and BIN_TO_UUID, which make it trivial to convert between the human-readable string representation of a UUID and the more efficient BINARY(16) representation.

MariaDB now has a native UUID column type as of MariaDB 10.7. This is brand new -- 10.7 had its RC release in Nov, not GA yet but very soon I'd imagine.


If you use a type1 UUID, you shouldn't have quite so random distribution, particularly if there's only a single machine generating the UUIDs (which admittedly, kind of misses the point, but alas is done all the time).


Well, you probably shouldn't cluster on a UUID, but it's not really a great idea to blindly cluster on a sequential id, either. You should cluster based on how you query the data to minimize I/O.


That's only part of it, the major problem with classic UUIDs are that writes occur randomly across the entire keyspace. Whether it's represented as a string or in binary, ordering does not change, and so the underlying indexing method must cope with these random orderings at insertion time.

At least btrees perform much worse with random insertions. I don't know how much impact it has on LSM


LSMs and fractal trees handle random inserts much better. But you still lose locality of lookups: items generated near each other time-wise are not correlated on disk.


I'm kind of shocked MySQL doesn't do this?


Experience has taught me to never be surprised when you learn that MySQL does something incorrectly. Having used it since v3.3 it's not unusual.


They do test a binary(16) field which is basically what you're talking about (and you could use a binary(32) for a 128-bit UUID).


16 bytes is 128 bits. Why would you use 32?


In terms of space usage and its impact on the buffer pool, storing as 128-bit integers (if your DBMS supports it, which MySQL does not) is essentially equivalent to the bin16 case in the first "UUID insertion rates" graph. Performance still falls off a cliff, just a little bit later. As the author says, "The use of a smaller representation for the UUID values just allows more rows to fit in the buffer pool but in the long run, it doesn't really help the performance, as the random insertion order dominates."

If you look at the fastest approach the author tried in the second "UUID insertion rate" graph, it actually stores a modified UUID in a 36-character string. There is a missed optimization to store the modified UUID as 16 bytes, rather than 36, but imposing some order on the keys is the dominant improvement.


You know what's nonsensical? Assuming one rule of thumb applies everywhere. I will stand by that rule of thumb until I'm cold and in the ground.


Not to mention the string serialization is also ugly...


If it's ever being turned into a string and stored or transmitted it's always better to use a standard base64 encode to keep the string to 22 chars instead of the standard 36/38 chars.

You can also use base85 and go to 20, but you get into some funky chars there.


I recently read a book by Google’s head guy on API design that was specifically about designing APIs and it had a big section on what makes a good identifier and why people reach for UUIDs and why specifically it is a problem on multiple levels.

The thing that he ended up recommending however was super interesting in that I had never seen it mentioned before but it was basically to use this instead http://www.crockford.com/base32.html


I see this around the crypto world a lot lately (notably for Substrate/Polkadot addresses): base58 https://tools.ietf.org/id/draft-msporny-base58-01.html

Seems to have the same idea of "human-read-write without visually identical characters" but with an expanded set for shorter string length.


I use crockford 32 to _represent_ my UUIDs, but they are obviously stored as binary. Is the only problem with UUIDs that sometimes they get stored as strings?

The only issue I've had with UUIDs is when they don't sort in increasing chronological order. RDBMSs don't appreciate high insert loads into random points in the index. Take care of that, however, and they're a treat.


ULIDs or time prefixed UUIDs are other options.


The linked article discusses the issues - the storage size when encoded as char, as 16byte values, as well as the impact on read/write locality.


Where does it discuss any of this? It compares base 10/16/32/64. There’s no mention at all of UUIDs or GUIDs.

(Or do you mean the original article, rather than the linked Crockford article?)


I mean the original.


Yeah I seem to recall he also mentioned that as an approach.


This format was discussed in a HN first page post just this week:

> This alphabet, 0123456789ABCDEFGHJKMNPQRSTVWXYZ, is Douglas Crockford's Base32, chosen for human readability and being able to call it out over a phone if required.

https://news.ycombinator.com/item?id=29794186


People in my industry in my country has specifically avoid using B and D together as they sound too similar over the phone.

Also 2 and Z can be similar in writing.

However it is nice to not see 0 and O, 1,I,l in the same string.


F and S sound similar over the phone, at least on POTS landlines, as they don't carry the higher frequencies (> 4 kHz) that distinguish the S from the F. Note that cat names tend to have S sounds.

POTS = Plain old telephony service is restricted to a narrow frequency range of 300–3,300 Hz, called the voiceband, which is much less than the human hearing range of 20–20,000 Hz [from https://en.wikipedia.org/wiki/Plain_old_telephone_service ]


Anybody who has to relay things like API or CD keys over a POTS line on a regular basis quickly learns the NATO phonetic alphabet.


If you're worried about clarity over the phone, you should look into the NATO phonetic alphabet: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet


I prefer to use Aeon, Bdellium, Czar, Djinn, Eye, etc.


The bomb defusal scene in Archer was an absolute classic for this. https://youtu.be/_4jxLxZrMfs


> Djinn

Fun fact: dzs counts as a single letter in Hungarian (e.g. in alphabetical ordering).


Quite a challenge for non-English crowd.


You still have to know that 0 is 0 and not O, and that 1 is 1 and not I or l.


but if mistake is made, and you wrote down L instead of 1, and sent me in a e-mail. I, knowing that it is crockford 32, would easily deduce what mistake was made.


Right, I didn't realize the decoder is specified to be lenient in that way, so the confounded characters are actually equivalent in the encoding.


Yeah when he lays out the arguments for it in the book you can clearly see why it makes a huge amount of sense. The usability, the performance, the value of a checksum etc…


Looking for that quote I can't find it on that page.


I'm a big fan of Crockford-flavored Base32, as in $dayjob the folks who did some of the fundamental engineering work for the product I work with decided 10+ years ago to use Crockfor-flavored Base32 as the way to expose most user-visible IDs. Things I love about it are all mentioned in Crockford's spec, but I'll restate them here:

- It's case-insensitive allowing data encoded with it to survive going through most random line-of-business applications which may have errant UPPER() or lower() calls somewhere in their depths, as well as making it easy for humans to talk about out-loud (no need to specify case makes it easy)

- Making all the "pillar" shaped characters (I, i, 1, l) equivalent and all the "donut" shaped characters (o, O, 0) equivalent means big swaths of human typing mistakes are avoided. After so much exposure to Crockford Base32, I now loath having to decipher "is it a capitol 'I'? Or a lowercase 'l'?"

Overall, it's a great way to encode any user-facing ID; just make sure you understand that just using this encoding won't stop users from realizing that Crockford Base32 encoded IDs are sequential. If you want sequential IDs to appear non-obviously sequential after encoding you'll need to use additional techniques. Assuming Crockford Base32 obscures sequential IDs is the one case where I saw someone do something they shouldn't as it directly relates to Crockford Base32.


> (...) but it was basically to use this instead

Base32 is a representation format which provides a textual representation of numbers, not a storage format.

I mean, anyone is free to dump Base32, or even base 2, into a string and just run with that, but that would be highly inefficient.


Oh wow thanks! I was familiar with Zimmerman's base 32, (z-base-32, https://philzimmermann.com/docs/human-oriented-base-32-encod...) but not Crockford's.

I just added it to the "Other projects" section of my base converter. https://convert.zamicol.com/


ULIDs are base 32 as well. https://github.com/oklog/ulid


Is the spec not a better link than a go implementation?

https://github.com/ulid/spec


What was the book?


https://www.bookdepository.com/API-Design-Patterns-JJ-Geewax...

I loved it, think it all made a ton of sense. Lots of code samples, no weird technology choices (normally they would do it all via protobufs and gRPC but they keep the same principles and just use HTTP instead and Typescript for code samples)


Glad to hear ! :-)


https://livebook.manning.com/book/api-design-patterns/chapte...

Section 6.3.3 gets to the point about base32


We use crockford tokens heavily where I work, highly recommend them. We typically use a prefix to denote the type of thing being identified, like “U-“ for User followed by a 10-15ish long crockford token. Works great.


Would you happen to have a link to the book?




Which book?



Bad for performance as primary keys.

But, still provide strong value as a unique identifier which is what makes them popular.

I’ve used integers as primary keys, with UUIDs as alternate keys for external-to-the-data-store queries.


"Bad for performance as primary keys" -> "Bad for performance as primary keys in MySQL". This isn't an issue in PostgreSQL and perhaps the lesson here is that as you scale, you need to understand more about the internals of the DB system you've chosen. This isn't limited to RDBMS as it's pretty easy to show trade-offs in choosing a NoSQL as well.


You are right, the internals are important. When I tested this, a long time ago [1] [2], I found that randomly distributed keys on PostgreSQL were indeed faster than e.g. on MySQL. Which surprised me! I still don't quite understand why. But, even with PostgreSQL, sequential (or nearly sequential) are much faster.

[1] https://markmail.org/thread/3jzqjy6cavxcrpbq [2] https://markmail.org/download.xqy?id=3jzqjy6cavxcrpbq&number...


> Which surprised me! I still don't quite understand why.

Because MySQL (specifically innodb) will cluster by the PK by default, while Posgres will not.

Meaning in MySQL, by default, the table will be stored in PK order, so when you insert an UUID at a random location any record which sorts after it has to be moved to make room, for nothing since UUID ordering is not really relevant (at least for the current standard UUIDs).

In pg, clustering is not a property (at least for now) it's an explicit point operation, which is not performed by default. So when you insert a row whether UUID or INTEGER-keyed, it's just tacked on wherever there's free space. The UUID still has to be inserted at the right place in the index' btree which can lead to a lot of cascading changes, and it's more expensive to compute, store, and load than a simple integer (especially 32b), but it doesn't induce an arbitrarily high amount of shuffling in the storage layer of the table itself.


While the problem on the article is less of an issue in Postgres (the indexing cache locality is still there), they are still slower than the serial ones.

I don't know if you save enough problems by using them as alternative keys in Postgres for it to be faster, my guess is that just using them as primary key would be faster than a serial primary key and an UUID alternative one. Still, UUIDs are much more useful as client-facing data than as a pure database value, so I would also do that (and standardize my PK format), probably paying a performance penalty.


"Slower" is a matter of context. Like with most things, there are trade offs to allow for more concurrency. UUIDs are intended to avoid serialization with ID generation, which makes sense in a federated system. Yes, they take up more space and aren't all neatly ordered, but the trade off works out well for the right kind of systems (FlakeIDs are a better trade off for more contexts). The problem this article is describing is you're getting the worst of both worlds, because you're still serializing in the MySQL database...


They are still generally found to be slower to use than sequential primary keys in postgresql.


There are remedies for that while still keeping the advantages of UUIDs. https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...


What if you convert from MySql to postgres? Will the uuids still benefit from postgres optimizations?


Rather: bad for performance (as primary keys) when read/write in sequential order.

Great for (primary keys or anything) in a distributed/sharded database (e.g. CockroachDB), when data access is mostly by keys.


this is what i do, any public facing id - url, or js/html that would show an id would be a uuid and then the backend queries are done with primary keys / integers.


We do the same.


I do something similar. It works well.


Done the same :-)


Yeah. The main problem is people using them as primary keys in naïve systems like relational databases. You can't just expect a relational database to magically become a distributed system just by using UUIDs. There is a bit more work to do than that.


People use UUIDs for more reasons than just making things distributed. You can generate them client side if that's advantageous, you prevent leaking information about how many records there are in the system and prevent guessing of other potential PKs and potential unauthorized access, and there's some optimization strategies that benefit from not relying on a serial PK.


If you're obfuscating, then you should be using a DHT to map surrogate keys to internal keys that are more convenient for use in a MySQL index... or just stop using MySQL.


Well they shouldn't. UUIDs are for distributed systems. Generating keys client side? That's a distributed system.


I've had occasional uses for it in systems that aren't distributed at all. It's a handy property of UUIDs, if UUIDs are useful for a particular use case, I'm going to use them, what they're 'supposed' to be for is irrelevant.


> You can't just expect a relational database to magically become a distributed system just by using UUIDs.

Nobody thinks this, do they?


You’d be surprised. Unpleasantly, unfortunately.

Not sure if the term gets used much now a days, but ‘cargo cult’ programming is definitely a thing still. Never stopped.


Somebody thought RabbitMQ did load balancing because it had three redundant nodes. Pretty much ruined the company. Definitely caused all of us to lose our equity.


I’d love to hear that story.


Is it too much to ask people to explain what's wrong with this well-written and on-topic comment instead of downvoting it?


Calling relational databases (HN’s preferred storage system) naive probably sounds like trolling to most people. There are also plenty of distributed relational databases. People downvote comments that sounds like trolling or flamebait.

I use UUIDs but I don’t know why they would magically make my Postgres a distributed system. I like them because the client can generate them offline.


What? Really? Naïve in this context means a general purpose solution that doesn't "know" about your use case. Have people never heard of a naïve algorithm or solution?

Distributed relational databases aren't naïve in this context. MySQL is.


> Have people never heard of a naïve algorithm or solution?

I have a fairly recent PhD in algorithms and I haven't heard naïve used this way. When I hear naïve, it usually just means "does the immediately obvious thing".

For what you're trying to say, the term I'm familiar with is "oblivious", e.g. "oblivious routing" or "oblivious local search", occasionally with modifiers such as "cache-oblivious".


I've never heard use of the word "oblivious", but my education is 15 years old at this point.


Naive is an emotionally loaded word outside of academic communities.


"naïve systems" doesn't make sense.

Nobody is assuming simply adding a UUID transforms a system into a distributed one.

The parent comment is about exposing UUIDs probably in order to not expose the sort order of the database.


> and purely random (version 3)

This is a typo, v3 isn't random. It is generated deterministically from inputs.

> The only “repeated” value is the version, “4”, at the beginning of the 3rd field. All the other 124 bits are random.

And this is close but not quite correct. UUID v4 has a couple of other fixed bits, there are only 121-122 random ones. There are patterns in the text representation other than constant numbers. :)


This blog post is over two years old, that 3 vs 4 error is inexcusable.


I'm sure I take bigger penalty hits for less, sorry but UUID's "click" in my head and they prevent a whole slew of foot-guns. It would be one thing if auto-inc was the same as UUID but there are really annoying things with auto-inc (not knowing the id until after insert being top of mind) and also you can generate UUID's client-side/offline if needed. Yes, I know some people argue for auto-inc as primary and still use UUID for client-facing but I don't understand what that really gets you and it seems way more complicated.


Typically, it gets your performance. And if performance is what you need it may be one of the simpler things you can do.


I am very much not a database person, so forgive me if this is a dumb question.

I'm reading this article and it says that UUID are compared byte by byte, and seems to be indicating they're stored as string. Is that actually the case? I would have assumed that SQL supported 128 bit ints, but this seems to imply it does not.

Another question: if a column is set to char(fixed size) do the various sequel engines really not optimise to do multi word comparisons? (e.g. 8byte at a time, then 4, 2, 1, as size requires)


Not a dumb question, I think you've hit on a key oddity here.

This article is about MySQL, apparently it's really the case in MySQL?

It's not the case in every rdbms universally. Postgres has a uuid type that stores them how you would (rightfully) expect.

I have no idea why MySQL does it this way, it does seem odd.


MySQL doesn't have a UUID type, so naive implementations use (VAR)CHAR. Those in the know use BINARY(16) and MySQL 8 now has helper functions to convert to and from hex. Apparently MariaDB will soon have a native UUID type. PostgreSQL has had them from for years.


It often doesn't matter if you use a 128bit int or a string since either way you're loading 4K/8K/16K from ssd/hdd. Database people do all sorts of silly things like storing datetimes as normalized ISO strings (2022-01-08T08:18:20) instead of using 64bit unix time. They get away with it because both the backend (durable storage) and the clients (webapps responding to a user 10ms away) are incredibly slow compared to CPUs.

That aside, Postgres stores UUIDs as a 128bit int, not as a string, so it almost certainly uses multi-word comparisons via memcmp: https://stackoverflow.com/a/29882952/703382


> Database people do all sorts of silly things like storing datetimes as normalized ISO strings (2022-01-08T08:18:20) instead of using 64bit unix time

That's super weird. I can't think of a major RDBS that doesn't have a native date/time data type, unless you count SQLite.


You can fit many more uuids encoded as 128 bit numbers than as strings in a 4K page. Hence you'll need fewer pages to store your data and that might make a difference between fetching from cache vs fetching from disk.


At least in the case of Postgres, I know from reading the source before that it stores UUIDs as 128-bit values and does optimized binary comparator operations on those values. It can export and import UUIDs as text strings but does not treat them as text internally. I'm not sure why any competently engineered database engine would do otherwise.


I recently wrote about how encoding ULIDs in the UUID format could help with some of these problems

https://news.ycombinator.com/item?id=29794186

https://sudhir.io/uuids-ulids


Hi Sudhir, Love your blog posts. Keep them coming.


A lot of things that we do for security or privacy are bad for performance, but I think they are still good tradeoffs.


Agreed. Using UUID-s for keys is useful to exclude entire classes of security issues. Most notable are many kinds of enumeration attacks.


Also entire classes of bugs. You screwed up a JOIN, or an application-level lookup for that matter? With UUIDs you'll get no results, with sequential ints you'll get a valid but wrong result.

Or worse, the right result for the wrong reason. I've actually seen a case where creating a new entity in the application populated X records in X child tables, each with a sequential ID, and as a result all of them had the same surrogate PK. They were 1:N relationships in principle, but the software wasn't feature complete yet so the actual records were all 1:1.

Years later one of those tables finally received some extra records, and it caused a really weird bug because a query had accidentally used the PK instead of the FK as a join key, but for years it had happily chugged along because the two columns were in sync.


Quit exposing your keys.


Yep, so we need solutions for using these practices in a performant way rather than hearing they are not 'good' and then having to explain over and over again why they are there. Our datasets that use UUIDs have not had issues with performance but of course we keep looking for ways to keep using UUIDs while improving performance. It would be better to provide solutions on how to do that and spend time to get performance on par with int keys. Like someone else said; 32 bit int keys are no good anyway for many cases, so let's go to 128 bit, optimize for that and everyone is happy.

Edit: like https://news.ycombinator.com/item?id=29851653


And for data safety as well.

In an environment where millions of events are processed every second, being able to uniquely identify them is a must, and temporal keys are not always an option.


Just a month ago we migrated many of the columns from integers to UUIDs (encoded as binary(16)) in several critical tables which are pretty large (Percona server, too), and so far I haven't heard about any serious performance degradation after the release.


what's pretty large for you though ? There are fields where 100k entries is a pretty large dataset and others where "large" starts at petabyte


I didn't mention that our DB setup uses sharding, and every tenant has their own DB shard (there are tens of thousands of shards). I just checked that one of the largest tenants has 2.2 mln rows in one of the affected tables, which is usually joined with 2-4 more related tables using UUIDs (another such table is 1.1 mln rows, for example), and they're on the hot code path because it's the core of the system. Maybe with sharding the difference is negligible? During code review I raised the concern that inserts and joins can become much slower after we migrate it to UUIDs, but so far my fears haven't materialized. Usually with these tables we have performance problems on the application side, not in the DB, such as ORM fetching data in a very inefficient way, or using too much RAM. Maybe it'll bite us in the long term as the tenants' shards grow in size, who knows.


A few million rows is tiny for a database, so it shouldn't be an issue in most cases. That much data will trivially fit into cache (these days, possibly even CPU cache). It probably won't become noticeable until your tables are much larger than cache memory.


> Let’s begin by the base64 notation. The cardinality of each byte is 64 so it takes 3 bytes in base64 to represent 2 bytes of actual value.

Wait, what? I thought it takes 4 base-64 digits to represent 3 bytes of data. Not 3 base-64 digits to represent 2 bytes of data.


base64 means the "vocabulary" used has 6 bits (2^6 = 64). Hence, to complete a full number of bytes (without using any padding), you need 4 b64-letters:

    4 * 6 = 24 = 8 * 3
So, you're correct... the exact amount of bytes that it takes to represent "actual" bytes goes like this:

    Actual bytes | b64 bytes required | overhead
    1            | 2                  | 2x
    2            | 3                  | 1.5x
    3            | 4                  | 1.33x
    4            | 6                  | 1.5x
    5            | 7                  | 1.4x
    6            | 8                  | 1.33x
    7            | 10                 | 1.43x
    8            | 11                 | 1.37x
    9            | 12                 | 1.33x
EDIT: As you can see, this averages with an overhead of between 33% (best case scenario where the encoding requires no padding, happens every 3 rows above) and something like 37%, decreasing with the number of bytes being encoded and approaching the minimum, 33% (e.g. to encode 1024 bytes, you need 1366 b64 digits, an overhead of 1.333984375x).


Maybe they're using a 9-bit byte? :)


Microsoft SQL Server / Azure SQL support sequential UUIDs to solve the index distribution problem: https://docs.microsoft.com/en-us/sql/t-sql/functions/newsequ...

It's better than nothing, but one of the values of UUIDs for identifiers is that you can create new ones client-side while offline. These "sequential" UUIDs will fail standard UUID validation because of the byte swapping and, in my experience, when used offline-capable apps, will result in sparse clusters of sequential UUIDs that yield an unpredictable improvement over truly random UUIDs.


I think the author is missing an overall picture, eg. Event driven scenario's.

Where you don't have to check collisions with a db. He mentioned generating the pk's on remote client, but that doesn't capture the interesting bits.

You generate the newly created object with the guid.

You send it to the API/Microservices and it's generated, fire-and-forget style. And the remote client has an Id of the newly created object to do something with.


But now the remote client has an ID of something that may or may not exist the next time they try to use it depending on whether or not it actually made its way into the database.

I've seen this kind of architecture before. It sounds nice but is loaded with consistency problems.


There's patterns for implementing it. It's not really any different than using more than 1 database which is unavoidable in many scenarios (interacting with 3rd parties)

https://chrisrichardson.net/post/sagas/2019/08/04/developing...


That's why the pattern is eventual consistency.

You receive a message "{entity}Created" and it contains the Id of the full object.


It's like, _maybe_ eventual consistency. Hopefully the client doesn't try to do anything important with the ID/new object


If the client needs to use the object right there and then, it can start polling the GET for that ID until it returns 200.

A slightly more sophisticated pattern is to have the initial creation command return an ID of the creation event. Then the client can poll that ID and just check that the event has been processed successfully. This requires the client to trust the server a little, but it means it can just poll a small, temporary, probably in-memory "recent events" table instead of hammering the main API.

A more efficient design is to YOLO, submit the second command while the first is still being processed, and trust the server to handle the commands in the right order. This is fine for backend services, unfortunately human users get annoyed at their frontend when it says "sorry you have to do the purchase finalization again because an earlier 'add to cart' command failed and I only noticed it now".


If you get a "Created" back, it is possible for it to be guaranteed that it'll be created eventually.


What if the object is deleted again? You could still have an inconsistent picture, no? And you could create and delete over and over again, so that a client has an inconsistent picture all the time.


That's what eventual consistency is, isn't it? It's definitely not the easiest thing to design systems around, but at a big enough scale you sometimes don't have a choice.

There's also many ways to "work around it", so that it doesn't seem inconsistent to the user.

I'm definitely not arguing that it's how most systems should be designed though.


That's a different design issue then - it means lack of synchronization. It is orthogonal to using uuids at a client. In fact, (properly) using uuids are a reasonable way to resolve this.


Not arguing against using UUID. Just exploring that async concept further and the challenges in that. OK 2 separate issues : )


I don't think you would ever get a 201 Created in an eventual consistency scenario. You would get a 202 Accepted. Unless the API is lying to the clients.


Created can mean different things to different people. Maybe an ecommerce order is created in the sense it's been updated in the inventory database and a record added to some sort of fulfilment system but there's still outstanding data warehouse sync jobs, some order analysis/product recommender job, and some anti fraud jobs (that just need to complete before the order picking in the warehouse begins)


Fair enough, though what I said remains true for an "Accepted" response.


But Accepted doesn't mean created.

The feedback flow is async from the creation flow.

If you get created through the feedback flow it's created.


Accepted can mean that it will eventually be created though. In the sense that it's validated and guaranteed to be created at a later point ("eventually").


Well, you would have some kind of synchronisation protocol where the database confirms it has received that record and it now exists.


People should have a look as k-sortable unique identifiers (KSUID). Binary they are represented by 20 bytes and their string representation has 27 characters, which is shorter than UUIDs since the use a base62 encoding. They are sortable since the 20 bytes start with a 32 bit UNIX timestamp followed by random 128 bits. They should be very efficient for clustered indexes / B+-Trees.

Note also that as long as you have a single central database you don't need UUIDs. They are only needed if you have several processes creating objects without coordination.


Sounds like a ULID that doesn't fit in a UUID column - https://github.com/ulid/spec


Note some of the newly proposed UUID formats would take care of some of these issues [1]. They are time-ordered but still have a good bit of random entropy.

1. https://news.ycombinator.com/item?id=28088213


The author seems to have forgotten the obvious alternative of using ints for database primary keys and uuids externally.


Performance is really bad, if you save them as strings and use those strings as keys. If you take them as 128 bit integers, it’s kind of fine.

Often 32 bit integers are anyway not long enough as a primary key, so you need at least 64 bit keys. UUID is just double the size then.


Only if you are creating persistent objects in multiple places that may not be in communication with one another, and that at some point need to be distinguished from each other, do you need UUIDs.

Even in mobile apps, which seem like the most common use case for needing them: Unless your app creates these objects while not connected, you don't need UUIDs.

If your backend database is where objects get created, you don't need UUIDs.


> The missing 4 bits is the version number used as a prefix to the time-hi field.

Why would you use 4 bits for a version number in something that's supposed to be unique? What is the benefit of following this specification despite such cost, versus creating 128 unique bits based on time / random generators / machine IDs yourself?


The ability to mix different types of ID in one column or one business data store.

When you start with random UUIDs, and then decide you actually wanted per-host-namespaced ones halfway through, if you have allocated zero bits for the version ID you're up the creek.

You're trading off present-day efficiency for future-day flexibility. Whether that's wise for a particular case depends on that case.


Because 124 random bits are still way enough to make collisions extremely unlikely (needs on the order of 2^62 UUIDs even with birthday paradox - good luck storing them all), yet they are still recognizable as being of that specific format.


Slightly related question-

does anyone know what data type Reddit uses for their post/comment id? It seems like a short alphanumeric yet unique identifier and much shorter than a uuid.

Also what does twitter use for their post/comment id? Seems like some sort of big int?


Twitter created Snowflake many years ago to generate IDs. Unsure whether it’s still in use.

https://blog.twitter.com/engineering/en_us/a/2010/announcing...


Twitter uses strings instead of ints to represent the 64 bit UID, because Javascript only supports 53bit ints instead of 64bit... https://developer.twitter.com/en/docs/twitter-ids


That sounds nice and simple, and fits in 64 bits (vs 128):

"we settled on a composition of: timestamp, worker number and sequence number. Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper"


> The remaining of the UUID value comes from the MD5 of a random value and the current time at a precision of 1us.

I might be misunderstanding something here but if your random seed is based on time, under high-concurrency, doesn't this risk collisions? I can't see any thread-safety guarantees in the documentation.[1]

[1] https://dev.mysql.com/doc/refman/8.0/en/mathematical-functio...


If the column is UNIQUE there’s no collisions it will just fail to INSERT.


Then it doesn't have the same quality as a UUID which is supposedly guaranteed to be unique across space _and_ time[1].

[1] https://datatracker.ietf.org/doc/html/rfc4122


It’s not guaranteed, it’s just extremely probable that it’s unique given that its generation follows a uniformly random distribution.


Is this specific to MySQL or does it apply to Postgres too?


As far as I can gather from this post and looking at the data type documentation, MySQL does not have a specific UUID type, but Postgres does.[0] I'll assume that Postgres has some internal optimisations to UUID that MySQL thus lacks.

Addendum: I also realise this is anecdotal, but someone on Stackoverflow mentions a significant speed up from changing `text` to `uuid` in Postgres.[1] But this also fits with what I've been told on #postgresql on libera.chat. That being said, integers would still outperform uuid.

[0] https://www.postgresql.org/docs/14/datatype-uuid.html [1] https://stackoverflow.com/questions/29880083/postgresql-uuid...


The UUID type index in Postgres is optimized: https://brandur.org/sortsupport


The Postgres UUID type works just like BINARY(16) covered in the article.


A lot of the problems listed in the post are physical issues with larger data types that are somewhat random - eg the size, how clustered indexes work, and you will have the same problems with them in SQL Server.


If you order the data based on the uuid and your uuid is randomly distributed, then you will almost always be writing the data in the middle of your table, physically. You can cut the impact somewhat by using spare tables (leaving lots of empty space) but eventually you'll be re-writing the data.

SQL Server has a sequential uuid type which avoids exactly this problem.


If that's the underlying issue, sequentiallity... Just use uuid uuid 6 or 7. They are time based and approximately sortable (unlike uuid1).

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...

Disclaimer, I have no data to back up this solves the performance problems described. It's just likely to solve that "writing in the middle of the table" part.


Why use a sequential UUID over integers as an id and then associating them with a random UUID?

Integers are a superior way to record sequential data.


Because you don't want your record IDs to be perfectly sequential data.

You only care about them being sequential enough that your database engine will write them down quickly and efficiently (for engines like InnoDB or MSSQL which have such behaviour). But as a developer you typically want them to be random so that you can generate them in a decentralised manner, prevent guessing or accidental joins, safely merge records from multiple tables, etc.

Sequential UUIDs usually preserve enough randomness for the latter (as long as you don't do stuff like generating a trillion IDs exactly at midnight), while providing enough sequentiality for the former.


But now the ID has a random concatenation of data in it, like the MAC address of the node that inserted it into the database and a timestamp.

It'd be much more orderly for each node to have a sequential ID, the ID of the node that created it, a timestamp when it was created and a real UUID v4 for if someone wants to prevent collisions. Same data, but the primary key on an individual node is a lot smaller (half the size) and the metadata is available for use if someone wants it. There is natural room to build up namespaces and such. It is better to keep these observations separate instead of munging them all into the record ID.


I 100% agree that metadata like timestamps and machine IDs, if you care about them, should have their own columns.

I only disagree with:

> It'd be much more orderly for each node to have a sequential ID

I've outlined some reasons above why a random ID has advantages over a sequential one.

I run Postgres in prod so I can use purely random UUIDv4, but for those running mysql or mssql, UUIDv6 and the like are a useful compromise between sequentiality and randomness. The fact that they happen to use timestamps as the source of sequentiality is only an implementation detail.


> SQL Server has a sequential uuid type which avoids exactly this problem.

you refer to the uuid generated by sql server?


Yes. When generating you can use NewSequentialId() which generates a sequential guid to avoid needing to reorganise indexes.



There are several problems, one of which is also ergonomy.

I've only read it partially (it's a very interesting read nonetheless), however, this is a key point (italic mine):

> Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored six times for each row. That means a total of 6B char(36) values representing 216 GB.

It assumes that MySQL users store UUIDs as CHAR(36), which is very wasteful, since an UUID actually requires 16 bytes (128 bits).

Now, one can store UUIDs a binary blobs in MySQL, however, they are not human-readable, so one tends to store them as CHAR(36) instead, wasting 20 bytes per entry (in total, requiring 2.25 times the strict necessary).

By supporting UUID as native data type, the storage can use the strict necessary amount of bytes, but still maintain readability, because the RDBMS will convert the data to a human-readable form.

Additionally, MySQL's clustered indexes are subject to write amplification, which makes things worse.

I haven't read the rest of the article though, which likely includes other consideration about the spatiality problems due to randomness. Things gets even more complex, due to the relationship with the data structures (I haven't fully read the article).


> they are not human-readable, so one tends to store them as CHAR(36) instead

one is an idiot


One does tend to be an idiot at times...


Postgres is somewhat different mainly because it doesn't use clustered index primary keys. So the row's position on disk is not related to the primary key index entry's position on disk.

Additionally using the less cryptographically secure uuid v1 can be a performance optimization since it has implicit time based sorting.


> Additionally using the less cryptographically secure uuid v1 can be a performance optimization since it has implicit time based sorting.

Except the way the fields are laid out basically defeats the point: UUIDv1 lays a 60 bits timestamp starting from the lower 32 bits, so it only sorts within a 7 minutes (2*32 * 100ns) bucket.

Hence the proposal for UUIDv6, which lays the exact same timestamp in reverse order (starting from the "top" 32b), making it naturally sortable.


Huh, you're right. I guess I gotta message an old coworker and tell them their "perf optimization" didn't work.

Lesson learned, thanks!


Do check that they're not using a "cheating" implementation, I think some DBs have "sequential UUIDs" which are either UUIDv1 the right way up (aka uuidv6) internally, or an other scheme which yields a sequence (e.g. mssql's NEWSEQUENTIALID).

Alternatively, it's possible that they created pseudo-UUIDv1 by hand putting data in UUIDv6.


Some years back (maybe 5-10?) I remember doing a test with UUIDs on Postgres, and found no speed difference between UUIDs and integer PKs.

I don't remember the parameters of the test, however.


In my testing (years ago, don't have the data) there was no detectable difference between PKs of the uuid and bigint types in my application, but there was a difference between uuid and int (int was faster). uuid is 16 bytes, bigint is 8 bytes, int is 4 bytes, so at some scale there will be a performance difference.


I've experienced some performance differences with UUIDs vs integers, but it has been minor enough to never cause any issues, even in tables with billions of records.


This topic has been raised more than once at my work and it scared a lot of people. It's important to understand your use case before you embrace any other solution.

This will affect you only when you are frequently creating and storing new IDs, which leads to reshaping btrees.

If you have IDs and its number is under control and doesn't change a lot you're just fine.


That's the reason I've chosen postgres over mysql years ago and I don't use clustered indexes.


What does postgres do differently here to still have good performance?


What's the purpose of using non-random data such as time and MAC address in UUID? It increases probability of collision compared to purely random bits and if you need time or MAC it seems better to store them as separate fields - easier to perform selects, groupings, etc.


> Even if you use pseudo-ordered UUID values stored using binary(16), it is still a very large data type which will inflate the size of the dataset

Every IPv6 datagram has a pair of source and destination UUIDs. :)


Why would anyone use a random value as a primary key? I haven’t done databases in a long time but isn’t it standard practice to use a sequential incrementing value for the primary key?


reasons are: 1. no need for coordination in distributed system, no need to check what is the next available id. 2. if userid is visible to users, sequential userid gives away information about the amount of users and allows guessing other valid userids.


This is exactly over engineering and solving for the wrong problem. It can be solved easily d efficiently in other ways.


Sequential UUIDs fix this problem.


Fun improvement from my project a long time ago, uuid has 36 char and to save space, we created a shorten version by removing all the dash character in uuid. The result is 4 char could be removed, the field in MySQL table only needs 32 char.


The point is, a uuid contains 128 bits of data, so a varchar(32) is still somewhat excessive


Why varchar over char? The length is always the same.


Yeah, true. My bad. I'm used to calling everything TEXT.


This seems to be just about using UUIDs as indexes in a DB, not using UUIDs in general as an ID for things which I'm not seeing any reason not to continue doing.


Sure, as long as you never want to look things up or reference them by ID, then there's no reason to worry. Otherwise, yes, you'll have the exact same problems laid out in the article




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: