> Except you literally can't do random distribution AND be compliant with UUIDv7 if you use any sort of normal lexical sorting/indexing, as they use the start of the key as the most significant bits. UUIDv7 is literally designed to have stable lexical sorting orders, have the time as the most significant bits, and have the most significant bits of the time as the most significant bits of the key! It's their primary design criteria!
> You can't 'prioritize' random parts of a key for sorting without writing a bunch of custom sorting (and key parsing) logic, which is generally undesirable for a number of reasons - and frankly completely unnecessary in these cases. You just wouldn't use UUIDv7 (or probably a UUID in general), and the benefits would pay for themselves very quickly anyway.
Forget prioritizing, that was about going fully random. Seriously, let's pretend I never said that specific sentence.
Let's focus on just the sharding scenario. None of what you said there conflicts with what I said about sharding.
Unless these database engines are so incompetent that you can't shard on something as simple as id[12:16]?
> I'm pointing out that for some systems, that makes UUIDv7 unsuitable because you WANT the keys to be randomly distributed to avoid hotspots. Using UUIDv7 in these situations will result in a single node receiving all writes (and all reads for a given time range), which in the dataset sizes I'm referring to is usually impossible to handle. No single node can handle that kind of load, regardless of how efficient it may be.
You only want the keys to be randomly distributed at the sharding layer. Once it reaches its home node, you don't want random distribution within that node. At best you begrudgingly accept it.
It's within a node that things like "normal lexical sorting" matter the most, so UUIDv7 does a great job of making that smooth.
You don't need lexical sorting between shards, especially when you're randomizing the shard.
The point I'm making is all these shenanigans are completely unnecessary, don't really help, and make everything extremely hard to manage, reason about, and get performance from - all to try to force usage of a specific key format (UUID) in a situation which it is not designed for, and for which it is not suited.
It's square peg, round hole.
And folks working on Exabyte sized indexed datasets generally already get this. So I'm not sure why i'm even having this discussion? I'm not even getting paid for this!
"it allows easy/cheap write combining" is not "completely unnecessary". What the heck, at least be consistent.
And it's not shenanigans! You could shard based on the first bytes of a key, or you could shard based on the last bytes of the key. Neither one should be harder. Neither one is shenanigans.
Wow a long thread of back and forth and confusion :)
Fwiw I’m with Dylan on this one!
I have direct experience of absolutely humongous data processing using random bits for shard selection where each shard uses sorted storage and benefits from the sortability of the time bits so, with just the smallest buffering, all inserts are basically super fast appends.
This is super normal in my experience. And I can’t wait for the new UUID formats to land and get widely supported in libs to simplify discussions with event producers :)
> You can't 'prioritize' random parts of a key for sorting without writing a bunch of custom sorting (and key parsing) logic, which is generally undesirable for a number of reasons - and frankly completely unnecessary in these cases. You just wouldn't use UUIDv7 (or probably a UUID in general), and the benefits would pay for themselves very quickly anyway.
Forget prioritizing, that was about going fully random. Seriously, let's pretend I never said that specific sentence.
Let's focus on just the sharding scenario. None of what you said there conflicts with what I said about sharding.
Unless these database engines are so incompetent that you can't shard on something as simple as id[12:16]?
> I'm pointing out that for some systems, that makes UUIDv7 unsuitable because you WANT the keys to be randomly distributed to avoid hotspots. Using UUIDv7 in these situations will result in a single node receiving all writes (and all reads for a given time range), which in the dataset sizes I'm referring to is usually impossible to handle. No single node can handle that kind of load, regardless of how efficient it may be.
You only want the keys to be randomly distributed at the sharding layer. Once it reaches its home node, you don't want random distribution within that node. At best you begrudgingly accept it.
It's within a node that things like "normal lexical sorting" matter the most, so UUIDv7 does a great job of making that smooth.
You don't need lexical sorting between shards, especially when you're randomizing the shard.