depending on scale and architecture, either behavior can be better. it’s easier ...

Dylan16807 · on July 22, 2023

Being random within each shard is still bad for write performance. Going fully random seems like a bad way to accomplish this goal.

Why not keep the timestamp bits to use when appropriate, but use some of the random bits for shard selection?

lazide · on July 22, 2023

Only when writing all at once and when you know what the shard boundaries are and the number of shards (and boundaries) are stable. If they’re changing, growing, et c. you can’t tell where they’re at predictably and random is the least likely to cause problems and allow sub-sharding dynamically.

Very large real world datasets are unlikely to be static long enough, and equipment stable enough, to not consider this effect.

Dylan16807 · on July 22, 2023

> If they’re changing, growing, et c. you can’t tell where they’re at predictably and random is the least likely to cause problems and allow sub-sharding dynamically.

I'm confused by your reply, because I never suggested not to use random bits for sharding.

I'm just saying that 60+ random bits should be enough to shard, change, grow, and sub-shard with. You don't need 122.

lazide · on July 22, 2023

I never said anything about bit or key length at all? Let alone how much was random or not? Perhaps you’re confused?

Dylan16807 · on July 22, 2023

Let's start over.

People were talking about the value of time+random UUIDs versus all-random UUIDs, and how those behave.

You said that sometimes the random behavior is preferable.

In response to that, I was saying that even if you want to sort randomly at some particular step, you should use the time+random format, because other steps might not want to sort randomly. You should directly choose to use the random part, instead of indirectly forcing it by making the entire UUID random.

Then you said "Only when writing all at once and when you know what the shard boundaries are and the number of shards (and boundaries) are stable."

I can't figure out how that relates to my post. I thought you were worried about insufficient random bits to use for sharding, but apparently that wasn't your concern. So I have no idea what your concern is. If you have a use case for randomness, use the random half of the UUID.

lazide · on July 22, 2023

UUIDv7 has a specific format that doesn’t support that.

For the case I’m describing, you can’t use it.

For situations you want write coalescing it’s fine though.

Not sure it we’re agreeing here?

Dylan16807 · on July 22, 2023

What do you mean it doesn't support that?

There's some flexibility in how you fill in a UUIDv7, but let's go ahead and say that the ones we're worried about have the first 32 bits filled with timestamp and the last 32 bits filled with random.

If you want pure sort-by-time, then use it the normal way. If you want pure sort-by-random, then it's slightly awkward but you can prioritize the random part.

But the additional power is that you can shard by the last 32 bits, then sort by the first 32 bits within a shard. And you don't need weird workarounds like hashing the UUID.

You said "it’s easier to shard when writes occur randomly over the overall space. it’s easier to coalesce when writes all happen in a given place (head or tail)". But you can have both at the same time. You can have easy sharding and easy coalescing.

lazide · on July 22, 2023

Except you literally can't do random distribution AND be compliant with UUIDv7 if you use any sort of normal lexical sorting/indexing, as they use the start of the key as the most significant bits. UUIDv7 is literally designed to have stable lexical sorting orders, have the time as the most significant bits, and have the most significant bits of the time as the most significant bits of the key! It's their primary design criteria!

You can't 'prioritize' random parts of a key for sorting without writing a bunch of custom sorting (and key parsing) logic, which is generally undesirable for a number of reasons - and frankly completely unnecessary in these cases. You just wouldn't use UUIDv7 (or probably a UUID in general), and the benefits would pay for themselves very quickly anyway.

To quote the UUIDv7 RFC:

"This document presents new time-based UUID formats which are suited for use as a database key." (as the first line of the abstract)

"Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key."

"- Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.

- Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.

- Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.

- The ID format SHOULD be Lexicographically sortable while in the textual representation.

- IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.

- IDs MUST NOT require unique network identifiers as part of achieving uniqueness.

- Distributed nodes MUST be able to create collision resistant Unique IDs without a consulting a centralized resource."

[https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...]

I'm pointing out that for some systems, that makes UUIDv7 unsuitable because you WANT the keys to be randomly distributed to avoid hotspots. Using UUIDv7 in these situations will result in a single node receiving all writes (and all reads for a given time range), which in the dataset sizes I'm referring to is usually impossible to handle. No single node can handle that kind of load, regardless of how efficient it may be.

For other types of systems (such as single machine databases or 'tight' clusters of databases without extreme write loads), UUIDv7 and similar is great, as it allows easy/cheap write combining when that is actually possible for a machine to handle the load.

Dylan16807 · on July 22, 2023

> Except you literally can't do random distribution AND be compliant with UUIDv7 if you use any sort of normal lexical sorting/indexing, as they use the start of the key as the most significant bits. UUIDv7 is literally designed to have stable lexical sorting orders, have the time as the most significant bits, and have the most significant bits of the time as the most significant bits of the key! It's their primary design criteria!

> You can't 'prioritize' random parts of a key for sorting without writing a bunch of custom sorting (and key parsing) logic, which is generally undesirable for a number of reasons - and frankly completely unnecessary in these cases. You just wouldn't use UUIDv7 (or probably a UUID in general), and the benefits would pay for themselves very quickly anyway.

Forget prioritizing, that was about going fully random. Seriously, let's pretend I never said that specific sentence.

Let's focus on just the sharding scenario. None of what you said there conflicts with what I said about sharding.

Unless these database engines are so incompetent that you can't shard on something as simple as id[12:16]?

> I'm pointing out that for some systems, that makes UUIDv7 unsuitable because you WANT the keys to be randomly distributed to avoid hotspots. Using UUIDv7 in these situations will result in a single node receiving all writes (and all reads for a given time range), which in the dataset sizes I'm referring to is usually impossible to handle. No single node can handle that kind of load, regardless of how efficient it may be.

You only want the keys to be randomly distributed at the sharding layer. Once it reaches its home node, you don't want random distribution within that node. At best you begrudgingly accept it.

It's within a node that things like "normal lexical sorting" matter the most, so UUIDv7 does a great job of making that smooth.

You don't need lexical sorting between shards, especially when you're randomizing the shard.

lazide · on July 22, 2023

The point I'm making is all these shenanigans are completely unnecessary, don't really help, and make everything extremely hard to manage, reason about, and get performance from - all to try to force usage of a specific key format (UUID) in a situation which it is not designed for, and for which it is not suited.

It's square peg, round hole.

And folks working on Exabyte sized indexed datasets generally already get this. So I'm not sure why i'm even having this discussion? I'm not even getting paid for this!

Ciao!

Dylan16807 · on July 22, 2023

"it allows easy/cheap write combining" is not "completely unnecessary". What the heck, at least be consistent.

And it's not shenanigans! You could shard based on the first bytes of a key, or you could shard based on the last bytes of the key. Neither one should be harder. Neither one is shenanigans.

> It's square peg, round hole.

Going entirely random is an even worse peg.

wood_spirit · on July 23, 2023

Wow a long thread of back and forth and confusion :)

Fwiw I’m with Dylan on this one!

I have direct experience of absolutely humongous data processing using random bits for shard selection where each shard uses sorted storage and benefits from the sortability of the time bits so, with just the smallest buffering, all inserts are basically super fast appends.

This is super normal in my experience. And I can’t wait for the new UUID formats to land and get widely supported in libs to simplify discussions with event producers :)

eru · on July 23, 2023

Just explicitly use (time, random uuid) as a key in your sorting, instead of sullying your uuid with time information?

15155 · on July 23, 2023

ULID schemes aren't just about big endian sorting advantages, they often better enable time-based sorting mechanisms.

remram · on July 21, 2023

If that is the case then why shouldn't the storage system hash the IDs itself, to spread them as it requires?

lazide · on July 21, 2023

Because sometimes you want some data to be collocated, while the rest sharded.

For instance, you might use a random object ID as a prefix value in the index, followed by attribute ID which isn’t. Or a modified time, so you can have a history of values which can be read out linearly.

If using it directly, that means Objects and their data are sharded randomly across, but when looking for an objects attributes (or attribute by time), their index entries are always co-located and you can read them out linearly with good performance.

If blindly hashing keys to distribute them, you can’t do that. Also, you can’t really do a linear read at all, since no data will be ‘associatable’ with others, as the index value is randomized, and what is stored in the index has no related to the key provided by the user.

You can only do a straight get, not a read. That is very limiting, and expensive with large data sets as most algorithms benefit greatly from having ordered data. (Well, you could do a read, but you’d get back entries in completely random order)

Needless to say, this is ‘advanced’ usage and requires pretty deep understanding of your data and indexing/write/read patterns, which is why random hashing is the most common hash map behavior.

remram · on July 22, 2023

Sounds like it should be an attribute of the index and not require a change in the data. To me, anyway.

    CREATE INDEX ... USING HASH;

lazide · on July 22, 2023

I’ve never seen that kind of optimization on a dataset that would fit on a database server of any kind. Tens of PB or EB usually, but sometimes only several hundred TB if it’s high load/in-memory only.

andromeduck · on July 22, 2023

Just swizzle the ID.