> there’s no way to meaningfully count change over time. You are right, should h...

zepolen · 2024-07-14T11:24:52 1720956292

On the contrary, I'd say perhaps you're not old enough, otherwise you'd have come across the advantages of:

- you can merge databases while guaranteeing ids won't conflict

- first hand support across various databases/systems

- you use it and never have to deal with ids again

sgarland · 2024-07-14T13:21:04 1720963264

> merge databases

You mean combining logically (or even physically) separated databases, collapsing their tuples into one? Why would you do that, and how often does that occur?

> first hand support across various databases/systems

UUIDs? Of the RDBMS most likely to be used (MySQL, Postgres, SQLite) only Postgres has a UUID type. The others store them as strings (please no) or binary types. MariaDB and Oracle have UUID types, and SQL Server has a GUID (essentially the same thing) type, but those are all less commonly seen.

What does have universal support is integers. They scale just fine (PlanetScale uses them [0] internally), and you can use them in a distributed system – if you even need one in the first place – via a variety of methods: interleaved ranges or a central server allocating chunks are two popular methods that come to mind.

[0]: https://github.com/planetscale/discussion/discussions/366

djbusby · 2024-07-14T18:50:00 1720983000

Sometimes various but similar database get smashed together when companies merge. Or when putting all the things in a "data-lake". Like inventory lot IDs or transaction IDs from 100s of stores local database uploaded to the corporate global DB (eg: Albertsons, Kroger, Walmart)

sgarland · 2024-07-14T15:02:25 1720969345

> Again I understand, most people don't seem to care about that, because they were born into cloud culture and have no clue what they are doing in terms of efficiency money/resource-wise.

They have no clue about how computers work, full stop. Sure, they know programming languages, but generally speaking, if you ask them about IOPS, disk or network latency, NUMA, cache lines, etc. they’ll tell you it doesn’t matter, and has been abstracted away for them. Or worse, they’ll say sub-optimal code is fine because shipping is all that matters.

There is certainly a difference between sub-optimal and grossly un-optimized code. Agonizing over a few msec outside of hot loops is probably not worthwhile from an efficiency standpoint, but if it's trivial to do correctly, why not do it correctly? One recent shocking example I found was `libuuid` in its various forms. util-linux's implementation [0] at its most recent tag is shockingly slow in larger loops. I'm fairly certain it's due to entropy exhaustion, but I haven't looked into it enough yet.

MacOS uses arc4random [1] (which for Linux, is in glibc as of v2.36, but you can get it from libbsd-dev otherwise), and it's much, much faster (again, on large loops).

I made some small C programs and a shell runner to demonstrate this [2].

[0]: https://github.com/util-linux/util-linux/blob/stable/v2.40/l...

[1]: https://man7.org/linux/man-pages/man3/arc4random.3.html

[2]: https://gist.github.com/stephanGarland/f6b7a13585c0caf9eb64b...