Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why UUIDs won't protect your secrets (alexsci.com)
75 points by 8organicbits 9 days ago | hide | past | favorite | 59 comments




> If you use secret UUIDs, think of them as toxic assets. They taint anything they touch. If they end up in logs, then logs must be kept secret. If they end up in URLs, then browser history must be kept secret. This is no small challenge.

a fun retail banking variation of this misadventure is (1) someone designs an elegant RESTful API for doing something or other (2) and it gets applied to credit cards, where the credit card number is used as the natural primary key and is RESTfully embedded in URLs, which people endeavour to avoid logging, but then when you (3) integrate middleware to report metrics to some SaaS monitoring platform, the end result is that you're spraying all your customers credit card numbers into the monitoring platform


why would anyone ever think that using a credit card number would be a good primary key?

why would anyone who ever suggested such a thing not be relegated to permanent headlight fluid fetching duty?


Well, there is a segment of the database side that thinks natural keys are better than artificial keys. A credit card number is a natural key, so I can see thr database logic to it.

The failure of depending on natural keys is simply highlighted by that problem.


I have had this fight so many times ... "we don't need to generate a random key, we can just use this 'unique' identifier" where the unique identifier is always some form of PII.

Some form of PII that can change too. Natural keys are a no go for me.

> there is a segment of the database side that thinks natural keys are better than artificial keys.

Aren't those guys hammered nearly daily (at least weekly) with one real-world example after another about how natural keys aren't unique?


I like natural keys... if you can prove that they're actually immutable and unique for the thing they're representing. Credit card number is a decent natural key for a table of payment instruments, not for users. Even for a natural-key-believer, users pretty much always need a synthetic ID, because anything you might possibly believe to be constant about humans turns out not to be.

And what if you need to be able to refer to those instruments, e.g. check or use them, but at the same time not expose credit card number to whatever entity needs to do the check?

How do you store the payment instrument for a certain purchase?


I’m not sure it does the job of a natural key though. When you renew a card you can keep the number but get new exp/security number which should probably be considered a different record.

The bigger issue: I also wouldn’t be quick to assume I know the idiosyncrasies of all card systems, like whether they reuse numbers.


The natural step is a multicolumn key (e.g. the key includes all three data points).

I've seen tables where primary keys were 7-10 columns long. It's a nightmare to work with when joining.


Surely the next thought is “I don’t want customer CC leaking” regardless what your background is?

When you introduce another ID that, maps 1:1 to the old one what do you get?

With the external id I get a single layer of obfuscation. If the external id is leaked they still don’t have any data that allows me to make a CC purchase. And if you designed it correctly, you could invalidate the id. ie a token that expires.

If the external id represents a single costumer uniquely, it still counts as personal information. That you hand out money by giving one id, but not to another is really arbitrary and doesn't make any sense. Knowing an identifier of another person should not allow you to withdraw money either way.

But you can’t hand out money with either value. The external ID is only valid in the system it was generated in.

And as I said, you can design it so that the ID expires periodically.


> But you can’t hand out money with either value.

Then there is no problem using the natural key?


This statement in the YouTube example is unfair: "Users may expect that the video has proper access control restrictions"

The selection is clearly labeled as "Unlisted", which I find pretty easy to understand. It's a term used on other sites as well (eg. Imgur) and is hopefully becoming colloquial. For those less certain, a tooltip with more detail about the implications could easily be added.

Please don't wall your gardens behind logon credentials. Too much of the web has already done this.

Obviously sensitive things like billing should be subject to strictly controlled security. But being able to share semi-private media in a low friction-way without forcing consumers of your content to log in or create an account is extremely handy.


Author here, I think we're mostly in agreement. Unlisted is a great approach for usability, especially for content with low sensitivity. It's a risk versus effort thing.

I'll point out that users don't always have a perfect understanding of the security websites provide. I'm a fan of Krug's book, Don't Make Me Think, as a model for how users interact with website UI and I think that's often how users interact with security. If you expect users to read, understand, and retain technical information you're going to be disappointed.

For YouTube in particular I noticed a number of articles that explain how unlisted content can leak and how to recover from a leak, so I think the 'may' in the quote is quite correct.


I vaguely remember seeing a small help text on YouTube when you try to share an unlisted video where they tell you as the user that it's unlisted and to "be mindful" about sharing it. I think most people who read that don't understand what that means so maybe that's why I believe they don't show it anymore. I wonder if it has anything to do with that `si` URL parameter that the Share button introduces into the URL.

Anyway, maybe I didn't fully comprehend your article, but it still seems a bit mysterious to me how a database of unlisted YouTube videos could exist. Is it just based on user submissions? If so, how are they finding the URLs to the videos in the first place? In my experience, the way they might leak is if the uploader mistakenly adds an unlisted video into a public-facing playlist, but that doesn't seem to me like it would be a common error among content creators.


We are generally on the same page, and your thoughtful reply got my upvote. I bet we'd agree good UI derives from familiar concepts and offers affordance.

You don't need to be technical to have a decent idea of what "Unlisted" means.

A pretty good analog are the days when White Pages had everyone's phone number printed in them. If you wrote your unlisted number on a billboard or gave it to someone else who compromised it, you wouldn't expect to retain your privacy. There are lots of things on the Internet that quack like a duck but bite like a snake; this isn't one of them. It does exactly what's written on tin.

It's unfortunate (if not surprising) that malicious individuals are compiling the digital equivalent of "Dark Pages", and I salute your effort to raise awareness of that risk amongst web developers.

I grant "Unfair" wasn't a precise enough word, I just felt at that point you slipped from spotlighting unintentional architectural oversights that are fair game, to advocating opinionated change to a deliberate design choice. Sometimes we need real scissors, not safety scissors!


Yes, I think familiarity is quite important. The white pages example is quite dated though. I'm in my late thirties and I think the only time I've seen a phone book being used was in the Terminator movie. So the analogy feels like the save icon being a floppy disk. I'd be curious what the younger crowd thinks.

"Once the URL is shared with others, the owner loses the ability to assert access control over the video."

That reminds me of Stallman's apocryphal story about favoring a password instead of ACLs, and why GNU doesn't have a "wheel" group :)

https://administratosphere.wordpress.com/2007/07/19/the-whee...

Sometimes a few of the users try to hold total power over all the rest. For example, in 1984, a few users at the MIT AI lab decided to seize power by changing the operator password on the Twenex system and keeping it secret from everyone else. (I was able to thwart this coup and give power back to the users by patching the kernel, but I wouldn't know how to do that in Unix.)

   However, occasionally the rulers do tell someone.  Under the usual
`su' mechanism, once someone learns the root password who sympathizes with the ordinary users, he or she can tell the rest. The "wheel group" feature would make this impossible, and thus cement the power of the rulers.

   I'm on the side of the masses, not that of the rulers.  If you are
used to supporting the bosses and sysadmins in whatever they do, you might find this idea strange at first.

> Under the usual `su' mechanism, once someone learns the root password who sympathizes with the ordinary users, he or she can tell the rest. The "wheel group" feature would make this impossible, and thus cement the power of the rulers.

How would that work? What is it about the wheel group that stops the sympathetic wheel from revealing his own login information to other people?


>What is it about the wheel group that stops the sympathetic wheel from revealing his own login information to other people?

Then there's a paper trail, bob logs in 10 times as much as anyone else and from all over the place.

Anyway this is all silly ancient politics and shared admin passwords as a method of freeing the people is long past relevant.


> How would that work? What is it about the wheel group that stops the sympathetic wheel from revealing his own login information to other people?

The wheel member probably doesn't want to reveal their own login information. They want to share the root password with other users; on systems without a wheel group that works, but on systems with a wheel group non-wheel users can't su.


> The wheel member probably doesn't want to reveal their own login information. They want to share the root password with other users

What's the difference? What would be something the other user could do in one of those scenarios that they couldn't do in the other one?


Root password is a shared secret that does not implicate the leaker; sharing your personal credentials, however does.

The other user would get root access in one scenario and not the other (assuming this is a system that only allows root login from the actual console, or doesn't allow root login at all).

> The other user would get root access in one scenario and not the other

What?


If it is not possible to login as root directly (which on a multi-user system, it might only be possible to do if you are directly at the computer), then the wheel group might be one of the pieces (which should not be used alone; the wheel users still should avoid revealing the root password anyways) to protect against accidental telling the password, I suppose, since then other users cannot use the password without physical access to the computer. If they deliberately want to do so, then they could presumably just as well make the other users to also be the wheel group (or modify the software on the computer so that it does not require wheel, etc), if you have root access.

I thought this article was just stating the obvious over and over?

The answer is _always_ auth over obfuscation.


Seems like the article was written as a personal journey of learning why UUID is not suitable for security or authorization in any way.

For anyone with prior knowledge and experience of UUID, it should be common sense that UUID will not protect any secrets, because that's not what they're for. They're a relatively unique and unguessable identifier, that's all.


Yeah, I guess I expected something much more interesting when I saw it on the front page (and the other positive comments)

Yeah, the article is a one-liner turned into a 500-liner:

Don't use a random id for security 'cause once its known, it's insecure.


> this article was just stating the obvious over and over?

This article gives me a lot of AI vibes

All statements, no logic, no solution.


Author here. I used a spelling and grammar checker, but nothing LLM based. I think AI generates very poor content so I think this is a criticism. I'll solicit any constructive feedback, I enjoy writing and would love to improve.

Author here. I think you missed the nuance I was going for.

I use YouTube and AWS as example since they both have implementations that are vulnerable to IDOR, but I think they made the right call. Sometimes usability takes preference over security. Sometimes 'obfuscation' is better than proper authorization.

> There are use cases where the effort needed to individually grant users access outweighs the risk of using unlisted. Not everyone is dealing in highly sensitive content, after all.


Title could have been “Python’s UUIDv7 only has 32 random bits” and just leave it at that. 2500 words to say “if you don’t authenticate access to a url then people can access it” wow shocker

Great piece, but worth mentioning that you generally can't use a presigned URL with CDN endpoints. So great for sensitive content, but if you rly want the thing to be widely and quickly accessible there's more work to be done

Well, you can if the signed URL is signed for the CDN's verification instead of the underlying storage.

Generalising this; you don't need stateful logged-in authentication to defeat IDOR, you can include an appropriately salted HMAC in the construction of a shared identifier, optionally incorporating time or other scoping semantics as necessary, and verify that at your application's trust boundary.

This tends to make identifiers somewhat longer but still fit well inside a reasonable email'd URL to download your phone bill without having to dig up what your telco password was.

However, note that one of the baseline requirements of privacy-oriented data access is issuing different and opaque identifiers for the same underlying thing to each identifiable principal that asks for it. Whether that's achieved cryptographically or by a lookup table is a big can of engineering worms.


You can use pre-signed URLs with CloudFront.

IDOR is not "Indirect Object Reference" it's "Insecure Direct Object Reference"

I am a bit "meh" on the YouTube "unlisted video" example. The name itself is fairly transparent in implying that there's really no security, the video is just not listed in a public-facing way. This is significantly different than the article's billing example, where customers would be quite right in assuming their bills will be only accessible to them.

> The name itself is fairly transparent in implying that there's really no security

A password-capability system is a password-capability system. Not requiring an account does not make it not an access control. (Though it does make it e.g. not selectively revokable, which is a known weakness of password capabilities.)


Correct me if I am misunderstanding your point but unlisted YouTube videos don’t need a password or anything to be accessed. Anyone who has the URL can access it. It’s just not indexed/searchable on YouTube.

Right. And neither do Google Docs shared by a no-login link (which used to be the only option) or for that matter RSA signing keys. You could in theory guess any of these, given all of the time in the universe (quite literally). A “password capability” is any mechanism where knowing the designation of an object (such as the “unlisted” link) is a necessary and sufficient condition to access it. The designation has to be hard to guess for the system to make sense.

(The intended contrast is with “object capabilities”, where the designation is once again necessary and sufficient but also unforgeable within the constraints of the system. Think handles / file descriptors: you can’t guess a handle to a thing that the system did not give you, specifically, a handle for.)


I get people won’t reasonably guess it, but an unlisted link is still an exposed link literally anyone with internet access can open. It’s simply not the same as a login + password, neither functionally nor technically.

The fact that this site exists says it all: https://unlistedvideos.com/indexm.html


Yeah at the end of the day the yt video isn’t under lock and key in any way, shape, or form vs. my billing info with my various utilities and such which is. It’s just “security through not knowing the exact URL (yet).”

Why would you use UUIDv7 rather than UUIDv4 though?

UUIDv4 is much more scattered (i.e., uniformly distributed), which heavily degrades indexing performance in databases.

But mainly on writes, not much for reads.

And if your database is 99% reads 1% writes, the difference probably doesn't really matter.

And tons of database indexes operate on randomly distributed data -- looking up email addresses or all sorts of things. So in many cases this is not an optimization worth caring about.


This depends on the database and should not be written as gospel.

Which databases doesn't it degrade performance with when used as an indexed field?

UUIDv7 seems popular for Postgres performance improvements, but it causes issues with databases like Spanner.

https://medium.com/google-cloud/understanding-uuidv7-and-its...


Lots of distributed, NoSQL databases work (or partially work) this way too (e.g., HBase rowkey, Accumulo row ID, Cassandra clustering key, DynamoDB sort key). They partition the data into shards based upon key ranges and then spread those shards across as many servers as possible. UUIDv7 is (by design) temporally clustered. Since many workloads place far more value on recent data, and all recent data is likely to end up in the same shard, you bottleneck on the throughput of a single server or, even with replication, a small number of servers.

FWIW it looks like Cassandra doesn't belong on this list, and DynamoDB only with qualifications.

Though Cassandra is more like quasi-SQL than NoSQL, the bigger issue is that actually the clustering key is never used for sharding. So Cassandra (today) always puts all data with the same partition key on the same shard, and the partition key is hashed, meaning there's no situation in which UUIDv7 would perform differently (better or worse) than UUIDv4.

In DynamoDB, it is possible for sort keys to be used for sharding, but only if there is a large number of distinct sort keys for the same partition key. Generally, you would be putting a UUID in the partition key and not the sort key, so UUIDv7 vs UUIDv4 typically has no impact on DB performance.


i think the standard recommendation is to do range partitioning on the hash of the key, aka hash range partitioning (i know yugabyte supports this out of the box, i'd be surprised if others don't). this prevents the situation of all recent uuids ending up on the same shard.

Indeed. In fact, Cassandra and DynamoDB have both hash keys and range keys; I've edited my comment to be more specific.

...the solution to IDORs is to authenticate each user and check authorization per object.

full stop.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: