> If you use secret UUIDs, think of them as toxic assets. They taint anything they touch. If they end up in logs, then logs must be kept secret. If they end up in URLs, then browser history must be kept secret. This is no small challenge.
a fun retail banking variation of this misadventure is (1) someone designs an elegant RESTful API for doing something or other (2) and it gets applied to credit cards, where the credit card number is used as the natural primary key and is RESTfully embedded in URLs, which people endeavour to avoid logging, but then when you (3) integrate middleware to report metrics to some SaaS monitoring platform, the end result is that you're spraying all your customers credit card numbers into the monitoring platform
Well, there is a segment of the database side that thinks natural keys are better than artificial keys. A credit card number is a natural key, so I can see thr database logic to it.
The failure of depending on natural keys is simply highlighted by that problem.
I have had this fight so many times ... "we don't need to generate a random key, we can just use this 'unique' identifier" where the unique identifier is always some form of PII.
I like natural keys... if you can prove that they're actually immutable and unique for the thing they're representing. Credit card number is a decent natural key for a table of payment instruments, not for users. Even for a natural-key-believer, users pretty much always need a synthetic ID, because anything you might possibly believe to be constant about humans turns out not to be.
And what if you need to be able to refer to those instruments, e.g. check or use them, but at the same time not expose credit card number to whatever entity needs to do the check?
How do you store the payment instrument for a certain purchase?
I’m not sure it does the job of a natural key though. When you renew a card you can keep the number but get new exp/security number which should probably be considered a different record.
The bigger issue: I also wouldn’t be quick to assume I know the idiosyncrasies of all card systems, like whether they reuse numbers.
With the external id I get a single layer of obfuscation. If the external id is leaked they still don’t have any data that allows me to make a CC purchase. And if you designed it correctly, you could invalidate the id. ie a token that expires.
If the external id represents a single costumer uniquely, it still counts as personal information. That you hand out money by giving one id, but not to another is really arbitrary and doesn't make any sense. Knowing an identifier of another person should not allow you to withdraw money either way.
This statement in the YouTube example is unfair: "Users may expect that the video has proper access control restrictions"
The selection is clearly labeled as "Unlisted", which I find pretty easy to understand. It's a term used on other sites as well (eg. Imgur) and is hopefully becoming colloquial. For those less certain, a tooltip with more detail about the implications could easily be added.
Please don't wall your gardens behind logon credentials. Too much of the web has already done this.
Obviously sensitive things like billing should be subject to strictly controlled security. But being able to share semi-private media in a low friction-way without forcing consumers of your content to log in or create an account is extremely handy.
Author here, I think we're mostly in agreement. Unlisted is a great approach for usability, especially for content with low sensitivity. It's a risk versus effort thing.
I'll point out that users don't always have a perfect understanding of the security websites provide. I'm a fan of Krug's book, Don't Make Me Think, as a model for how users interact with website UI and I think that's often how users interact with security. If you expect users to read, understand, and retain technical information you're going to be disappointed.
For YouTube in particular I noticed a number of articles that explain how unlisted content can leak and how to recover from a leak, so I think the 'may' in the quote is quite correct.
I vaguely remember seeing a small help text on YouTube when you try to share an unlisted video where they tell you as the user that it's unlisted and to "be mindful" about sharing it. I think most people who read that don't understand what that means so maybe that's why I believe they don't show it anymore. I wonder if it has anything to do with that `si` URL parameter that the Share button introduces into the URL.
Anyway, maybe I didn't fully comprehend your article, but it still seems a bit mysterious to me how a database of unlisted YouTube videos could exist. Is it just based on user submissions? If so, how are they finding the URLs to the videos in the first place? In my experience, the way they might leak is if the uploader mistakenly adds an unlisted video into a public-facing playlist, but that doesn't seem to me like it would be a common error among content creators.
We are generally on the same page, and your thoughtful reply got my upvote. I bet we'd agree good UI derives from familiar concepts and offers affordance.
You don't need to be technical to have a decent idea of what "Unlisted" means.
A pretty good analog are the days when White Pages had everyone's phone number printed in them. If you wrote your unlisted number on a billboard or gave it to someone else who compromised it, you wouldn't expect to retain your privacy. There are lots of things on the Internet that quack like a duck but bite like a snake; this isn't one of them. It does exactly what's written on tin.
It's unfortunate (if not surprising) that malicious individuals are compiling the digital equivalent of "Dark Pages", and I salute your effort to raise awareness of that risk amongst web developers.
I grant "Unfair" wasn't a precise enough word, I just felt at that point you slipped from spotlighting unintentional architectural oversights that are fair game, to advocating opinionated change to a deliberate design choice. Sometimes we need real scissors, not safety scissors!
Yes, I think familiarity is quite important. The white pages example is quite dated though. I'm in my late thirties and I think the only time I've seen a phone book being used was in the Terminator movie. So the analogy feels like the save icon being a floppy disk. I'd be curious what the younger crowd thinks.
Sometimes a few of the users try to hold total power over all the
rest. For example, in 1984, a few users at the MIT AI lab decided to
seize power by changing the operator password on the Twenex system and
keeping it secret from everyone else. (I was able to thwart this coup
and give power back to the users by patching the kernel, but I wouldn't
know how to do that in Unix.)
However, occasionally the rulers do tell someone. Under the usual
`su' mechanism, once someone learns the root password who sympathizes
with the ordinary users, he or she can tell the rest. The "wheel
group" feature would make this impossible, and thus cement the power of
the rulers.
I'm on the side of the masses, not that of the rulers. If you are
used to supporting the bosses and sysadmins in whatever they do, you
might find this idea strange at first.
> Under the usual `su' mechanism, once someone learns the root password who sympathizes with the ordinary users, he or she can tell the rest. The "wheel group" feature would make this impossible, and thus cement the power of the rulers.
How would that work? What is it about the wheel group that stops the sympathetic wheel from revealing his own login information to other people?
> How would that work? What is it about the wheel group that stops the sympathetic wheel from revealing his own login information to other people?
The wheel member probably doesn't want to reveal their own login information. They want to share the root password with other users; on systems without a wheel group that works, but on systems with a wheel group non-wheel users can't su.
The other user would get root access in one scenario and not the other (assuming this is a system that only allows root login from the actual console, or doesn't allow root login at all).
If it is not possible to login as root directly (which on a multi-user system, it might only be possible to do if you are directly at the computer), then the wheel group might be one of the pieces (which should not be used alone; the wheel users still should avoid revealing the root password anyways) to protect against accidental telling the password, I suppose, since then other users cannot use the password without physical access to the computer. If they deliberately want to do so, then they could presumably just as well make the other users to also be the wheel group (or modify the software on the computer so that it does not require wheel, etc), if you have root access.
Seems like the article was written as a personal journey of learning why UUID is not suitable for security or authorization in any way.
For anyone with prior knowledge and experience of UUID, it should be common sense that UUID will not protect any secrets, because that's not what they're for. They're a relatively unique and unguessable identifier, that's all.
Author here. I used a spelling and grammar checker, but nothing LLM based. I think AI generates very poor content so I think this is a criticism. I'll solicit any constructive feedback, I enjoy writing and would love to improve.
Author here. I think you missed the nuance I was going for.
I use YouTube and AWS as example since they both have implementations that are vulnerable to IDOR, but I think they made the right call. Sometimes usability takes preference over security. Sometimes 'obfuscation' is better than proper authorization.
> There are use cases where the effort needed to individually grant users access outweighs the risk of using unlisted. Not everyone is dealing in highly sensitive content, after all.
Title could have been “Python’s UUIDv7 only has 32 random bits” and just leave it at that. 2500 words to say “if you don’t authenticate access to a url then people can access it” wow shocker
Great piece, but worth mentioning that you generally can't use a presigned URL with CDN endpoints. So great for sensitive content, but if you rly want the thing to be widely and quickly accessible there's more work to be done
Well, you can if the signed URL is signed for the CDN's verification instead of the underlying storage.
Generalising this; you don't need stateful logged-in authentication to defeat IDOR, you can include an appropriately salted HMAC in the construction of a shared identifier, optionally incorporating time or other scoping semantics as necessary, and verify that at your application's trust boundary.
This tends to make identifiers somewhat longer but still fit well inside a reasonable email'd URL to download your phone bill without having to dig up what your telco password was.
However, note that one of the baseline requirements of privacy-oriented data access is issuing different and opaque identifiers for the same underlying thing to each identifiable principal that asks for it. Whether that's achieved cryptographically or by a lookup table is a big can of engineering worms.
I am a bit "meh" on the YouTube "unlisted video" example. The name itself is fairly transparent in implying that there's really no security, the video is just not listed in a public-facing way. This is significantly different than the article's billing example, where customers would be quite right in assuming their bills will be only accessible to them.
> The name itself is fairly transparent in implying that there's really no security
A password-capability system is a password-capability system. Not requiring an account does not make it not an access control. (Though it does make it e.g. not selectively revokable, which is a known weakness of password capabilities.)
Correct me if I am misunderstanding your point but unlisted YouTube videos don’t need a password or anything to be accessed. Anyone who has the URL can access it. It’s just not indexed/searchable on YouTube.
Right. And neither do Google Docs shared by a no-login link (which used to be the only option) or for that matter RSA signing keys. You could in theory guess any of these, given all of the time in the universe (quite literally). A “password capability” is any mechanism where knowing the designation of an object (such as the “unlisted” link) is a necessary and sufficient condition to access it. The designation has to be hard to guess for the system to make sense.
(The intended contrast is with “object capabilities”, where the designation is once again necessary and sufficient but also unforgeable within the constraints of the system. Think handles / file descriptors: you can’t guess a handle to a thing that the system did not give you, specifically, a handle for.)
I get people won’t reasonably guess it, but an unlisted link is still an exposed link literally anyone with internet access can open. It’s simply not the same as a login + password, neither functionally nor technically.
Yeah at the end of the day the yt video isn’t under lock and key in any way, shape, or form vs. my billing info with my various utilities and such which is. It’s just “security through not knowing the exact URL (yet).”
And if your database is 99% reads 1% writes, the difference probably doesn't really matter.
And tons of database indexes operate on randomly distributed data -- looking up email addresses or all sorts of things. So in many cases this is not an optimization worth caring about.
Lots of distributed, NoSQL databases work (or partially work) this way too (e.g., HBase rowkey, Accumulo row ID, Cassandra clustering key, DynamoDB sort key). They partition the data into shards based upon key ranges and then spread those shards across as many servers as possible. UUIDv7 is (by design) temporally clustered. Since many workloads place far more value on recent data, and all recent data is likely to end up in the same shard, you bottleneck on the throughput of a single server or, even with replication, a small number of servers.
FWIW it looks like Cassandra doesn't belong on this list, and DynamoDB only with qualifications.
Though Cassandra is more like quasi-SQL than NoSQL, the bigger issue is that actually the clustering key is never used for sharding. So Cassandra (today) always puts all data with the same partition key on the same shard, and the partition key is hashed, meaning there's no situation in which UUIDv7 would perform differently (better or worse) than UUIDv4.
In DynamoDB, it is possible for sort keys to be used for sharding, but only if there is a large number of distinct sort keys for the same partition key. Generally, you would be putting a UUID in the partition key and not the sort key, so UUIDv7 vs UUIDv4 typically has no impact on DB performance.
i think the standard recommendation is to do range partitioning on the hash of the key, aka hash range partitioning (i know yugabyte supports this out of the box, i'd be surprised if others don't). this prevents the situation of all recent uuids ending up on the same shard.
a fun retail banking variation of this misadventure is (1) someone designs an elegant RESTful API for doing something or other (2) and it gets applied to credit cards, where the credit card number is used as the natural primary key and is RESTfully embedded in URLs, which people endeavour to avoid logging, but then when you (3) integrate middleware to report metrics to some SaaS monitoring platform, the end result is that you're spraying all your customers credit card numbers into the monitoring platform
reply