I'm still in the dark about the problem with storing a salted hash, assuming a d...

eadmund · on Feb 1, 2025

The very short version is that decent hash functions are very, very fast, take almost no storage to compute and can thus easily be run in parallel. If you store `(,salt ,(HMAC-SHA2-256 salt password)) in your database and it leaks, an attacker can check hundreds of millions of possible passwords every second. The password ‘open sesame’ will get found pretty early, but so will ‘0p3n#s3s4m3$’. Pretty much anything which is not of the form ‘62Jm1QyJKGIZUpb5zARRLM’ will get found pretty quickly.

You want something that is slow and takes a lot of resources to run. PBKDF2 was an early attempt which uses lots and lots of CPU, but it doesn’t use a lot of space. scrypt, bcrypt and Argon2 use lots of CPU and also lots of space in an attempt to make it more expensive to run (which is the point: you want the expected cost of finding a password to be more than the expected value of knowing that password).

That’s one level of issue.

The next level of issue is that when you run a simple system like that the user shares his password with you for you to check that it’s valid. This sounds find: you trust yourself, right? But you really shouldn’t: you make mistakes, after all. An you may have employees who make mistakes or who are malicious. They might log the password the user sent. They might steal the user’s password. Better systems use a challenge-response protocol in which you issue a one-time challenge to the user, the user performs some operation on the challenge and then responds with a proof that he knows his secret.

But those have their own issues! And then there are issues with online systems and timing attacks and more. It’s a very difficult problem, and it all matters what you are protecting and what your threat model is.

lelanthran · on Feb 1, 2025

> But those have their own issues! It’s a very difficult problem, and it all matters what you are protecting and what your threat model is.

it's all a trade-off - those challenge/response systems are better, but they also have more moving parts. There's more bits in the system to go wrong.

When there's only two pieces of the system (compare salted hash of user-provided password against stored salted hash) there's very little room for errors to creep in. Your auditing will all be focused on ensuring that the user-provided password is not leaked/stored/etc after a HTTP sign-in request is rxed.

When using challenge/response, there is some pre-shared information that must synchronised between the systems (algorithm in use, key length, etc depending on the specific challenge/response system chosen). That's a great deal more points of attacks for malicious actors.

And then, of course, you need to add versioning to the system (algorithms may be upgraded over time, key lengths may be expanded, etc) that all present even more points of attack.

Compare to the simple "compare salted hash of password to stored salted hash": even with upgrading the algorithms and the key lengths there still remain only one point of attack - downloading the salted hashes!

It doesn't matter how much more secure a competing system is if it introduces, in practice anyway, more points of failure.

My takeaway after doing some cryptography for some parts of my career is that by choosing a hash function to be expensive in computational power and expensive is space and keeping the "user enters a password and we verify it" is still going to have fewer points of attack than "Synchronising pre-shared information between user and authenticator as the security is upgraded over the years".

Basically, the trade-off is between "we chose a quick hash function, but can at least upgrade everything without the client software noticing" and the digital equivalent of "It's an older code, sir, but it checks out" problems.

eadmund · on Feb 1, 2025

You keep returning to salted hashes. Please read the first two paragraphs of what I wrote for why that is a mistake. If you are going to use a shared secret system, do not use salted hashes. There is almost never a good reason to use salted hashes in 2025.

> you need to add versioning to the system

You need this with salted hashes, too! And of course with any password-based system.

lelanthran · on Feb 1, 2025

> Please read the first two paragraphs of what I wrote for why that is a mistake.

Okay, I read it, and then re-read it. I still don't get why (for example) `bcrypt` (a salted hash function) is a bad idea.

I fully accept that I am missing something here, but I really would like to know why using `bcrypt` is a problem.

>> you need to add versioning to the system

> You need this with salted hashes, too! And of course with any password-based system.

Not in the client software, you don't. The pre-shared information with password-based system is generally stored in the users head.

The pre-shared information in the challenge/response system means both the submitting software (interacting with the user and rxing the challenge) as well as the receiving software (txing the challenge and rxing the response) need to be synchronised.

Now, once again, I fully accept that I might be missing something here, but AFAIK, that synchronisation contains extra points of attacks; points of attacks that don't exist in the password/salted-hash system.

And since absolutely no system ever discards existing mechanisms completely when upgrading, that deprecated but still supported for a few more months is even more additional points of attack.

Once again, I am trying to understand, not be contentious, and I want to fuolly understand:

a) The problem with salted hashes like `bcrypt`

b) What changes need to be made to client software when upgrading algorithms and key lengths in a password-based system.

tptacek · on Feb 1, 2025

bcrypt is not a "salted hash function". It's a a password hash construction (at the time of its invention, it was called an "adaptive hash"; today, we'd call it a "password KDF"). If you're using bcrypt, you're fine.

What you can't do is use SHA2 with a random salt and call it a day.

lelanthran · on Feb 1, 2025

My understanding of bcrypt math is that the input to the algorithm is a random salt and a message, with the output being a hash.

I believe the actual implementation gives two output fields as a single value, with that value containing the salt and the hash.

This might be why we appear to be talking past each other - I consider bcrypt to be a salted hash because it takes a salt in the inputs and produces a hash in the output.

The fact that the output also contains the salt is, in my mind, an implementation detail.

tptacek · on Feb 1, 2025

Yes. We aren't talking past each other anymore. If you aren't composing a "salted hash" out of a cryptographic hash function and a salt, but are instead using bcrypt, scrypt, PBKDF2, or Argon2, you have nothing to worry about. Just to complete your understanding: the salt --- randomizing the hash --- has very little to do with what makes bcrypt a good password hash.

You'll avoid this confusion in the future if you don't refer to bcrypt as a "salted hash". Salted hashes are the technology bcrypt was invented to replace.

noduerme · on Feb 6, 2025

OP here: I am using bcrypt to hash passwords, but I struggle to understand how not salting i.e. not randomizing it against a stored secret would still be safe for storing short strings. Be that as it may, I'm glad to hear I'm not doing anything horribly wrong when building login systems.