The real problem of passwords in general is the complete lack of a standard to what a password can be. Basically every service with a login reinvents the whole damn thing with completely arbitrary rules that are incompatible with each other. Minimun and maximun length, case-sensitivity, digits, letters, special characters, spaces.
Ok, after reading a lot about passwords I decide that I'm gonna go with a condensed passphrase more than 10 characters long and with a couple service-contextual caharcters in the end. That oughta keep me safe right? Well, except I have to pick a 6-8 length password that MUST have a number AND a capitalized letter in it!
This is at least partly true. I remember using passphrases which involved spaces, until PayPal said "no we don't like spaces". I have no idea why; the only case that I can think of where spaces would really matter is if you were piping your password to an encryption command via a shell script -- in any other case it shouldn't be tokenized that way. There also should never be a rule about maximum length, and all passwords should be valid Unicode strings. Storing a hash of a unicode string should not be hard in this day and age.
Part of the problem is that languages never standardized one password interchange format. It could be really simple. Here's one example:
This is an interchange format based on computing PBKDF2(HMAC(SHA256)) with 2^13 = 8,192 rounds and a salt of ffH+5dfGr64h, to hash the given password. The exact mechanisms aren't important here, what's important is that there is no portable primitive which does this.
The closest that the above comes to being realized is a system called bcrypt. It isn't PBKDF2, but oh well: find a bcrypt module and use it.
Collisions can be handled with two different approaches. The first problem with collisions is when two people have the same password. This is one of the reasons you should salt a password before hashing. If you choose a random m-bit salt then due to the "birthday phenomenon", you can go to around 2^{m/2} users before there is a significant risk that some pair of them shares the same password. This actually isn't too hard to understand. If you have N people who want to high-five each other (e.g. N ~= 18 at the end of an Ultimate frisbee match, everyone from both teams generally high-fives everyone on their own team as well as everyone on the other team), then there are N * (N - 1) / 2 high-fives, or roughly N^2 / 2 for very large N. To see this, enumerate all of the pairs (1, 1), (1, 2), ..., (N, N -1), (N, N), N^2 of them in total, then cross out the ones of the form (k, k), N of them in total, because nobody high-fives themselves, and then cross out the half (a, b) for which a > b, because 3 high-fiving 2 is the same as 2 high-fiving 3. Then (N^2 - N) / 2 = N (N - 1) / 2.
But if each of these comparisons has a roughly independent chance p of sharing the same salt, like 1/2^72, then the number you need before you get towards a 50/50 chance is approximately p * N^2/2 = 1/2, or N = sqrt(1/p). If p = 1/2^72, sqrt(1/p) = 2^36 = 68.7 billion.
So a 72-bit random salt will make sure that everyone shares a salt with less than a billionth of your users or so. If you want even better, you might want to know that a salt doesn't have to be random, and you could generate them incrementally to get even more security. For example in Node.js:
var new_salt = (function () {
var counter = 0, buf = new Buffer(8);
return function () {
counter = (counter + 1) % 1e6;
buf.writeDoubleLE(new Date().getTime() * 1e6 + counter, 1);
return buf.toString('base64')
};
});
Now as long as I have fewer than one million new passwords per millisecond, there is no chance at all of two people having the same salt. (I'm sure Windows has timing issues so that this has to be much more than once per millisecond, but the point stands.)
You asked about hash collisions in particular, and in the context of not having a maximum length. That's a little trickier of a problem. Yes, there exist attacks, particularly on MD5, which use a bunch of arbitrary output blocks to steer a hash to a particular value. But that usually requires a hash to be well-broken. In the case of SHA-256, you would need something like 2^128 user accounts before you'd map different inputs to different outputs. So you're protected from both: you ensure different inputs to the hash function and then you choose a hash function which requires too much work to generate collisions.
Silent truncation is another gotcha. They drop everything after character 8 (so it fits in the database?). Then they change the schema, and stop truncating at 8 characters, but only on some forms. Oh, the joy.
I think the common problem of silent truncation after 8 characters suggests they are using the (outdated) crypt(3) API. The man page says "By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained."
This is so true. You may have a great password, like this:
9rIqIiGByCzcWq
But, it may be considered too long by some services, or not complex enough by others (as it lacks special chars), or it can't begin with a number, etc. Adding special chars breaks it for some DB apps (oracle can't take some chars in the password). It's really frustrating that it's 2012 and there's no global ISO password standard (or something similar).
Maximum length is the worst. If I discover that a service enforces a maximum length (besides simple practical limits like, say, 255 characters), I do not use that service if I can avoid it. You know when you see that that you are not dealing with someone who understands security.
Everybody is dancing around reality: passwords are incompatible with the average person. The best passwords are almost always difficult to memorize and once they are memorized it is costly for us to change them. It's an ill-fitting solution to a social problem.
That's why password managers are so useful. For every service a user can use a different strong password, but for the user there is only one password.
Of course, password managers have their weaknesses (a compromised system will often give access to all accounts, rather than just a few), but it is far more secure than having people using the same trivial password for every service.
It is a single password/pass phrase. This makes it simple to manage.
You can easily create a simple paper and geographically-redundant system (e.g. using a one-time pad) that means you can keep the passphrase in the open, for example, in your wallet or purse.
Password managers are the only way to go and eventually all password/passphrase users realise this because they end-up emulating these systems anyway (e.g. using extemely poor entropy additions to their existing passwords for different applications).
Passwords are only hard to memorize because of the old advice about "do not write them down".
But really, if you write a password on a $50 dollar bill you're not likely to then stick it to your monitor or leave it under a draw. You'd keep it in your wallet, and you hopefully keep that safe. Once you've remembered the password you can destroy the bit of paper. (uh, not a $50!)
I'd be interested to see if there's a bias in words that are selected by people when using a diceware system. People generate a passphrase, but then say "I won't use the first word because I can't remember it, I'll just generate another word". Also whether that would lead to exploitable weakness.
Passwords are also much easier to memorize when you have to use them every few days, and extremely hard to remember when you only use them once every several months when your login cookie expires.
I also would like to know whether there is a simple way to strongly encrypt a password that one keeps in one's wallet. The goal should be that if I am pickpocketed in a Dutch train (which seemingly never happens but the Dutch always warn me about it), my wallet does not contain some piece of paper saying "my.address@gmail.com h6tHVtcj3DsY".
Whoever solves the 'password problem' and has their solution adopted is going to make a lot of money, I think. I'm always working on this in the back of my head, but haven't come up with something that will really work, but I'm sure someday, someone will come up with a solution that is a secure, reliable and easy to use way of personal identification.
For offline encrytion that is true, but if you rate limit the login attempts (to something like once every 3 seconds after 4 tries) then even a password with 20 bits of entropy is going to take a riddicously long time to bruteforce.
A random 4-word passphrase is more than enough to stop online attacks against high-value targets (e.g. bank accounts) that have any throttling at all.
A random 5-word passphrase with three random characters tossed at the end has better than 82 bits of entropy and will halt offline attacks, at least by non-TLAs, for many years to come.
These are not hard to memorize. What's hard to memorize is 101 different passphrases, many of which you don't use every day, which is why the people actively working to make it easier are focusing on the management problem.
So each of those lines has a little over 64 bits of entropy. You might also consider the following one-liner in the shell:
drostie@signy:~$ head -c 9 /dev/urandom | base64
ST2QpSb0eKHk
If you're not paranoid, consider typing "password" into DuckDuckGo and it will give you a random password. If you're slightly more paranoid, consider using mouse-movements to generate your own random password with a random tool I coded for making random API keys: http://code.drostie.org/api.html . Just truncate to the first 10-12 characters or so.
This doesn't actually surprise me. If you consider the role of random brute forcing via dictionary attacks is to locate the appropriate order of tokens that work together to create a coherent meaning you're essentially not providing any more security with more words. A "Password" is a phrase composed of tokens that are the alphabet, numbers, symbols. A "Passphrase" is composed of tokens that are known english words. By taking the corpus search method to determine natural phrases they're essentially trying to identify the total breadth of 2+ token combinations that make up the english language.
This does break down like they said when you stop using coherent meanings. A passphrase that is HorseQuoteBulb would be hard to guess in comparison to HorsesEatHay or something of the same style.
The same goes for passwords: while it may be easy to guess a password as "phrase" it suddenly becomes a lot more difficult to randomly attempt guesses at "7_-Az!e".
Eventually I think we'll all be forced to use two factor authentication for added security. Here the user is mostly safe from their own ignorance where the danger of having credentials stolen is more prominent in the form of Man in the Middle attacks.
As with normal passwords, you will have people completely disregarding any advice on how a password should look like. People will just slowly learn that having a faulty security system on the web is the same as it is in real life. You don't put a curtain as your front door, and you shouldn't put "Harry Potter" as your passphrase, especially if you are holding a wand in your Facebook photo.
In my case, I use multi-word passphrases with words from a obscure dialect of a tiny European country's language. Being a dialect I grew up with, it has the benefit of being easy to remember, and the obscureness of it means the phrases itself are more or less a random string of characters to any brute-force attack. Not exactly a "best practice" for anyone but myself, but I'm happy with it :).
It doesn't even have to be random, simply choosing "SimplyChoosing" or "AssumingPeople" is way more secure than "ManchesterUnited" or "HarryPotter", but still people are creatures of habit.
Ok, after reading a lot about passwords I decide that I'm gonna go with a condensed passphrase more than 10 characters long and with a couple service-contextual caharcters in the end. That oughta keep me safe right? Well, except I have to pick a 6-8 length password that MUST have a number AND a capitalized letter in it!