Some evidence on multi-word passphrases

ttt_ · on March 8, 2012

The real problem of passwords in general is the complete lack of a standard to what a password can be. Basically every service with a login reinvents the whole damn thing with completely arbitrary rules that are incompatible with each other. Minimun and maximun length, case-sensitivity, digits, letters, special characters, spaces.

Ok, after reading a lot about passwords I decide that I'm gonna go with a condensed passphrase more than 10 characters long and with a couple service-contextual caharcters in the end. That oughta keep me safe right? Well, except I have to pick a 6-8 length password that MUST have a number AND a capitalized letter in it!

drostie · on March 8, 2012

This is at least partly true. I remember using passphrases which involved spaces, until PayPal said "no we don't like spaces". I have no idea why; the only case that I can think of where spaces would really matter is if you were piping your password to an encryption command via a shell script -- in any other case it shouldn't be tokenized that way. There also should never be a rule about maximum length, and all passwords should be valid Unicode strings. Storing a hash of a unicode string should not be hard in this day and age.

Part of the problem is that languages never standardized one password interchange format. It could be really simple. Here's one example:

    13 ffH+5dfGr64h UYDpAQkrzmvI8fZhqtXQNqCGLouWn2edKowiyqTlzsg

This is an interchange format based on computing PBKDF2(HMAC(SHA256)) with 2^13 = 8,192 rounds and a salt of ffH+5dfGr64h, to hash the given password. The exact mechanisms aren't important here, what's important is that there is no portable primitive which does this.

The closest that the above comes to being realized is a system called bcrypt. It isn't PBKDF2, but oh well: find a bcrypt module and use it.

tomg · on March 8, 2012

> There also should never be a rule about maximum length

What about hash collisions? Is there a good way to handle that?

drostie · on March 9, 2012

Collisions can be handled with two different approaches. The first problem with collisions is when two people have the same password. This is one of the reasons you should salt a password before hashing. If you choose a random m-bit salt then due to the "birthday phenomenon", you can go to around 2^{m/2} users before there is a significant risk that some pair of them shares the same password. This actually isn't too hard to understand. If you have N people who want to high-five each other (e.g. N ~= 18 at the end of an Ultimate frisbee match, everyone from both teams generally high-fives everyone on their own team as well as everyone on the other team), then there are N * (N - 1) / 2 high-fives, or roughly N^2 / 2 for very large N. To see this, enumerate all of the pairs (1, 1), (1, 2), ..., (N, N -1), (N, N), N^2 of them in total, then cross out the ones of the form (k, k), N of them in total, because nobody high-fives themselves, and then cross out the half (a, b) for which a > b, because 3 high-fiving 2 is the same as 2 high-fiving 3. Then (N^2 - N) / 2 = N (N - 1) / 2.

But if each of these comparisons has a roughly independent chance p of sharing the same salt, like 1/2^72, then the number you need before you get towards a 50/50 chance is approximately p * N^2/2 = 1/2, or N = sqrt(1/p). If p = 1/2^72, sqrt(1/p) = 2^36 = 68.7 billion.

So a 72-bit random salt will make sure that everyone shares a salt with less than a billionth of your users or so. If you want even better, you might want to know that a salt doesn't have to be random, and you could generate them incrementally to get even more security. For example in Node.js:

    var new_salt = (function () {
        var counter = 0, buf = new Buffer(8);
        return function () {
            counter = (counter + 1) % 1e6;
            buf.writeDoubleLE(new Date().getTime() * 1e6 + counter, 1);
            return buf.toString('base64')
        };
    });

Now as long as I have fewer than one million new passwords per millisecond, there is no chance at all of two people having the same salt. (I'm sure Windows has timing issues so that this has to be much more than once per millisecond, but the point stands.)

You asked about hash collisions in particular, and in the context of not having a maximum length. That's a little trickier of a problem. Yes, there exist attacks, particularly on MD5, which use a bunch of arbitrary output blocks to steer a hash to a particular value. But that usually requires a hash to be well-broken. In the case of SHA-256, you would need something like 2^128 user accounts before you'd map different inputs to different outputs. So you're protected from both: you ensure different inputs to the hash function and then you choose a hash function which requires too much work to generate collisions.

tomg · on March 12, 2012

Thanks for this post, it's very informative!

wisty · on March 8, 2012

Silent truncation is another gotcha. They drop everything after character 8 (so it fits in the database?). Then they change the schema, and stop truncating at 8 characters, but only on some forms. Oh, the joy.

cpeterso · on March 8, 2012

I think the common problem of silent truncation after 8 characters suggests they are using the (outdated) crypt(3) API. The man page says "By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained."

ovi256 · on March 8, 2012

MSN and Hotmail used, and maybe still do this, after 14 characters.

bwooce · on March 8, 2012

My bank does it after 6! It even reminds you that it is 6 chars max on the login page.

kijin · on March 8, 2012

Now it's 16 characters at Hotmail.

16s · on March 8, 2012

This is so true. You may have a great password, like this:

9rIqIiGByCzcWq

But, it may be considered too long by some services, or not complex enough by others (as it lacks special chars), or it can't begin with a number, etc. Adding special chars breaks it for some DB apps (oracle can't take some chars in the password). It's really frustrating that it's 2012 and there's no global ISO password standard (or something similar).

mistercow · on March 8, 2012

Maximum length is the worst. If I discover that a service enforces a maximum length (besides simple practical limits like, say, 255 characters), I do not use that service if I can avoid it. You know when you see that that you are not dealing with someone who understands security.

lhnz · on March 8, 2012

Everybody is dancing around reality: passwords are incompatible with the average person. The best passwords are almost always difficult to memorize and once they are memorized it is costly for us to change them. It's an ill-fitting solution to a social problem.

danieldk · on March 8, 2012

That's why password managers are so useful. For every service a user can use a different strong password, but for the user there is only one password.

Of course, password managers have their weaknesses (a compromised system will often give access to all accounts, rather than just a few), but it is far more secure than having people using the same trivial password for every service.

pnathan · on March 8, 2012

Also, if you forget your password to the password manager... :)

drucken · on March 8, 2012

It is a single password/pass phrase. This makes it simple to manage.

You can easily create a simple paper and geographically-redundant system (e.g. using a one-time pad) that means you can keep the passphrase in the open, for example, in your wallet or purse.

Password managers are the only way to go and eventually all password/passphrase users realise this because they end-up emulating these systems anyway (e.g. using extemely poor entropy additions to their existing passwords for different applications).

DanBC · on March 8, 2012

Passwords are only hard to memorize because of the old advice about "do not write them down".

But really, if you write a password on a $50 dollar bill you're not likely to then stick it to your monitor or leave it under a draw. You'd keep it in your wallet, and you hopefully keep that safe. Once you've remembered the password you can destroy the bit of paper. (uh, not a $50!)

I'd be interested to see if there's a bias in words that are selected by people when using a diceware system. People generate a passphrase, but then say "I won't use the first word because I can't remember it, I'll just generate another word". Also whether that would lead to exploitable weakness.

drostie · on March 8, 2012

Passwords are also much easier to memorize when you have to use them every few days, and extremely hard to remember when you only use them once every several months when your login cookie expires.

I also would like to know whether there is a simple way to strongly encrypt a password that one keeps in one's wallet. The goal should be that if I am pickpocketed in a Dutch train (which seemingly never happens but the Dutch always warn me about it), my wallet does not contain some piece of paper saying "my.address@gmail.com h6tHVtcj3DsY".

minikomi · on March 8, 2012

Anagrams .. Something that looks like it should be in a wallet (eg shopping list) .. Use the sum of the digits on your credit card plus a word

vacri · on March 8, 2012

"Everybody is dancing around reality:"

This is a fine password. Long, easy to remember, and has five special characters.

kruhft · on March 8, 2012

Whoever solves the 'password problem' and has their solution adopted is going to make a lot of money, I think. I'm always working on this in the back of my head, but haven't come up with something that will really work, but I'm sure someday, someone will come up with a solution that is a secure, reliable and easy to use way of personal identification.

tomjen3 · on March 8, 2012

For offline encrytion that is true, but if you rate limit the login attempts (to something like once every 3 seconds after 4 tries) then even a password with 20 bits of entropy is going to take a riddicously long time to bruteforce.

nknight · on March 8, 2012

A random 4-word passphrase is more than enough to stop online attacks against high-value targets (e.g. bank accounts) that have any throttling at all.

A random 5-word passphrase with three random characters tossed at the end has better than 82 bits of entropy and will halt offline attacks, at least by non-TLAs, for many years to come.

These are not hard to memorize. What's hard to memorize is 101 different passphrases, many of which you don't use every day, which is why the people actively working to make it easier are focusing on the management problem.

drostie · on March 8, 2012

Just to give people a flavor, here is a script I wrote in Python to generate various random passwords: http://www.bpaste.net/show/24775/

What it looks like in practice:

    drostie@signy:~$ words -c 5 4
    (Line entropy: 64.7055497954 bits.)
    rebelliousness subpoenas garrulous coach
    mono peroxide underused fathers
    entreat entwined meatier umbilicuses
    relay torrential husband Jenkins
    Kristine sluggers feverishly Lessie

So each of those lines has a little over 64 bits of entropy. You might also consider the following one-liner in the shell:

    drostie@signy:~$ head -c 9 /dev/urandom | base64
    ST2QpSb0eKHk

If you're not paranoid, consider typing "password" into DuckDuckGo and it will give you a random password. If you're slightly more paranoid, consider using mouse-movements to generate your own random password with a random tool I coded for making random API keys: http://code.drostie.org/api.html . Just truncate to the first 10-12 characters or so.

TheCapn · on March 8, 2012

This doesn't actually surprise me. If you consider the role of random brute forcing via dictionary attacks is to locate the appropriate order of tokens that work together to create a coherent meaning you're essentially not providing any more security with more words. A "Password" is a phrase composed of tokens that are the alphabet, numbers, symbols. A "Passphrase" is composed of tokens that are known english words. By taking the corpus search method to determine natural phrases they're essentially trying to identify the total breadth of 2+ token combinations that make up the english language.

This does break down like they said when you stop using coherent meanings. A passphrase that is HorseQuoteBulb would be hard to guess in comparison to HorsesEatHay or something of the same style.

The same goes for passwords: while it may be easy to guess a password as "phrase" it suddenly becomes a lot more difficult to randomly attempt guesses at "7_-Az!e".

Eventually I think we'll all be forced to use two factor authentication for added security. Here the user is mostly safe from their own ignorance where the danger of having credentials stolen is more prominent in the form of Man in the Middle attacks.

napoleoncomplex · on March 8, 2012

As with normal passwords, you will have people completely disregarding any advice on how a password should look like. People will just slowly learn that having a faulty security system on the web is the same as it is in real life. You don't put a curtain as your front door, and you shouldn't put "Harry Potter" as your passphrase, especially if you are holding a wand in your Facebook photo.

In my case, I use multi-word passphrases with words from a obscure dialect of a tiny European country's language. Being a dialect I grew up with, it has the benefit of being easy to remember, and the obscureness of it means the phrases itself are more or less a random string of characters to any brute-force attack. Not exactly a "best practice" for anyone but myself, but I'm happy with it :).

16s · on March 8, 2012

diceware is very good for these sort of passwords. I have no affiliation, but would use it for passphrases if I needed to. http://world.std.com/~reinhold/diceware.html

gizzlon · on March 8, 2012

Interesting read.. Assuming people would choose random words in their passphrases always seemed too simplistic and naive.

user24 · on March 8, 2012

It doesn't even have to be random, simply choosing "SimplyChoosing" or "AssumingPeople" is way more secure than "ManchesterUnited" or "HarryPotter", but still people are creatures of habit.

gizzlon · on March 8, 2012

True, but when calculating entropy people often assumes random choice of words

cpeterso · on March 9, 2012

The system can let the user choose from a selection of random passphrases.

user24 · on March 9, 2012

massive reduction in entropy.