That comic is unfortunately not telling the truth. The password phrase IS 44 bits of entropy, assuming you input random ascii. But any reasonably knowledgeable person trying to crack passwords, will use a dictionary to create a passphrases, rendering this less useful than the first password. Even if you do substitute o/0/ø, i/l/1, a/4, ect randomly. You still need a rather long sentence, and preferably spice it up with simple substitution - now this passphrase is no longer easy or simple to remember.
If you choose four random words from a list of 2048 common words, and your attacker knows that's what you're doing, then your entropy is 4 * log_2(2048) = 44 bits. If the attacker didn't know your strategy and tried to brute force letter by letter it would be much higher - around 48log_2(26)=150 bits assuming around eight letters per word - but like you said, we should assume the attacker knows exactly what strategy we're using, so 44 bits is the better number to work with.
Since that's still more than 28 bits, a 'correct horse battery staple' type password is harder to crack then a short random string of mixed characters, even if the attacker knows exactly what your password generation strategy is.
I was a bit hasty of the entropy of the passphrase, my mistake. I still stand by that even if we choose from 2048 common words, generating a good passphrase (one that isn't a common sentence) is harder than we think.
Yes, 2048 is tiny. I've been using a 4096-word dictionary I found online for years, along with my family and two kids since they were about 6 years old.
There is absolutely no trouble with a 4096 word dictionary. Yes, they (and me too) sometimes bump into words we don't recognize, but it's not that common.
Here, I just generated you a few passwords:
* hefty march attempt force bowel scuff
* between sepia book sweat lemma saint
* safe warn magical cask hefty wish
* alum glib puck adieu dour lazy
* telephone pine cavort good knee swank
* numeral plan jewel conch slate tube
* pastry piano sure proxy unit brew
* trig rise taint current sans gallop
Here is the same random numbers but encoded into ascii instead of words:
* 81Pk3t?Rq6S}
* ]CPcYrT^?iE3
* +qV`J9ZU&.,C
* `>sp=~V);3g>
* E&_ff7a|Z4B[
* ?OX~[J>0K'S*
These each have the exact same amount of entropy as the word-based ones.
Yes and good luck writing those passwords like "between sepia book sweat lemma saint" without typos :)
I just watched a friend using some kind of long password and it took about a minute to get the password correctly entered. Easy to look at the keyboard also when he was typing and guess the words used.
There is not 3000 _core_ words. You don't teach elementary school children 3000 words. That list is significanly smaller. In Denmark it's 120 words, then you'll be well on your way to reading and writing most basic stuff.
That someone has selected 2048 words used to generate passphrases, doesn't make it easy to remember.
Eh, you may want to peek at the XKCD things explainer to see what is life under 1k words.
Everybody uses more than a few thousand words on at least one language. And selecting the least used ones will make your passprhases easier to memorize, because they have much more concrete meaning (by virtue of their rarity) than the most used words.
I've been doing this for years and it is indeed easier to remember. You won't be keeping all your passwords in your head in any format if you're using a unique password per site or service as you should. But you occasionally have to buffer them mentally between your password manager and the input box (especially on mobile), and the same number of bits of entropy are infinitely easier to copy correctly in this format vs something like "?G[G6n|4".
Err.. I would suspect most(non-english?) people on the internet to know 2000 words. (By virtue of being atleast bi-lingual, they would, so the challenge boils down to can they type it? unicode support should help, but I've found most cases people simply type those sounds in English.)
As for the English as a primary language people, I've no clue about the number of words, but if we can expect them to type 1024 words, we'd still get 20 bits..
The GfyCat URL generator list is about ~1800 animals and ~8000 adjectives [0]. All fairly memorable words. Add in other types of words and 2048 quickly becomes a small number.
The words don't have to be incredibly simple, either. Grabbing a random GfyCat image from the front page gives me "OrangeLankyBasilisk" which is easy to remember, none of those words are particularly foreign. But they're not in your list of most common words, nor are any of them in XKCD's simplewriter [1], which keeps track of the 1000 most common words.
Edit: The words need to be randomly chosen by a computer, so it doesn't matter what the most common words are. You could generate the password from a list of 2048 spanish words or japanese words or emojis and the entropy is the same as the previous scenario (assuming the attacker knows your dictionary of symbols just like they'd know your dictionary of English words). If you let a human choose, of course the smiley-face and poop emojis are going to be picked 50% of the time, but that's not the intention of the comic or passphrases.
You can if it's a computer picking the password from presumably common words.
If the human is picking anything than there WILL be a bias in selection. Effort should be made at minimizing that, but even with education this is a difficult task for any worker.
It got to the point where I actually took a classic literary work and made a 'password words' dictionary from it just so that I could have the computer generate possible new passwords. (there bias is mostly filtering out things that might be offensive... because someone can always take it the wrong way even if you explain in advance the theory of how the password was created by a computer).
I've been pretty happy using diceware[0] for my word-based passwords. Just be sure to use actual dice and not a software based random number generator if you want it to be truly random.
Mix in some other languages or slang words, and a number is always possible and easy to remember. You may choose to replace the space consequently with a comma or ; or another easy to type character (not one that requires a shift if you keep everything lowercase).
> If you choose four random words from a list of 2048 common words, and your attacker knows that's what you're doing, then your entropy is 4 * log_2(2048) = 44 bits. If the attacker didn't know your strategy and tried to brute force letter by letter it would be much higher - around 48log_2(26)=150 bits assuming around eight letters per word - but like you said, we should assume the attacker knows exactly what strategy we're using, so 44 bits is the better number to work with.
> Since that's still more than 28 bits, a 'correct horse battery staple' type password is harder to crack then a short random string of mixed characters, even if the attacker knows exactly what your password generation strategy is.
Typo in the above but can't edit on my phone, the second sum should be 4 \* 8 \* log_2(26). Apologies.
No, you are wrong. The number of bits of entropy in "horsestaple..." is estimated by assuming the words where chosen at random from the 2^11=2048 most common words. 4*11=44 bits in total. In practice it is even better since a hacker would also try different kinds of passwords! So no, you do not need substitute characters.
Yes, I was wrong about the entropy when writing that. But I still don't think that passphrases are as godsend as the comic make it seem. Can we really assume 2048 common words? The 100 most commonly used, make up 50% of written words.
A common sentence like "I drove to the mall yesterday" is not a good passphrase, but I'm certain that people who use "rocket" as a password would do something similar.
The intention is that the random words are selected from a list of 2000 unique, common words.
Choosing a sentence is a different strategy, which is less secure.
$ wget -O ⅓Mwords http://norvig.com/ngrams/count_1w.txt
$ for i in `seq 10`; awk '/^[a-z]{3,}/ { print $1 }' ⅓Mwords | head -n 2000 | shuf -n 5 | tr '\n' ' ' && echo
videos possible disease maintenance chair
teen documents than without son
research interface library largest drive
location ball beauty coming files
files middle fri meet air
guarantee samsung click super inn
legal previous rent resort use
reply thought better fresh phentermine
bad command once vehicle australian
fun random professor course sponsored
I'm not suggesting that 20 random characters is easier to remember, but for average Joe, it might as well be the same. Not only do they have to remember the words, the sequence, and how to spell them. Unfortunately we cannot expect this from users in general - the worst offenders write down a password like "rocket", so there is no hope that they'll try to remember a sequence of random words.
We shouldn't have remember passwords at all IMO. It's creating entropy by remembering things, but the human brain is inheritly bad at remembering exact things. Things like a yubikey is a better idea, plug it in, enter your pincode, and use a key pair to authentication. All the user have to do is keep track of the physical thing and the pincode.
Even those 44 bits are too little nowadays. Passphrases are not a godsend, but something good to use when the correct technology - a password manager - is not available.
A notable use case is choosing a master password for your password manager. And you'll want a longer phrase.
the idea is you use a mnemonic generator to pick the words. The fact that "100 most commonly used, make up 50% of written words" (a dubious statistic, source?) is irrelevant.
> the idea is you use a mnemonic generator to pick the words.
I know you are supposed to use a generator to pick the words, that is how BIP39 for bitcoin works. But average Joe is not going to do that. He will select "I went to highschool in 1992". Authentication is a hard problem, and unless you force a reasonable scheme, it will be weak.
I think you failed to think that through. Random alphanumeric input of that length would be 25log(36)=525=125 bits. It is 11 bits per word because it is chosen from a ~2000 words dictionary.
Edit: should have reloaded, said by enough people already :D