i suspect some people here don't really know in any detail how password cracking works.
you start with a dictionary. a good dictionary combines multiple passwords, slang, common patterns of keys on the keyboard, and old, known, passwords (so all the entries here will be added, for example).
but that's just the start. the cracking programs also have rules. in simple terms these can be "shift to upper case" or "combine two words from dictionary" or "reverse word" or "replace "e" with "3". and there are common sets of rules that people develop (and test) that do standard transformations like automatically making passwords "leet", or extending with common sequences (123), etc.
and you can automate this. so something like hashcat will generate random rules and apply them to the dictionary. because the rules are written so that they can be composed this is surprisingly powerful and probably explains some of the "harder" cracks on this list.
the only way to beat this, then, is to use random passwords (for example, as generated by keepass) and to make them large enough that just running through every combination doesn't work.
on size: what is important is the number of available combinations. systematic cracking - trying every possible combination - still uses common characters first so "size" is a combination of character variety and number of characters. again, something like keepass makes this easy.
finally, note that on some old or poorly maintained systems, you may be restricted (perhaps without knowing) to 8 characters. then using a wide range of characters is critical.
i know this is obvious to many, but some of the comments about "hard" passwords made me think it might be useful...
So this is not the complete set of passwords, only the ones that have been cracked so far, using a dictionary? Scanning through the list, very few look like a random set of characters that a password generator would come up with. Does that mean that passwords which were randomly generated are, and are likely to remain, safe?
So this is not the complete set of passwords, only the ones that have been cracked so far, using a dictionary?
a dictionary + rules + time. yes (i assume - i have no inside knowledge).
[...] passwords which were randomly generated are, and are likely to remain, safe?
it depends on the length and range of characters. random passwords are still vulnerable if they are short and/or use a restricted range of characters. they can be found "by accident" when using rules and dictionaries. they can also be found by simply trying all combinations ("a", "b", ..., "aa", ..., "ab"...). length and character range protects against both of these.
it's likely that attackers will use all available approaches, including simple brute force. i focused on dictionaries + rules above because (1) that's generally more efficient and (2) some people don't seem to realize how powerful it is.
Thanks. I was under the impression somehow that cracking the first few passwords would make it easier to crack the rest (by figuring out the salting scheme perhaps)?
Also, if there is no restricted range of characters, and a solid, random password is used, how can the attacker know when they have broken it, short of trying them all against the actual server?
No cracking one doesn't help you with the rest at all. Salt and encryption scheme figured out to even start.
You just keep going in bruteforce and build bigger rainbow tables / variant dictionary attacks. Given a large number of GPU's you will get all passwords.. time though could be a number of years to thousands of years.
actually, the last point is interesting, but goes the other way: the attacker doesn't have to find your password, just some text that hashes to the same value that is in the password file. that places an upper limit on how secure you can make passwords for any given hash (although in practice it's usually irrelevant).
Ok, this is interesting, so now we're talking about hashing collisions, right? The idea might be that if you just start iterating through all possible alphanumeric combinations, that it's possible you'd stumble on a successful collision before you'd stumble on the actual password. It seems this would only be likely once the number of possible passwords was greater than the number of possible hash outputs (32^66 for MD5, right)? That would occur if the length restriction on the password was greater than the length of the hash output, or a broader charset was allowed for the password than for the hashing output.
I guess if this ever actually became a relevant concern, you'd simply keep adding a few extra chars to the hashing algorithm output to keep it beyond the range of reasonable.
You can actually calculate password bit strength/safety (assuming the chars are indeed random... and that's a big assumption). A randomly selected character from all printable ASCII characters (roughly 95 chars) has about 6.5 bits of entropy. So an eight character password is about 52-bits strong.
I have a tangentially related question. During this LulzSec situation, I stepped into 2009 and started using LastPass. I generated 16 character random passwords for every place that I've been visiting. As I understand it, these passwords are encrypted on my machine and then uploaded to LastPass servers. When I login with my master password, I have access to all of these passwords unencrypted.
My question: is there an intrinsic weakness in storing these passwords at a central location? E.g. would it be feasible that LastPass gets targeted by a complicated blackhat attack, and then instead of losing just one of my passwords I lost them all? Or is the manner in which they are encrypted and transmitted secure enough to prevent this?
If someone managed to compromise LastPass' servers, they would be able to modify the content that was sent to your browser, and thereby do anything they wanted with your master password after you typed it in.
That's the weakness of services that are supposedly "Host-proof". If the host is sending you the webapp/javascript/webpage in which your data is decrypted, each and every time you visit their website, you have to trust that they wont (or someone else wont) modify it in a malicious way in order to gain access to your data.
The only way to be safe from this kind of attack is to use the same trusted copy of the code, and not use any new versions until you trust that it is also safe, or to somehow verify that the webpage sent to your browser is exactly what you expect it to be.
Yes, it is. Use 1Password or KeePass because they aren't centrally hosted. 1Password even provides a little security through obscurity by using DropBox for its optional keychain syncing. People store a lot of uninteresting crap on Dropbox, filtering through to gather .agilekeychains would be a real PITA.
If it was simple as filtering for those files, you would have a point. It's not though. If you can't see the difference between a big compiled database of 1 million+ users password DBs vs a multiple petabyte mess of all types of files including a handful of agilekeychains, then it's not worth discussing further.
Correct me if I'm wrong: you mean that in the real world, salted passwords have only one advantage over plain text passwords, which is to buy you some time ? (albeit it's a strong advantage)
A salted, hashed (with a slow hash function) strong (long, random, large character set) password, will take hundreds, thousands, or millions of years (average case) to crack with brute force.
It's only technically "buying some time." In reality it's keeping it completely secure.
And what use would a hacker have attacking a whole database full of them? It would take too long to do it, even with less secure passwords. It takes your fruit out of the "low hanging" category, which is a bit of security in itself.
you can do the maths. if the words are drawn from a dictionary of size D and you use N words then the number of choices that need to be explored is D^N. the equivalent number of bits is log2 of that.
so what's D? i imagine most people would use common words, so let's take the most popular 1 percent of "dict/words" on my computer. that's 4,000 words. log2(D^N) = N * log2(D) = N * 12 (since log2(1000)~10).
now in another thread here there's an article saying that 9 characters drawn from an alphabet of 100 (lower+upper+symbols) is around the current limit. the equivalent number of bits there is log2(100^9) = 60. so you'd need about 5 words (5*12=60) for the same level of security.
tl;dr: you need more than 5 words (very) roughly. i am not 100% sure i have the maths right, either :o)
in more practical terms, i don't know of a program that actually does this. the programs i know about typically only combine a few words. but i am no great expert, and it would be fairly easy to write such a program.
[edit: actually, you are less safe than i estimate above if the attacker is smart, because the words probably aren't independent (most people will use meaningful sentences). so you can use markov chains to hugely reduce the search space (in more general terms: "the" will be followed by a noun phrase, etc). so you need significantly more than 5 words. heh. if someone wants to pay me to write such a program, please get in touch... ;o]
I think there are a few grumpy types that downvote for little reason, or perhaps completely arbitrarily. They're often the first votes in, as this action requires little consideration, and presumably such people have few other distractions.
Also, some downvotes are cast in error. This is very easy on touch-screens, and occasionally a problem elsewhere. Since totals are no longer displayed, a person may not even realize their intended upvote registered as a downvote.
So assume it was a mistake. Or a hopelessly grumpy person whose bad attitude is best ignored, rather than amplified with curses and complaints. Brush off the first downvote or two, and well-considered upvotes follow soon enough.
Yeah, I've gotten downvoted a lot for correcting people...but I think that not even approaching proper english makes you look like an idiot, so really, I'm helping.
A few people seem to be commenting that the salt used for all of the passwords is the same.
This is not the case.
These are crypt(3) strings and the $1$ at the beginning signifies the hash scheme used, not the salt--$scheme$salt$hash. The 1 means that FreeBSD-MD5 is the algorithm used, which is basically MD5 with the salt iterated 1000 times.
Your comment that this is MD5 x 1000 is very informative and got me searching for the FreeBSD MD5 crypt function, and I found this gem from /usr/src/lib/libcrypt/crypt-md5.c:
/*
* and now, just to make sure things don't run too fast
* On a 60 Mhz Pentium this takes 34 msec, so you would
* need 30 seconds to build a 1000 entry dictionary...
*/
for(i = 0; i < 1000; i++) {
MD5Init(&ctx1);
if(i & 1)
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
...
... which seems to still be the default implementation in FreeBSD. I'd guess it's time to increase the iterations to 200000.
A lot of them are just 6 characters. Even a random 6 character password can be brute forced in a few minutes. Many of the short ones were just single dictionary words, so wouldn't even get past the initial check for stupid passwords--they would fall in seconds.
Most of the longer ones seem to be simply combinations of a couple dictionary words, or a word and a number, or similar fairly low entropy combinations that would come early in a brute forcer's search.
Most of the ones that aren't like that are based on simple patterns on the keyboard, which would also be in a brute forcer's list of things to check early.
Looks great but no Linux support. Since I use Ubuntu on my work (and home) laptop and netbook, this is largely useless for me.
KeePassX + Dropbox is a great way to go, but it's kind of more "DIY"-ish. Still not hard to set up, keeps all my passwords in sync, and KeePassX has some great features like "auto-type" which basically is a one-click website login.
I now use the max-length passwords on all the sites I use, and they're all crazy random ones. I don't memorize any of them because it's so easy to reset a password if I lost my KeePassX access (unlikely since it's on Dropbox + 4 computers + CrashPlan backups).
On occasion when signing up for an account that I'm just testing out, I don't feel like creating a random password for it, so I just use a simple one. If it turns out I want to use the service, I will change it to a more complex one. This could be what happened here.
It's okay, Bitcoins don't have any potential monetary value!
If I were to sign up for MtGox, I don't know if I'd use anything above my standard "don't care" password, at least until I decided to get serious about it. Typing "shifty3rd" is easier than generating a LastPass password - and I only use it for accounts when I don't care if they're compromised.
The $1$ part of the hash is purely to indicate that MD5 was the encryption algorithm used. The salt used is the first 12 characters, so from "$1$" to the next "$"
For example, for "$1$JVf3ep..$uvo634QwxDsAuHMOZlKws1:riprip" the salt is "$1$JVf3ep..$" (although those periods may just be padding, IIRC).
I take it straight up dic attack, i see no gen password with 32 chars in it. Guess this teaches you a lesson, 32 character generated password (or max pass size) as a requirement for 99% of sites. Now if only windows had a standard password storage API which programs can access using special rules and special admin programs can manage this way just like the web browser we can have password stores for windows + sync to cloud.
Windows does have a password keychain (and has since XP), but AFAIK the only programs that use it are Microsoft programs. Internet Explorer uses the Windows keychain to store saved passwords.
KeePassX + Dropbox bro. It's only an alt-tab and a right-click -> log in to site on whatever you want to get into, once you enter your master password and/or present your password file.
What kind of impact would having used something like sha-512 with a 128 bit salt have had over md5? How many more cycles do those take to generate? I assume the attacker had to brute force the salt from a known password as well, if that's sufficiently random that should provide some security as well shouldn't it?
I'm pretty sure this article[1] is posted daily on HN. General purpose hashing functions are all fast (which, for a password storage system, is synonymous with bad).
Well, considering that 292 of the passwords contain mixed case, 79 of them are 12 characters or more (this one's nice: "qwe123QWE!@#"), 59 of them contain non-alphanumeric characters, and 6713 of the 8655 passwords posted are unique ... it's probably only a matter of time.
Nice. I think it's time to upgrade all my passwords.
For what it's worth, my password was 8 random alphanumerics, and it's not in the list, while /.,mnbvcxz (12 alphanumerics with symbols) is there. The cracker must have some sort of algorithm that looks for consecutive patterns on the keyboard.
It goes to show that the old rules - non-dictionary word, mixed case, etc - really don't cut it anymore. Psychologically, picking a password that has high entropy is quite difficult, and the crackers are only going to develop better algorithms in the future. I think using a good random generator is the only way to ensure you have a decent password these days.
I dunno, they managed to crack "G7io5639*%V64ioT5h9" -- 19 characters including lowercase, uppercase, numbers, and symbols. It doesn't seem to follow any pattern on a QWERTY layout -- maybe another layout?
I wasn't aware they could crack passwords that long though -- wasn't that supposed to take years, even with a GPU?
I was surprised at how few randomly generated passwords there were in that list, if any. Like you mention, there's a lot of variants on qwerty/12345/!@#$%: 1qaz2wsx, Zaq1Xsw2, 1qazxsw23e, 1q2w3e, !@#$1234, etc.
"ZXasqw12!@" was about as random as it got from a quick scroll-through, and even that is basically just keys directly next to each other.
I assume randomness doesn't mean as much as sheer password length does when it comes to crackability, but I wonder if there's anything to be learned from this. Maybe only that random password generators tend to default to a safer (longer) length? :)
Random passwords are probably not going to be cracked by a run-through of John with a ruleset, which is likely similar to -- or exactly -- what was used to create this list.
These are salted FreeBSD MD5s (iterated hash); even with a powerful GPU, you're probably only going to be able to check a few hundred thousand per second at best.
Yeah mine niether. I have confidence it won't end up there either. The way the difficulty and time hockeysticks up once you hit the low hanging fruit is amazing(<8 char, non complex, simple patterns and common passwords, dictionary words and number / symbol variants). http://www.rakkhis.com/2011/06/i-was-hacked-mtgox-bitcoin-3-...
Nor mine. Of course mine was only 8 characters, character set of size 72, so only a "mere" 722,204,136,308,736 possible permutations. In any case the couple places that one was reused have been updated already, I'm still waiting for it to show up eventually.
Yeah dude, mine was on the bigger list. It's a 10chr semi-random password (all lower case tho).
I pick passwords based on easy things to type and the memorize the pattern / commit it to muscle memory. I also try to use obscure but pronounceable patterns somewhere in the pass.
Couldnt find mine.. thats a comfort. =) Unfortunately most web based system ive encountered has 20char as most for key. My new passphrase model is based on 128 char key using two values piped through sha512. It works... but friendface probably wont accept it :)
Could someone help this newbie understand why proper salting doesn't make this hard enough? And how does the cracker know they have the right password? Wouldn't they need to check it against the authentication mechanism or figure out the algorithm?
If the crypto method being used is strong enough, then there is no need to obscure the authentication algorithm or the salted value. That just provides a little bit of security through obscurity, it doesn't actually harden the passwords anymore. Thus, modern password hashing libraries just put the algorithm and salt information in the password string itself, so that the string of data you store in the database contains 3 pieces of data: The algorithm used to hash the password, the salt, and the password hash. The benefit of this is convenience: Different platforms/languages can all create/authenticate various password hash types with ease as long as everybody sticks to this format.
And because of this, if you choose a weak crypto method to hash your passwords (which is what MtGox did), then an attacker conveniently has the 3 things they need to attempt to crack the password: The algorithm, the salt and the hashed value all in the same place.
However, if a strong crypto is used (IE a computationally intensive and slow hash), then the fact that the algorithm, salt and hashed value are in the same place doesn't really weaken the passwords. With an expensive enough hash, even if the attacker has all the information about how to crack the password right in front of them, actually performing the necessary computation would be so expensive that a brute force attack is effectively protected against.
You can see more details about the format of the password strings and where the 3 parts of data are stored here: http://php.net/manual/en/function.crypt.php
(the PHP page had the best explanation of my quick search, though this isn't just PHP specific)
The purpose of salting is to prevent the use of "rainbow tables" (precomputed hashes). With salted passwords, the attacker has to compute the password hash for every password he tries (takes time), he cannot use pre-computed hash tables and just compare the password hash to the precomputed hashes (a simple string compare).
Using one salt defeats rainbow tables. Using multiple salts does not defeat it any better.
Using multiple salts would seem to help defeat brute forcing or dictionary attacks, but that is only true if the salts are secret... the problem is that they are not secret because the at this point the attacker access to everything, including the salts.
I accidentally up-voted parent, so felt I had to reply since this part is somewhat wrong:
> Using multiple salts would seem to help defeat brute forcing or dictionary attacks, but that is only true if the salts are secret
If I have a dictionary of common passwords, I need to hash this with the salt used by MtGox and then I can test the hashes against all the passwords from MtGox.
Had they used a different salt for each password, my work would be n times as expensive (n being the number of passwords to test against).
Using multiple salts makes it more expensive, but it doesn't defeat an attack unless it makes the attack impossibly expensive. My contention is is that multiple salts by themselves will not make the attack impossibly expensive.
Better encryption, e.g. bcrypt http://en.wikipedia.org/wiki/Bcrypt attempts to make it impossibly expensive to crack passwords. Part of their technique is to use per-password salts to increase the time for all passwords, but the key is to have an encryption algorithm that takes an "impossibly" long time per password as well. Key is that it is "an adaptive hash: over time it can be made slower and slower so it remains resistant to specific brute-force search attacks against the hash and the salt."
Rainbow tables are O(1). Salts are O(n). Bcrypt is O(big), where "big" can be increased to keep it bigger than a "practical" attack will be willing to attempt.
Normally you use a different salt for each password—having the same salt for all passwords makes a rainbow table for that salt actually useful in cracking your hashes.
you start with a dictionary. a good dictionary combines multiple passwords, slang, common patterns of keys on the keyboard, and old, known, passwords (so all the entries here will be added, for example).
but that's just the start. the cracking programs also have rules. in simple terms these can be "shift to upper case" or "combine two words from dictionary" or "reverse word" or "replace "e" with "3". and there are common sets of rules that people develop (and test) that do standard transformations like automatically making passwords "leet", or extending with common sequences (123), etc.
and you can automate this. so something like hashcat will generate random rules and apply them to the dictionary. because the rules are written so that they can be composed this is surprisingly powerful and probably explains some of the "harder" cracks on this list.
the only way to beat this, then, is to use random passwords (for example, as generated by keepass) and to make them large enough that just running through every combination doesn't work.
on size: what is important is the number of available combinations. systematic cracking - trying every possible combination - still uses common characters first so "size" is a combination of character variety and number of characters. again, something like keepass makes this easy.
finally, note that on some old or poorly maintained systems, you may be restricted (perhaps without knowing) to 8 characters. then using a wide range of characters is critical.
i know this is obvious to many, but some of the comments about "hard" passwords made me think it might be useful...