This is the second[0] post I've read this weekend where someone preferred a file-based database system to MySQL. In the other, they explain why MySQL wouldn't be preferable to SQLite in their use case. Here, moving to flat files was faster and easier to work with, at 1/5 the total size, and being much more granular.
They essentially built an ad-hoc database complete with sequential keys and indexing. I'm curious to see what the resulting performance would be if they used a database that's appropriate for the type of data they have. I'm not familiar with non-relational databases but I suspect that a key-value or document store would work very well.
Any developer today that is developing an application and isn't using something like Argon2, Bcrypt, or Scrypt should be considering a plan to move away from whatever they're currently using yesterday. There is no reason to be using anything less than those three and continued use is in my mind negligence.
If at all possible you shouldn't be storing passwords to begin with and instead relying on another service for authentication.
> If at all possible you shouldn't be storing passwords to begin with and instead relying on another service for authentication.
Should we all be using "Login with LinkedIn", then?
Passwords are always difficult to deal with even when using bcrypt. Who knows if bcrypt is still considered secure in 5 years? How long would it take to implement a change which updates the hashing algorithm for new logins while still using the old algorithm for old logins? When should you erase all passwords from inactive who haven't logged in and thus still use the old algorithm. (If you are interested in this problem, Django's user model uses a pretty straight forward and good approach[1]).
Outsourcing them is not the answer. It is a good idea to add that for the user's convenience but I hate it when websites only offer the option to login with "your" favorite social media. But even then, by outsourcing the passwords, you are risking your users' privacy by giving them to Google/Facebook/etc. This even discriminates users' privacy when they are not using Facebook for authenticating because facebook can see that user X visited your website (and sometimes even all URLs from that website you have visited). This is because those "Login with" and "Like" buttons always hit Facebook's and Google's servers with every webpage.
It very much is, if you're outsourcing to someone who can do it with greater competence than the average team can. Keeping current on the crypto, designing with the ability to sunset algorithms in mind, continuous pen testing, investing in physical security/network security/HSMs/you name it definitely isn't cheap or easy. Unless you're in the business of doing _that_ you're almost certainly better off having someone do it for you.
That said, I'm with you on the social logins front. I have/had? hope for OpenID Connect as an alternative so it would be great if someone neutral like Mozilla jumped on the bandwagon.
You are right, if the company you outsource your task to is actually better then you. In LinkedIn's case, outsourcing was the wrong decision because they used pretty bad "tools". You should also only outsource if you trust the other company to be "competent" and protect your interests. For instance, I would have trusted Mozilla with their OpenID alternative but not Google, Facebook, and LinkedIn (though I'm pretty sure Google knows how to keep the login data safe, I'm more worried about privacy in that case).
In this case, what I would do is to use a framework that makes getting those things wrong hard. Django is a great example for that. They provide you with a generic user model that does password handling for you. They also add a few middlewares by default to protect you against CSRF, click jacking and many more. While django can be really slow, and hard to use when doing something "unsual", you can learn a lot from it. I don't know many frameworks that make security so much easier. In Go, you can do all those things as well but that requires that you are aware of those security measures to use them, which is not ideal for junior developers or "fast moving startups that don't have time to invest in security measure".
Storing username and passwords is really hard. Using a secure hashing alg is trivial compared to other time consuming problems like handling brute force attacks or other kind of anomalies like a distributed attack trying to use a db of leaked email and passwords from other services.
That is pretty much the first thing I do when I inherit a project with authentication. You don't need to make another company your application's doorman, there are a lot of PAM backends that you can run on premises that "do it for you". If you have the competency to manage a LAMP stack - then you can likely handle a well tested and existing authentication server.
All the years in physical security might have broken my brain, because I am always surprised by how willing people are to leak information that doesn't need to be leaked. One project I was pulled into was on the precipice of uploading millions of customer's addresses to Google's geolocation API - had I not been able to bring the lead to his senses I might have made a run for the network closet.
PAM is great, and it's especially great as a layer of indirection, but I can't agree with your overall point that using PAM = problem solved. To your no harder than LAMP point, most teams can't competently manage a security critical LAMP stack. They're in good company given that big companies/governments get pwned with great regularity. Survival requires defense in depth, and that gets expensive. It's a matter of everything from policy (are there two-man change controls on firewall rules, do separate teams own the route tables and firewall, do separate teams develop/audit/deploy security critical code) to hardware (is private key material stored on an HSM, are sensitive services physically isolated, does entropy come from a hardware RNG). Most small companies aren't thinking about those things.
Also, given that the P is for pluggable, what's the backend? You wouldn't use pam_unix for users outside your org. A DB? Now you're back to square one. LDAP+Kerberos/AD? That beats the DB but it doesn't do anything for your defense in depth requirement.
> ...I can't agree with your overall point that using PAM = problem solved.
I don't think we have the same problem definition. I'm saying that it solves the problem of authentication implementation details - where the just-enough-to-be-dangerous types screw up (salting, keyspace, the keeping current on crypto part). LDAP can certainly be leveraged for defense in depth, authorization vs authentication, but that is much less off-the-shelf. This also provides some separation between the authentication server and braindead PHP scripts that barf the results of ";select * from users;".
> Also, given that the P is for pluggable, what's the backend?
Kerberos is the obvious choice for authentication, LDAP integration for authorization if you're needing a fine granularity. You'd really have to go out of your way to end up with a PAM that dumps right into a DB with a poor crypto policy - I've never seen it. You could use /etc/passwd - but you're right, you wouldn't want to... the option is nice though.
I don't disagree that a company that makes money primarily on identity management could do it better, if you assign a low value to the information that is necessarily leaked. But let me just point out the context in which we are having this conversation: LinkedIn offered such a service, as does Facebook - both have suffered breaches. While that isn't how they made their money, plenty of people used the service - following the letter of your advice, if not the spirit of it.
The point of bcrypt though is that it is (more) future proof in that the hash generation is slow as balls and can be made exponentially slower by increasing the difficulty.
Other than some kind of algorithm weakness this slowness will translate through to the brute force attack thus taking longer for an attacker. BCrypt also has something that means it is harder for a GPU to crack it (mutable RAM memory tables or something).
As for migrating passwords - you can use a fallback password hasher that checks against a previous hash should the main one fail - then once the login is successful re-hash with the newest algorithm!
You can't really know how future proof it is, though. As far as I know, nobody has proven it to be unbreakable.
Right know we can't break it, we can only brutal force it but maybe tomorrow a mathematician finds some properties to calculate all possible inputs of a certain length for the hash within a reasonable time. Or maybe they find another way that doesn't include brutal force (statistics, ...).
What I'm saying is that passwords are hard in general (to store, to enforce policies properly, ...) and just because bcrypt is "unbreakable" right now, doesn't mean it has to be in 5 years.
Your last paragraph describes nicely what needs to be done, but is your code ready for that? Maybe your database password/salt column only allows X characters, now you need to rebuild the database. Maybe something else expects the passwords to have that specific format (another micro service, some script, ...). Re-hashing a password can be hard if this possibility was not considered from day one.
I was talking about the algorithm. Let's say bcrypt is considered insecure in 5 years (even with 100 iterations) and you are supposed to use another hash algorithm. Is your code and database able to handle that? If so, great but I don't think many people implement this and depending on your code, this can get difficult.
> ...instead relying on another service for authentication.
If you are suggesting something like on-premises Kerberos, I agree - better yet: just design with PAM in mind. But if you are suggesting something like "login with facebook" then I have to disagree - unless your goal is to add a serious point of failure, major dependency, and huge source of information leakage all in one step.
I complained to my bank that their 12 character password limit suggests they are storing passwords. Their reply was little more than don't worry about it, you aren't responsible for fraud. I asked for them to add some kind of second factor authentication (I'm a fan of TOTP systems) and was told they are thinking about making that available for their business accounts.
It bothers me that my most valuable login is probably my weakest.
> How is this better than double salted sha256. I.e sha256("secret", sha256("secret2", $pwd))
ITYM hmac-sha256("secret", hmac-sha256("secret2", $pwd)), but that's neither here nor there; the operation requires either two (your version) or four (my HMAC version) SHA256 operations, which really isn't much: your version would make an exhaustive password search twice as expensive; mine would make it four times as expensive. Neither is very much.
Note too that salts are not secret; they must be stored with the password.
> The double sha/md5 would even give rainbow tables a super hard time right?
If you're using even a single high-entropy salt, a rainbow table is useless. What's not useless is just trying lots and lots and lots of potential passwords: first sha256("salt", sha256("salt2", "12345")), then sha256("salt", sha256("salt2", "hunter2")), then sha256("salt", sha256("salt2", "opensesame")), then sha256("salt", sha256("salt2", "love")) and on and on and on.
'But that will take forever!' you might cry. Not really: the article noted a 209.7 million hashes per second rate. At that rate, one could try every possible date in a century (36,525 possibilities: lots of people use birthdays or anniversaries in passwords) in U.S., British, German, French date formats (x4) with the top 100 male and female names prepended and appended (x400: 100 each, before & after), with and without spaces between (x2) in approximately half a second. If one adopted your approach, it'd slow it down to just over a second; under my approach, it'd be 2¼ seconds. Not very good.
PBKDF2, bcrypt, scrypt & Argon2 all attempt to slow this down by not requiring one or two or four hash operations, but rather by requiring thousands or hundreds of thousands of operations. scrypt goes even further by requiring lots of memory access, since memory access is slow on GPUs while just hashing is very expensive. Argon2 goes even further still.
Under any of the above, it might require a full second on even a GPU cluster to verify a password, which would mean that it'd take 3 years and 8 months to try all of those possibilities earlier. While a real-world system wouldn't be tuned to take a full second on a GPU cluster (it'd be slower on a server), it might very well be tuned to take say 10 or 100 milliseconds, which is still relatively slow for someone trying every single hash but relatively fast for validating a user login.
PBKDF2 is absolutely for password storage. In fact RFC 2898 specifically notes that use case (for KDFs in general):
> Another approach to password-based cryptography is to construct key derivation techniques that are relatively expensive, thereby increasing the cost of exhaustive search.
Do you have a source for that? It was my understanding that PBKDF2 is 'good enough for now', but not necessarily the most future-proof of techs, given how easily the algorithm is optimised for GFX cards.
Most of the attacks described in this article are not solved by any of those though right? They protect against hacking one person but if you just do these advanced dictionary attacks you can still crack people with weak passwords.
No. You cannot effectively use the techniques you can use against SHA/MD5 to attack the three I mentioned.
SHA and MD5 can be calculated entirely in a CPU's registers without having to rely on RAM. Bcrypt for example requires the use of a matrix as part of its calculation, slowing down the process. A GPU has so few channels from the processor to memory that it cannot be effectively done in parallel.
All one has to do with Bcrypt is just adjust the difficulty and any advances in GPU technology or whatever can be nullified.
Thanks. I thought I saw an article about a recent leak with salted bcrypted passwords where they cracked weak passwords. The conclusion was there is no way to prevent a weak password from being compromised so to prevent it require longer more complicated passwords.
The author of this piece was doing about 156 hashes per second and after just over five days, he had only gone through and cracked 4,000 account passwords--we're talking 0.0001% of Ashley Madison's supposed userbase here. To run just over the 14,000,000 passwords from RockYou.txt on every single user account from AM, it would take up to at least two billion years.
Weak passwords are still weak, but things like bcrypt buy you time. A bcrypt configuration might have hashing take 250ms on a server's Xeon CPU (as opposed to nanoseconds with a sha-based hash). You can try the password 'password' on say 50k hashes that were dumped, but it will take about 3.5 hours. You can get this time down by parallelizing or with better hardware like GPUs / FPGAs / ASICs, but you're still looking at a base time of 3.5 hours per attempt to go through each account. Or if you're only targeting one account, 3.5 hours to try 50k words.
If the server had bcrypt configured to take a second per password, multiple everything by 4. And so on. What I think is needed as a supplement to "use bcrypt / scrypt" is a "and use it this way so you don't accidentally open your server to DoSing or give a poor experience", because a 10 seconds-to-compute-on-a-Xeon hash is great from a security standpoint if your hashes get leaked, it sucks from a user experience to have to wait at least 10 seconds to login, and if you have to service multiple logins at once your server's not going to be able to do anything else if you just use bcrypt/scrypt synchronously.
If the server had bcrypt configured to take a second per password, multiple everything by 4. And so on. What I think is needed as a supplement to "use bcrypt / scrypt" is a "and use it this way so you don't accidentally open your server to DoSing or give a poor experience", because a 10 seconds-to-compute-on-a-Xeon hash is great from a security standpoint if your hashes get leaked, it sucks from a user experience to have to wait at least 10 seconds to login, and if you have to service multiple logins at once your server's not going to be able to do anything else if you just use bcrypt/scrypt synchronously.
If you're in a situation where you're needing to rely on hashing a user's password for every action, your application has far worse problems than what password storage method is in play. Moving from your current password storage method that is inadequate to one that would be better also takes into account that you haven't done something completely wrong with session states.
[edit]
I misread the poster I was responding to assuming that they meant that the user was re-authenticating on each request. My thought process was that if they're storing credentials in a cookie in lieu of a session ID then that needed to be addressed first before even going down the avenue of correcting password storage.
You can't protect against people using "password123" or "dadada" as their passwords, but for those of us using long randomly generated passwords bcrypt makes cracking it well outside the realms of possibility for anyone short of nation-state attackers. (I bet the NSA can get quite a lot more than 100kH/s for bcrypt if they're determined enough, but I wonder if even _they_ can throw 6 orders of magnitude more compute at the task?)
Some of these don't make sense. The point of bcrypt, scrypt, and pbkdf2 is the difficulty of them is configurable.
Digging into the comments on that gist, it says that the bcrypt benchmark used a workfactor of 5 (= 32 rounds). The lowest possible bcrypt workfactor is 4 (= 16 rounds).
For comparison, the default for the login hashes on OpenBSD is 8 (= 256 rounds) for standard accounts and 9 (= 512 rounds) for the root account, and OpenBSD's bcrypt_pbkdf(3) function uses a workfactor of 6 (= 64 rounds) and that's intended to be called with multiple rounds itself eg. in signify(1) it uses 42 rounds (a bit over the equivalent of bcrypt with a workfactor of 11) and ssh-keygen(1) defaults to 16 (roughly equivalent to bcrypt with a workfactor of 10 (= 1024 rounds).
The point I'm trying to make here is that bcrypt benchmark uses a ridiculously low workfactor, and it looks like the scrypt and pbkdf2 ones did to.
Yep - and even in it's "ridiculously low work factor" configuration, it's around 6 orders of magnitude slower than SHA. In context, the author of that piece was trying to show off the hashrate, so would be expected to err on the side of making his monster 8 gpu rig look better rather than worse.
Any of bcrypt, scrypt, or PBKDF2 can easily (as in, by design in the hash function parameters) be made however much slower is needed for you (so long as your normal login process is then still "fast enough", I had a WordPress site a while back where I couldn't wind the bcrypt plugin up much past 11 before the inexpensive webhosting would time out before login succeeded... (And yeah, feel free to mock me for "securing" WP with bcrypt - it was mostly because I wanted to be confident if/when the site got exploited, the hashes in the DB weren't going to be too easily attackable for anyone who'd used a decent password))
bcrypt uses unique salts per-password. The hash outputted by the bcrypt function is actually several delimited fields that have been joined into a single string.
This means that if multiple users use a common password like 'pass123', the hashes stored in the database will still each be unique. Any attacker trying to reverse all of the password hashes will have to reverse every single one individually in a targeted manner rather than using pre-computed rainbow tables or generating hashes from a wordlist and scanning the database for matching hashes.
General-purpose cryptographic hash functions like the (now-broken) MD5, SHA1, SHA256, etc. are designed to be computationally easy, ie. fast.
Salting protects against rainbow tables [1], but it doesn't change the fact that computing a SHA256 hash is fast.
Password hash functions like PBKDF2, bcrypt, scrypt, Argon2 are designed to be computationally expensive, to make a password-cracking endeavor take even longer.
Argon2, the winner of the Password Hashing Competition and the current state-of-the-art, for example, has two ready-made variants: Argon2d is more resistant to GPU cracking, while Argon2i is more resistant to time-memory tradeoff attacks [2].
bcrypt is significantly slower to compute. Something like 5 or 6 orders of magnitude slower. (se my other comment in here with numbers for cracking various hash types on an 8gpu rig...)
If X is millions (or even billions), maybe, but you shouldn't. Just use one of the real password algorithms. Never ever roll your own hashing system assuming it's secure enough. It won't be.
Of course. I just wanted to get an idea of the reasons without going too much into mathematical details. For projects I would just use argon 2 or bcrypt.
At some point, the cracker is going to reach diminishing returns.
Presumably some fraction of the remaining 15% will be completely random, from people using password managers or similar. These could only be cracked by brute force (assuming they are truly random), and since they're also likely longer, it's unlikely they would be cracked any time soon - they're just not worth the computational investment to the cracker.
> Any estimate on the time to crack the remaining 15%?
I know of at least one person on LinkedIn using passwords of the form 2jzAwGyOzfxNoW0u3lTIIa (i.e., 22 digits & mixed-case letters); calculating 209.7 million hashes per second he should be able to crack his first one of that sort of password in about 182,686,540 years.
Anybody using a modern password safe should be in exactly that position. I use 1Password and default to 25chars including upper/lower/digits/specials. It's occasionally annoying when I need to transcribe one of those passwords from my phone into a system I trust but not enough to keep my password safe on (my work laptop, for example), butthat's rare enough that I just suck it up and cope.
I'm assuming it's one of them given it was a gift from Sagitta. Quite a while back when I was in it. Then, they were using FPGA's to offload cracking from our Pentium 3's and 4's. High end FPGA's cost $1-2k. Systems that integrate them were often a lot more expensive but one could get coprocessor cards. So, not sure if password cracking has come down in cost or gone up vs FPGA's.
The combo of algorithms supported, flexibility, and performance is certainly better than cranking out HW implementations on FPGA's.
I forgot root password on my old IRIX / SGI Octane2 and had to "crack" it a few months ago. Turned out it was using full eight characters and was on the tail-end of the alphabet. It took less than a day to guess it on an older two-gpu 680GTX machine with cudaHashcat. Also, how awesome of IRIX guys was to not allow more than 8 characters in a password?
Some of the answers on that page suggest it changed around 2000, and SGI Octane 2 was still being sold in 2004(?), but I don't know how rapidly systems changed.
Pretty good writeup. My takeaway is that it's more important to use a long password than to mix and match letters/digits/special characters if that's the choice you have (since the cracking process is greedy and going from short to long not from low entropy to high entropy...for lack of a better description). Would that be a correct assumption?
Just to pose a silly example...take a password of 300x"x" (maybe turn the 42nd into an "o" for good measure)...since many attacks probably won't enumerate that many characters before they reached a sufficient mass of cracked PWs that would be reasonably safe even though it is kind of a silly PW, right?
Edit: no need to do that in practice since you can just use a randomly generated PW with a PW safe but maybe there's a case where you need to remember the PW just in case.
> My takeaway is that it's more important to use a long password than to mix and match letters/digits/special characters
Yes. Trivially, if you use only two symbols (eg: "0" and "1"), a (random) password of 128 letters should be pretty safe. Note that your example of just 300 of one letter wouldn't be all that safe. In general, a good password won't really be easy to remember, because it needs to encode a lot of entropy.
More generally, you probably want log2(Nsymbols^length) >= 64, possibly => 96 (ie: equivalent to at least 64 or 96 bits of entropy). If you're using big and small letters, digits, and say ten printable symbols, every single letter (each random pick of one of the 226+10+10=72 symbols) adds roughly 6.17 bits of entropy. So you'll need at least eleven letters in your password. If you just use small letters, every character in your password adds about 4.7 bits - so to "climb over" 64 bits of entropy, you'd need at least 14 letters in your password.
Using just digits, each digit 0-9 adds about 3.32 bits, so for 64 bits you'd need a string of 20 random* digits.
To enumerate half of 2^64 passwords at 200 million tries/second, would take about 2^63/(200 000 000 * 3600 * 24 * 365) ~ 1 499 years. Clearly, if you had 3 000 machines, you could do this in about half a year - so depending on your risk profile, you might choose to aim for 96 bits: 2^95/(200 000 000 * 3600 * 24 * 365) ~ 6 439 554 927 618 years ... (That's eg: 29 random digits).
Yes, sure. You should not be using anything but Bcrypt et al for passwords (salt, salt, salt!) – but... Out of curiosity. What if these passwords were SHA-512 hashed (unsalted) rather than SHA1?
As part of a presentation I did at a local OWASP chapter, here are some numbers based on just using CPython's Hashlib processing of 14,000,000 someodd passwords:
It's presumably not the most optimized for attacks, but you can try the "openssl speed" command (including specific algorithms, if you want, like "openssl speed sha1 sha512").
If you want to throw a wrench in a password cracker's gears, why can't you just run your 'crypt' function on its own output a thousand times in a row, so that anyone attempting to crack it will need to run it a thousand times with every candidate password? What I mean is 'crypt(crypt(crypt(p)))' should take three times as long as 'crypt(p)', right? And scale 'a thousand' to however many iterations takes one second on contemporary hardware.
> Both are largely unapproachable on GPUs and ASICs.
Not true with respect to bcrypt. Bcrypt is not designed to be memory hard, it uses a constant and relatively small amount of memory (~4kB IIRC) which is only incrementally better than PBKDF2.
However its memory access pattern means it needs fast RAM, which is what makes GPUs inefficient for it: they have lots of memory[0] but it's slow to access from the parallel cores[1]. You can absolutely have small amounts of fast ram in FPGA and ASICs.
[0] and a good amount of memory/core, a 1080 has ~3MB/core, a 1070 has 4 (they both have 8GB RAM but the 1080 has 2560 CUDA cores versus 1920 for the 1070)
[1] GPUs have very high memory throughput but very high latencies even for caches[2] e.g. on Kepler (a few generations back) the L1 was already 48 cycles, Skylake can go up to L3 in 42 cycles
[2] which are shared by multiple cores and are very very small: from what I've found Pascal has 64KiB of L1 per SM (with each SM grouping 64 CUDA cores) and 4MB of L2 for the entire chip
In general, you can. Just remember that you are stacking it N times on a CPU and, depending on the function, there is hardware that can do it more efficiently than a CPU: you're technically inconveniencing yourself more (in comparison to a hacker) by using a cheap function. You're going to gain a lot more by using a function built for high computational cost from the outset.
Furthermore,
> It can be shown that the repeated hashing reduces the space of possible values, but should reach an inner "cycle" of size roughly sqrt(N) if the hash output values are in a space of size N[1]
Using the correct type of function just makes more sense.
Honest question, how does somebody know whether a dumped hashed/"encrypted" password has actually been broken and exists in plaintext?
Some time ago I reset almost all my passwords to 1passwd $RAND, but some of these dumps are ooooold. Is there a legit way to find what's available for my email?
There are some efforts, like https://haveibeenpwned.com/. However, personally? I always feel naked dishing out my email(s) in a <form>... Irregardless of HTTPS or HTTP.
"At hashes.org, 87% completed https://hashes.org/public.php "
There are loads of cracked hashes from other public leaks as well at hashes.org - worth adding these to your pen testing dictionaries.
[1] https://www.reddit.com/r/netsec/comments/4ozdz8/introduction...